Lucene Index Dumper
LuceneAnalyzer is a quick hack for dumping and inspecting a Lucene index. Something for the 'sort-uniq-cut-awk' guys out there. :-)
- release 0.0.4 (for Lucene 3.1)
- release 0.0.3 (for Lucene 2.x)
Show global statistics of the index:
shell> ./luceneanalyzer -g /dir_to_some_lucene_index
Global Information:
===================
number of documents: 17
total number of features: 955
total number of tokens: 1442
version: 1328361447856
still current: true
maximal document number: 17
has deletions: false
Show field information:
shell> ./luceneanalyzer -f /dir_to_some_lucene_index
Field Information:
==================
Fields of type 'ALL':
store_0_coordinate
text
...
Fields of type 'INDEXED_WITH_TERMVECTOR':
includes
Fields of type 'TERMVECTOR':
Fields of type 'TERMVECTOR_WITH_OFFSET':
Fields of type 'TERMVECTOR_WITH_POSITION':
Fields of type 'TERMVECTOR_WITH_POSITION_OFFSET':
includes
Fields of type 'UNINDEXED':
store
Show information about terms, statistics and positions:
shell> ./luceneanalyzer -t -vv /dir_to_some_lucene_index
Terms: ====== cat camera 12[0] cat connector 3[0],4[0] cat copier 11[0] cat electronics 1[0],2[0],3[0],4[0],5[0],6[0],7[0],8[0],9[0],10[0],11[0],12[0],15[0],16[0] ... ext using 13[415] text utf 14[3] text v 8[2] text va902b 9[1] text valueselect 7[1]
A Git repository is accessible at git://gitorious.org/luceneanalyzer/luceneanalyzer.git (or at http://gitorious.org/luceneanalyzer)
In case of questions, contact me at <abaumann at yahoo dot com>.