Lucene Index Dumper

LuceneAnalyzer is a quick hack for dumping and inspecting a Lucene index. Something for the ‘sort-uniq-cut-awk’ guys out there. :-)

Show global statistics of the index:

shell> ./luceneanalyzer -g /dir_to_some_lucene_index

Global Information:
===================
        number of documents: 17
        total number of features: 955
        total number of tokens: 1442
        version: 1328361447856
        still current: true
        maximal document number: 17
        has deletions: false

Show field information:

shell> ./luceneanalyzer -f /dir_to_some_lucene_index

Field Information:
==================
Fields of type 'ALL':
        store_0_coordinate
        text
...
Fields of type 'INDEXED_WITH_TERMVECTOR':
        includes
Fields of type 'TERMVECTOR':
Fields of type 'TERMVECTOR_WITH_OFFSET':
Fields of type 'TERMVECTOR_WITH_POSITION':
Fields of type 'TERMVECTOR_WITH_POSITION_OFFSET':
        includes
Fields of type 'UNINDEXED':
        store

Show information about terms, statistics and positions:

shell> ./luceneanalyzer -t -vv /dir_to_some_lucene_index

Terms:
======
cat     camera  12[0]
cat     connector       3[0],4[0]
cat     copier  11[0]
cat     electronics     1[0],2[0],3[0],4[0],5[0],6[0],7[0],8[0],9[0],10[0],11[0],12[0],15[0],16[0]
...
ext    using   13[415]
text    utf     14[3]
text    v       8[2]
text    va902b  9[1]
text    valueselect     7[1]

A Git repository is accessible at git://git.andreasbaumann.cc/LuceneAnalyzer.git (or at http://git.andreasbaumann.cc/cgit/LuceneAnalyzer/ )

In case of questions, contact me via email.