Lupy is a full-text indexer for Python. It is a port of Jakarta Lucene to Python, and reads, writes, and searches indexes in Lucene binary format. Like Lucene, it is sophisticated, scalable, and Unicode aware.
| Tags | Text Processing Indexing |
|---|---|
| Licenses | LGPL |
| Implementation | Python |
Recent releases


Changes: The code has been reorganized into modules. Some iteration constructs have been converted to Python iterators and generators. All text processing internally is now handled as Unicode. Analyzers are back as generators of tokens. The changes to the code to make it more Pythonic appear to have resulted in trading time for space: preliminary tests indicate about a 5% speedup on one dataset in exchange for a 20% increase in memory usage.


Changes: This version fixes a bug related to another bug fixed previously.


Changes: The main reason for this release is to clean up a minor bug in the indexer.Index wrapper. The default mergeFactor has been changed from 9 to 20 for better performance. The example in simple.py uses a keyword for filename instead of a tokenized and stored Text field. SegmentInfos and FieldInfos have been tidied up to be more Pythonic. close() is called on the open searcher in indexer.Index.setupIndexer.


Changes: This version fixes a Windows-only bug in IndexWriter, and adds setMergeFactor to the Index to allow for tuning.


Changes: Some minor changes were made for Python 2.3, although a couple of warnings about bit operations remain. This release breaks some code: field.Keyword() must now be used instead of field.Field.Keyword(). If you are using the Indexer wrapper, searches are now more accurate because the query is tokenized first.