Sherlock Holmes

Sherlock Holmes is a modular system for gathering and indexing textual and image data, and searching in it. The most popular application is, of course, indexing of Web pages ranging from small Web sites to whole top-level domains, but other data sources, parsers, and user interfaces can be added easily.

Tags Internet Web Indexing/Search Text Processing Indexing
Licenses GPL
Operating Systems POSIX Linux
Implementation C Perl

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  13 Apr 2009 17:57
  • Rrelease-after

    Changes: This release moves almost all features from the commercial version into the free version. The most prominent new features include the computation of dynamic page weights, a new gatherer that intelligently chooses which pages to crawl by their weights, site compression, and extended image and audio search. The indexer is much faster thanks to new sorting routines.

    • Rrelease-mid
    •  16 May 2007 13:19
    • Rrelease-after

    Changes: This release can download JPEG, PNG, and GIF images, store their thumbnails, and search in them through their reference texts. HTML documents can now be filtered by their content. This release significantly speeds up the indexer and the search server on multi-processor systems. It can be used on Darwin (Mac OS X).

    • Rrelease-mid
    •  25 Jul 2006 21:33
    • Rrelease-after

    Changes: Sherlock now contains a new library for analyzing the contents of the documents. An existing index can now be quickly patched by new cards. The search server dumps the context of long cards better, and it can serve as a simple database by allowing browsing of all cards. A faster utility, "shcp", was added for copying the index into different machines. The configuration mechanism has been improved. Sherlock now supports the AMD64 architecture. Most modules have been substantially optimized, cleaned up, and corrected.

    • Rrelease-mid
    •  20 Jun 2005 14:52
    • Rrelease-after

    Changes: The limitation on indexing only the first 4096 words in a document has been removed. Two morphological stemmers and utilities to create tables for them have been added. The customization interface, the makefiles, and the configuration system have been greatly improved. A major cleanup of the code has been done, several bugs have been fixed, and many small features have been added.

    • Rrelease-mid
    •  23 Feb 2005 10:50
    • Rrelease-after

    Changes: This release fixes a bug in the gatherer concerning compressed buckets. Upgrading is essential.

    8355ebcc362f5e43db57ba9cee34a804_thumb

    Project Spotlight

    Config::Model

    A framework to edit and validate configuration files or data.

    144d19ff2c345021570175a19b0e6d06_thumb

    Project Spotlight

    Cerberus Helpdesk

    A highly streamlined, group email Web application.