libTextCat

Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". It was primarily developed for language guessing, a task on which it is known to perform with near- perfect accuracy. Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses.

Tags Scientific/Engineering Artificial Intelligence Software Development Libraries Text Processing Linguistic
Licenses BSD Original
Operating Systems POSIX Linux
Implementation C

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  05 Dec 2003 11:03
  • Rrelease-after

Changes: A long overdue autoconfig script has been added.

  • Rrelease-mid
  •  20 May 2003 06:35
  • Rrelease-after

Changes: The distribution now contains Gertjan van Noord's language models for the automatic recognition of over 70 languages. The makefiles were cleaned up to make them more portable.

1f85c2f53e0522d6d437d77bb6d82f60_thumb

Project Spotlight

Glade

GUI builder for GTK+ and GNOME

9adb4efe9006992d1b6feab5a44643dd_thumb

Project Spotlight

cb2Bib

A bibliographic reference extracting tool.