Projects / dbacl

dbacl

dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.

Tags Scientific/Engineering Artificial Intelligence Text Processing Filters Communications Email Linguistic Adaptive Technologies Information Management Metadata/Semantic Models
Licenses GPL
Operating Systems POSIX
Implementation C

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  26 Mar 2006 02:13
  • Rrelease-after

Changes: This is a hodge-podge of fixes and improvements. A new hypex command, the TREC 2005 options files, and an essay on chess are now in the tarball. Several improvements to the parsing engine were made, including a new -e char option and bugfixes. Compilation problems on various architectures were fixed, and libslang2 support was added.

  • Rrelease-mid
  •  02 Jul 2005 02:20
  • Rrelease-after

Changes: This release fixed some bugs, cleaned up the behaviour of the -w switch, changeed the "complexity" accounting algorithm, and improved the organization of the man page and tutorials.

  • Rrelease-mid
  •  20 May 2005 04:16
  • Rrelease-after

Changes: This release includes various bugfixes and small usability improvements in the documentation and default switch handling. The major addition is support for the TREC spamjig and improved memory mapping for faster online learning.

  • Rrelease-mid
  •  13 Nov 2004 00:27
  • Rrelease-after

Changes: This release added a new MAP confidence score (-U, to complement the -X switch), some new scoring types in mailinspect, and a new parsing switch for trace headers in email (-T email:theaders). Category learning now accepts directory names as well as file names, and preliminary work on a new header mining tool (hmine) was performed. Category files are now written in 'portable' format by default.

  • Rrelease-mid
  •  29 Jul 2004 00:47
  • Rrelease-after

Changes: Many bugs were discovered and fixed. A test suite was added to prevent future regressions. It can be called using make check. Memory management was improved, giving a large speedup in classification speed, and a putative confidence score is now available via an -X switch. Some documentation changes were made.

71a65a7c81799c5c9b26997051496a08_thumb

Project Spotlight

Nebula3

Multi-user Web file storage software.

B2d41f050d90e5e9a50a856a35202011_thumb

Project Spotlight

Webkit2pdf

A batch HTML to PDF converter based on Webkit.