cpdetector

cpdetector is a small yet clever framework for codepage detection that integrates different strategies. It may be used as a library for third party software that accesses textual data over network. It also includes a best-practice implementation in form of a command line tool that allows sorting and transforming large collections of documents based on their codepage. Available strategies include: jchardet (exclusion, frequency analysis, and guessing), detection of the HTML charset property, and detection of the XML encoding declaration.

Tags Communications Information Management Internet Web Indexing/Search Software Development Internationalization Libraries Java Libraries
Licenses MPL
Implementation Java

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  17 Jun 2008 21:19
  • Rrelease-after

Changes: The release structure has been changed: cpdetetor.jar does not contain 3rd party library files anymore. Missing public functions are contained again. The proguard shrinker has been updated from version 3.8 to 4.2.

  • Rrelease-mid
  •  15 Jun 2008 09:22
  • Rrelease-after

Changes: The proguard shrinker is now used, so the cpdetector jar is now more than ten times smaller. System.out is no longer used for logging in JChardetFacade. All packages were renamed with the prefix "info.monitorenter".

  • Rrelease-mid
  •  21 Apr 2007 14:09
  • Rrelease-after

Changes: Severe errors like a potential infinite loop and incorrect file handling have been removed.

  • Rrelease-mid
  •  02 Mar 2005 07:18
  • Rrelease-after

Changes: A bug in the Ant build of the source release has been fixed. Instructions for document tests with fit were added.

  • Rrelease-mid
  •  14 Dec 2004 11:05
  • Rrelease-after

Changes: It is now possible let cpdetector guess the codepage out of the remaining possibilities when it is not possible to narrow down this set to one. This version marks the start of testing with FIT. A new best practice command line tool allows printing of the codepage name for file arguments.

No-screenshot

Project Spotlight

Cyrus IMAP Server

Full featured IMAP server

4b07879d5a5e6363290a5602f791696b_thumb

Project Spotlight

DMDirc

An IRC client.