Projects / Unicode Utilities

Unicode Utilities

The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.

Tags Utilities Text Processing Linguistic
Implementation C

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  18 Feb 2009 12:13
  • Rrelease-after

Changes: This release updates character data to Unicode version 5.1 and fixes a bug in the validation option of uniname as well as a couple of other minor bugs.

  • Rrelease-mid
  •  03 Apr 2008 23:09
  • Rrelease-after

Changes: This release adds a new utility, unifuzz, which generates test input for programs expecting Unicode. In addition to generating random sequences of characters, unifuzz can generate a character from each range, tokens of various potentially problematic characters and sequences, very long lines, strings with embedded nulls, and ill-formed UTF-8.

  • Rrelease-mid
  •  30 Jun 2007 00:34
  • Rrelease-after

Changes: This release adds an option to unidesc that causes it to list the ranges detected after reading all input rather than listing them as they are encountered. uniname now has an option that causes it to ignore characters within the Basic Multilingual Plane.

  • Rrelease-mid
  •  30 Jan 2007 06:31
  • Rrelease-after

Changes: This release adds the utility unirev, a filter that reverses UTF-8 strings character-by-character. The package name has changed.

  • Rrelease-mid
  •  12 Jan 2007 03:40
  • Rrelease-after

Changes: Uniname and unidesc now report the unofficial ranges within the Private Use Areas registered with the ConScript Unicode Registry.

No-screenshot

Project Spotlight

Augeas

A configuration API.

4fea690295eb90ed833a50bdc87db3a3_thumb

Project Spotlight

Qore Programming Language

A modular, multi-threaded, object-oriented, SQL-integrated scripting language.