unac is a C library and command that removes accents from a string. For instance, the string été will become ete. It provides a command line interface that removes accents from standard input or from a string given as an argument. In the library function and the command, the charset of the input is specified as an argument. The input is converted to UTF-16 using iconv(3), accents are stripped, and the result is converted back to the original charset. The iconv -l command on GNU/Linux will show all charsets supported. It currently has Perl, PHP3, and PHP4 interfaces.
| Tags | Text Processing |
|---|---|
| Licenses | GPL |
Recent releases


Changes: A Unicode 3.2 bug was fixed, debug information was added, and the information level from the regression test was improved. The manual pages were also rewritten for clarity and content.


Changes: An upgrade from Unicode 3.0.1 to Unicode 3.2, and updates to the autotools files.


Changes: This release has better detection of the iconv library using the AM_ICONV macro. Autotools files have been upgraded, and there are minor documentation upgrades.


Changes: When unac_string finds an illegal sequence while converting, it now replaces it with a space. For instance, the 1/4 ISO-8859-1 character is converted to 1 4 (one space four) because the fraction character does not exist in ISO-8859-1. The new unac_version function returns the version number.


Changes: New support for systems that do not have UTF-16BE defined but only UTF-16 being implicitly big-endian, which means that it will work with both glibc-2.1.3 and glibc-2.1.94. A fix for an occasional allocation bug, allocation of the returned buffer even if an empty string is given in input, and more regression tests.