fmII
Sat, Sep 06th home | browse | articles | contact | chat | submit | faq | newsletter | about | stats | scoop 19:29 UTC
in
Section
login «
register «
recover password «
[Project] add release | add branch | add screenshot | broken links | change owner | email subscribers | update project | update branch (urls) [Project]

 uni2ascii 4.4 (Default)
Sections: Mac OS X, Unix

 

Added: Mon, Jan 3rd 2005 22:19 UTC (3 years, 8 months ago) Updated: Sun, Aug 31st 2008 07:09 UTC (6 days ago)


Screenshot About:
uni2ascii and ascii2uni convert between UTF-8 Unicode and 29 7-bit ASCII equivalents including: hexadecimal and decimal HTML and SGML numeric character references, \u-escapes, standard hexadecimal, raw hexadecimal, and RFC2396 URI format. Such ASCII equivalents are useful for entering Unicode in program source or in programs that are not 8-bit safe, and for testing and debugging. Several options allow Unicode to be converted to approximately equivalent ASCII, e.g. by stripping diacritics. An optional GUI is provided.

Release focus: Minor feature enhancements

Changes:
This release adds an option to uni2ascii which produces single-character approximations for the same characters given multi-character approximations by the -x option. The license is now GPL v.3.

Author:
Bill Poser [contact developer]

Rating:
8.46/10.00 (1 vote)

Homepage:
http://billposer.org/Software/uni2ascii.html
Tar/GZ:
http://billposer.org/Software/Downloads/uni2ascii.tar.gz
Tar/BZ2:
http://billposer.org/Software/Downloads/uni2ascii.tar.bz2
Zip:
http://billposer.org/Software/Downloads/uni2ascii.zip
Changelog:
http://billposer.org/Software/uni2ascii.html#changelog
RPM package:
http://dag.wieers.com/packages/uni2ascii
Debian package:
http://packages.debian.org/testing/text/uni2ascii
OS X package:
http://pdb.finkproject.org/pdb/package.php/uni2ascii
BSD Ports URL:
http://www.freshports.org/textproc/uni2ascii/

Trove categories: [change]
[Development Status]  5 - Production/Stable
[Environment]  Console (Text Based)
[Intended Audience]  Developers, End Users/Desktop
[License]  OSI Approved :: GNU General Public License (GPL), OSI Approved :: GNU General Public License v3
[Operating System]  POSIX
[Programming Language]  C, Tcl
[Topic]  Software Development, Software Development :: Internationalization, Text Processing :: General, Text Processing :: Linguistic, Text Processing :: Markup, Text Processing :: Markup :: HTML/XHTML, Text Processing :: Markup :: SGML

Dependencies: [change]
Tcl/Tk (optional)
[download links]

 
Project admins: [change]
» Bill Poser (Owner)

» Rating: 8.46/10.00 (Rank N/A)
» Vitality: 2.34% (Rank 200)
» Popularity: 2.52% (Rank 1929)

project statsdownload stats
(click to enlarge graphs)
   Record hits: 31,752
   URL hits: 11,123
   Subscribers: 55

Other projects from the same categories:
Jujunie-Integration
SerbianC
Metawriter
IBM Client Application Tool for JMS
idioskopos

Users who subscribed to this project also subscribed to:
ctopy
netmapr
NRH-up2date
Super Grub Disk
sendxmpp


Add comment · Rate this project · Subscribe to new releases · Ignore this project · Email this project to a friend · Project record in XML

 Branches

Branch Version Last release License URLs
Default 4.10 31-Aug-2008 GNU General Public License v3 Homepage Tar/GZ Changelog

 Releases

Version Focus Date
4.10 Minor bugfixes 31-Aug-2008 07:09
4.9 Minor bugfixes 07-May-2008 10:13
4.8 Major bugfixes 04-May-2008 14:03
4.7 Minor feature enhancements 27-Apr-2008 01:22
4.6 Minor bugfixes 03-Apr-2008 00:14
4.5 Minor feature enhancements 21-Mar-2008 09:12
4.4 Minor feature enhancements 15-Jan-2008 09:06
4.3.2 Minor bugfixes 07-Aug-2007 10:26
4.3 Minor feature enhancements 12-Mar-2007 07:44
4.2 Minor bugfixes 03-Mar-2007 11:45

 Comments

[»] Recode
by Ed Avis - Jan 12th 2006 02:24:11

How does this compare to GNU recode?

--
Ed Avis

[reply] [top]


    [»] Re: Recode
    by Bill Poser - Jan 13th 2006 09:42:49

    Recode and uni2ascii are complementary. Briefly put, Recode converts from one encoding to another (where the expectation is that the target character set will be the same as, or a superset of, the source character set), whereas Uni2ascii converts between UTF-8 Unicode and ASCII representations of Unicode. In practical terms, Uni2ascii will not convert between, say, ASCII and EBCDIC, which Recode will, whereas Recode will not convert between Unicode and the \x{00E9} format, which Uni2ascii will. (I should say that Recode lists but does not explain the encodings that it knows so it is not always easy to figure out what it handles. It is possible that it can handle things that I am not aware of. But at least as far as I can tell, it does not handle the textual representations of Unicode characters that Uni2ascii handles.)

    Thus, if you've got a text in, say, TIS-620 (the Thai national standard) and you want to get it into Unicode, you would use Recode. If you want to include that Thai text in a blog posting using Movable Type, which is not 8-bit safe, you would use Uni2ascii to convert your Unicode version of the Thai text to HTML numeric character references. Similarly, if you wanted to include that Thai text as a string in a program in Java, Python, Scheme, or Tcl, you would use uni2ascii to convert the Unicode to the \uxxxx format.

    My conception of the difference is this. When you have the same character set but different associations between the characters and the integers, conversion between the two is pure encoding conversion. ASCII and EBCDIC are different encodings of the same character set; converting between them is a matter of encoding conversion. On the other hand, when you have radically different character sets, conversion from one to the other is a matter of transliteration. Transliteration may be perfect, or nearly so, if both writing systems have been adapted for the same language (e.g. in the case of the roman and cyrillic writing systems for Serbo-croatian) or quite imperfect, (e.g. when Vietnamese is written using only the English alphabet.)

    A third situation is when you use escape sequences to represent the characters of one character set in another. That's what we're doing hen we use the sequence of ASCII characters \x{00E9} to represent the Unicode character U+00E9 "Latin small letter e with acute".

    Recode is basically intended to handle encoding conversion. Uni2ascii, on the other hand, is aimed at the third case, the representation of Unicode characters by ASCII escape sequences. Other programs (e.g. my own Xlit) deal with transliteration.

    Of course, the division I've made here, while, I think, the one that people usually make, is not quite so simple, since what are generally thought of as different encodings of the same character set may in fact use somewhat different character sets. For example, decomposed Unicode uses sequences of two or more Unicode characters to represent what in other encodings are single characters. For example, e with acute accent is a single character in ISO-8859-1 (0xE9) but is a two character sequence (0x0065 0x0301) in non-composed Unicode, where it is treated as plain e followed by acute accent. Encoding conversion programs like recode are therefore, in the strict sense, doing more than pure encoding conversion.

    At one level, all of these conversions are the same since they can all be treated as mappings of one set of byte strings to another. However, there is a conceptual difference among them that, with some fuzzy edges, seems to correspond to the functionality of the software designed to handle them.

    Returning to practicalities, Uni2ascii and Recode also provide different approaches to and degrees of control over disparities between character sets, e.g. what to do with characters with diacritics when converting to ASCII.

    [reply] [top]




© Copyright 2008 SourceForge, Inc., All Rights Reserved.
About freshmeat.net •  Privacy Statement •  Terms of Use •  Trademark Guidelines •  Advertise •  Contact Us • 
ThinkGeek •  Slashdot  •  Linux.com •  SourceForge.net  •  Jobs