Jericho HTML Parser is a Java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognized or invalid HTML. It also provides high-level HTML form manipulation functions.
| Tags | Text Processing Markup HTML/XHTML Software Development Libraries Java Libraries Internet Web Dynamic Content |
|---|---|
| Licenses | LGPL |
| Operating Systems | OS Independent |
| Implementation | Java |
Recent releases


Changes: Important bugfixes and a new stream-based parsing option allowing memory efficient processing of large files.


Changes: This version is a major new release that requires the Java 5 runtime or later. It introduces major API changes such as generics and enums, as well as some new features.


Changes: This version includes important bugfixes and the following enhancements. Non-server tags are no longer recognized inside server tags. Microsoft downlevel-revealed conditional comments are recognized. All unnecessary white space may be removed from a source document. Various other enhancements were made to existing features.


Changes: This version includes important bugfixes and introduces the following minor enhancements: elements inside SCRIPT elements are ignored. Encoding detection and analysis were improved. Parsing of attributes containing server tags was improved.


Changes: This version has been released under a dual licence system, allowing a choice between the Eclipse Public License (EPL) and the LGPL. It includes important bugfixes and introduces the following major features: simple rendering of HTML markup into text, integrated logging with various logging frameworks, and easier parsing of HTML tags containing server tags.