HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust, and well-tested package.
| Tags | Internet Web Dynamic Content Software Development Libraries Java Libraries Text Processing Markup HTML/XHTML |
|---|---|
| Licenses | LGPL |
| Implementation | Java |
Recent releases


Changes: the license has been changed to the CPL. Maven2 is now used as the build environment. Subversion is used for the source repository. A new Web site was created. <<tag> is now correctly parsed as text. A method to render the start of a tag in HTML was added. CssSelectorNodeFilter does not accept [attr|=val].


Changes: Support was added for commonly requested composite tags. Several enhancements were made to the filtering functionality. Additions were made to the HTTP connection processing subsystem. Other user-requested features and bugfixes were made.


Changes: This is the first candidate for the final 1.6 release. All outstanding bugs have been fixed. A new XorFilter rounds out the logical node filters.


Changes: NodeTreeWalker, a utility class to traverse a tree of Node objects using either depth-first or breadth-first tree order, has been added. Several other bugfixes and patches have been incorporated.


Changes: Support has been added for commonly requested composite tags, P, H1-H6, and definition list tags (DL, DT, DD). The node interface has been augmented with get first/last child and get previous/next sibling methods to ease traversing the HTML document.