Projects / HTML Parser

HTML Parser

HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust, and well-tested package.

Tags Internet Web Dynamic Content Software Development Libraries Java Libraries Text Processing Markup HTML/XHTML
Licenses LGPL
Implementation Java

Tweet this project Short link

Rss Recent releases

Changes: the license has been changed to the CPL. Maven2 is now used as the build environment. Subversion is used for the source repository. A new Web site was created. <<tag> is now correctly parsed as text. A method to render the start of a tag in HTML was added. CssSelectorNodeFilter does not accept [attr|=val].

  • Rrelease-mid
  •  10 Jun 2006 21:50
  • Rrelease-after

Changes: Support was added for commonly requested composite tags. Several enhancements were made to the filtering functionality. Additions were made to the HTTP connection processing subsystem. Other user-requested features and bugfixes were made.

Changes: This is the first candidate for the final 1.6 release. All outstanding bugs have been fixed. A new XorFilter rounds out the logical node filters.

Changes: NodeTreeWalker, a utility class to traverse a tree of Node objects using either depth-first or breadth-first tree order, has been added. Several other bugfixes and patches have been incorporated.

Changes: Support has been added for commonly requested composite tags, P, H1-H6, and definition list tags (DL, DT, DD). The node interface has been augmented with get first/last child and get previous/next sibling methods to ease traversing the HTML document.

233a8ea4b4d02491eb73b0045fc659c4_thumb

Project Spotlight

Adobe Reader

A PDF reader for Linux and other Unices.

No-screenshot

Project Spotlight

Bot Lane

An online interface to various site checking tools.