NekoHTML is a simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables application programmers to use the NekoHTML parser with existing XNI tools without modification or rewriting code.
| Tags | Text Processing Markup HTML/XHTML XML |
|---|---|
| Licenses | Apache |
| Implementation | Java |
Recent releases


Changes: A charset regression was fixed.


Changes: The license was changed to Apache 2.0 and the version number was boosted to reflect the maturity of the project. Project files were reorganized to decouple them from the rest of the CyberNeko Tools for XNI. xercesMinimal.jar and source were updated so that NekoHTML compiles using Xerces-J 2.9.1. The default behavior was changed to not normalize attribute values and a new feature was added to allow users to turn on normalization. The build was modified to target compilation for Java 1.3. Suggested paragraph tag balancing was adjusted and various reported bugs were fixed.


Changes: A feature to allow a scanner to fix character entity references for Microsoft Windows characters was added. The nekohtmlXni.jar file is no longer built by default. Tag-balancing was changed to allow headers inside of links. Handling of the blockquote tag, a tag-balancing bug for unknown elements, the mapping of the encoding name in meta tags, various namespace binding bugs, and a no-such-method exception when using the augmentations feature with older versions of Xerces2 were fixed.


Changes: This release added features for stripping CDATA delimiters from script and style tags, made augmentations, bugfixes, and performance enhancements, and fixed some tag balancing issues.


Changes: This version implements scanning of XML declaration, fixes a script tag scanning bug, and adds version class and manifest entries to query product information.
- All comments
Recent commentsRe: Apache license or Cyberneko license?
If anyone is interested, i sent this question to licensing AT gnu DOT org:
> My special exception uses the wording "the Apache license".
> Would i have to change this special exception, and what wording would you recommend to allow for "Apache-style" licenses?
And i got this answer:
> Something like "any license with terms identical to the % Apache license version 1.1 but for names" ought to do it.
Re: Apache license or Cyberneko license?
> To be more specific, i'm working for a
> company which is looking to GPL our
> software. We are using a couple of
> Apache libraries, which are under the
> Apache license.
> Therefore we include a GPL "special
> exception" which allows linking with
> software licensed under "The Apache
> License".
> Since CyberNeko is under an Apache-style
> license, but not "The" Apache License,
> this special exception would not include
> the CyberNeko License, right?
Here are a few relevent links for you:
http://www.gnu.org/philosophy/license-list.html
http://www.apache.org/foundation/licence-FAQ.html#GPL
Note: The CyberNeko license is based on the Apache version 1.1 license.
Re: Apache license or Cyberneko license?
To be more specific, i'm working for a company which is looking to GPL our software. We are using a couple of Apache libraries, which are under the Apache license.
Therefore we include a GPL "special exception" which allows linking with software licensed under "The Apache License".
Since CyberNeko is under an Apache-style license, but not "The" Apache License, this special exception would not include the CyberNeko License, right?
Re: Apache license or Cyberneko license?
> Which is it? The Apache license or the
> Cyberneko license? My GPL project with
> special exception for the Apache license
> can't use the Cyberneko license, right?
The CyberNeko license is an Apache-style license. In other words, the wording is exactly the same but the project is not associated with the Apache Software Foundation. So you can use NekoHTML with the same freedom that you use Apache-based software.
Apache license or Cyberneko license?
Which is it? The Apache license or the Cyberneko license? My GPL project with special exception for the Apache license can't use the Cyberneko license, right?