Projects / CyberNeko HTML Parser

CyberNeko HTML Parser

NekoHTML is a simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables application programmers to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Tags Text Processing Markup HTML/XHTML XML
Licenses Apache
Implementation Java

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  24 Jan 2008 01:54
  • Rrelease-after

Changes: A charset regression was fixed.

  • Rrelease-mid
  •  15 Dec 2007 03:50
  • Rrelease-after

Changes: The license was changed to Apache 2.0 and the version number was boosted to reflect the maturity of the project. Project files were reorganized to decouple them from the rest of the CyberNeko Tools for XNI. xercesMinimal.jar and source were updated so that NekoHTML compiles using Xerces-J 2.9.1. The default behavior was changed to not normalize attribute values and a new feature was added to allow users to turn on normalization. The build was modified to target compilation for Java 1.3. Suggested paragraph tag balancing was adjusted and various reported bugs were fixed.

  • Rrelease-mid
  •  19 Jun 2005 03:23
  • Rrelease-after

Changes: A feature to allow a scanner to fix character entity references for Microsoft Windows characters was added. The nekohtmlXni.jar file is no longer built by default. Tag-balancing was changed to allow headers inside of links. Handling of the blockquote tag, a tag-balancing bug for unknown elements, the mapping of the encoding name in meta tags, various namespace binding bugs, and a no-such-method exception when using the augmentations feature with older versions of Xerces2 were fixed.

  • Rrelease-mid
  •  18 Nov 2004 02:24
  • Rrelease-after

Changes: This release added features for stripping CDATA delimiters from script and style tags, made augmentations, bugfixes, and performance enhancements, and fixed some tag balancing issues.

  • Rrelease-mid
  •  30 Jun 2004 03:52
  • Rrelease-after

Changes: This version implements scanning of XML declaration, fixes a script tag scanning bug, and adds version class and manifest entries to query product information.

Rss Recent comments

Rcomment-before 14 Jan 2004 05:21 Rcomment-trans kreiger Rcomment-after

Re: Apache license or Cyberneko license?

If anyone is interested, i sent this question to licensing AT gnu DOT org:

> My special exception uses the wording "the Apache license".
> Would i have to change this special exception, and what wording would you recommend to allow for "Apache-style" licenses?

And i got this answer:

> Something like "any license with terms identical to the % Apache license version 1.1 but for names" ought to do it.

Rcomment-before 22 Dec 2003 10:12 Rcomment-trans andyc2 Rcomment-after

Re: Apache license or Cyberneko license?

> To be more specific, i'm working for a
> company which is looking to GPL our
> software. We are using a couple of
> Apache libraries, which are under the
> Apache license.
> Therefore we include a GPL "special
> exception" which allows linking with
> software licensed under "The Apache
> License".
> Since CyberNeko is under an Apache-style
> license, but not "The" Apache License,
> this special exception would not include
> the CyberNeko License, right?

Here are a few relevent links for you:

http://www.gnu.org/philosophy/license-list.html
http://www.apache.org/foundation/licence-FAQ.html#GPL

Note: The CyberNeko license is based on the Apache version 1.1 license.

Rcomment-before 13 Dec 2003 04:02 Rcomment-trans kreiger Rcomment-after

Re: Apache license or Cyberneko license?
To be more specific, i'm working for a company which is looking to GPL our software. We are using a couple of Apache libraries, which are under the Apache license.
Therefore we include a GPL "special exception" which allows linking with software licensed under "The Apache License".
Since CyberNeko is under an Apache-style license, but not "The" Apache License, this special exception would not include the CyberNeko License, right?

Rcomment-before 13 Dec 2003 01:16 Rcomment-trans andyc2 Rcomment-after

Re: Apache license or Cyberneko license?

> Which is it? The Apache license or the
> Cyberneko license? My GPL project with
> special exception for the Apache license
> can't use the Cyberneko license, right?

The CyberNeko license is an Apache-style license. In other words, the wording is exactly the same but the project is not associated with the Apache Software Foundation. So you can use NekoHTML with the same freedom that you use Apache-based software.

Rcomment-before 12 Dec 2003 16:42 Rcomment-trans kreiger Rcomment-after

Apache license or Cyberneko license?
Which is it? The Apache license or the Cyberneko license? My GPL project with special exception for the Apache license can't use the Cyberneko license, right?

No-screenshot

Project Spotlight

rdup

A tool to generate an (incremental) backup file list.

No-screenshot

Project Spotlight

pfcalc

A command-line tool to calculate pipe friction and pressure drop through pipes.