Projects / doclifter

doclifter

doclifter is a tool that transcodes {n,t,g}roff documentation to DocBook XML markup. It parses man, mandoc, ms, me, or TkMan page sources, does structural analysis, and recognizes common troff-markup cliches. The result is usable without further hand-hacking about 95% of the time.

Tags Documentation Text Processing Markup SGML XML
Licenses GPL
Operating Systems OS Independent
Implementation Python

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  25 Dec 2006 04:21
  • Rrelease-after

Changes: A bug in db2man.xsl was worked around. Markus Hoenicka's requested behavior for multiple-file conversions was implemented. Translation of groff extended .cc and .c2 requests was implemented. The .TA macro that occurs duplicatively with .ta in X.org manual pages is now ignored. The program can cope with unresolved .Sx refererences in mdoc. .Ex and .Ee are handled. The X consortium macro preamble is now handled better. .RS/.RE is now fully handled, with no more spurious warnings.

  • Rrelease-mid
  •  14 Jan 2005 21:49
  • Rrelease-after

Changes: Interrupt handlers were refactored so that manlifter can be aborted with a single ^C. As a result, exit values 4 and 5 were swapped. In manlifter the result file is no longer removed unless running in batch mode. This release lifts 96% of the 11121 pages in a full Fedora Core 3 install.

  • Rrelease-mid
  •  24 Dec 2004 22:09
  • Rrelease-after

Changes: manlifter was added to the distribution. doclifter no longer strips off file extensions before appending .xml. Major improvements were made in the parsing of displays, and C function prototypes are now recognized in them.

  • Rrelease-mid
  •  13 Aug 2004 14:15
  • Rrelease-after

Changes: The manual date now goes in refentryinfo. Correct parsing of multi-command synopses has been restored.

  • Rrelease-mid
  •  03 Aug 2004 10:15
  • Rrelease-after

Changes: Handling of the mdoc .Brq macro was implemented. Code no longer chokes on multiple Synopsis headers.

Rss Recent comments

Rcomment-before 18 Sep 2002 01:38 Rcomment-trans esr Rcomment-after

Re: Unique and powerful addition to DocBook toolchain

> The only significant problem I've run
> into with the
> 1.0.0 version is in the implementation
> it uses for dealing
> with ISO character entities: In some XML
> instances, it
> generates internal DTD subsets that
> include entity
> declarations which reference the SGML
> versions of the ISO
> character-entity sets instead of the XML
> versions.
>
> But that's a really minor issue, and one
> that I'm sure
> Eric will probably have fixed in the
> next release.

Your wish is granted. :-)

Rcomment-before 05 Sep 2002 23:06 Rcomment-trans xmldoc Rcomment-after

Unique and powerful addition to DocBook toolchain

This is an important addition to the DocBook toolchain.
It fills a big need and is unique in that (as far as I
know) there are no other tools available -- open-source or
proprietary -- for converting man/roff docs to DocBook.

There's some very clever logic in it for making
inferences about structure from some of the
not-that-explicitly-structured roff markup and turning it
into fairly structured DocBook markup. In particular, it
can:

* parse command/function synopses and convert them into
DocBook markup (using "real" markup like Cmdsynopsis, Arg,
Replaceable, etc.)

* recognize things like use of italics in a FILES
section to mark filenames, and convert them to correct
DocBook markup (e.g., using the Filename element)

* recognize patterns such as URLs, email addresses, man
page references, and C program listings, and convert them
to correct DocBook markup

The only significant problem I've run into with the
1.0.0 version is in the implementation it uses for dealing
with ISO character entities: In some XML instances, it
generates internal DTD subsets that include entity
declarations which reference the SGML versions of the ISO
character-entity sets instead of the XML versions.

A workaround is simply to delete any ISO character
entity declarations from doclifter-generated XML documents.
The declarations are actually redundant at best, because
both the DocBook XML and SGML DTDs already reference the
appropriate sets.

But that's a really minor issue, and one that I'm sure
Eric will probably have fixed in the next release.

3179f67752946433c291ef8008f07cb6_thumb

Project Spotlight

FrontAccounting

A user-friendly, Web-based accounting system.

Dff7917099a4aa3b17a2a9818e7149db_thumb

Project Spotlight

Talend Open Profiler

A program to analyze your databases and check your data quality.