Microsoft Word 2002 Unmunger

The Word Unmunger is a small Python program which removes much of the HTML cruft produced by Microsoft Word 2002 (Word version 10), making the files much easier to edit by hand. It removes XML namespace declarations, smart tags, meta tags, HTML comments, style sheets, DIVs, the Microsoft Office file list, CSS classes, and Microsoft Office grammar and spelling error markers.

Tags Text Processing
Licenses MIT/X
Implementation Python

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  11 Mar 2003 01:11
  • Rrelease-after

Changes: The program no crashes on larger documents due to limitations in Python's default regular expression implementation (sre). The pre implementation is now used instead. A debug mode that prints regular expressions as they're used was added, along with more robust handling of command line arguments.

  • Rrelease-mid
  •  21 Dec 2002 22:53
  • Rrelease-after

Changes: Based on a request from a user, Word Unmunger now features a batch mode for automatic processing of several files at once. The code has also been cleaned up to allow new unmunging rules to be added more easily.

  • Rrelease-mid
  •  01 Dec 2002 12:43
  • Rrelease-after

Changes: This release adds a new filter for files exported from Word X for Macintosh. Word X puts in a large number of <![ ... ]> tags for conditionals. These are now removed.

No-screenshot

Project Spotlight

Free Islamic Toolbar

Islamic toolbar for watching Islamic TV channels, listen Islamic radio, Islamic Search engine and more..

5ce91300369404ce3a4befb9acb03d4d_thumb

Project Spotlight

CLIChart

Command line tools to summarize and chart data from system logs.