SILVERCODERS DocToText is a powerful utility that can convert MS Word binary format (DOC), Rich Text Format (RTF), OpenDocument (also known as ODF and ISO/IEC 26300), and Office Open XML documents (ISO/IEC 29500, also called OOXML, OpenXML, or MSOOXML) to plain text. Extracted plain text from doc, rtf, odt, ods, odp, docx, xlsx, and pptx files can be used for a lot of things like searching, indexing, or archiving. DocToText can be also used as a fast console viewer.
| Tags | Utilities Text Processing Archiving Office/Business |
|---|---|
| Licenses | GPL |
| Operating Systems | Unix POSIX Linux Windows Windows Windows |
| Implementation | C++ |
Recent releases


Changes: In addition to bugfixes and optimizations, Office Open XML (ISO/IEC 29500, also called OOXML, OpenXML, or MSOOXML) documents are supported.


Changes: Support for ODT (OpenDocument) documents was added. Fixes were made in RTF format support.


Changes: Support for RTF documents was added.


No changes have been submitted for this release.
- All comments
Recent commentsRe: prior art
> Thanks for both the utility and
> description update then; will have a
> look :-)
There is one more thing: catdoc is not actively developed since 2005. Doctotext was started in 2006 and will have new functionalities, like for example pdf support. You can consider it as a future replacement.
We could try to add something to catdoc, but we started new project because of licensing issues (we need to use doctotext in our commercial software).
Re: prior art
Thanks for both the utility and description update then; will have a look :-)
Re: prior art
> ...is it much better than catdoc(1)? :)
It supports more formats (OpenDocument, Office Open XML) and as far as I know some inconvenient DOC documents are handled better.
prior art
...is it much better than catdoc(1)? :)