Projects / Hachoir metadata

Hachoir metadata

Hachoir metadata can extract metadata from archives (bzip2, gzip, zip, tar), audio (MPEG audio/MP3, WAV, Sun/NeXT audio, Ogg/Vorbis, MIDI, AIFF, AIFC, Real Audio), images (BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF), and video (ASF/WMV, AVI, Matroska, Quicktime, Ogg/Theora, Real Media). It supports invalid or truncated files and Unicode text. It can remove duplicate values. It can also filter metadata according to priority.

Tags multimedia
Licenses GPL
Operating Systems OS Independent
Implementation Python

Tweet this project Short link

Rss Recent releases

  • Rrelease-mid
  •  03 Sep 2008 14:47
  • Rrelease-after

Changes: The --maxlen option was created for the hachoir-metadata program. --maxlen=0 disables the arbitrary string length limit. A FLAC metadata extractor was created. Multiple comments in a GIF image are supported.

  • Rrelease-mid
  •  11 Jul 2007 16:33
  • Rrelease-after

Changes: This release reads the number of channels, bit rate, and sample rate, and computes the compression rate of Real Audio. It reads user comments of JPEG pictures. It computes the frame rate of Windows ANI. It normalizes language for ID3 and MKV. OLE2 and FLV extractors are now fault tolerant.

  • Rrelease-mid
  •  14 Apr 2007 23:26
  • Rrelease-after

Changes: hachoir-metadata is now fault tolerant (like hachoir-core and hachoir-parser). It is also robust against fuzzing tests. New supported formats include Microsoft Office documents such as Word (.doc), Excel (.xls), and Powerpoint (.ppt), X11 Portable Compiled Font (.pcf), New-style Executable (Windows 16-bit program), and Microsoft Archive (.mar). A distinction is made between the raw value and the formatted value. Plugins were added for Nautilus and Konqueror.

  • Rrelease-mid
  •  24 Jan 2007 04:42
  • Rrelease-after

Changes: Very long strings (more than 800 characters) are truncated. setup.py uses distutils by default (and not setuptools). The package no longer depends on hachoir-core nor hachoir-parser. Duration computation for AIFF is skipped if rate is null. KeyError for XCF is now caught. For JPEG (EXIF), exposure is only converted to "1/%g" if the value is a float. The WAVE extractor was rewritten, thus fixing bit rate and duration computation. WAVE files with 6 channels and the IEEE (32-bit float) codec are supported. Division by zero in AVI duration computation is now avoided. For Matroska, strings are converted to Unicode if needed.

Changes: The following new formats are supported: Aldus Placeable Metafile (APM) picture, Audio Interchange File Format (AIFF), Audio Interchange File Format Compressed (AIFC), Microsoft Enhanced Metafile (EMF) picture, Microsoft Windows Metafile (WMF) picture, Real Audio, Real Media, and Targa picture. For strings, spaces are stripped and then empty strings are skipped. For ICO, 8 bits/pixel are used if bpp=0. For JPEG, the format version is "JFIF %u.%02u" (and not "JPEG %s.%s"). JPEG quality is not computed if needed fields are missing. The duration of each RIFF stream is computed since an audio stream may be shorter than a video stream.

No-screenshot

Project Spotlight

Dovecot

A secure IMAP server.

No-screenshot

Project Spotlight

Jackcess

A pure Java library for reading and writing MS Access databases.