urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
| Tags | Internet Web Dynamic Content Indexing/Search Site Management Link Checking Text Processing Filters Markup |
|---|---|
| Operating Systems | OS Independent |
| Implementation | Python |
Recent releases


Changes: This version now allows you to convert HTML of Web pages to plain text using either Lynx (via "-dump"), html2text, or simply by stripping all HTML tags via a regular expression. This feature has to be enabled on a per-URL basis in the user-defined hooks.


Changes: This release adds support for Python 2.6 and above by using the hashlib module instead of the (deprecated) sha module for generating hashes. Python versions before 2.5 are still supported and will use the sha module for generating hashes, just like the previous versions.


Changes: Support for system-wide installation was added. The ~/.urlwatch/ directory is used for user settings. The BSD license is used. A setup.py script was added. Command-line options and verbose logging mode were added. Example files are copied on first start. A Unix manual page was added.


Changes: This release adds support for cleaning bad HTML (long lines, etc.) with python-utidylib (W3C's HTMLTidy) and adds a module and support for converting iCalendar (*.ics) files to plaintext for easy-to-use iCalendar watching.


Changes: This version adds support for sending a correct User-agent header to the remote HTTP server.