Esolang:Wiki dumps

From Esolang
Jump to navigation Jump to search

A dump of the wiki's content in MediaWiki's XML format is available. It is updated every Sunday at around 4:00 a.m. London time (GMT/BST (yeah, I know)).

You can download the compressed dump directly, but if you plan to update the dump regularly, zsync is strongly recommended, as it allows the dump to be updated incrementally, which decreases the bandwidth load on this server. You can install zsync on Debian-derived systems (such as Ubuntu) with apt-get install zsync (remember to run this as root, e.g. with sudo on Ubuntu). The following command will download or update the esolang.xml file in the current directory:

$ zsync http://esolangs.org/dump/esolang.xml.zsync

Downloading these dumps lets you easily access all the content (including page history and files), allowing you to back up the wiki in case of future failure, perform bulk analysis on the data, or set up a mirror.

Verifying authenticity

Unfortunately, the standard zsync client does not support HTTPS. If you want to make sure the file you downloaded is authentic, you have two options:

  1. Use zsync-curl, which replaces the HTTP client with libcurl, allowing for HTTPS support.
  2. Download the data over HTTP, and verify a checksum downloaded over HTTPS. The esolang.xml.sha256.txt file contains the SHA-256 checksum for the most recent dump. You can validate it by running curl -s https://esolangs.org/dump/esolang.xml.sha256.txt | sha256sum -c in the directory you downloaded the esolang.xml file to.

Keeping updated with cron

If you're on a Unix-like operating system, you can create a crontab file (e.g. with crontab -e) containing a line like this:

30 6 * * 0 zsync -o /path/to/esolang.xml http://esolangs.org/dump/esolang.xml.zsync

Adjust the first two fields (minute and hour), and possibly the last (day of week, 0 = Sunday), according to your timezone, and the path according to where you want to keep the dump (for example, /home/elliott/esolang.xml). Also consider if you would be okay with, say, monthly rather than weekly copies.

The example line will incrementally update your copy of the dump every Sunday at 6:30 a.m. your local time. It should download much less than the 300 megabytes or so that the full dump takes up. Note that it will store the dump uncompressed on your computer, which takes up around 1.4 gigabytes at the time of writing.

This method works best on a server; if your computer is not always on at 6:30 a.m. (or whatever time you have set), you can look into using something like anacron to schedule dumps.

Validating the checksum automatically before overwriting the file in this approach is left as an exercise for the reader.