HackEso

From Esolang
Jump to navigation Jump to search

HackEso is an IRC bot used by the esoteric languages community in the #esoteric IRC channel on the freenode network. It runs arbitrary Linux commands in a sandbox, and any IRC user may edit those commands, so the bot is highly programmable and shaped by the community. The IRC nick of the bot is HackEso, and its commands are invoked with a backtick (`) prefix. HackEso is a reincarnation of a bot called HackEgo.

File system

/hackenv

The most important part of the file system of HackEso's sandbox is the user-writable directory at /hackenv; this path is available also as the $HACKENV environment variable. The directory is persistent between command executions, and its changes are kept continuously and haven't been reset since the bot's birth in 2009.

The directory is under version control with a Mercurial repository which tracks all its changes. There is an implicit commit after any command that modifies hackenv. The full command line and the IRC nick issuing it is remembered in the version control commit description. Commands have read-only access to the repository with the hg program (Mercurial): this reads the actual repository from a file system that is mounted read-only at /hackenv/.hg . The repository contains the history since 2012-02; the bot always used a Mercurial repository for hackenv, but the history before that is not readily accessible anymore.

/hackenv/tmp

The directory /hackenv/tmp is the initial working directory for each command. This directory is not version controlled, but it is still user-writable and persistent between commands.

You have to be careful if you want to modify both /hackenv/tmp and the rest of /hackenv from the same command. The underlying reason for this is the following. The bot runs commands that appear to have changed /hackenv twice, in order to semantically be able to run multiple read-only commands parallel to each other, while avoiding conflicts from commands that write to the filesystem. When the bot reruns a command, it rolls back changes to /hackenv, but it can do that only to the version-controlled part.

In particular, you should not try commands such as mv /hackenv/tmp/file /hackenv/bin/file. On the first run, this will move the specified file out of /hackenv/tmp, and into /hackenv/bin. Then, the repository cleanup will remove the file from /hackenv/bin, under the assumption that when the command is repeated, the file will be created anew. Unfortunately the file is now permanently gone, and mv fails on the second run.

/tmp

The directory /tmp is a user-writable in-memory filesystem, local to each individual command execution. Changes to it are not persisted, but it can be used for temporary files during the execution of one command.

Everything else

The rest of the file system, such as /usr, contain a typical Debian installation with many useful programs.

Core IRC usage

Command

The main interface for HackEso is this. An IRC user asks HackEso to run a command by saying a backtick character ` followed by a command name optionally followed by a space and an argument. They can say this on the #esoteric channel if they want to demonstrate something to other people, or in private message to HackEso or in the #esoteric-blah channel if the commands would distract the main channel. It is customary that if you make changes to the bot, even if you don't run the changing command in the #esoteric channel to avoid spam, you at least show the changes afterwards in the channel by a command that reads them, or mention them in the channel in some other form.

The command name is generally an executable in the PATH, or a pathname to an executable. Apart from the usual directories /usr/bin:/bin, the PATH includes /hackenv/bin as its first element (so it takes priority). After the command name, the rest of the IRC line is taken as at most one command-line argument. For example, if someone says `perl -e print "foo" in the #esoteric channel, then the program /usr/bin/perl is ran with the single argument -e print "foo", so the output is foo.

Sometimes you want to run a command with multiple arguments. For that, you can use one of the shell wrappers /hackenv/bin/` or /hackenv/bin/``, which run their argument as a bash shell command. As one backtick is required to invoke a command in the first place, and a space is needed before an argument, the wrappers are invoked with `` ... or ``` .... For example, if you say ``` grep "hey, hey" /hackenv/quotes in the channel, then the shell invokes grep with two arguments.

Output

The bot gives only one line of the output for each command. This output collects what the command writes to its standard output and standard error. The output generally includes as much of the output as can fit in an IRC line. When there are linefeeds in the output, HackEso replaces them with a space then backslash then space ( \ ), except for the ones that are among trailing whitespace; but if there's a carriage return or nul byte in the output, HackEso truncates the part of the output after that. HackEso sometimes adds a short prefix before the output to discourage triggering other bots. HackEso does not otherwise reformat the output, so eg. mIRC color codes are written as is. When you give the command in a channel, HackEso writes the output to the same channel; if you give the command in private message to HackEso, it sends the reply to you in private message.

Sandbox environment

The bot runs commands in a sandbox, which has its own file system and permission restrictions. There is a time limit of approximately 30 seconds, after which the command is terminated, as well as some resource limits (on file size, memory use and number of processes). If the timeout is reached, the output of the command so far is still printed (as long as the command has actually written it, instead of buffering), and there is no message about the timeout. The standard input of commands is connected to /dev/null. The sandbox does not have any access to network. Some environment variables give information about the context of the IRC command: $IRC_NICK, $IRC_IDENT, $IRC_HOST, $IRC_COMMAND, $IRC_TARGET, $IRC_MESSAGE. The environment variable $HACKENV gives the absolute path of the hackenv directory.

Builtin commands

There are also a few built-in commands that are handled partly outside the sandbox. Names of built-in commands include revert, fetch, run, help. These commands cannot be executed by scripts inside the sandbox, as they do not actually exist.

  • The revert command lets you revert hackenv to a previous revision even if you broke something so much that you can't run commands inside; it takes a Mercury revision number to revert to as argument.
  • The fetch command lets you download a file from the internet through http and store it to the sandboxed file system. This is necessary because normal commands run in a sandbox that can't access the internet at all. The command takes an optional filename with path (without whitespace inside) followed by a http(s) URL.
  • The run command runs a shell command. It's quite similar to the `` ... wrapper, but as a built-in command it does not rely on the contents of /hackenv/bin.

Web interface

HackEso has a web interface that gives you read-only access to the bot. This consists of three parts.

https://hack.esolangs.org/repo/ is a view of the hackenv Mercury repository. You can access this directly with the hg version control program, or browse the repository through a web client with any web browser. The url command is a convenience program that takes a filename as argument and gives a deep link to that file in the web client. hurl is similar but links to the file's history page.

https://hack.esolangs.org/tmp/ lets you read regular files in the /hackenv/tmp directory.

The paste convenience command is a combination of the above. If you give it a filename as argument, it gives you a URL under https://hack.esolangs.org/repo/ for version controlled files or under https://hack.esolangs.org/tmp/ for files under /hackenv/tmp . If you invoke it without argument, it reads stdin, writes it to a temporary file under /hackenv/tmp , and gives you the URL for that.

Finally https://hack.esolangs.org/edit/ is a textarea interface that lets you read the contents of a regular file in /hackenv , even if it's under /hackenv/tmp . It also lets you edit the contents of files, but doesn't directly write to the HackEso file system, instead writes to a temporary location under https://hack.esolangs.org/get/ from which you can then copy to the file with the fetch command. This service might be buggy if the new contents of the file is not pure ASCII. To use this interface, form the URL by appending to the base url https://hack.esolangs.org/edit/ the relative pathname of the file under /hackenv , eg. https://hack.esolangs.org/edit/quotes to edit the quotes file. The edit convenience command gives you the URL from a filename.

The wisdom and quote databases

HackEso has a knowledge database of one-line answers keyed with strings. This was probably originally intended to contain useful answers that we frequently want to show to people in the channel, but it didn't evolve that way. Instead, this database contains mostly joke entries, so much that it is now difficult to add actually useful serious entries there.

We refer to this as the wisdom database or wisdome, as it is stored in the directory /hackenv/wisdom . Each entry is stored as a single file with the filename being its key and the single line file contents (ending in a linefeed) being the value. The wisdom database is generally queried by the ? command, which takes the key as its argument. Wisdom keys encoded in utf-8 and are all lowercase, and the ? command lowercases the letters of the key. In addition, we have the wisdom or w commands, which print a random wisdom, or a random wisdom whose key contains the command argument as an infix.

We have commands to modify the wisdom database, including slashlearn to add an entry, whose argument is the key is followed by a double slash followed by the value, and forget, which removes an entry, but in practice wisdom entries are often modified with more usual unix commands too.

There is also a second, rarely used shadow database mirroring the format of the wisdom database, but which may contain serious entries where the wisdom database has joke entries. It is known as the tomfoolery database, for it is in the directory /hackenv/tmflry. The ?? command retrieves a tomfoolery by key.

The wisdom database contains the welcome message that we use to greet new visitors on the channel at /hackenv/wisdom/welcome. There's a special command to access this: welcome, which takes the nick of the person to be greeted as its optional argument. There are also several commands that print a variant of this message, of which the most well known one is relcome. The message also has several translated versions where the wisdom key has an ISO-639 language code in its name, such as /hackenv/wisdom/welcome.nb, and commands for each of them named from the first word of the translated message, such as velkommen.

HackEso also has a heavily maintained collection of interesting lines that people say in the #esoteric channel. This is in the file /hackenv/quotes . The quotes database is usually read with the allquotes, quote, q commands, and written with the addquote and delquote command. The quotes don't have keys. They have line numbers, but as existing quotes are often deleted, the line numbers can change.

There is also a nicely formatted PDF listing the wisdoms and quotes, but it's quite old.

Interpreters

Other useful or interesting commands

  • TODO `, ``, New Zealand locale, nooodl
  • TODO sport, 1, 2, n
  • TODO webcomic notification lists: olist, smlist, pbflist, slist
  • TODO dontaskdonttelllist
  • TODO list
  • TODO words and coins
  • TODO 8-ball
  • TODO recipe
  • TODO cards-by-name, random-card
  • TODO ctof, ftoc, toroman, fromroman
  • TODO thanks, karma

Implementation details

The basic bot framework used by HackEso is multibot, a minimal general-purpose IRC bot framework. It handles keeping an IRC connection alive, and responding to the usual PINGs. For every incoming message, it will also look for the most specific user-provided executable it could run. For example, for the message PRIVMSG #foo :!bar, it would try to execute PRIVMSG/tr_21.cmd, PRIVMSG-chan.cmd and PRIVMSG.cmd, in that order of precedence. It also provides a Unix domain socket that commands (and even unrelated programs) can use to send arbitrary messages to the IRC server.

The second big piece of implementation is umlbox, a sandboxing solution based on User-mode Linux, a method for running the Linux kernel as a regular user-mode executable. That functionality is part of the standard Linux kernel: umlbox adds a wrapper script and an init binary that together conspire to make it easy to run a single command, rather than a full Linux distribution.

By default, the Linux system implemented by the UML kernel has no access to the host system. However, the hostfs filesystem driver can be used to mount parts of the host environment's virtual filesystem tree into the guest's, either read-only or read-write. In addition, the UML kernel provides ways to attach serial lines to file descriptors of the hosting process, mount files as block devices, and various kinds of ways to support networking. The umlbox wrapper provides a single command-line argument to mount most of the important host directories (/bin, /usr and so on) in read-only mode: this way commands can be executed almost as they would be using the userland (commands and libraries) of the host system, except in a strong sandbox.

The umlbox init binary is baked into an otherwise empty initrd, which the kernel mounts into / and then executes. The binary creates a few important device nodes (/console, /ttyN and /null to use as the standard input/output file descriptors of commands, /ubda to read the configuration data from the host), then parses the configuration. All requested mounts from the host system (and the standard proc, sysfs and tmpfs file systems) are mounted under /host. Then all the specified commands are executed, chrooted in /host. For the most part this should be transparent to the commands, although they can see the original device names (like /null) when looking at /proc/self/fd/0.

The /tty1 device inside the UML kernel is always a TTY, even when connected to a non-TTY file descriptor in the host system. To make programs that change their output based on that behave the same, umlbox will pipe the command output through a cat when the external stdout is not connected to a TTY. On HackEso, this is always the case, as umlbox is being executed as a subprocess with its output connected to a pipe. Commands on HackEso can observe this cat, though normally its existence can be ignored.

The final piece of the puzzle is hackbot, which glues the system together. It's a set of multibot commands which provide the implementation for the HackEso ` prefix, by invoking the requested command inside a umlbox, with read-write access to the version-controlled repository's working copy. The scripts also handle serializing all mutations to files, and committing the changes to the repository.

For HackEso, this entire system runs inside a systemd-nspawn container, providing some additional level of isolation through dedicated filesystem, user, process and network namespaces, should something escape from the UML sandbox. Finally, the container is hosted on top of the virtualization solution of the hosting provider (Qemu-KVM), so there are arguably four nested levels of logical Linux systems.

History

The IRC bot HackEgo was created and hosted by Gregor Richards near 2009-06, eventually hosted on a now infamous hosting provider called CaC. The hosting provider discontinued the service in 2018-03, so the bot became temporarily unavailable. In 2018-04, fizzie reincarnated the bot from a backup of the hackenv repository under the name HackEso, hosting it elsewhere together with the esolangs.org wiki, and he has been running the bot ever since. The bot is sometimes referred to as HackBot, though this may also refer to just the part of the bot that does the sandboxing, without the IRC bridge which is handled by a software called multibot.