sed

From Esolang
Jump to navigation Jump to search


sed is a UNIX utility that performs text transformations on an input stream, using a simple programming language. sed was developed by Lee E. McMahon and introduced in 1974. At the time, it was one of the first tools to support regular expressions.

This article refers to sed in the context of esoteric programming. For more general information, please visit the Wikipedia article on sed.

Mode of operation

sed is line-oriented: it reads text from an input stream, line by line, into an internal buffer called the pattern space. Each line read starts a cycle. To the pattern space, sed applies one or more operations specified via a script, using a programming language with about 30 commands. For each input line, after running the script, sed ordinarily outputs the pattern space as modified by the script and begins the cycle again with the next line. This behavior can be changed by certain sed options and commands.

A separate special buffer, the hold space, may be used by a few sed commands to store and accumulate text between cycles. Although sed doesn't support variables, the pattern space (cycle lifetime) and the hold space (execution lifetime) are the effective equivalent for storage. This, combined with the power of regular expressions and goto-like branching, enables one to write fairly complex sed programs.

Synopsis

Today sed is available on most operating systems. Although multiple variants of sed exist, the rest of this article is focused primarily on GNU sed.

Usage: sed [option]... {script} [input_file]...

Please read the GNU sed manual for comprehensive details about how GNU sed works, about what options, commands and regular expressions are supported.

Frequently used command line options

Option Description
-e commands add sed commands to the script to be executed
-f file add file content to the script to be executed
-i[suffix] edit file(s) in place (make backup if suffix is given)
-n suppress automatic printing of pattern space
-r use extended regular expressions in the script

Frequently used script commands

A sed script is comprised of a sequence of commands. Each command is executed every cycle, unless prepended by a line address telling sed to execute it only when the current line number matches the address. A line address can be a number (3: third line, $: last line), a regular expression (/regex/: all matched lines), a range (2,7: lines 2 to 7 inclusive), plus some more complicated ones.

Command Description Line address Can use regex
s/regex/replacement/flags Attempt to match regex against the pattern space. If successful, replace the portion matched with replacement. Special replacement symbols: & (the pattern space portion matched), \1..\9 (the corresponding matching sub-expressions in the regex). Modifier flags (see manual): g, p, i, m, etc. none, one or range yes
p / P Print the pattern space. / Print until the first newline in the pattern space. none, one or range no
d / D Delete pattern space and start new cycle. / Delete until the first newline in the pattern space, then restart cycle with resulting pattern space (don't read a new line). none, one or range no
a text / i text Queues text for printing at the end of cycle, where it appends / inserts it. none or one only escaped chars
c text Delete pattern space, print text and start new cycle. none, one or range only escaped chars
n / N Overwrite / append pattern space with the next line of input. It doesn't start a new cycle. none, one or range no
h / H Overwrite / append hold space with the content of the pattern space. none, one or range no
g / G Overwrite / append pattern space with the content of the hold space. none, one or range no
x Swap the contents of the pattern and hold spaces. none, one or range no
y/source/dest/ Transliterate the characters in the pattern space that appear in source, to the corresponding character in dest. none, one or range no
q [code] / Q [code] Immediately quit the script with optional error code. With Q, the automatic printing of the pattern space at the end of cycle is disabled. none or one no
: label Label for the goto-like commands b and t. none no
b [label] Jump to label, or to the end of the script if label is missing. none, one or range no
t [label] / T [label] If a / no s command has done a successful substitution since the last input line was read and since the last t or T command, then jump to label. If label is missing, jump to the end of the script. none, one or range no
= Print the current line number. none or one no
{ Begin a block of commands, which must end with }. none, one or range no

The s command is the most powerful command in sed, especially since it can use a complicated regular expression. Plus its behavior can be changed based on the flags given to it. A common advanced technique done with s is a look-up table.

Examples

Hello, World!

s/.*/Hello, World!/;q

The first input line is read into the pattern space. The s command replaces that content with 'Hello, World!'. The q command stops sed from starting a new read cycle, but the default printing of the pattern space still happens (end of current cycle).

Number of input lines

$=;d

If the line read is the last line, only then print the current line number using =. For every input line, d is used to delete the pattern space and start a new cycle immediately, thus nothing will be printed automatically at the end of each cycle. Alternatively, sed -n -e '$=' would work too.

Print variable-sized triangle

Here is a more complex sed script:

x;G                # prepend a newline to the unary string
:loop              # start loop
   P               # print first line only
   s:\n(.):\1\n:   # shift one unary char from 2nd line to 1st
t loop             # repeat

The sed script above receives as input an unary string i.e. A symbol repeated multiple times, and prints a triangle with height that equals the length of the string. On the right you have some high-level comments to understand how it works. Try it online!

Turing completeness

In 2001 Christophe Blaess emulated a Turing Machine in sed, thus proving the language is capable of performing any computable task (arithmetic, compilation, etc.). Complex sed programs already exist, such as converters, emulators of UNIX commands (dc) and games (Tic Tac Toe, sokoban, chess, tetris).

Here is a relevant article on the Turing completeness of sed.