EPARM

From Esolang
Jump to navigation Jump to search

EPARM is a protocol by which a program whose only input channel is standard input (stdin) can receive command line arguments. With EPARM, command line arguments are sent to a program along standard input in a prefix before the normal standard input data.

EPARM was originally designed for SNUSP, but can be applied to other programming languages with only one input channel, such as Brainfuck.

Introduction

Motivation

For anyone working on a SNUSP interactive command-line shell, it would be nice to be able to send command line arguments to SNUSP programs when they're executed.

However, part of the beauty of SNUSP (and Brainfuck) is that all input into and output from the program is through stdin and stdout, using the ',' and '.' instructions.

Fortunately, these two ideas are compatible.

The goal of EPARM is to satisfy this pair of ideas.

Design considerations

First, let's call any program that implements argument reading as specified here EPARM compliant.

A program that only reads input from one channel shall be referred to as a single input channel program.

EPARM was created under the assumption that it is not possible to create an command line argument capable single input program whose behaviour is completely equal to that of a non argument capable single input channel program with the same functionality. Based on that, EPARM was designed to be as close as possible to backwards-compatible with non compliant programs, and people or programs that assume EPARM programs to be non-EPARM-compliant.

One example that should be as close to backwards-compatible as possible is the case of a "cat" program.
Consider a non-EPARM-compliant cat program which is being used to write a file from its input.
If someone used an EPARM-compliant cat program without knowing it, we would want to minimise the potential damage caused.

EPARM is based on a argument-specifying prefixes.

Choice of Prerequisite Standards

The design of EPARM assumes that EOF is 255.
More precisely,
EPARM assumes that the end-of-line specifier (EOF) is not 0, 253, or any printable ASCII value. If 0 is used as EOF, it is possible to implement a working modified form of EPARM. More explanation is in Syntax.

The examples assume that the new-line sequence is the single byte 10, and not (13, 10) like in a couple of operating systems out there.
EPARM itself does not assume this. It is compatible with any new-line sequence that doesn't contain 0, 253 or EOF.

SNUSP-start escape character

The first part of this protocol is the SNUSP-start escape character (SS escape character, or SSE). This feature is named after the language it was originally designed for.

The SS escape character may appear at the start of input to an EPARM compliant program. After the SNUSP start character, the next byte specifies a SNUSP-start command the program must follow.

If the program detects an SS escape character as the first byte it reads, it will respond to it. If the SS escape character does not appear at the start of the input, and it appears later, the program will treat it as a normal character -- the character will lose it's SS escape powers.

When an EPARM compliant program detects an SS escape character at the start of input, it treats the next character as a SNUSP-start command. Here we have a chance to create 255 different commands, though the selection is limited if you still want to keep things approximately backwards-compatible.

Choice of SNUSP-start character

EPARM uses a value of 253 as the SNUSP-start escape character (SSE). This value was chosen with the following constraints in mind:

  1. The SSE should not be a printable ASCII character.
  2. The SSE should not clash with the byte-order marks (BOMs) used in UTF-16 Unicode text files. These may start with 255 or 254.
  3. The SSE should not cause any screwup if it is printed onto the start of UTF-8 Unicode text file.
  4. The SSE should not be an EOF marker.
  5. (desirable) The SSE should not be a low number (<32). In binary files, these characters are relatively common, compared to other non-printable characters.
  6. (desirable) Make the SSE simple for a SNUSP program to detect.
    In other words, if we made a Modular SNUSP, or Core SNUSP program to detect the SSE and act on it, it should be possible for that program to be very short.

The "desirable" constraints were considered to be ones that could be discarded if the full set of constraints was unsatisfiable. The choice of 253 satisfies all of them, except maybe (3).

^ These two websites suggest that in UTF-8, some byte values are invalid to appear at the start of a character sequence and those invalid bytes, along with other invalid UTF-8 sequences, are omitted when displayed or converted to other character-sets. They also suggest that 253 is an invalid sequence-starting byte value in UTF-8. If all this is true, then 253 is as good as we will get for UTF-8 constraints.

SNSUP-start command

EPARM uses the SNUSP-start command 'a' to signify command-line arguments. If an EPARM-compliant program sees an 'a' right after the SS escape character, it will try to read command-line arguments.

'a' was chosen because:

  1. 'a' stands for argument.
  2. It probably helps if it is an ASCII printable character.
  3. It is the first SNUSP-start command to be made up.

Syntax

After the 'a', the EPARM-compliant program will read a list of strings from its input. The strings are the command-line arguments.

Argument termination

The argument strings will be null terminated, i.e. they will end with 0. This behaviour is inspired by C programs, which do the same thing. when their "main" functions are called. Developers who use 0 as the EOF value can use 255, or pretty much any value except 253, as the argument string terminator.

List termination

The list will be terminated by another SSE character (253), followed by 'a'.

Constraints:

  1. We cannot use an ASCII printable character (such as 'a') as the list termination character (list terminator), because that would screw up any command arguments that contain 'a'.
  2. We cannot use 0 (eg two zeroes in a row terminates the current argument and the list) because empty command-line arguments have to be possible. An analogue is running $ someprogram "" hello in a Unix-like shell. The program would be passed an empty string as an argument.
  3. We cannot use EOF because that would close the "file" in the non-argument-compliant version of the program, when this one hasn't started reading it yet. That would break information-flow compatibility.
    If EOFs were used, the argument-compliant and non-argument-compliant versions would effectively have to read a different number of "files" to each other. If something went wrong in calling a program, the called program would refuse its normal input, or wait for more output after it's intended input is finished.
  4. We shouldn't make up too many new special characters for these things.

Within any SNUSP-start command, 253 is used as an escape character, which can perform special functions. When used with the following byte 'a', it terminates the list of arguments.

Therefore the list terminator is 253 followed by 'a'.

The entire SNUSP-start command, including starting and ending characters, is the argument-specifying prefix.

The second list terminator character

The second list terminator character is chosen to be 'a', but this is not considered a strong requirement of EPARM compliance.

The constraints on it are:

  1. It must not be EOF, that would break information flow.
  2. It shouldn't be 253; (253, 253) should be an escape sequence to insert 253

into the argument being read.

In the end, however, consistency is desirable.

Other escape sequence

EPARM defines two SNUSP-start escape sequences. The first one (SSE, 'a') is defined in the previous section.

The other is (SSE, SSE) i.e. (253, 253).

When (SSE, SSE) is encountered during a SNUSP-start command, the EPARM-compliant program inserts the SNUSP-start escape character into the command. This is an analogue to typing '\\' in a string literal in C code.

Behaviour when there are no arguments

Suppose we have a single input channel program that might be EPARM-compliant and we want to execute it with no arguments.

There are two ways in which we can do this:

  1. Run the program with an empty argument specifying prefix i.e. (253, 'a', 252, 'a').
  2. Run the program with no argument specifying prefix.

In deciding which of these approaches to use, we must consider the following two threats:

  1. The program isn't EPARM compliant.
  2. The normal input will start with 253.

The relative probability of these threats occurring determines which approach we should use.

If (1) the program more probably isn't EPARM-compliant,
then (1) run the program with an empty argument-specifying prefix.

If (2) the program's normal input more probably will start with 253,
then (2) run the program with no argument-specifying prefix.

EPARM does not specify which approach to take. The choice is left to the programmer.

A really smart program that executes programs might read the first byte of normal input beforehand, use approach (1) if and only if that byte is 253, otherwise approach (2), and then feed the called program the rest of the normal input.

Examples

Suppose we have two cat programs written in SNUSP, accat.snusp and cat.snusp. Both programs behave the same, except acccat.snusp accepts some command-line arguments via this system. Both programs behave the same by default, i.e. acccat.snusp behaves like cat.snusp when it is given no command line arguments. Let accat.snusp accept the following arguments:

  1. -u Converts the output to upper case
  2. a ARG Append ARG to the output

Let's have a file called test.txt containing "Hello World!" and ending with a newline character (10).

Example 1

Command:
$ accat.snusp
Input:
The contents of test.txt
Effective input:
"Hello World!",10
Output:
accat.snsup:
Hello World!
cat.snusp:
Hello World!

Or with the arguments-prefix,

Effective Input:
253,"a",253,"a",Hello World!",10
with dots:
".a.aHello World!."
Output:
accat.snusp:
Hello World!
cat.snusp (with dots):
.a.aHello World!

Example 2

Command:
$ accat.snusp -u
Input:
The contents of test.txt
Effective input:
253,"a-u",0,253,"aHello World!",10
With dots for nonprinting chars:
".a-u..aHello World."
Output:
accat.snusp:
HELLO WORLD
cat.snusp (with dots):
.a-u..Hello World

There is some data corruption screw-up in the output of the old form of cat when the argument-prefix is used, which includes every case where some arguments are sent.

When no argument-prefix is used, there isn't a problem.
This holds true as long as the first byte in the input isn't 253.

Example 3

Command:
$ accat.snusp -u -a "And many more"
Input:
The contents of test.txt
Effective input:
253,"a-u",0,"-a",0,"And many more",0,253,"aHello World!",10
with dots:
".a-u.-a.And many more..aHello World!."
Output:
accat.snusp:
HELLO WORLD!And many more
cat.snusp (with dots):
.a-u.-a.And many more..aHello World!.
Or, with "echo" commands,

Example 4

Command:
$ acecho.snusp Hello World!
Input:
nothing
Effective input:
253,"aHello",0,"World!",0,253,"a"
with dots:
".aHello.World!..a"
Output:
acecho.snusp:
Hello World!
echo.snusp:

Comments:
This is where echo.snusp is made to do nothing because it cannot read any arguments, and so cannot print any arguments.

Example 5: An EPARM compliant echo implementation

    /@=@=@=@=++#
$>>@/<<\
/======/                                    
\@!\,+++?\,@\=?\=!/,+?\#EOF       EOF   EOH   
   |     #  |  #  |   \-?\>+<===\  #     #    
   >    EOH | EOH \\     \+++?\,+?!/@\-?!/@\==+?\==\  
   >        |     !\.---=\ /==/ |    |     |    |  |  
   >        |     \======|=|====/!===|=====|====/  |  
   >  /-====/!===========|=|=========/     \=======|==============+\
   |  \@=@=@=@=@=---#    | |                       | #+++=@=@=@=@=@/
   |                  /==/ \==\                    | 
   @                  \<=/?\!>/!===================/
   \=@=@+++#           /=/ \-\                        
   .                   \=>.<=/
   #

This program is written in Modular SNUSP.

The program prints all of its arguments to standard out, separated by single spaces. Once the arguments have been printed, the program exits.

The output of this program is consistent with the output for acecho.snusp in Example 4.

Conclusion

In conclusion, EPARM lets programs use only stdin and stdout for communication with the outside world, and receive command-line arguments. SNUSP programs, and SNUSP-calling programs that follow EPARM are compatible with programs that do not follow EPARM under most conditions.

More ideas

EPARM contains some ideas that can be extended into other forms of functionality. The SNUSP-start command protocol used by EPARM can be used to request up to 254 other functions at the start of a SNUSP program's input.

When a SNUSP-start command is being read,
the SNUSP-start escape character can be used as a general escape character.

A SNUSP-start escape sequence could be defined to allow stating the EOF character without closing the input.

A SNUSP start command may be used to turn on escapable mode, whereby the program then performs its normal function except with SNUSP start escape characters enabled in the normal input.

EOF