From Esolang
Jump to navigation Jump to search

Software language bootstrapping is a two part activity that must be done with a great deal of consideration if you don't want to be considered a massive *Expletive deleted* who inflicted potentially multiple lifetimes of work upon others. To that end we will cover what bootstrapping is, what it requires and what you never should do if you don't want people hating you forever.

What is bootstrapping a language

A language is bootstrapped when its compiler is written in its own language. Usually this is achieved via writing a compiler or interpreter for the language in another more complete language, and then transitioning, after which the old implementation can be archived.

For example, many C compilers are written themselves in C but the C language was first written in the B language and later rebootstrapped from the cc_* family.

What is required to bootstrap a language

To bootstrap a language is a rather straightforward task if one is honest with themselves and willing to openly share their mistakes with the world. Version controlling of your source code is strongly recommended as there have been multiple cases of programmers building a new feature in their language and using it, only to lose the binary which included that functionality and not have the source code needed to rebuild it at all. So go install git first and follow this very simple rule:

NEVER check in code that can't be built by a previous commit

This will save you potentially years of work and will reduce the risks of a crazy person with a chainsaw showing up at your door.

Next you need to write the first implementation of your language in a different (ideally commonly available/supported language) as if your language can't be built by a commonly available language, expect it to die 5 minutes after you stop working on it. So please make sure to make the implementation of your language in both the bootstrapping language and itself both available and ensure there is a *VERY* clear line between them.

Finally you'll need the ability to do a bunch of very hard work. Because no one cares about your lazy *metoo* language which doesn't make a meaningful contribution. If you come up with a *really* good idea, you might be able to skate by with minimal work but if you go that extra mile, your language will be one they still talk about with praise long after you are gone. So please put in that extra effort.

Things that doom you

The most common thing that dooms most languages is: there is no working implementation of the language publicly available.

The second most common thing that usually dooms a language is that people can't build that language with standard tools (such as make, gcc, ghc, scheme, etc) if you are an amazing language with a large team of experts you might manage to escape this trap (like Haskell barely did) but don't bank on it because even if you succeed people will hate you for doing it that way.

The third most common thing that dooms a language is the output is non-reproducible. If you don't have reproducible results, you can't properly debug what is going wrong and no body has time for a buggy language. So be sure to look up reproducible builds and how to ensure not only is your language implementation reproducible but also that the output is as well.

The fourth most common thing that dooms a language is the author is trying to do way too much all at once. It is hard enough to debug a new language implementation without also having to debug a new assembly language and new linker so don't do that. If you are crazy enough to also do a new kernel too, expect to achieve net nothing for years and then no one will care about the results. So output C or assembly or llvm language-independent intermediate representation if you are a compiler.

The fifth most common thing that dooms a language is there is no free (as in freedom) implementation available. So release your language under an FSF approved license unless you want your language to be ignored forever. (assuming the source code isn't released or until the copyright expires on the publicly available source code)

The sixth most common thing that dooms a language is the author is unbearable to work with. Seriously, even great languages are dead ends if the people are insufferable *Expletive deleted* and the only hope is if they get forked by someone who actually is capable of acting like an emotionally healthy individual who respects reasonable boundaries. If you can't do that, get some therapy before you waste years of your life on driving away everyone who *MIGHT* work with you and create something of value together.

Assuming you have avoided all of the above and put real effort into producing something meaningful, use your language to produce something cool to demonstrate how cool it is. Good luck, it is hard to get attention out there.

Bootstrapping by example

Bootstrapping a real language that people care about is a considerable amount of work even when done by brilliant people working together (which is a rare feat in itself). So to that end I will detail the efforts taken to rebootstrap the C language over the course of 6 years of hard work by nearly a dozen people.

The stage0 project started with hex0 bootstrapping hex1 which was a quite tedious task. GNU Mes Developers noticed hex0 and decide to start by writing a Scheme interpreter in a minimal C subset and use that to self-host.

Then with some effort the projects started working together.

Which produced results which other people started to notice.

Which enabled even more progress.

That inspired even more notice.

And after all that

The bootstrap was completed and it produced a community of people who like each other.

Other bootstrapping examples

I(User:Abo-Junghichi) want to mention other eforts.


Differ from stage0 project, this project choose i386-linux user environment instead of (virtual) machine with own instruction set that binary is produced for. And, It strips calculating forward jumps to minimize the compiler of earlier step.

Collapse OS

This project targets old 8bit-era microproccessors with 8k byte RAM. It is a dialect of forth language.


This project builds modest programming language on top of a virtual machine each of whose instrution is one byte. The virtual machine is implemented with C-language for portability, but can be implemented as a i386-linux binary file with just 1k byte size. For example, below code is to set behavior of '\n' to doing nothing from undefined behavior at default.


At default, not only '\n' but also ' ' is set as undefined behavior. So you can't indent source code. Enough to be an esolang!

MinimalBinaryBoot strips portability between ISA, reaches 165 bytes as a i386-linux binary.

See also