My favorite is the 366-byte C program emulator that can run Linux and Doom [0]. ...

cpldcpu · 2026-06-07T09:52:34 1780825954

Wow! And it also implements a very interesting variant of SUBLEQ that is turing complete.

>This VM implements an OISC - a One Instruction Set Computer. That instruction takes three signed 32-bit operands, a, b and c, and runs a program from memory m[] as follows:

1 PC (program counter) starts at 0

2 Fetch the next instruction (32-bit signed operands a, b and c)

3 If the low bit on any operand is set, remove it, and replace that operand with m[operand] i.e., a dereference of that address

4 Set m[b] = m[b] - m[a]

5 If m[b] is 0 or negative, set the PC to c, otherwise increment PC by 3 words

6 Go to step 2

jpegqs · 2026-06-08T18:37:23 1780943843

> 3 If the low bit on any operand is set, remove it, and replace that operand with m[operand] i.e., a dereference of that address

This dereference option makes a second instruction, this is not OISC.

cpldcpu · 2026-06-09T11:33:53 1781004833

CISC view: Its another adressing mode.

RISC view: SUBLEQ is already four instructions (2x memory access, alu, branch)

jpegqs · 2026-06-09T16:03:14 1781020994

3x memory access then, because read+read+write. And extra 0 to 3 reads in this work.

stevekemp · 2026-06-07T12:09:49 1780834189

I've spent the past few weeks coming up with my own simple programming language, which compiles to linux/amd64 assembly.

I could have gone all out writing standard library routines for opening files, running shell commands, coding strstr, strcpy, and similar. And to be honest I did implement some things I didn't need as part of the learning process (for example print(getenv("HOME")) works). But I soon realized I needed some example programs to test things and show off.

So of course the first real program I implemented was a brainfuck interpreter. Which means my language is now, indirectly, turing complete!

My early versions took 9 minutes to output the famous mandelbrot program, so I had to make a bunch of optimizations, and later implemented support for switch/case statements to speed things up. Now I can generate the same output in two minutes - so room for improvement, but also a good bit of progress!

Cheating by implementing another language in my own was very very satisfying. Though of course this is all for fun/learning and not intended to be used seriously by anybody, not even myself!

https://github.com/skx/s-lang

gsliepen · 2026-06-08T10:48:27 1780915707

Nice! What surprised my about the IOCCC submission though was how fast it ran. Sure, subleq can at least do some useful arithmetic in one instructions, but it still requires a lot of instructions to do an integer multiplication, let alone a floating point operation. But DOOM was actually playable with a decent FPS.

s-macke · 2026-06-07T08:28:49 1780820929

Here is the video:

https://www.youtube.com/live/MoWCwZx1Swc?si=eIOlRsKWNKRVRZeB...

ralferoo · 2026-06-07T12:09:38 1780834178

I think I like this idea, but the linked-to Eternal Software Initiative [1] is a bit confusing. There are several different versions of the instructions to decode this, all conflicting.

There's the one here: Set m[b] = m[b] - m[a]

Then it links to the reference implementation on github [2] which says you just need the napkin notes [3], which is dividing everything read by 4, which is corroborated by the reference implmentation [4], but it's not clear why 4 is chosen here rather than 2, as it seems to waste a bit. Was this bit needed, or is it reserved for future expansion?

I presume the original implementation didn't do the divide by 4 and it was added later, but I don't see why it was needed, other than perhaps just making LLVM code gen a little easier. I'd need to work through lots of examples to work out if the system as described is impossible without dividing by 4 (although you'd presumably only be able to access even addresses, and the PC increases by 3 each time, so it would definitely be annoying to refer to code locations).

Then the reference implementation starts doing magic when location 64 is accessed, overwriting locations 64-67 with the current time, which is mentioned in the napkin description, but not the description on the main page.

Both descriptions mention the magic -1 address, so it seems strange that the very implementation-dependent UTC clock isn't also implemented with -ve addresses rather than trashing memory that is otherwise free for the implementation to use as desired.

Both descriptions also mention the regular timer interrupt process, which also seems disappointing, reusing address 0 as the interrupt handler location and 1 as the saved PC, which means that you have to overwrite the initial entry point at location 0 as soon as the program starts.

[1] https://eternal-software.org/

[2] https://github.com/adriancable/eternal

[3] https://github.com/adriancable/eternal/blob/main/docs/napkin...

[4] https://github.com/adriancable/eternal/blob/main/vm/vm.c

ralferoo · 2026-06-07T16:14:49 1780848889

Maybe answering my own question, but I'm now wondering if the reason that the divide by 4 was chosen (so essentially using byte addressing instead of word addressing) is so that the linker can do symbol fixup / relocation.

adriancable · 2026-06-15T02:26:54 1781490414

Hi - author of the VM here! The whole LLVM stack (including codegen at the start, all the way through to symbol fixup on the linker side) assumes that addresses are byte addresses. And indeed for the VM to be useful, we want to be able to compile and run software written for POSIX C (e.g. Linux), which assumes an 8-bit byte. This conflicts with the architecture of the VM where the smallest directly addressable unit is a 32-bit word.

We manage this at the MIR level - we have for example LW, LH and LB MIR instructions that load (aligned) 32-bit words, 16-bit halfwords and 8-bit bytes from byte addresses that LLVM generates, respectively. LW is implemented as a simple assembly instruction (since we can load 32-bit words directly), whereas LH and LB are a lot more complicated since they need to load the whole 32 bit word and then compute individual bits depending on the subaddress within the word. But the end result is always that generated instructions reference only (32-bit aligned) byte addresses.

So this means:

- that the VM has to divide those byte addresses by 4 to obtain word addresses (i.e. indexes into our 32-bit word memory)

- the bottom two bits of each byte address are guaranteed to be 0, since all byte addresses in generated code are 32-bit aligned, therefore we can repurpose the low bit to indicate the indirect addressing mode

Hope this helps!

plasticeagle · 2026-06-07T11:44:55 1780832695

I downloaded and built this, and I feel confident in stating that this is the most impressive thing I have ever seen.

adriancable · 2026-06-15T02:28:38 1781490518

I really appreciate your kind words - very happy to hear you enjoyed this. Thank you!

s-macke · 2026-06-07T13:46:53 1780840013

What I find fascinating is the fact that you can implement those few lines in an esoteric language such as FRACTRAN or Game of Life and even boot Linux on them. Seems doable now. In theory.