"Why use Python when you can spend 3 weeks writing C to do the same thing, but cooler?" β probably you, at some point during this project
This is a hand-rolled, stack-based virtual machine written in pure C, complete with a polymorphic object system that supports integers, floats, strings, collections, and vectors. It's the kind of thing you build when you want to understand how languages actually work under the hood β or when you just really, really don't want to use Python's list.
Under the hood, there are two main systems working together:
- The Object System β A tagged union that can hold any of the supported data types, with a full suite of operations (add, subtract, multiply, divide, clone, compare, free).
- The Virtual Machine β A bytecode interpreter with its own operand stack, instruction pointer, and opcode dispatch loop.
Together, they form a surprisingly capable little runtime. It's not going to replace the JVM. But it will impress your systems professor.
Everything lives in one C file because we are brave (or slightly chaotic). Here's a mental map:
ghost_vm.c
β
βββ Types & Structs
β βββ object_kind_t β enum: INTEGER | FLOAT | STRING | COLLECTION | VECTOR
β βββ object_data_t β union: holds whichever kind of data the object is
β βββ object_t β the actual object (kind + data)
β βββ collection β dynamic array of object_t pointers (doubles as a stack)
β βββ vector β N-dimensional float array
β βββ vm_t β the virtual machine itself (bytecode + ip + operand stack)
β
βββ Object Constructors
β βββ new_object_integer(int)
β βββ new_object_float(float)
β βββ new_object_string(char*)
β βββ new_object_vector(size_t, float*)
β βββ new_object_collection(size_t capacity, bool is_stack)
β
βββ Collection Operations
β βββ collection_append() β push to back
β βββ collection_pop() β remove from back
β βββ collection_access() β index-based read (non-stack only)
β βββ collection_set() β index-based write
β βββ stack_peek() β look at top without removing
β βββ is_empty() β 1 if empty, 0 if not, -1 if you passed nonsense
β βββ is_full() β checks capacity
β
βββ Object Operations (Polymorphic)
β βββ object_add() β + (integers, floats, strings, collections, vectors)
β βββ object_subtract() β - (integers, floats, stack-collections, vectors)
β βββ object_multiply() β * (integers, floats, strings, collections, vectors)
β βββ object_divide() β / (integers, floats, vectors)
β βββ object_equals() β deep equality check
β βββ object_clone() β deep copy
β βββ object_free() β recursive destructor
β
βββ Virtual Machine
β βββ new_virtual_machine() β allocates VM, sets up operand stack
β βββ run_vm() β the main fetch-decode-execute loop
β
βββ main() β demo program (builds a 3D vector, adds a scalar, prints it)
The whole object system is built on C's union β a memory region that can be interpreted as different types depending on context. We use an enum tag (the kind field) to track what's actually stored.
typedef struct Object {
object_kind_t kind; // what type is this?
object_data_t data; // the actual data (interpreted based on kind)
} object_t;This is essentially what dynamically-typed languages do internally. Python objects, Ruby objects, JavaScript values β they're all tagged unions in disguise. We just wrote ours in C where there's no safety net and the floor is made of segfaults.
| Kind | C Type | Notes |
|---|---|---|
INTEGER |
int |
Whole numbers. Classic. |
FLOAT |
float |
Decimal numbers. Slightly treacherous. |
STRING |
char * |
Heap-allocated, null-terminated. |
COLLECTION |
collection |
Dynamic array. Can behave as a list or a stack. |
VECTOR |
vector |
N-dimensional float coordinates. |
The collection type is doing double duty:
- As a list: supports indexed access, append, set, clone
- As a stack: supports push (append), pop, peek
The bool stack field on the struct is the flag that determines which mode it's in. Try calling collection_access() on a stack and you'll get a very stern fprintf(stderr, ...).
The VM's operand stack is itself a collection with stack = true. So yes β the object system eats its own cooking.
There is no garbage collector here. You malloc, you free. The object_free() function is your best friend:
void object_free(object_t *obj);It's recursive β freeing a COLLECTION will free all its children. Freeing a STRING will free its heap-allocated string. Freeing a VECTOR will free its coords array. And then it frees the object shell itself.
β οΈ Double-free danger zone: Theobject_add()function for collections uses move semantics. It transfers ownership of items fromaandbinto a new collection, then sets their lengths to 0 before freeing the shells. This is intentional and documented in the source. Do not remove those lines. Do not look at those lines funny. Just trust them.
Every arithmetic operation is overloaded by kind. Here's a quick reference for what works with what:
| a \ b | INTEGER | FLOAT | STRING | COLLECTION | VECTOR |
|---|---|---|---|---|---|
| INTEGER | β int | β float | β | β | β |
| FLOAT | β float | β float | β | β | β |
| STRING | β | β | β concat | β | β |
| COLLECTION | β | β | β | β merge* | β |
| VECTOR | β broadcast | β broadcast | β | β | β elementwise |
*Collection merge uses move semantics. a and b are consumed.
Strings and collections support repetition β "ha" * 3 gives you "hahaha", and [1, 2] * 3 gives you [1, 2, 1, 2, 1, 2]. Just like Python. Except you wrote it yourself in C. Feel good about that.
Subtracting two stack-typed collections pops matching items off the top of a. Both must be stacks, and the top items of a must equal the items of b. It's niche, but it's there.
The VM is a classic stack machine:
- Bytecode array (
size_t *) β the program to execute - Instruction Pointer (
ip) β points to the current instruction - Operand Stack β an
object_tcollection where instructions push and pop their operands
No registers. No frames (yet). Just a stack and a loop.
| Opcode | Operand(s) | Description |
|---|---|---|
OP_PUSH_INT |
int value |
Pushes an integer object onto the stack |
OP_PUSH_FLOAT |
size_t raw_bits |
Pushes a float (encoded as raw bits) onto the stack |
OP_PUSH_STRING |
size_t ptr |
Pushes a string object (from a raw char* cast) |
OP_BUILD_COLLECTION |
size_t n |
Pops n items, builds a collection, pushes it |
OP_BUILD_VECTOR |
size_t d |
Pops d numeric items, builds a d-dimensional vector |
OP_ADD |
β | Pops 2, adds, pushes result |
OP_SUB |
β | Pops 2, subtracts, pushes result |
OP_MUL |
β | Pops 2, multiplies, pushes result |
OP_DIV |
β | Pops 2, divides, pushes result |
OP_PRINT |
β | Pops 1, prints it, frees it |
OP_HALT |
β | Stops the VM |
Floats are passed into the bytecode array as raw bit patterns reinterpreted as size_t. This is done via the *(size_t*)&f pattern:
float f = 10.5f;
size_t encoded = *(size_t*)&f; // reinterpret the bits, don't convert the valueAnd decoded inside the VM like so:
size_t raw_bits = vm->bytecode[vm->ip];
float value = *(float*)&raw_bits;This is technically a type-punning trick. It is valid in C via memcpy semantics, and works correctly here. Is it elegant? No. Does it work? Yes. Will it confuse anyone reading it for the first time? Absolutely.
Compile and run:
gcc -o ghost ghost_vm.c -Wall -Wextra
./ghostExpected output (from the main() demo):
--- VM BOOT SEQUENCE INITIATED ---
<12.000000,22.000000,32.000000>
--- VM HALTED ----
What it did:
- Pushed floats
10.0,20.0,30.0onto the stack - Built a 3D vector
<10, 20, 30> - Pushed float
2.0 - Added
2.0to the vector (broadcast scalar addition) β<12, 22, 32> - Printed and halted
- Stack underflow is checked at the VM level before each arithmetic op. You'll get a
fprintf(stderr, ...)and an early return. The VM doesn't crash silently. is_empty()checksNULLafter checkingkindβ the null check should come first. This is a minor ordering quirk in the current implementation.object_clone()doesn't handleVECTORyet β if you try to clone a vector kind, you'll getNULL. Something to add next.object_equals()doesn't handleVECTORorFLOATprecision β float equality is compared with==, which is fine for exact matches but will bite you with computed values.
If you want to extend this into something more serious, here's a natural roadmap:
- Variable store β a hash map from names to
object_t * -
OP_LOAD/OP_STOREβ load/store variables from/to the store -
OP_JUMP/OP_JUMP_IFβ control flow - Call frames β support for function calls with local scopes
- A bytecode compiler β take source text, emit opcodes
- Fix
object_clone()for vectors - A proper REPL
This codebase makes one big bet: the object is the unit of everything. Every value the VM touches is an object_t. The VM's stack is an object_t. Collections hold object_t pointers. Operations take object_t pointers and return object_t pointers.
This makes the architecture extremely uniform. Adding a new type means:
- Add it to the
enum - Add it to the
union - Write a constructor
- Handle it in each operation's
switch - Handle it in
object_free()andobject_clone()
It's verbose. It's explicit. It's C. Welcome.
Do whatever you want with this. Learn from it, break it, extend it, ship it. Just don't blame me when you get a segfault at 2am. That's between you and Valgrind.
*Built with stubbornness, malloc, and a deep suspicion of garbage collectors.*License: MIT (Go wild).