A minimal secure kernel written in C/ASM, boots via UEFI (primary path) or GRUB Multiboot2 (legacy path), targeting x86-64 long mode.
Milestone M8 (Preemptive Multitasking + Process Lifecycle): complete (verified N=4/N=6). See docs/devlog/M8.md.
- Timer-driven preemptive scheduler (trapframe
isr_timer, quantum, ring-3-only preemption) + kernel idle task SYS_EXIT→ zombie + reaping;vmm_space_destroyfixed to free the privatePML4[0]PDPT; no PMM leak across rounds- Fixed a heap allocator bug (
kfreecoalesced non-physically-contiguous blocks) that corrupted page tables at N≥3 - Build the demo:
make CFLAGS_EXTRA=-DM8_SCHED_DEMO=1; verify withtools/selftest.sh
Milestone M7 (Ring-3 Entry + Cooperative Scheduling): complete. Tag M7_STABLE. See docs/devlog/M7.md.
arch_enter_user_mode()performs the ring-0 → ring-3 transition into a user process built from a synthetic ELFSYS_YIELD(syscall 0) drivessched_yield_from_syscall(), which saves the caller trapframe, picks the nextREADYprocess and resumes it viairetq- Two ring-3 processes alternate cooperatively via
SYS_YIELD(verified: hundreds of thousands of clean[SCHED] switchper run, both directions, no CPU exception, on MB2 + UEFI) - Demo is gated behind
M7_RING3_DEMO(off by default so normal boot reaches the shell):make CFLAGS_EXTRA=-DM7_RING3_DEMO=1 - Non-interactive verification:
tools/selftest.sh(asserts M4 12/12 + alternating ring-3 switches + no[EXC]) - Four bugs were fixed to get here (premature timer preemption,
EFLAGS.NT→iretq#GP, supervisorPML4[0]+ shared user PDPT, 2× stride in ELF copy) — see the devlog
Milestone M6 (Minimal Context Switch): complete. See docs/devlog/M6.md.
trapframe_t* tfadded to the PCB;process_create_from_elfbuilds the initial iret frame (RIP=entry, RSP=stack_top, RFLAGS=0x202)arch_iret_to_tf/arch_switch_to_processrestore all 15 GPRs andiretqinto the target aftervmm_switch_space- Syscall and exception handlers persist the live trapframe into
current->tf
Milestone M5 (Trapframe Syscall Entry + Kernel Stack Hardening): complete. See docs/devlog/M5.md.
- Unified
INT 0x80entry into the canonicaltrapframe_tlayout (matchesisr_commonGPR save/restore); Csyscall_handler(trapframe_t*) - Replaced PID-based kernel-stack slots with a bounded slot index
Milestone M4 (Stabilization & Isolation Test Suite): complete. Tag M4_STABLE. See docs/devlog/M4.md.
- 12-case in-kernel selftest suite for
user_range_valid/copy_from/to_user(all PASS) vmm_map_in_space()supervisor enforcement aligned withvmm_map()(R1 closed)- Smoke tests: MB2 (exit 124 PASS) + UEFI (exit 124 PASS)
- Selftest controlled by
M4_SELFTEST_ENABLE(default: 1)
Milestone M3 (User/Kernel Isolation): complete. Tag M3_ISOLATION_BASE. See docs/devlog/M3.md.
user_range_valid,copy_from_user,copy_to_user— safe boundary-crossing primitivesvmm_map()supervisor enforcement,vmm_harden_user_space()fixed, PML4[256+] assert- All syscall user pointers validated (SYS_OPEN/READ/WRITE/DRIVER)
Milestone M2 (Stack and Exception Hardening): complete. Tag M2_STABLE. See docs/devlog/M2.md.
- Dedicated virtual stack region at
0xFFFFFF8000000000(PML4[511]) - Kernel main stack: 16 KB usable + guard_lo / guard_hi (not-present PTEs)
- IST1/2/3 stacks: 8 KB each, full guard pages, virtual addresses in TSS
- Debugcon boot markers:
SECoS build <TS> git:<HASH>+[M2] Stack switch ok - Deterministic smoke test:
tools/smoke.sh(exit 124 = PASS for both MB2 and UEFI)
M1 (Memory Model Hardening): complete. Tag M1_STABLE. See docs/devlog/M1.md.
# Build and run (selftest is enabled by default, M4_SELFTEST_ENABLE=1):
make iso && tools/smoke.sh --mb2 --timeout 20 --log /tmp/secos_mb2.log
# Check debugcon log for [M4][SELFTEST] lines (expect: 12/12 PASS)
# Disable selftest at build time:
make iso CFLAGS_EXTRA=-DM4_SELFTEST_ENABLE=0- ✅ Long Mode (64-bit) boot
- ✅ Multiboot 1/2 support (GRUB)
- ✅ UEFI boot support (see UEFI_IMPLEMENTATION.md)
- ✅ Basic VGA text terminal + framebuffer console
- ✅ Initial identity mapping (transitory)
- ✅ Working stack
- ✅ Interrupt Descriptor Table (IDT)
- ✅ PIT timer with periodic interrupts (IRQ0)
- ✅ Tick and uptime system
- ✅ Blocking sleep functions
- ✅ PS/2 keyboard driver with circular buffer (IRQ1)
- ✅ Physical Memory Manager (PMM) frame allocator
- ✅ Heap allocator (kmalloc/kfree) with dynamic expansion
- ✅ Virtual Memory Manager (VMM) with user space support and in-space translation
- ✅ NX Bit and W^X policy for kernel regions and ELF segments
- ✅ ELF64 loader (PT_LOAD segments, W^X enforcement, p_align handling, per-process page tracking)
- ✅ User process address spaces + stack with guard page
- ✅ Extended PCB (state, registers, manifest, mapped page list for precise unload)
- ✅ Multiboot memory map parsing
- ✅ Interactive shell with commands
- ✅ Error handling during boot & process unload (elfunload)
- ✅ ps command (basic process listing)
See our Development Roadmap for upcoming features!
sudo apt install nasm gcc binutils gnu-efi ovmf qemu-system-x86 mtoolsmake # produces kernel.bin
make uefi # produces dist/EFI/BOOT/BOOTX64.EFIcp kernel.bin dist/kernel.elf
mcopy -i dist/test_esp.img -o dist/kernel.elf ::kernel.elf
mcopy -i dist/test_esp.img -o dist/EFI/BOOT/BOOTX64.EFI ::EFI/BOOT/BOOTX64.EFIBuild ESP image and run with q35 (OVMF works best with q35 chipset):
# Full rebuild
make clean && make && make uefi
# Rebuild ESP image (run once or after EFI/kernel changes)
dd if=/dev/zero of=dist/test_esp.img bs=1M count=64 2>/dev/null
mkfs.fat -F 32 dist/test_esp.img
mmd -i dist/test_esp.img ::/EFI ::/EFI/BOOT
mcopy -i dist/test_esp.img dist/EFI/BOOT/BOOTX64.EFI ::/EFI/BOOT/BOOTX64.EFI
mcopy -i dist/test_esp.img dist/kernel.elf ::/kernel.elf
# Run (exit 124 = timeout = kernel alive; exit 0 = kernel crashed)
timeout 30 qemu-system-x86_64 \
-machine q35,accel=tcg \
-m 512M \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=dist/OVMF_VARS_test.fd \
-drive if=ide,format=raw,file=dist/test_esp.img \
-net none \
-nographic \
-serial file:/tmp/secos_smoke.log
echo "Exit: $?"Serial output lands in /tmp/secos_smoke.log.
# Multiboot2 / GRUB (build ISO + run 20s)
tools/smoke.sh --mb2 --timeout 20 --log /tmp/secos_mb2.log
# UEFI / OVMF / q35 (build + run 25s)
tools/smoke.sh --uefi --timeout 25 --log /tmp/secos_uefi.logExit 0 = PASS (kernel alive). Exit 1 = FAIL (triple fault / crash). Debugcon log captures: build timestamp, git hash, M2 stack switch RSP.
qemu-system-x86_64 \
-machine pc -cpu qemu64 -m 512M \
-drive if=pflash,format=raw,readonly=on,file=/usr/share/OVMF/OVMF_CODE.fd \
-drive if=pflash,format=raw,file=dist/OVMF_VARS_test.fd \
-drive if=ide,format=raw,file=dist/test_esp.img \
-serial file:/tmp/kernel_debug.log \
-debugcon file:/tmp/uefi_boot.log \
-display none -no-rebootsudo apt install nasm gcc binutils grub-common grub-pc-bin xorriso qemu-system-x86make iso # produces myos.iso (GRUB bootable)
make run # build ISO + run in QEMU (graphical window)
make run-serial # build ISO + run headless: interact with the shell in THIS terminal over COM1
make run-vnc # build ISO + run with a VNC server on localhost:5900
make clean # remove all artifactsThe interactive shell is usable without a graphical window: terminal output is mirrored to COM1 and serial RX feeds the shell input path, so
make run-serial # qemu ... -serial stdio -display nonedrives the whole shell in the terminal (useful where WSLg/GTK/SDL do not work).
Quit QEMU with Ctrl-A then X. The framebuffer window (make run) and VNC
(make run-vnc) remain available.
UEFI path (primary):
OVMF → BOOTX64.EFI → efi_main() → elf_load_kernel()
→ ExitBootServices() → _uefi_start [BITS 64] → kernel_main(magic=0, info=&secos_boot_info)
Multiboot2 path (legacy):
GRUB → _start [BITS 32] → long mode setup → long_mode [BITS 64] → kernel_main(magic=0x36d76289, info=mb2_info)
kernel_main distinguishes the boot path via magic:
magic == 0→ UEFI;infois a pointer tostruct secos_boot_infomagic == 0x36d76289→ Multiboot2;infois pointer to Multiboot2 info struct
.
├── boot/
│ └── boot.asm # Multiboot2 entry (_start, 32-bit) + UEFI entry (_uefi_start, 64-bit)
├── arch/x86/
│ ├── idt_asm.asm # Interrupt stubs
│ └── syscall_asm.asm # Syscall entry
├── kernel/
│ ├── kernel.c # kernel_main, initialization
│ └── bootinfo.h # secos_boot_info struct (shared UEFI/Multiboot2)
├── uefi/
│ ├── boot.c # UEFI bootloader (efi_main)
│ ├── elf_load.c # ELF64 loader (used by UEFI bootloader)
│ ├── efi.h # UEFI type definitions
│ └── crt0.s # UEFI application CRT0 (replaces gnu-efi crt0)
├── mm/ # PMM, VMM, heap, ELF kernel loader
├── drivers/ # Framebuffer, keyboard, timer, RTC
├── fs/ # RAMFS, VFS, FAT32, ext2
├── lib/ # Terminal utilities
├── user/ # Embedded test ELF driver
├── dist/
│ ├── EFI/BOOT/BOOTX64.EFI # Built UEFI bootloader
│ ├── test_esp.img # FAT32 ESP image for QEMU
│ └── OVMF_VARS_test.fd # UEFI variable store
├── linker.ld # Kernel linker script
└── Makefile # Build system
Once the kernel boots you can use these commands:
- help - Show list of available commands
- clear - Clear the screen
- echo [text] - Print specified text
- info - Show system information
- uptime - Display system uptime
- sleep [ms] - Wait N milliseconds (1-10000)
- mem - Show memory statistics (PMM + Heap)
- memtest - Memory allocation/free test
- memstress - Heap allocator stress
- elfload - Load embedded test ELF
- elfunload - Destroy last loaded process
- ps - List active processes (minimal)
- colors - VGA color test
- reboot - Reboot system
The kernel includes a simple in-memory filesystem supporting mutable/immutable files and hierarchical directories. Main commands:
- rfls [path] - List direct children of a directory (root if omitted)
- rfcat - Show file content
- rfinfo - Show metadata + first bytes
- rfadd - Create mutable file with initial content
- rfwrite - Write (grows if needed) into mutable file
- rfdel - Delete mutable file
- rfmkdir / rfrmdir - Create / remove directory (empty)
- rfcd / rfpwd - Change / show RAMFS working directory
- rftree [path] - Print recursive tree with branch format (├─, └─)
- rfusage - Stats: number files, dirs, total bytes, free slots
- rfmv - Rename file or directory (updates descendant paths). Prevents cycles (cannot rename a dir inside itself: /a -> /a/b).
- rftruncate - Shrink/expand mutable file to size bytes
Special files generated at boot:
VERSION- Dynamic build info (BUILD_TS and GIT_HASH from Makefile)init.rc- Command script auto-executed at boot (uses shell_run_line)sys/syscalls.txt- Placeholder syscall listsys/manifest.txt- Initial RAMFS manifest (list TYPE SIZE NAME of each entry)
Path notes: resolver normalizes paths removing duplicate slashes and handling . and ...
Quick example:
rfmkdir docs
rfadd docs/readme.txt HelloRAMFS
rfwrite docs/readme.txt 5 _World
rfcat docs/readme.txt # Output: Hello_World
rfmv docs/readme.txt docs/info.txt
rftree docs
rftruncate docs/info.txt 5
rfcat docs/info.txt # Output: Hello
A minimal VFS layer abstracts different filesystems under a unified API. Main components:
vfs_mount_root()– mount a filesystem as root/(currently RAMFS).vfs_lookup(path)– resolve a generic inode (file or directory).vfs_readdir(path, cb)– iterate direct children of a directory.- File ops:
vfs_read_all,vfs_write,vfs_create,vfs_truncate,vfs_remove,vfs_rename,vfs_mkdir.
Shell VFS commands:
vls [path]– list via VFS (shows absolute paths with leading/).vcat <path>– read file through the VFS layer.vinfo <path>– show type and size.vpwd– show CWD (reuses RAMFS working dir for now).vmount– show mount root status (RAMFS already mounted).
Planned next steps:
- FAT32 driver: parse BPB, FAT table and root directory (initial read-only).
- exFAT driver: different structure (allocation bitmap + up-case table); initial read-only support.
- Abstract block device interface (e.g.
block_read(sector, buf)). Under QEMU can be simulated with a buffer or future ATA/virtio.
Design file in preparation: FAT32.md will outline initial parsing and inode VFS mapping.
Introduced a mediation subsystem for device access named Driver Space. It allows a user process to register as a "driver" for a device and perform granular operations via the SYS_DRIVER syscall, without exposing raw MMIO or sensitive structures.
Componenti principali:
device_desc_t– descriptor with register and memory region base/size plus capability mask.- Device registry – array initialized by
driver_registry_init(). - Process→device binding – created with shell command
drvreg <id>. driver_call_t– request structure sent todriver_syscall()(user-space wrapper):typedef struct { uint32_t opcode; // DRIVER_OP_* uint32_t device_id; // indice nel registro uint64_t reg_offset; // offset nel buffer registri uint64_t value; // valore per WRITE_REG o arg generico uint64_t mem_offset; // offset regione memoria device uint64_t mem_length; // lunghezza per map/unmap uint32_t flags; // DRIVER_FLAG_* } driver_call_t;
- Dispatcher kernel
handle_driver_call()– valida e esegue l'operazione. - Permission engine – verifica binding, capability e range.
- Audit log – buffer circolare con eventi (errori o flag audit).
Initial supported opcodes:
DRIVER_OP_READ_REGDRIVER_OP_WRITE_REGDRIVER_OP_MAP_MEM(stub)DRIVER_OP_UNMAP_MEM(stub)DRIVER_OP_GET_INFO
Primary result codes: DRV_OK, DRV_ERR_DEVICE, DRV_ERR_BINDING, DRV_ERR_OPCODE, DRV_ERR_RANGE, DRV_ERR_PERM, DRV_ERR_ARGS.
Quick user-space example (testdriver):
driver_call_t dc = {0};
dc.device_id = 0;
dc.opcode = DRIVER_OP_WRITE_REG;
dc.reg_offset = 0x4;
dc.value = 0xABCD1234;
long r = driver_syscall(&dc);Related shell commands:
drvinfo– show devices and bindings.drvreg <id>/drvunreg <id>– bind / remove process binding.drvlog– print audit log.drvtest– execute a test operation sequence.
Current security:
- No direct access to real MMIO (shadow register buffer).
- Capability gating reduces surface.
- Exclusive binding per device.
- Audit errors always recorded.
Future hardening plans: rate limiting per process/opcode, advanced audit filter (drvlog errors|dev=<n>|op=<code>), DMA sandbox, IRQ subscribe, bulk transfer.
For more details see DRIVER_IF.md.
- ✅ Long Mode (64-bit) boot
- ✅ Basic VGA text terminal
- ✅ Identity mapping of first 1GB of memory
- ✅ Working stack
- ✅ Error handling during boot
To test on real hardware:
-
Write the ISO to a USB flash drive:
sudo dd if=myos.iso of=/dev/sdX bs=4M status=progress
(Replace
/dev/sdXwith the correct device) -
Boot the computer from the USB drive
This kernel is a solid starting point. You can extend it with:
- Ring-3 transition + context switch - in progress (M5–M7): trapframe-based syscall entry,
arch_iret_to_tf, andSYS_YIELDcooperative scheduling are implemented; the ring-3 demo does not yet complete (see Current Status) - Preemptive scheduler - timer-driven (IRQ0) context switching on top of the M6/M7 trapframe machinery
- Security manifest - Parse
.note.secossection for policy (loader support already present; see below)
The loader searches for an ELF note (PT_NOTE) with name SECOS and type QSEC containing a structure:
uint32_t version;
uint32_t flags; // MANIFEST_FLAG_REQUIRE_WX_BLOCK, STACK_GUARD, NX_DATA, RX_CODE
uint64_t max_mem; // limite attivo: se usage > max_mem abort
uint64_t entry_hint; // entry attesa (0 = ignora)
If present it is validated (entry match, supported flags). W|X segments are rejected unconditionally. The max_mem field is compared to total occupied memory (pages * 4096) after loading and before process start: if it exceeds the limit the process is aborted.
- ASLR - Address space layout randomization for code and stack
- File system - FAT32/exFAT + VFS
- File system - RAM or disk-based filesystem
- Device drivers - Mouse, serial port, AHCI/IDE
- Networking - Basic TCP/IP stack
- Advanced shell - Piping, redirection, job control
- Syscalls - User/kernel space interface
Per debug con QEMU e GDB:
qemu-system-x86_64 -cdrom myos.iso -s -SIn another terminal:
gdb kernel.bin
(gdb) target remote localhost:1234
(gdb) continue- OSDev Wiki - Comprehensive OS development resource
- Intel Manual - Intel CPU documentation
- AMD Manual - AMD CPU documentation
- GRUB Documentation - GRUB manual
This code is provided as an educational example and may be used freely under the MIT license (see LICENSE.md).
SECoS adotta un modello a tre livelli logici. Il Kernel Space (Ring 0) possiede page tables,
PMM, VMM, IDT e scheduler. Lo User Space (Ring 3) esegue processi ELF isolati, senza accesso
hardware diretto. Il Driver Space è un livello intermedio in Ring 3: un processo con binding a
un dispositivo può eseguire operazioni granulari via SYS_DRIVER, mediate e auditate dal kernel.
Nessun processo Ring 3 accede a MMIO, IOPL=0. Ogni operazione driver è soggetta a capability check
(caps_mask) e registrata nel log circolare.
Specifica completa: docs/DRIVER_SPACE.md.
The kernel applies W^X to its sections and marks data regions NX. User pages are mapped with USER while shared kernel regions keep USER=0 after hardening (vmm_harden_user_space). The user stack has an unmapped guard page to catch overflow via page fault. The ELF loader enforces that no segment is both writable and executable and validates alignment (p_align 0 or 0x1000). Every code/data/stack page is tracked in the PCB for precise unload and memory accounting (manifest max_mem).
| Term | Definition |
|---|---|
| Long Mode | 64-bit operating mode of x86-64 CPUs enabled via EFER.LME and paging. |
| Multiboot | Boot specification allowing GRUB to pass memory and module info to the kernel. |
| IDT | Interrupt Descriptor Table: maps interrupt vectors to handler entry points. |
| PIC | Programmable Interrupt Controller (8259) remapped to avoid conflict with CPU exceptions. |
| PIT | Programmable Interval Timer generating periodic IRQ0 ticks (used for uptime and future scheduling). |
| PMM | Physical Memory Manager allocating/releasing 4KB frames using a bitmap. |
| Heap Allocator | Dynamic memory allocator (kmalloc/kfree) expanding with PMM frames. |
| VMM | Virtual Memory Manager handling page mapping, user spaces, physmap and hardening (USER bit removal for kernel regions). |
| Physmap | High virtual linear mapping of all physical memory for direct access (non-executable). |
| W^X | Policy ensuring writable pages are not executable (Write XOR Execute). |
| NX Bit | No-Execute bit set on data regions to block instruction fetches. |
| Guard Page | Unmapped page placed adjacent to a stack to catch overflow via page fault. |
| ELF Loader | Loads ELF64 binaries (PT_LOAD segments) enforcing alignment and W^X restrictions. |
| Manifest (SECOS note) | ELF PT_NOTE section carrying security flags (stack guard, NX data, RX code requirement, max memory). |
| PCB | Process Control Block containing PID, registers, memory space pointer, stack top, manifest pointer and accounting fields. |
| RAMFS | In-memory hierarchical filesystem supporting mutable and immutable entries. |
| VFS | Virtual File System abstraction layering over RAMFS and future FS drivers (ext2, FAT32). |
| Driver Space | Mediated interface allowing user processes to perform granular device operations via audited syscall. |
| Capability Mask | Bitmask indicating which driver operations (read/write/map/info) a device permits. |
| Audit Log | Circular buffer capturing driver syscall events (tick, pid, opcode, result) for inspection. |
| IST | Interrupt Stack Table entries used to switch to dedicated stacks for critical faults (e.g., double fault). |
| CR3 | Register holding base of current paging structures (PML4 in x86-64). |
| PML4 | Top-level page table in x86-64 used in 4 or 5 level paging hierarchies. |
| Phys Address | Real hardware memory address before translation; often accessed via physmap. |
| USER bit | Page table permission bit allowing code/data access from ring3 (future user-mode). |
| Trap Gate | IDT entry type preserving IF flag state for syscalls/int instructions (as used for INT 0x80). |
| Tick | Increment from PIT interrupt used for timekeeping and scheduling. |
| Frame | 4KB physical memory unit managed by PMM. |
| Huge Page | Larger page (2MB) used to reduce page table overhead for physmap. |
| Demand-Zero | Technique allocating zeroed pages lazily on first access within a registered region. |