Aura

A pure C++ inference engine for Google's Gemma 2B demonstrating core mechanics of LLMs

No abstraction layers of PyTorch or ONNX

Features

Zero runtime dependencies: written in standard C++17. no torch, no external math libraries required for inference
Gemma 2 Architecture: implements key architectural details:
- RMSNorm with (1+w) scaling
- Logit soft-capping (attention & final)
- GeGLU activation
- RoPE (rotary positional embeddings)
Advanced sampling: implements top-P (nucleus) sampling (p=0.9) for coherent text generation
Low memory footprint: uses fseek to stream weights from disk, allowing 2B/9B models to run on systems with limited RAM (though disk i/o becomes the bottleneck)

Quick start

1. Prerequisites

C++ Compiler: g++ (GCC), Clang, or MSVC
Python 3.x: required only for exporting weights
Disk Space: ~5GB for the model weights

2. Install dependencies (Python)

install the necessary libraries to download and convert the model:

pip install torch transformers huggingface_hub accelerate

3. Export weights

run the provided scripts to download Gemma 2 2B from HuggingFace and convert it to Aura's binary format

# Downloads model (~5GB) and converts weights
python export_gemma.py

# Exports the tokenizer vocabulary
python export_tokenizer.py

this will create gemma_weights.bin and tokenizer.bin in your directory

4. Build the engine

compile main.cpp. using -O3 and -march=native is recommended for best performance

g++ -O3 -march=native main.cpp -o aura

5. Run Aura

Start the engine:

./aura

Possible usage:

init aura engine...
ready...
chat mode enabled.
user: The capital of Switzerland is

Note on performance: since the engine streams weights from disk for every token, the initial prompt processing will be slow (limited by your disk speed). once generation starts, it produces tokens at a steady pace

Architecture

main.cpp: the core C++ inference engine
export_gemma.py: python script to export huggingface weights to raw binary
export_tokenizer.py: python script to export the tokenizer vocabulary

License

MIT License. The Gemma 2 model weights are subject to Google's Gemma Terms of Use.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
export_gemma.py		export_gemma.py
export_tokenizer.py		export_tokenizer.py
ground_truth.py		ground_truth.py
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Aura

Features

Quick start

1. Prerequisites

2. Install dependencies (Python)

3. Export weights

4. Build the engine

5. Run Aura

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Aura

Features

Quick start

1. Prerequisites

2. Install dependencies (Python)

3. Export weights

4. Build the engine

5. Run Aura

Architecture

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages