Skip to content

SuperInstance/flux-llama

Repository files navigation

flux-llama

FLUX × llama.cpp — Novel integration of FLUX bytecode agents with LLM inference.

The Idea

What if token sampling in language models was driven by bytecode programs running on a virtual machine? What if multiple agents, each running their own sampling strategy as FLUX bytecode, voted on each token via A2A-style consensus?

This is that experiment.

Architecture

LLM Output Logits
       │
       ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Agent 0     │     │  Agent 1     │     │  Agent 2     │
│  (Conserv.)  │     │  (Creative)  │     │  (Penalty)   │
│              │     │              │     │              │
│  FLUX Byte   │     │  FLUX Byte   │     │  FLUX Byte   │
│  code:       │     │  code:       │     │  code:       │
│  logit * 2   │     │  pos-dep     │     │  freq-div    │
│              │     │  temperature │     │              │
└──────┬───────┘     └──────┬───────┘     └──────┬───────┘
       │                    │                    │
       └────────────┬───────┴────────────────────┘
                    │
              ┌─────▼──────┐
              │  Weighted  │
              │   Vote     │
              │  (A2A)     │
              └─────┬──────┘
                    │
              ┌─────▼──────┐
              │  Selected  │
              │   Token    │
              └────────────┘

Features

Multi-Agent Token Sampling

  • Each agent is a FLUX bytecode program that scores candidate tokens
  • Agents vote via weighted consensus (A2A-style trust scoring)
  • Strategies can be swapped at runtime by loading different bytecode

Bytecode Embedding Generation

  • Each agent's bytecode is converted to a 128-dim embedding vector
  • Opcode frequency → embedding dimension
  • Enables similarity comparison between agent strategies

Integration Paths

  1. Standalone (this demo): Simulated logits, pure FLUX VM sampling
  2. llama.cpp hook: Wire into llama_sample_token() callback
  3. ggml tensors: Map FLUX registers to tensor operations
  4. Custom models: Bytecode as a "programming layer" over any LLM

Building

# Standalone (no llama.cpp needed)
gcc -std=c11 -Wall -O2 -DFLUX_STANDALONE -o flux-llama src/flux_llama.c -lm
./flux-llama

# With llama.cpp (requires llama.cpp installed)
gcc -std=c11 -Wall -O2 -I/path/to/llama.cpp/include \
    -o flux-llama src/flux_llama.c -lm -lllama

Example Output

📊 Setting up 3-agent inference swarm...
  Agent 0 (Conservative): weight=0.5 — boosts high-logit tokens
  Agent 1 (Creative):     weight=0.3 — position-dependent temperature
  Agent 2 (Penalty):       weight=0.2 — penalizes high-frequency tokens

📝 Generating text (20 positions) via swarm consensus:
  the the the the the the the the the the the sea sea sea sea sea ...

Why This Matters

  • Agent-driven creativity: Different sampling strategies create different "voices"
  • Evolutionary optimization: Bytecode can be mutated and selected for quality
  • Transparent decisions: You can disassemble exactly why a token was chosen
  • Composable: Mix and match agent strategies like LEGO blocks
  • Fast: FLUX VM runs at 48K+ ops/sec on ARM — negligible overhead vs LLM inference

Future Directions

  • Real llama.cpp integration (sampling callback hook)
  • GPU-accelerated FLUX VM (CUDA) for batch scoring
  • Evolutionary agent optimization (mutate bytecode, select by output quality)
  • Bytecode embeddings as features for model fine-tuning
  • Multi-model swarms (different base models, FLUX coordination layer)

License

MIT — SuperInstance (DiGennaro et al.)

About

FLUX × llama.cpp — Multi-agent bytecode-driven LLM token sampling with swarm voting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors