Skip to content

Oasis-Company/ASI-Evolve

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ASI-Evolve โ€” Let AI Do the Research, You Keep the Insight

ASI-Evolve overview

"What if you could run a tireless AI researcher on your hardest problem โ€” one that reads the literature, designs experiments, runs them, and learns from every failure?"

That's ASI-Evolve. It is a general agentic framework that closes the loop between knowledge โ†’ hypothesis โ†’ experiment โ†’ analysis โ€” and repeats it autonomously, round after round, until it finds something that works.

We built it for AI research. But the loop doesn't care about domain.
A financial analyst, a biomedical engineer, a climate scientist, or a game developer can all plug their own problem into ASI-Evolve and let it search for better solutions than any human has time to manually explore.

Paper arXiv


What ASI-Evolve Proved

The paper validated ASI-Evolve on three hard AI problems โ€” each requiring expensive compute, open-ended search, and multi-dimensional feedback:

Domain What It Discovered Gain
Neural Architecture Design 105 SOTA linear attention architectures +0.97 pts over DeltaNet (โ‰ˆ3ร— recent human gains)
Pretraining Data Curation Evolved pipeline that selects cleaner training data +3.96 pts avg, +18 pts on MMLU
RL Algorithm Design Novel optimization mechanisms with mathematical innovations +12.5 pts on AMC32 vs GRPO
Biomedical (Drug-Target Interaction) Stronger architecture for cold-start generalization +6.94 AUROC

These are not toy benchmarks. These are real, frontier-level results โ€” produced autonomously.


The Core Loop (Works in Any Domain)

Step Action
1 LEARN โ€” retrieve relevant prior knowledge
2 DESIGN โ€” propose the next candidate program/idea
3 EXPERIMENT โ€” run it and collect metrics
4 ANALYZE โ€” turn results into reusable lessons

Repeat the loop and improve each round.

Three agents drive this loop:

  • Researcher โ€” reads the database and cognition store, proposes the next candidate
  • Engineer โ€” executes the candidate and collects structured metrics
  • Analyzer โ€” distills outcomes into transferable lessons for future rounds

Two memory systems keep the loop from going in circles:

  • Cognition Store โ€” inject your domain knowledge, papers, or heuristics upfront so the AI doesn't start from zero
  • Experiment Database โ€” every trial is stored with its motivation, code, result, and analysis; parent selection uses UCB1, greedy, random, or MAP-Elites island sampling

Who Should Use This

You don't need to be an AI researcher. You need:

  1. A problem where better code = better outcome โ€” optimization, algorithm design, pipeline tuning, simulation strategy
  2. An evaluation script โ€” something that takes a program and returns a score
  3. Some domain knowledge โ€” papers, rules of thumb, known good approaches

If you have those three things, ASI-Evolve can search the space for you.

Examples across domains:

Industry Problem You Can Throw At It
๐Ÿงฌ Biomedicine Drug-target interaction models, protein folding strategies, clinical trial algorithms
โšก ML Infra Inference schedulers, KV-cache eviction policies, batching strategies, kernel tiling
๐ŸŒ Climate / Energy Grid load-balancing heuristics, carbon footprint optimization pipelines
๐ŸŽฎ Game AI Bot strategies, procedural level generation algorithms, reward shaping functions
๐Ÿญ Manufacturing Quality control classifiers, scheduling heuristics, defect detection pipelines
๐Ÿ”ฌ Materials Science Crystal structure search algorithms, synthesis route optimization
๐Ÿ“ฆ Logistics Routing algorithms, warehouse assignment policies, demand forecasting models

Walkthrough: Inference Throughput Optimization

Scenario: You're an ML infra engineer. Your LLM serving system uses a fixed continuous-batching scheduler. You want to automatically discover a better request-scheduling policy that maximizes throughput while keeping P99 latency under budget โ€” without spending weeks hand-tuning heuristics.

1. Clone and install

git clone https://github.com/GAIR-NLP/ASI-Evolve.git
cd ASI-Evolve
pip install -r requirements.txt

2. Create your experiment

mkdir -p experiments/infer_scheduler/prompts
experiments/infer_scheduler/
โ”œโ”€โ”€ input.md          โ† problem description
โ”œโ”€โ”€ config.yaml       โ† API key + overrides
โ”œโ”€โ”€ initial_program   โ† baseline scheduler to evolve from
โ”œโ”€โ”€ init_cognition.py โ† inject your domain knowledge
โ”œโ”€โ”€ evaluator.py      โ† benchmark the candidate scheduler
โ””โ”€โ”€ eval.sh           โ† shell wrapper called each round

3. Describe the problem (input.md)

# LLM Serving Scheduler

Write a Python function `schedule(queue, gpu_mem_free_gb)` that selects
a batch of requests from `queue` (list of dicts with keys: seq_len, priority,
wait_ms) given available GPU memory, and returns a list of selected request ids.

Optimize for: tokens/sec throughput (primary), P99 latency ms (must stay < 500ms).

4. Baseline program (initial_program)

def schedule(queue, gpu_mem_free_gb):
    """FCFS baseline โ€” just take requests in arrival order."""
    budget = int(gpu_mem_free_gb * 1024)  # rough token budget
    selected, used = [], 0
    for req in sorted(queue, key=lambda r: r["wait_ms"], reverse=True):
        if used + req["seq_len"] <= budget:
            selected.append(req["id"])
            used += req["seq_len"]
    return selected

5. Seed the cognition store (init_cognition.py)

from cognition.store import CognitionStore

store = CognitionStore(storage_dir="experiments/infer_scheduler/cognition_data")
store.add([
    {
        "title": "Continuous Batching",
        "content": "Orca-style iteration-level scheduling avoids head-of-line blocking "
                   "by preempting long requests and inserting shorter ones mid-flight.",
    },
    {
        "title": "Sequence Length Bucketing",
        "content": "Grouping requests by similar sequence length reduces padding waste "
                   "and improves GPU utilization significantly.",
    },
    {
        "title": "Priority + Starvation",
        "content": "Pure priority scheduling causes starvation. Age-weighted priority "
                   "(priority * log(1 + wait_ms)) balances latency and throughput.",
    },
])
python experiments/infer_scheduler/init_cognition.py

6. Evaluator (evaluator.py + eval.sh)

import importlib.util, sys, json
from benchmark import run_serving_simulation  # your internal benchmark harness

def load(path):
    spec = importlib.util.spec_from_file_location("candidate", path)
    mod = importlib.util.module_from_spec(spec); spec.loader.exec_module(mod)
    return mod.schedule

if __name__ == "__main__":
    fn = load(sys.argv[1])
    result = run_serving_simulation(scheduler_fn=fn, trace="traces/sharegpt_1k.jsonl")
    # score = throughput; constraint violation penalizes automatically
    score = result["tokens_per_sec"] if result["p99_latency_ms"] < 500 else 0
    print(json.dumps({"score": score, "metrics": result}))
# eval.sh
#!/bin/bash
python /path/to/experiments/infer_scheduler/evaluator.py "$1"

7. Run

python main.py \
  --experiment infer_scheduler \
  --steps 40 \
  --sample-n 3 \
  --eval-script /path/to/experiments/infer_scheduler/eval.sh

ASI-Evolve will iterate through scheduling policies โ€” from simple heuristics to multi-factor priority functions with dynamic chunking โ€” and write a structured lesson after every trial. After 40 rounds you have a ranked database of every policy tried, why each worked or failed, and the best-performing code ready to deploy.


What ASI-Evolve Does That You Can't Do Manually

Manual Research ASI-Evolve
Try 5โ€“10 ideas per week 50โ€“200 candidates per run
Knowledge stays in one person's head Every insight written to the database
Cold start on each new hypothesis Cognition store primes every round
Hard to know why something worked Analyzer explains every outcome
Results hard to reproduce Full experiment tree stored on disk

Repository Layout

ASI-Evolve/
โ”œโ”€โ”€ main.py                         โ† entry point
โ”œโ”€โ”€ config.yaml                     โ† global defaults
โ”œโ”€โ”€ pipeline/                       โ† Researcher, Engineer, Analyzer agents
โ”œโ”€โ”€ cognition/                      โ† cognition store (embedding + FAISS)
โ”œโ”€โ”€ database/                       โ† experiment database (nodes + sampling)
โ”œโ”€โ”€ utils/
โ””โ”€โ”€ experiments/
    โ”œโ”€โ”€ circle_packing_demo/        โ† included runnable demo
    โ””โ”€โ”€ best/circle_packing/        โ† top programs from our ablation runs

Installation

pip install -r requirements.txt

Requirements:

  • Python 3.10+
  • bash and python3 on your system path
  • Any OpenAI-compatible API endpoint (GPT-4o, Claude, Gemini, local models via LiteLLM, etc.)
  • Optional: Weights & Biases for experiment tracking

Quick Start (Circle-Packing Demo)

The included demo evolves a program to pack 26 circles in a unit square โ€” a clean benchmark used in our ablation studies.

# Initialize the cognition store
python experiments/circle_packing_demo/init_cognition.py

# Run 10 evolution steps
python main.py \
  --experiment circle_packing_demo \
  --steps 10 \
  --sample-n 3 \
  --eval-script /path/to/experiments/circle_packing_demo/eval.sh

ASI-Evolve reaches SOTA-level circle-packing results in as few as 17 rounds.


Configuration Reference

Configuration merges in this order (later overrides earlier):

  1. config.yaml at repository root
  2. experiments/<name>/config.yaml
  3. An explicit file passed with --config

Key settings:

Key What It Controls
api.model The LLM driving all agents
pipeline.sample_n How many historical nodes to show the Researcher each round
pipeline.parallel.num_workers Parallel evolution workers (2โ€“4 for production)
database.sampling.algorithm ucb1 / greedy / random / island
cognition.retrieval.top_k How many cognition items to retrieve per round

Citation

If you use ASI-Evolve in your work, please cite:

@misc{asi_evolve_2026,
  title   = {ASI-Evolve: AI Accelerates AI},
  author  = {Xu, Weixian and Mi, Tiantian and Liu, Yixiu and Nan, Yang and Zhou, Zhimeng and Ye, Lyumanshan and Zhang, Lin and Qiao, Yu and Liu, Pengfei},
  year    = {2026},
  note    = {SJTU / SII / GAIR. https://github.com/GAIR-NLP/ASI-Evolve}
}

ASI-Evolve is open-source and ready for your domain.
Fork it. Point it at your problem. Let it run.

๐Ÿ“„ Paper ยท ๐Ÿ› Issues ยท ๐Ÿ’ฌ Discussions

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 95.2%
  • Jinja 3.5%
  • Shell 1.3%