Context Engineering Toolkit

Context window optimization library for LLM applications. Compression, prioritization, and benchmarking tools to maximize the value of every token.

Why I Built This

Context windows are the critical bottleneck in every LLM application. Whether you are building RAG, agents, or chat systems, you are constantly fighting the trade-off between providing enough context for good answers and staying within token limits. Most teams solve this with naive truncation -- cutting text at an arbitrary character count and hoping for the best.

This toolkit provides principled approaches:

Extractive compression that preserves information density by selecting the most important sentences
Token-aware truncation that respects token boundaries instead of splitting mid-word
Priority-based assembly that ensures the most relevant information fits first
Retention benchmarks that measure whether your context strategy actually preserves key information

The result: smaller context windows with higher information density, lower API costs, and better LLM outputs.

Features

Feature	Description
Multi-model token counting	Accurate counts for GPT-4, GPT-4o, Claude, Llama
Cost estimation	Per-request cost calculation by model
Extractive compression	TF-IDF sentence scoring with position bias
Smart truncation	Token-aware head/tail/middle-out strategies
Priority assembly	REQUIRED > HIGH > MEDIUM > LOW context ordering
Token budgeting	Section-based allocation with rebalancing
Retention benchmarks	Key term, entity, numeric, sentence coverage metrics
CLI interface	Count, compress, benchmark from the command line

Quick Start

# Install
pip install -e ".[dev]"

# Run the demo
python -m src.cli demo

# Count tokens
echo "Your text here" | python -m src.cli count

# Compress a document to 500 tokens
python -m src.cli compress --file document.txt --target-tokens 500

# Benchmark compression quality
python -m src.cli benchmark original.txt compressed.txt

Architecture

src/
  tokens/
    counter.py           # Multi-model token counting with tiktoken
    budget.py            # Section-based token budget management
  compression/
    extractive.py        # TF-IDF sentence scoring and selection
    truncation.py        # Token-aware head/tail/middle truncation
  assembly/
    priority.py          # Priority-based context window assembly
  benchmarks/
    retention.py         # Information retention measurement
  cli.py                 # Click-based CLI interface

See docs/architecture.md for detailed Mermaid diagrams.

Usage

Token Counting

from src.tokens.counter import TokenCounter, ModelFamily

counter = TokenCounter(ModelFamily.GPT4O)
result = counter.count("Your text here")

print(f"Tokens: {result.token_count}")
print(f"Cost: ${result.estimated_input_cost_usd:.6f}")
print(f"Window usage: {result.utilization:.2%}")
print(f"Remaining: {result.remaining_tokens:,}")

Extractive Compression

from src.compression.extractive import ExtractiveSummarizer

summarizer = ExtractiveSummarizer(model=ModelFamily.GPT4O)

# Compress to target token count
compressed = summarizer.compress(long_text, target_tokens=500)

# Or compress by ratio
compressed = summarizer.compress_with_ratio(long_text, ratio=0.3)  # 30% of original

Priority-Based Context Assembly

from src.assembly.priority import PriorityAssembler, ContextItem, ContextPriority

assembler = PriorityAssembler(budget_tokens=4000, model=ModelFamily.GPT4O)

# System prompt always included
assembler.add(ContextItem(
    content="You are a helpful assistant.",
    priority=ContextPriority.REQUIRED,
))

# RAG results, ordered by relevance
assembler.add(ContextItem(
    content=rag_chunk_1,
    priority=ContextPriority.HIGH,
    relevance_score=0.95,
    category="retrieved_context",
))

# Chat history as supporting context
assembler.add(ContextItem(
    content=chat_history,
    priority=ContextPriority.MEDIUM,
    category="chat_history",
))

result = assembler.assemble()
print(f"Included {len(result.included_items)} items, excluded {len(result.excluded_items)}")
print(f"Token utilization: {result.utilization:.1%}")

Token Budget Planning

from src.tokens.budget import TokenBudget, BudgetPriority

budget = TokenBudget(total_budget=8000, response_reserve=2000)
budget.add_section("system", system_prompt, 100, priority=BudgetPriority.CRITICAL)
budget.add_section("context", rag_results, 3000, priority=BudgetPriority.HIGH)
budget.add_section("history", chat_history, 2000, priority=BudgetPriority.MEDIUM)

report = budget.allocate()
print(report.summary())

# Rebalance if sections are uneven
rebalanced = budget.rebalance(report)

Retention Benchmarking

from src.benchmarks.retention import RetentionBenchmark

benchmark = RetentionBenchmark()
result = benchmark.evaluate(original_text, compressed_text)

print(f"Overall retention: {result.overall_score:.1%}")
print(f"Key terms: {result.key_term_retention:.1%}")
print(f"Entities: {result.entity_retention:.1%}")
print(f"Numbers: {result.numeric_retention:.1%}")
print(f"Compression ratio: {result.compression_ratio:.1%}")

Design Decisions

Development

pip install -e ".[dev]"
make test          # Run tests with coverage
make lint          # Lint with ruff
make typecheck     # Type check with mypy
make demo          # Run interactive demo

Related Projects

This project is part of a broader AI engineering portfolio:

ai-assistant — Production AI agent framework (Kaya) that uses this toolkit for context optimization
mcp-toolkit-server — MCP server that integrates with context engineering for tool-use optimization
meaningful_metrics — Evaluation framework for measuring AI effectiveness
modern-rag-pipeline — RAG pipeline that applies context engineering principles for retrieval optimization

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
integrations		integrations
notebooks		notebooks
profiles		profiles
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Engineering Toolkit

Why I Built This

Features

Quick Start

Architecture

Usage

Token Counting

Extractive Compression

Priority-Based Context Assembly

Token Budget Planning

Retention Benchmarking

Design Decisions

Development

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Context Engineering Toolkit

Why I Built This

Features

Quick Start

Architecture

Usage

Token Counting

Extractive Compression

Priority-Based Context Assembly

Token Budget Planning

Retention Benchmarking

Design Decisions

Development

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages