Skip to content

feat: add Headroom context compression layer#1675

Open
baduyne wants to merge 1 commit into
agent0ai:mainfrom
baduyne:feat/headroom_context_compression_layer
Open

feat: add Headroom context compression layer#1675
baduyne wants to merge 1 commit into
agent0ai:mainfrom
baduyne:feat/headroom_context_compression_layer

Conversation

@baduyne
Copy link
Copy Markdown

@baduyne baduyne commented May 28, 2026

Summary

Integrate a Headroom-compatible compression layer into Agent Zero to reduce excessive token usage in memory recalls, tool outputs, and LiteLLM requests.

This PR introduces token-aware context optimization before runtime context injection, helping prevent context window overflows and improving long-running agent stability.


Motivation

Agent Zero currently encounters several token/context related problems:

  • oversized memory recalls
  • large tool outputs
  • browser annotation bloat
  • embedding failures from oversized chunks
  • LiteLLM ContextWindowExceededError

As the runtime grows more autonomous and tool-heavy, context accumulation becomes a major scalability issue.

This PR aims to reduce unnecessary token usage while preserving important semantic information.


What This PR Adds

Token-aware context compression

Introduces a compression layer before sending context into LiteLLM pipelines.

Compression targets include:

  • memory recall outputs
  • tool execution logs
  • browser/DOM outputs
  • large intermediate context blocks

Safer memory injection flow

Previous flow:

memory recall
    ↓
append directly to runtime context
    ↓
LiteLLM request

New flow:

memory recall
    ↓
compress / truncate / optimize
    ↓
token validation
    ↓
LiteLLM request

Goals

  • reduce token usage
  • prevent context window overflow
  • improve runtime stability
  • improve support for local/self-hosted models
  • reduce unnecessary LiteLLM payload size

Related Issues

Fixes:

Related to:


I have created the discussion about this sollution #1674

Notes

This PR is designed to be minimally invasive and compatible with the current LiteLLM-based architecture.

The implementation is intended as a foundation for future:

  • adaptive context compression
  • memory summarization
  • tool-output compaction
  • MCP schema optimization
  • token observability metrics

Demo:
image

@3clyp50 @silverqx @regit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants