Skip to content

Add blog post: Glassbox — Grab vLLM's Attention#183

Draft
dmaniloff wants to merge 6 commits intovllm-project:mainfrom
dmaniloff:blog/glassbox-intro
Draft

Add blog post: Glassbox — Grab vLLM's Attention#183
dmaniloff wants to merge 6 commits intovllm-project:mainfrom
dmaniloff:blog/glassbox-intro

Conversation

@dmaniloff
Copy link
Copy Markdown

@dmaniloff dmaniloff commented Mar 27, 2026

Summary

  • Introduces glassbox, a vLLM plugin for extracting structured signals from transformer attention during inference
  • Authors: Diego Maniloff, Dominik Dahlem, Mac Misiura — Red Hat AI

Post structure

Three design pillars:

  1. Research-informed signals — five feature groups from current literature + new research: spectral (pre-softmax SVD), AttentionTracker, LLM-Check, LapEigvals (EMNLP 2025), and routing/Hodge features from degree-normalized attention (Dahlem et al., upcoming)
  2. Built for inference — matrix-free SVD via matvec oracles and a fused Triton kernel, configurable overhead (intervals, heads, signals on/off)
  3. vLLM-native — custom attention backend registered via vllm.general_plugins, no source modifications

Also covers:

  • Three run modes: vllm serve, glassbox-run, glassbox-extract
  • Pluggable handler system: JSONL, OpenTelemetry spans, custom handlers
  • Terminal-style demo output from a real OPT-125m run
  • Vision: closing the loop from signals to action — inline detection, Observation Plugin RFC (#36998), external serving, llm-d shadow mode

dmaniloff and others added 6 commits April 15, 2026 16:29
Introduces glassbox, a vLLM plugin for extracting structured signals
from transformer attention during inference. Covers research-informed
signals, matrix-free SVD for inference efficiency, and vLLM-native
integration via custom attention backends.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
…ision

- Add LapEigvals (EMNLP 2025) as 5th signal group
- Update YAML config to current signal names (spectral, routing, tracker, selfattn, laplacian)
- Add "Running glassbox" section with three run modes (vllm serve, glassbox-run, glassbox-extract)
- Add signal emission subsection (JsonlHandler, OtelHandler, custom handlers)
- Rewrite vision section around closing the loop: inline detection, RFC #36998 ABORT/CONTINUE, external serving, llm-d shadow mode
- Fix post-softmax description: clarify difference from FlashAttention tiling

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
Show actual glassbox-run command and cleaned-up log output from a
real OPT-125m run. Add "What the features tell you" subheading for
the ratio trajectory table. Update table values to match real run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Diego Maniloff <diego.maniloff@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant