PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63

mjehanzaib999 · 2026-02-12T10:05:53Z

PR: M1 — Drop-in Instrumentation & End-to-End Optimization

Branch: feature/M1-instrument-and-optimize → feature/M0-technical-plan

Summary

This PR delivers Milestone 1 of the LangGraph OTEL Instrumentation API: two function calls to instrument and optimize any LangGraph agent end-to-end.

instrument_graph() — one-liner to wrap any StateGraph/CompiledGraph with full OTEL tracing, dual semantic conventions (param.* + gen_ai.*), and auto-derived Binding objects
optimize_graph() — one-liner optimization loop: invoke → flush OTLP → TGJ → ingest_tgj → optimizer backward/step → apply_updates() via bindings → next iteration uses updated prompts
63 tests passing (StubLLM-only, CI-safe) including a 21-test E2E integration suite with a real LangGraph
M1 notebook with dual mode (StubLLM deterministic + Live OpenRouter), Colab badge, and executed outputs

All 7 M1 acceptance gates from the reviewed technical plan are verified.

Architecture

User Code
─────────
ig = instrument_graph(graph, llm, initial_templates={...})
result = optimize_graph(ig, queries=[...], iterations=3)

Under the Hood
──────────────
┌─────────────────────────────────────────────────────────────┐
│  InstrumentedGraph                                          │
│  ├── .session    → TelemetrySession (TracerProvider +       │
│  │                  InMemorySpanExporter, flush_otlp/tgj)   │
│  ├── .tracing_llm → TracingLLM (dual semconv:               │
│  │                    param.* parent + gen_ai.* child spans) │
│  ├── .templates  → {param_key: template_str}                │
│  ├── .bindings   → {param_key: Binding(get, set)}           │
│  └── .invoke()   → delegates to compiled LangGraph          │
└─────────────────────────────────────────────────────────────┘

Optimization Loop (per iteration)
──────────────────────────────────
invoke() → flush_otlp() → OTLP→TGJ → ingest_tgj()
→ ParameterNode + MessageNode → backward(feedback)
→ optimizer.step() → apply_updates(bindings) → next invoke()

New Modules (`opto/trace/io/`)

File	Lines	Purpose
`instrumentation.py`	138	`instrument_graph()` + `InstrumentedGraph` dataclass
`optimization.py`	412	`optimize_graph()` + `EvalResult`, `EvalFn`, `RunResult`, `OptimizationResult`
`telemetry_session.py`	188	`TelemetrySession` — unified OTEL session with `flush_otlp()`, `flush_tgj()`, `export_run_bundle()`
`bindings.py`	105	`Binding` dataclass + `apply_updates()` + `make_dict_binding()`
`otel_semconv.py`	126	`emit_reward()`, `emit_trace()`, `record_genai_chat()`, `set_span_attributes()`

Modified Modules

File	Change
`langgraph_otel_runtime.py`	`TracingLLM` enhanced: dual semconv parent (`param.`) + child (`gen_ai.`) spans; child spans carry `trace.temporal_ignore = "true"` to protect TGJ chaining
`__init__.py`	Exports all new M1 public APIs (21 symbols)

Tests (63 passing)

File	Tests	Scope
`test_bindings.py`	10	`Binding`, `apply_updates()`, strict/non-strict modes
`test_otel_semconv.py`	5	`emit_reward()`, `emit_trace()`, `record_genai_chat()`
`test_telemetry_session.py`	6	`flush_otlp()`, `flush_tgj()`, `record_spans`, `span_attribute_filter`
`test_instrumentation.py`	10	`instrument_graph()`, child span emission, temporal chaining
`test_optimization.py`	11	`EvalResult`, `_normalise_eval()`, data contracts
`test_e2e_m1_pipeline.py`	21	Full E2E: real LangGraph + StubLLM → invoke → OTLP → TGJ → `ParameterNode` → mock optimizer → `apply_updates()` → re-invoke with updated template

Notebook and Docs

File	Description
`examples/notebooks/01_m1_instrument_and_optimize.ipynb`	10 sections, 20 code cells, dual mode (StubLLM + Live OpenRouter), Colab badge, 3 item dataset, `temperature=0`, `max_tokens=256` budget guard, committed with executed outputs
`docs/m1_README.md`	Architecture diagrams, full API reference, data flow pipeline, semantic convention design, temporal chaining contract, file map, quick start, acceptance criteria status
`requirements.txt`	Pinned dependencies for `uv`/`pip` environments

Key Design Decisions

Generic — no hard-coded node names. trainable_keys=None means all nodes are trainable; explicit set to restrict. No "planner" or "synthesizer" baked into the optimization API.
Dual semantic conventions. Each TracingLLM.node_call() emits a parent span (param.* for optimization) and an optional child span (gen_ai.* for Agent Lightning observability). Child spans carry trace.temporal_ignore = "true".
Temporal chaining contract. The OTLP to TGJ adapter only advances the temporal chain on root spans (those without parentSpanId). This prevents child LLM spans from corrupting the planner to synthesizer ordering that the optimizer relies on. Verified by 3 dedicated tests.
Explicit Binding layer. Optimizer output keys map to Binding(get, set) objects. apply_updates() pushes values through the bindings. Auto-derived from initial_templates by default, but users can supply custom bindings for class attributes, DB rows, etc.
Flexible EvalFn contract. eval_fn can return float, str, dict, or EvalResult — all auto-normalized. This avoids forcing users into a rigid evaluation interface.
Closure-based wiring. Node functions close over ig.tracing_llm and ig.templates dict. When apply_updates() modifies the dict, the next invoke() automatically reads the updated values — no globals needed.

M1 Acceptance Gates

#	Gate	Status
1	OTLP export works — `flush_otlp(clear=True)` returns at least 1 span; second flush returns 0	PASS
2	TGJ conversion works — `flush_tgj()` docs consumable by `ingest_tgj()`	PASS
3	Temporal chaining — child spans do NOT advance TGJ temporal chain	PASS
4	Bindings apply deterministically — `strict=True` raises on missing keys	PASS
5	E2E update path (StubLLM) — `optimize_graph(iterations>=2)` changes at least 1 prompt	PASS
6	Notebook live validation — OTLP+TGJ with `param.*` from real provider	PASS
7	Tests + notebook gate — all new APIs have at least 1 pytest; Colab badge present	PASS

Test Plan

Run python -m pytest tests/unit_tests/test_bindings.py tests/unit_tests/test_otel_semconv.py tests/unit_tests/test_telemetry_session.py tests/unit_tests/test_instrumentation.py tests/unit_tests/test_optimization.py tests/features_tests/test_e2e_m1_pipeline.py -v — all 63 pass
Notebook runs end-to-end in StubLLM mode without API keys
Notebook live section runs when OPENROUTER_API_KEY is set (Colab Secrets or .env)
No secrets committed in notebook outputs
notebook_outputs/m1/ artifacts contain valid OTLP JSON and TGJ JSON

…tion do not lose initial node to optimize (TODO: trainer might have a better solution)

…a lot of logs for further analysis

…ns and doc evaluation hooks

- Add T1 technical plan for LangGraph OTEL Instrumentation API - Add architecture & strategy doc (unified OTEL instrumentation design) - Add M0 README with before/after boilerplate reduction comparison - Add feedback analysis and API strategy comparison (Trace-first, dual semconv) - Add prototype_api_validation.py with real LangGraph StateGraph + OpenRouter/StubLLM - Add Jupyter notebook (prototype_api_validation.ipynb) for Colab-ready demo - Add example trace output JSON files (notebook_trace_output, optimization_traces) - Add .env.example for OpenRouter configuration

- Replace hardcoded API key with 3-tier auto-lookup (Colab Secrets → env → .env) - Save all trace outputs to RUN_FOLDER (Google Drive on Colab, local fallback) - Add run_summary.json export with scores and history - Update configuration docs with key setup priority table - Fix Colab badge URL with actual repo/branch path

…ace/io/otel_adapter.py

Deliver Milestone 1 — drop-in OTEL instrumentation and end-to-end optimization for any LangGraph agent via two function calls. New modules (opto/trace/io/): - instrumentation.py: instrument_graph() + InstrumentedGraph wrapper - optimization.py: optimize_graph() loop + EvalResult/EvalFn contracts - telemetry_session.py: TelemetrySession (TracerProvider + flush/export) - bindings.py: Binding dataclass + apply_updates() + make_dict_binding() - otel_semconv.py: emit_reward(), emit_trace(), record_genai_chat() Modified modules: - langgraph_otel_runtime.py: TracingLLM dual semconv (param.* parent + gen_ai.* child spans with trace.temporal_ignore) - __init__.py: export all new M1 public APIs Tests (63 passing, StubLLM-only, CI-safe): - Unit tests for bindings, semconv, session, instrumentation, optimization - E2E integration test (test_e2e_m1_pipeline.py): real LangGraph with StubLLM proving full pipeline instrument → invoke → OTLP → TGJ → optimizer → apply_updates → re-invoke with updated template Notebook + docs: - 01_m1_instrument_and_optimize.ipynb: dual-mode (StubLLM + live OpenRouter), Colab badge, executed outputs, <=3 item dataset, temperature=0, max_tokens=256 budget guard - docs/m1_README.md: architecture, API reference, data flow, semantic conventions, acceptance criteria status - requirements.txt: pinned dependencies for uv/pip environments

A. Live mode error handling: - A1: TracingLLM raises LLMCallError on HTTP errors/empty content instead of passing error strings as assistant content - A2: Notebook only prints [OK] when provider call actually succeeds with non-empty content - A3: gen_ai.provider.name correctly set to "openrouter" (not "openai") when using OpenRouter - A4: optimize_graph forces score=0 on invocation failure, bypassing eval_fn B. TelemetrySession API correctness + redaction: - B5: flush_otlp(clear=False) properly peeks at spans without clearing the exporter - B6: span_attribute_filter now applied during flush_otlp; supports drop (return {}), redact, and truncate C. TGJ/ingest correctness and optimizer safety: - C7: _deduplicate_param_nodes() strips numeric suffixes to collapse duplicate ParameterNodes - C8: _select_output_node() excludes child LLM spans, selects the true sink (synthesizer) D. OTEL topology and temporal chaining: - D9: Root invocation span wraps graph.invoke(), producing a single trace ID per invocation - D10: Temporal chaining uses trace.temporal_ignore attribute instead of OTEL parent presence E. optimize_graph semantics + trace-linked reward: - E11: best_parameters is a real snapshot captured at the best-scoring iteration - E12: eval.score attached to root invocation span before flush, linking reward to trace F. Non-saturating scoring for Stub mode: - F13: StubLLM and eval_fn are structure-aware; stub optimization demonstrates score improvement Files changed: - langgraph_otel_runtime.py: LLMCallError, _validate_content, flush_otlp(clear=) - telemetry_session.py: flush_otlp delegation, _apply_attribute_filter - otel_adapter.py: root span exclusion, trace.temporal_ignore chaining - instrumentation.py: _root_invocation_span context manager, root span on invoke/stream - optimization.py: _deduplicate_param_nodes, _select_output_node, _snapshot_parameters, eval-in-trace - __init__.py: export LLMCallError - test_optimization.py: updated for best_parameters field - 01_m1_instrument_and_optimize.ipynb: all fixes reflected in notebook - test_client_feedback_fixes.py: 20 new tests covering all 13 issues

doxav and others added 21 commits February 12, 2026 15:01

checkpoint of WIP JSON OTEL demo

192949c

working OTEL/LANGGRAPH demo

2f1794b

converted demo JSON/OpenTelemetry to LangGraph

bc0b304

checkpoint

e81ad34

OTEL/JSON/LANGGRAPH demo: add a mechanism to ensure multiple optimiza…

a71e1ed

…tion do not lose initial node to optimize (TODO: trainer might have a better solution)

ADDED batchify for handling the multiple feedback in a batch + ADDED …

53871aa

…a lot of logs for further analysis

working code optimization - TODO: clean, simplify the code

87d3c67

fixed code optimization

da80055

ADD synthtizer prompt in optim score > High score

d88a779

TEST removing span/OTEL from optimized code

d03fec5

fixed and updated LangGraph/Otel demo README

1692a89

restore

1c75117

ADD demo and tests for native LangGraph integration with OTEL tracing

779db55

ADD refactor run_graph_with_otel to support custom evaluation functio…

23a377c

…ns and doc evaluation hooks

ADD implement run_benchmark function to compare different feedback mode

d19ba70

Fix Colab badge URL: replace placeholders with actual repo/branch path

30a89c8

Update T1 tech plan: notebooks + acceptance alignment + fixed opto/tr…

c85baf8

…ace/io/otel_adapter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63

PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63

Uh oh!

mjehanzaib999 commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63

Are you sure you want to change the base?

PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63

Uh oh!

Conversation

mjehanzaib999 commented Feb 12, 2026