-
Notifications
You must be signed in to change notification settings - Fork 7
PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mjehanzaib999
wants to merge
21
commits into
AgentOpt:experimental
Choose a base branch
from
mjehanzaib999:m1-for-upstream
base: experimental
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
PR: M1 — Drop-in Instrumentation & End-to-End Optimization #63
mjehanzaib999
wants to merge
21
commits into
AgentOpt:experimental
from
mjehanzaib999:m1-for-upstream
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tion do not lose initial node to optimize (TODO: trainer might have a better solution)
…a lot of logs for further analysis
…ns and doc evaluation hooks
- Add T1 technical plan for LangGraph OTEL Instrumentation API - Add architecture & strategy doc (unified OTEL instrumentation design) - Add M0 README with before/after boilerplate reduction comparison - Add feedback analysis and API strategy comparison (Trace-first, dual semconv) - Add prototype_api_validation.py with real LangGraph StateGraph + OpenRouter/StubLLM - Add Jupyter notebook (prototype_api_validation.ipynb) for Colab-ready demo - Add example trace output JSON files (notebook_trace_output, optimization_traces) - Add .env.example for OpenRouter configuration
- Replace hardcoded API key with 3-tier auto-lookup (Colab Secrets → env → .env) - Save all trace outputs to RUN_FOLDER (Google Drive on Colab, local fallback) - Add run_summary.json export with scores and history - Update configuration docs with key setup priority table - Fix Colab badge URL with actual repo/branch path
…ace/io/otel_adapter.py
Deliver Milestone 1 — drop-in OTEL instrumentation and end-to-end optimization for any LangGraph agent via two function calls. New modules (opto/trace/io/): - instrumentation.py: instrument_graph() + InstrumentedGraph wrapper - optimization.py: optimize_graph() loop + EvalResult/EvalFn contracts - telemetry_session.py: TelemetrySession (TracerProvider + flush/export) - bindings.py: Binding dataclass + apply_updates() + make_dict_binding() - otel_semconv.py: emit_reward(), emit_trace(), record_genai_chat() Modified modules: - langgraph_otel_runtime.py: TracingLLM dual semconv (param.* parent + gen_ai.* child spans with trace.temporal_ignore) - __init__.py: export all new M1 public APIs Tests (63 passing, StubLLM-only, CI-safe): - Unit tests for bindings, semconv, session, instrumentation, optimization - E2E integration test (test_e2e_m1_pipeline.py): real LangGraph with StubLLM proving full pipeline instrument → invoke → OTLP → TGJ → optimizer → apply_updates → re-invoke with updated template Notebook + docs: - 01_m1_instrument_and_optimize.ipynb: dual-mode (StubLLM + live OpenRouter), Colab badge, executed outputs, <=3 item dataset, temperature=0, max_tokens=256 budget guard - docs/m1_README.md: architecture, API reference, data flow, semantic conventions, acceptance criteria status - requirements.txt: pinned dependencies for uv/pip environments
A. Live mode error handling:
- A1: TracingLLM raises LLMCallError on HTTP errors/empty content instead of passing error strings as assistant content
- A2: Notebook only prints [OK] when provider call actually succeeds with non-empty content
- A3: gen_ai.provider.name correctly set to "openrouter" (not "openai") when using OpenRouter
- A4: optimize_graph forces score=0 on invocation failure, bypassing eval_fn
B. TelemetrySession API correctness + redaction:
- B5: flush_otlp(clear=False) properly peeks at spans without clearing the exporter
- B6: span_attribute_filter now applied during flush_otlp; supports drop (return {}), redact, and truncate
C. TGJ/ingest correctness and optimizer safety:
- C7: _deduplicate_param_nodes() strips numeric suffixes to collapse duplicate ParameterNodes
- C8: _select_output_node() excludes child LLM spans, selects the true sink (synthesizer)
D. OTEL topology and temporal chaining:
- D9: Root invocation span wraps graph.invoke(), producing a single trace ID per invocation
- D10: Temporal chaining uses trace.temporal_ignore attribute instead of OTEL parent presence
E. optimize_graph semantics + trace-linked reward:
- E11: best_parameters is a real snapshot captured at the best-scoring iteration
- E12: eval.score attached to root invocation span before flush, linking reward to trace
F. Non-saturating scoring for Stub mode:
- F13: StubLLM and eval_fn are structure-aware; stub optimization demonstrates score improvement
Files changed:
- langgraph_otel_runtime.py: LLMCallError, _validate_content, flush_otlp(clear=)
- telemetry_session.py: flush_otlp delegation, _apply_attribute_filter
- otel_adapter.py: root span exclusion, trace.temporal_ignore chaining
- instrumentation.py: _root_invocation_span context manager, root span on invoke/stream
- optimization.py: _deduplicate_param_nodes, _select_output_node, _snapshot_parameters, eval-in-trace
- __init__.py: export LLMCallError
- test_optimization.py: updated for best_parameters field
- 01_m1_instrument_and_optimize.ipynb: all fixes reflected in notebook
- test_client_feedback_fixes.py: 20 new tests covering all 13 issues
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR: M1 — Drop-in Instrumentation & End-to-End Optimization
Branch:
feature/M1-instrument-and-optimize→feature/M0-technical-planSummary
This PR delivers Milestone 1 of the LangGraph OTEL Instrumentation API: two function calls to instrument and optimize any LangGraph agent end-to-end.
instrument_graph()— one-liner to wrap anyStateGraph/CompiledGraphwith full OTEL tracing, dual semantic conventions (param.*+gen_ai.*), and auto-derivedBindingobjectsoptimize_graph()— one-liner optimization loop: invoke → flush OTLP → TGJ →ingest_tgj→ optimizerbackward/step→apply_updates()via bindings → next iteration uses updated promptsAll 7 M1 acceptance gates from the reviewed technical plan are verified.
Architecture
New Modules (
opto/trace/io/)instrumentation.pyinstrument_graph()+InstrumentedGraphdataclassoptimization.pyoptimize_graph()+EvalResult,EvalFn,RunResult,OptimizationResulttelemetry_session.pyTelemetrySession— unified OTEL session withflush_otlp(),flush_tgj(),export_run_bundle()bindings.pyBindingdataclass +apply_updates()+make_dict_binding()otel_semconv.pyemit_reward(),emit_trace(),record_genai_chat(),set_span_attributes()Modified Modules
langgraph_otel_runtime.pyTracingLLMenhanced: dual semconv parent (param.*) + child (gen_ai.*) spans; child spans carrytrace.temporal_ignore = "true"to protect TGJ chaining__init__.pyTests (63 passing)
test_bindings.pyBinding,apply_updates(), strict/non-strict modestest_otel_semconv.pyemit_reward(),emit_trace(),record_genai_chat()test_telemetry_session.pyflush_otlp(),flush_tgj(),record_spans,span_attribute_filtertest_instrumentation.pyinstrument_graph(), child span emission, temporal chainingtest_optimization.pyEvalResult,_normalise_eval(), data contractstest_e2e_m1_pipeline.pyParameterNode→ mock optimizer →apply_updates()→ re-invoke with updated templateNotebook and Docs
examples/notebooks/01_m1_instrument_and_optimize.ipynbtemperature=0,max_tokens=256budget guard, committed with executed outputsdocs/m1_README.mdrequirements.txtuv/pipenvironmentsKey Design Decisions
Generic — no hard-coded node names.
trainable_keys=Nonemeans all nodes are trainable; explicit set to restrict. No "planner" or "synthesizer" baked into the optimization API.Dual semantic conventions. Each
TracingLLM.node_call()emits a parent span (param.*for optimization) and an optional child span (gen_ai.*for Agent Lightning observability). Child spans carrytrace.temporal_ignore = "true".Temporal chaining contract. The OTLP to TGJ adapter only advances the temporal chain on root spans (those without
parentSpanId). This prevents child LLM spans from corrupting the planner to synthesizer ordering that the optimizer relies on. Verified by 3 dedicated tests.Explicit
Bindinglayer. Optimizer output keys map toBinding(get, set)objects.apply_updates()pushes values through the bindings. Auto-derived frominitial_templatesby default, but users can supply custom bindings for class attributes, DB rows, etc.Flexible
EvalFncontract.eval_fncan returnfloat,str,dict, orEvalResult— all auto-normalized. This avoids forcing users into a rigid evaluation interface.Closure-based wiring. Node functions close over
ig.tracing_llmandig.templatesdict. Whenapply_updates()modifies the dict, the nextinvoke()automatically reads the updated values — no globals needed.M1 Acceptance Gates
flush_otlp(clear=True)returns at least 1 span; second flush returns 0flush_tgj()docs consumable byingest_tgj()strict=Trueraises on missing keysoptimize_graph(iterations>=2)changes at least 1 promptparam.*from real providerTest Plan
python -m pytest tests/unit_tests/test_bindings.py tests/unit_tests/test_otel_semconv.py tests/unit_tests/test_telemetry_session.py tests/unit_tests/test_instrumentation.py tests/unit_tests/test_optimization.py tests/features_tests/test_e2e_m1_pipeline.py -v— all 63 passOPENROUTER_API_KEYis set (Colab Secrets or.env)notebook_outputs/m1/artifacts contain valid OTLP JSON and TGJ JSON