Skip to content

Observability: _execute_recipe drops tracebacks; callers see only str(e) and exception class #274

@bkrabach

Description

@bkrabach

Summary

When a recipe step raises any exception during dispatch, _execute_recipe's outer except handler in modules/tool-recipes/amplifier_module_tool_recipes/__init__.py:464 catches it and converts it to:

return ToolResult(
    success=False,
    error={
        "message": f"Recipe execution failed: {str(e)}",
        "type": type(e).__name__,
    },
)

The traceback is dropped. Only str(e) and the exception class name are preserved. Callers consuming ToolResult.error get a one-line message with no file/line/stack information. The CLI prints this once at the very end of stdout — but in headless subprocess invocations the message is buried in agent "Thinking..." progress output and the captured-buffer tail truncates before showing it.

Real-world impact

This blindness directly cost ~6 hours of debugging time during Amplifier-Resolve reality-check capability E2E. The recipe consistently failed at step 2 with 'str' object has no attribute 'get'. The actual call site (amplifier-app-cli/lib/merge_utils.py:62 — fixed in microsoft/amplifier-app-cli#169) was 7 stack frames downstream from _execute_recipe. We were unable to find it from ToolResult.error alone, despite five iterations of progressively-deeper instrumentation. The bug was eventually surfaced only by a manual interactive amplifier tool invoke recipes invocation with the outer except patched to call traceback.format_exc().

Proposed fix

Preserve the traceback in the error structure:

import traceback

except Exception as e:
    return ToolResult(
        success=False,
        error={
            "message": f"Recipe execution failed: {str(e)}",
            "type": type(e).__name__,
            "traceback": traceback.format_exc(),  # NEW
        },
    )

The same pattern applies to the corresponding handler in _resume_recipe. Both should preserve the traceback so callers can surface it (in logs, in failure envelopes, in UI).

Risk assessment

  • Compatibility: existing callers that read ToolResult.error.message are unaffected. The new traceback field is additive.
  • Size: tracebacks for typical recipe failures are 500-3000 chars. Negligible for ToolResult payloads which already carry agent prompts and outputs in the kilobyte range.
  • Privacy: tracebacks may include local file paths (no different from any Python exception). If callers need to redact, they can do so on the traceback field specifically.

Suggested test

@pytest.mark.asyncio
async def test_recipe_execution_failure_preserves_traceback():
    """Regression: ToolResult.error must carry a 'traceback' field on failure
    so callers can diagnose without rerunning under a debugger."""
    # Arrange a recipe that will deterministically fail at step dispatch
    # (e.g., reference an agent that doesn't exist)
    ...
    result = await tool.execute(...)
    assert result.success is False
    assert "traceback" in result.error
    assert "File" in result.error["traceback"]
    assert len(result.error["traceback"]) > 50

Discovered while

Running the amplifier-bundle-reality-check recipe inside an Amplifier-Resolve reality-check runner sub-container. The downstream consumer (the runner) had to reverse-engineer what failed inside the recipe by reading events.jsonl post-mortem; preserving the traceback in ToolResult.error would have eliminated that entire debugging path.

🤖 Generated with Amplifier

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions