diff --git a/docs/development.md b/docs/development.md
index b79fb96a..d0d07084 100644
--- a/docs/development.md
+++ b/docs/development.md
@@ -197,7 +197,39 @@ The CLI is built with Commander and uses these conventions:
 
 - **@clack/prompts** for interactive UI
 
-- Support `--verbose`, `--quiet`, `--dry-run` flags
+- Support `--verbose`, `--quiet`, `--debug`, `--dry-run` flags
+
+### Log Levels
+
+The CLI supports four log levels, controlled by flags or `MARKFORM_LOG_LEVEL` environment variable:
+
+| Level | Flag | Description |
+| --- | --- | --- |
+| `quiet` | `--quiet` | Suppress non-essential output |
+| `default` | (none) | Model info, tool calls, result summaries, token counts |
+| `verbose` | `--verbose` | Adds harness config, full result listings |
+| `debug` | `--debug` | Adds full prompts, raw tool inputs/outputs (truncated) |
+
+### Wire Format Capture
+
+Use `--wire-log <file>` to capture the full LLM request/response for debugging:
+
+```bash
+# Capture wire format to YAML file
+pnpm markform fill form.md --model=openai/gpt-5-mini --wire-log session-wire.yaml
+pnpm markform research form.md --model=google/gemini-2.5-flash --wire-log session-wire.yaml
+pnpm markform run form.md --wire-log session-wire.yaml
+
+# Or use environment variable
+MARKFORM_WIRE_LOG=session.yaml pnpm markform research form.md --model=openai/gpt-5-mini
+```
+
+The wire log captures:
+- System and context prompts sent to the LLM
+- Tool definitions
+- Tool calls and results per step
+- Reasoning content (for models with extended thinking)
+- Token usage (including reasoning tokens)
 
 ## Testing
 
diff --git a/docs/project/specs/active/plan-2026-01-04-agent-cli-logging-improvements.md b/docs/project/specs/active/plan-2026-01-04-agent-cli-logging-improvements.md
new file mode 100644
index 00000000..f6962975
--- /dev/null
+++ b/docs/project/specs/active/plan-2026-01-04-agent-cli-logging-improvements.md
@@ -0,0 +1,764 @@
+# Plan Spec: Agent CLI Logging Improvements
+
+## Purpose
+
+This is a technical design doc for improving the logging and CLI experience when running
+markform agents. The goal is to provide:
+
+1. **Comprehensive logging levels** - From basic console output to full wire format capture
+2. **Library-first design** - Callbacks that work for both CLI and programmatic TypeScript usage
+3. **Enhanced console experience** - Better progress display with tool details and web search summaries
+
+## Background
+
+**Current State:**
+
+Markform provides several agent execution commands (`fill`, `research`, `run`) that produce
+turn-by-turn console output showing:
+- Turn numbers with issues list
+- Patches generated per turn with field IDs and values
+- Completion status
+
+The current logging infrastructure includes:
+- `--verbose` flag: Shows token counts, tool calls, full prompts (system + context)
+- `--quiet` flag: Suppresses non-essential output
+- `--record` / `--transcript` flags: Saves session transcript to YAML file
+
+However, there are gaps in the current implementation:
+
+1. **Inconsistent logging levels**: The `research` command has different logging behavior
+   than `fill`, and the callback system isn't consistently wired up across commands.
+
+2. **Limited verbose output**: While `--verbose` shows prompts, it doesn't show:
+   - Web search results and queries
+   - Tool inputs/outputs with timing
+   - Detailed patch validation errors
+
+3. **No wire format capture flag**: The `captureWireFormat` option exists in the API but
+   isn't exposed as a CLI flag. This data is valuable for debugging and understanding
+   the exact prompts sent to the LLM.
+
+4. **Session logging isn't integrated with verbose**: The `--transcript` flag saves
+   session data, but there's no way to capture the full wire format (LLM request/response)
+   without modifying the code.
+
+**Related Docs:**
+- [development.md](../../development.md) - CLI commands and conventions
+- [arch-markform-design.md](../architecture/current/arch-markform-design.md.md)
+
+## Summary of Task
+
+Improve agent CLI logging with three levels of output and better wire format capture:
+
+### Logging Levels
+
+1. **Default (no flags)**: Rich output suitable for interactive use:
+   - Model and provider info at start
+   - Turn numbers with issues list
+   - Tool calls with start notification, query, timing, and duration
+   - Web search: query, result count, timing, and source summary
+   - First 5-8 result titles from web search
+   - Token counts per turn
+   - Patches generated with field IDs and values
+   - Patch validation warnings/errors
+   - Tool summary at end of turn
+   - Completion status
+
+2. **Verbose (`--verbose`)**: Operational details for debugging:
+   - Everything from default
+   - Harness configuration (maxTurns, maxPatches, targetRoles, fillMode)
+   - Detailed issue breakdown by field/group
+   - Full web search result details (all titles, snippets, URLs)
+   - Patch application details (accepted, rejected, reasons)
+   - Field validation details (which validators ran, pass/fail)
+   - Form progress stats (answered, skipped, remaining by priority)
+
+3. **Debug (`--debug` or `LOG_LEVEL=debug`)**: Full diagnostic output:
+   - Everything from verbose
+   - Full system and context prompts each turn
+   - Raw tool inputs and outputs (truncated at 500 chars)
+   - LLM response steps and reasoning
+   - Detailed patch application results
+
+### Wire Format Capture
+
+Add `--wire-log <path>` flag to capture the complete wire format session to a YAML file:
+- Complete LLM request/response for each turn
+- Tool schemas sent to the model
+- All tool calls and their inputs/outputs
+- Token usage statistics
+
+This is distinct from `--transcript` which captures a lighter session summary without
+the full wire format data.
+
+### CLI Flag Design
+
+```
+markform fill <file> --model <model>
+markform research <file> --model <model>
+markform run
+
+New flags:
+  --verbose        Enhanced output with timing, tokens, tool details
+  --debug          Full diagnostic output (or LOG_LEVEL=debug)
+  --wire-log <path>  Capture full wire format session to YAML file
+```
+
+**Environment Variables:**
+- `LOG_LEVEL=debug`: Alternative to `--debug` flag
+- `MARKFORM_WIRE_LOG=<path>`: Alternative to `--wire-log` flag
+
+### Library-First Callback Design
+
+The logging system must work for both CLI and programmatic TypeScript usage. The `FillCallbacks`
+interface should be rich enough that library users can build their own logging/progress UIs.
+
+**Extended Callback Information:**
+
+```typescript
+interface FillCallbacks {
+  // Existing callbacks (unchanged signature)
+  onTurnStart?(turn: { turnNumber: number; issuesCount: number }): void;
+  onIssuesIdentified?(info: { turnNumber: number; issues: InspectIssue[] }): void;
+  onPatchesGenerated?(info: { turnNumber: number; patches: Patch[]; stats?: TurnStats }): void;
+  onTurnComplete?(progress: TurnProgress): void;
+
+  // Enhanced tool callbacks with richer information
+  onToolStart?(call: {
+    name: string;
+    input: unknown;
+    // NEW: Structured input for known tool types
+    toolType?: 'web_search' | 'fill_form' | 'custom';
+    query?: string;  // For web search tools
+  }): void;
+
+  onToolEnd?(call: {
+    name: string;
+    output: unknown;
+    durationMs: number;
+    error?: string;
+    // NEW: Structured output for known tool types
+    toolType?: 'web_search' | 'fill_form' | 'custom';
+    resultCount?: number;  // For web search: number of results
+    sources?: string;  // For web search: source domains (e.g., "imdb.com, wikipedia.org")
+    topResults?: string;  // For web search: first 5-8 result titles with "..."
+    fullResults?: Array<{ index: number; title: string; url: string; snippet?: string }>;
+  }): void;
+
+  onLlmCallStart?(call: { model: string }): void;
+  onLlmCallEnd?(call: { model: string; inputTokens: number; outputTokens: number; reasoningTokens?: number }): void;
+
+  // NEW: Reasoning/thinking capture (for models that support it)
+  onReasoningGenerated?(info: {
+    stepNumber: number;
+    reasoning: Array<{
+      type: 'reasoning' | 'redacted';
+      text?: string;
+    }>;
+  }): void;
+}
+```
+
+**Library Usage Example:**
+
+```typescript
+import { fillForm } from 'markform';
+
+const result = await fillForm({
+  form: markdown,
+  model: 'anthropic/claude-sonnet-4-5',
+  enableWebSearch: true,
+  captureWireFormat: false,
+  callbacks: {
+    onTurnStart: ({ turnNumber }) => {
+      myLogger.info(`Starting turn ${turnNumber}`);
+    },
+    onToolStart: ({ name, query }) => {
+      if (query) {
+        myProgressUI.showSearching(query);
+      }
+    },
+    onToolEnd: ({ name, resultCount, sources, topResults, durationMs }) => {
+      if (resultCount !== undefined) {
+        myProgressUI.showResults(`${resultCount} results (${durationMs}ms)`);
+        if (sources) myLogger.info(`Sources: ${sources}`);
+        if (topResults) myLogger.debug(`Top results: ${topResults}`);
+      }
+    },
+    onPatchesGenerated: ({ patches, stats }) => {
+      myLogger.info(`Generated ${patches.length} patches`);
+      if (stats?.inputTokens) {
+        myMetrics.recordTokens(stats.inputTokens, stats.outputTokens);
+      }
+    },
+  },
+});
+```
+
+### Enhanced Console Progress Display
+
+The CLI should show better real-time progress, especially for tool execution:
+
+**Default Mode (rich output for interactive use):**
+```
+Model: openai/gpt-5-mini (provider: openai)
+Turn 1: 5 issue(s): directors (missing), full_title (missing), ...
+  [web_search] "Pulp Fiction 1994 movie details"
+  ✓ web_search: 8 results (1.2s)
+     Sources: imdb.com, wikipedia.org, rottentomatoes.com
+     Results: "Pulp Fiction (1994) - IMDb", "Pulp Fiction - Wikipedia", ...
+  → 5 patches (tokens: ↓1234 ↑567):
+    full_title (string) = "Pulp Fiction"
+    year (number) = 1994
+    directors (string_list) = [Quentin Tarantino]
+    ...
+  Tools: web_search(1), fill_form(1)
+Turn 2: 3 issue(s): ...
+  ...
+  ✓ Complete
+⏰ Research time: 45.2s
+```
+
+**Verbose Mode (operational details):**
+```
+Model: openai/gpt-5-mini (provider: openai)
+Harness: maxTurns=100, maxPatches=10, targetRoles=[agent], fillMode=continue
+Turn 1: 5 issue(s): directors (missing), full_title (missing), ...
+  Issues by group: movie_info(3), credits(2)
+  [web_search] "Pulp Fiction 1994 movie details"
+  ✓ web_search: 8 results (1.2s)
+     Sources: imdb.com, wikipedia.org, rottentomatoes.com
+     [1] "Pulp Fiction (1994) - IMDb" - imdb.com/title/tt0110912
+     [2] "Pulp Fiction - Wikipedia" - en.wikipedia.org/wiki/Pulp_Fiction
+     [3] "Pulp Fiction - Rotten Tomatoes" - rottentomatoes.com/m/pulp_fiction
+     ... (5 more)
+  → 5 patches (tokens: ↓1234 ↑567):
+    full_title (string) = "Pulp Fiction" [accepted]
+    year (number) = 1994 [accepted]
+    directors (string_list) = [Quentin Tarantino] [accepted]
+    invalid_field (string) = "test" [rejected: field not found]
+    ...
+  Validators: url_validator(2 passed), required(5 passed)
+  Progress: 5 answered, 0 skipped, 12 remaining (3 high, 5 medium, 4 low)
+  Tools: web_search(1), fill_form(1)
+```
+
+**Debug Mode (full diagnostic):**
+```
+Model: openai/gpt-5-mini (provider: openai)
+Harness: maxTurns=100, maxPatches=10, targetRoles=[agent], fillMode=continue
+Turn 1: 5 issue(s): directors (missing), full_title (missing), ...
+  ─── System Prompt ───
+  You are a research assistant...
+  ─── Context Prompt ───
+  # Current Form State
+  ...
+  [web_search] "Pulp Fiction 1994 movie details"
+     Input: { query: "Pulp Fiction 1994 movie details" }
+  ✓ web_search: 8 results (1.2s)
+     Output: { results: [...], total: 8 } ...[truncated]
+  → 5 patches (tokens: ↓1234 ↑567):
+    ...
+```
+
+**Key Console Improvements:**
+1. Default shows model info, token counts, tool summaries, and result titles
+2. Default shows patch validation warnings/errors inline
+3. Use limited indicators: ✓ (success), ❌ (error), → (result), [tool_name] for tool calls
+4. Verbose adds harness config, full result listings, patch accept/reject details, validator info
+5. Debug adds full prompts and raw tool inputs/outputs (truncated at 500 chars)
+
+## Backward Compatibility
+
+**Compatibility Level:** Minor Enhancement (More Informative Default Output)
+
+| Area | Impact |
+| --- | --- |
+| CLI | New optional flags (`--debug`, `--wire-log`); existing flags unchanged |
+| Default behavior | Enhanced with tool call details (more informative, same structure) |
+| Verbose behavior | Enhanced with additional details beyond new default |
+| API | `FillCallbacks` interface extended with optional fields |
+
+**Default Behavior Changes:**
+- Now shows tool call names, queries, and timing (previously only in verbose)
+- Same turn-by-turn structure
+- Same exit codes
+- Same output file handling
+
+**Use `--quiet` for minimal output** (unchanged behavior)
+
+## Stage 1: Planning Stage
+
+### Current Implementation Analysis
+
+**Files involved:**
+- `src/cli/lib/shared.ts` - Core logging utilities (`logInfo`, `logVerbose`, `logError`)
+- `src/cli/lib/fillLogging.ts` - `createFillLoggingCallbacks()` factory
+- `src/cli/lib/fillCallbacks.ts` - Tool-specific callbacks for spinner updates
+- `src/cli/commands/fill.ts` - Fill command implementation with inline logging
+- `src/cli/commands/research.ts` - Research command with different logging pattern
+- `src/harness/harnessTypes.ts` - `FillCallbacks` interface, `TurnStats`, `WireFormat`
+- `src/harness/liveAgent.ts` - Wire format capture in `buildWireFormat()`
+- `src/engine/session.ts` - Session serialization
+
+**Current callback flow:**
+1. `fill.ts` creates inline callbacks and tool callbacks
+2. `research.ts` doesn't use the callback system (uses `runResearch` directly)
+3. `createFillLoggingCallbacks()` provides standard callbacks but isn't used by `research`
+
+**Wire format capture:**
+- `captureWireFormat` option exists in `FillOptions`
+- Wire format is built in `liveAgent.ts::buildWireFormat()`
+- Includes: system prompt, context prompt, tool schemas, LLM response steps
+- Currently only used for golden tests (when `captureWireFormat: true`)
+
+### Feature Requirements
+
+**Must Have:**
+- [ ] Unified logging callback system across `fill`, `research`, and `run` commands
+- [ ] Library-friendly callbacks with structured tool information (query, resultCount, sources, topResults)
+- [ ] Default mode: model info, tool calls, result titles, token counts, tool summary, patch warnings
+- [ ] Verbose mode: harness config, full result listings, patch accept/reject, validator details
+- [ ] Debug mode via `--debug` flag or `LOG_LEVEL=debug` environment variable
+- [ ] `--wire-log <path>` flag to capture full wire format to YAML
+
+**Should Have:**
+- [ ] Consistent spinner/progress behavior across commands
+- [ ] Limited visual indicators per CLI best practices (✓ ❌ → [tool])
+- [ ] Form progress stats in verbose mode (answered, skipped, remaining by priority)
+
+**Won't Have (This Phase):**
+- JSON streaming output format (separate feature)
+- Progress bars instead of spinners
+- Real-time log streaming to external services
+- Custom tool type registration (use 'custom' type for now)
+
+### Acceptance Criteria
+
+**CLI Behavior:**
+
+1. Running `markform research <form> --model <model>` (default mode) shows:
+   - Model and provider info at start
+   - Turn numbers with issues list
+   - Tool calls with name and query (`[web_search] "query"`)
+   - Tool completion with result count, timing, source domains, and first 5-8 titles
+   - Token counts per turn
+   - Patches generated with field IDs and values
+   - Patch validation warnings/errors
+   - Tool summary at end of turn
+   - Total timing
+
+2. Running with `--verbose` additionally shows:
+   - Harness configuration (maxTurns, maxPatches, targetRoles, fillMode)
+   - Issues breakdown by group
+   - Full web search result listings (all titles, snippets, URLs)
+   - Patch accept/reject status with reasons
+   - Validator execution details
+   - Form progress stats (answered, skipped, remaining by priority)
+
+3. Running with `--debug` or `LOG_LEVEL=debug` additionally shows:
+   - Full system prompt each turn
+   - Full context prompt each turn
+   - Raw tool inputs (before execution)
+   - Raw tool outputs (truncated at 500 chars)
+
+4. Running with `--wire-log session.yaml` produces a YAML file containing:
+   - `request.system`: Full system prompt
+   - `request.prompt`: Full context prompt
+   - `request.tools`: Tool schemas
+   - `response.steps`: All tool calls and results
+   - `response.usage`: Token counts
+
+5. All commands (`fill`, `research`, `run`) produce identical logging for the same operations
+
+**Library API:**
+
+6. `fillForm()` accepts callbacks with structured tool information:
+   ```typescript
+   onToolStart: ({ name, query }) => { /* query available for web search */ }
+   onToolEnd: ({ name, resultCount, resultSummary }) => { /* structured results */ }
+   ```
+
+7. Library users can implement their own progress UI using callbacks alone (no CLI dependencies)
+
+## Stage 2: Architecture Stage
+
+### Logging Level Implementation
+
+Add a `LogLevel` enum to `src/cli/lib/cliTypes.ts`:
+
+```typescript
+export type LogLevel = 'quiet' | 'default' | 'verbose' | 'debug';
+
+export interface CommandContext {
+  dryRun: boolean;
+  verbose: boolean;
+  quiet: boolean;
+  debug: boolean;  // NEW
+  logLevel: LogLevel;  // NEW (computed from flags)
+  format: OutputFormat;
+  formsDir?: string;
+  overwrite: boolean;
+}
+```
+
+Derive `logLevel` from flags in `getCommandContext()`:
+- `--quiet` → `'quiet'`
+- No flags → `'default'`
+- `--verbose` → `'verbose'`
+- `--debug` or `LOG_LEVEL=debug` → `'debug'`
+
+### Unified Callback System
+
+Refactor `createFillLoggingCallbacks()` to accept `LogLevel` and provide appropriate
+output for each level:
+
+```typescript
+export function createFillLoggingCallbacks(
+  ctx: CommandContext,
+  options: FillLoggingOptions = {},
+): FillCallbacks {
+  const level = ctx.logLevel;
+
+  return {
+    onIssuesIdentified: ({ turnNumber, issues }) => {
+      if (level === 'quiet') return;
+      logInfo(ctx, `Turn ${turnNumber}: ${formatTurnIssues(issues)}`);
+    },
+
+    onToolStart: ({ name, query }) => {
+      if (level === 'quiet') return;
+      // DEFAULT: Show tool name and query
+      const queryStr = query ? ` "${query}"` : '';
+      logInfo(ctx, `  [${name}]${queryStr}`);
+      options.spinner?.message(`${name}...`);
+      // DEBUG: Show full input
+      if (level === 'debug') {
+        logDebug(ctx, `     Input: ${summarize(input, DEBUG_OUTPUT_TRUNCATION_LIMIT)}`);
+      }
+    },
+
+    onToolEnd: ({ name, resultCount, sources, topResults, fullResults, durationMs, error }) => {
+      if (level === 'quiet') return;
+      if (error) {
+        logInfo(ctx, `  ❌ ${name} failed (${durationMs}ms): ${error}`);
+        return;
+      }
+      // DEFAULT: Show result count, timing, sources, and top results
+      const countStr = resultCount !== undefined ? `${resultCount} results` : 'done';
+      logInfo(ctx, `  ✓ ${name}: ${countStr} (${formatDuration(durationMs)})`);
+      if (sources) {
+        logInfo(ctx, `     Sources: ${sources}`);
+      }
+      if (topResults) {
+        logInfo(ctx, `     Results: ${topResults}`);
+      }
+      // VERBOSE: Show full result listings
+      if ((level === 'verbose' || level === 'debug') && fullResults) {
+        for (const result of fullResults) {
+          logVerbose(ctx, `     [${result.index}] "${result.title}" - ${result.url}`);
+        }
+      }
+      // DEBUG: Show raw output (truncated)
+      if (level === 'debug') {
+        logDebug(ctx, `     Output: ${summarize(output, DEBUG_OUTPUT_TRUNCATION_LIMIT)}`);
+      }
+    },
+
+    // ... other callbacks
+  };
+}
+```
+
+### Wire Format Capture
+
+Add `wireLogPath` option to pass through the fill flow:
+
+1. CLI parses `--wire-log <path>` flag
+2. Sets `captureWireFormat: true` in fill options
+3. After fill completes, writes wire format to the specified path
+4. Uses existing `serializeSession()` or new `serializeWireLog()` function
+
+Wire log file structure:
+```yaml
+session_version: "0.1.0"
+mode: live
+model_id: "openai/gpt-5-mini"
+turns:
+  - turn: 1
+    wire:
+      request:
+        system: "..."
+        prompt: "..."
+        tools: {...}
+      response:
+        steps: [...]
+        usage:
+          input_tokens: 1234
+          output_tokens: 567
+```
+
+### Research Command Integration
+
+Update `research.ts` to use the same callback system as `fill`:
+
+```typescript
+// Create callbacks same as fill command
+const callbacks = createFillLoggingCallbacks(ctx, { spinner });
+
+// Pass to runResearch options
+const result = await runResearch(form, {
+  model: modelId,
+  enableWebSearch: true,
+  captureWireFormat: !!options.wireLog,
+  callbacks,
+  // ... other options
+});
+```
+
+This requires updating `ResearchOptions` to accept callbacks.
+
+### File Changes Summary
+
+| File | Changes |
+| --- | --- |
+| `src/harness/harnessTypes.ts` | Extend `FillCallbacks` with structured tool fields |
+| `src/harness/liveAgent.ts` | Extract and pass structured tool info to callbacks |
+| `src/cli/lib/cliTypes.ts` | Add `LogLevel`, `debug` to `CommandContext` |
+| `src/cli/lib/shared.ts` | Add `logDebug()`, update `getCommandContext()` |
+| `src/cli/lib/fillLogging.ts` | Enhance callbacks for all log levels, add emoji output |
+| `src/cli/lib/toolParsing.ts` | NEW: Helper to extract web search queries and results |
+| `src/cli/cli.ts` | Add `--debug` and `--wire-log` global options |
+| `src/cli/commands/fill.ts` | Wire up `--wire-log`, use unified callbacks |
+| `src/cli/commands/research.ts` | Use unified callbacks, add wire log support |
+| `src/cli/commands/run.ts` | Use unified callbacks |
+| `src/research/runResearch.ts` | Accept callbacks in options |
+
+## Stage 3: Refine Architecture
+
+### Reusable Components Found
+
+1. **Existing callback system** (`FillCallbacks` in `harnessTypes.ts`)
+   - Already supports all the hook points we need
+   - `onToolStart`, `onToolEnd`, `onLlmCallStart`, `onLlmCallEnd` are already defined
+   - Just need to wire them up consistently
+
+2. **Existing wire format capture** (`buildWireFormat()` in `liveAgent.ts`)
+   - Already builds complete wire format
+   - Already captured in `TurnStats.wire`
+   - Just need to expose via CLI flag
+
+3. **Existing session serialization** (`serializeSession()` in `session.ts`)
+   - Already handles YAML output with proper snake_case conversion
+   - Can be used for wire log output
+
+4. **Existing logging utilities** (`shared.ts`)
+   - `logInfo`, `logVerbose`, `logError`, `logWarn` already exist
+   - Just need to add `logDebug` and update context handling
+
+### Simplifications
+
+1. **No new callback interface** - Use existing `FillCallbacks`
+2. **No new serialization** - Extend `SessionTranscript` or use same serializer
+3. **Unified approach** - `research.ts` should use `fillForm()` or at minimum the same callback wiring
+
+### Performance Considerations
+
+- Wire format capture adds memory overhead (storing prompts/responses)
+- Only enable when `--wire-log` is specified
+- No performance impact on default or verbose modes
+
+## Stage 4: Implementation Phase
+
+### Phase 1: Enhanced Callback Types
+
+- [ ] Extend `FillCallbacks.onToolStart` with `toolType`, `query` fields
+- [ ] Extend `FillCallbacks.onToolEnd` with `toolType`, `resultCount`, `resultSummary` fields
+- [ ] Add helper to extract structured info from web search tool inputs/outputs
+- [ ] Update `liveAgent.ts` to populate structured fields in callbacks
+
+### Phase 2: Logging Infrastructure
+
+- [ ] Add `DEBUG_OUTPUT_TRUNCATION_LIMIT = 500` to `settings.ts`
+- [ ] Add `LogLevel` type and `debug` flag to `CommandContext`
+- [ ] Add `logDebug()` function to `shared.ts`
+- [ ] Update `getCommandContext()` to compute `logLevel` from flags and `LOG_LEVEL` env var
+- [ ] Add `--debug` and `--wire-log <path>` to global CLI options
+- [ ] Enhance `createFillLoggingCallbacks()` with log level awareness
+
+### Phase 3: Command Integration
+
+- [ ] Update `fill.ts` to use `createFillLoggingCallbacks()` consistently
+- [ ] Update `research.ts` to use `createFillLoggingCallbacks()`
+- [ ] Update `run.ts` to use `createFillLoggingCallbacks()`
+- [ ] Add wire log output writing after fill completes
+- [ ] Ensure consistent behavior across all commands
+
+### Phase 4: Web Search Result Parsing
+
+- [ ] Add `extractWebSearchResults()` helper to parse provider responses
+- [ ] Extract result count from all providers (OpenAI, Anthropic, Google, XAI)
+- [ ] Extract source domains from URLs (e.g., "imdb.com, wikipedia.org")
+- [ ] Extract first 5-8 result titles with "..." for additional results
+- [ ] Handle provider-specific response structures gracefully
+
+### Phase 5: Agent Reasoning Capture
+
+- [ ] Extend `WireResponseStep` interface with `reasoning?: ReasoningOutput[]`
+- [ ] Extend `WireResponseFormat.usage` with `reasoningTokens?: number`
+- [ ] Update `buildWireFormat()` in `liveAgent.ts` to capture reasoning from steps
+- [ ] Add `onReasoningGenerated?` callback to `FillCallbacks` interface
+- [ ] In verbose mode, display reasoning content (truncated for readability)
+- [ ] In debug mode, display full reasoning content
+- [ ] Ensure reasoning is properly serialized in wire log YAML output
+
+### Phase 6: Testing and Documentation
+
+- [ ] Add unit tests for logging utilities
+- [ ] Add unit tests for structured callback extraction
+- [ ] Add unit tests for reasoning capture
+- [ ] Test all three log levels with example forms
+- [ ] Verify wire log output format matches schema
+- [ ] Update CLI help text and development.md
+- [ ] Add library usage examples to documentation
+
+## Resolved Design Decisions
+
+1. **Wire log format**: Extend `SessionTranscript` with wire format data
+   - **Decision**: Unify with golden test transcript format
+   - Reuse `SessionTranscript` schema, include wire format in each turn
+   - Ensure tool call details are captured (inputs, outputs, timing)
+   - Same format works for `--wire-log`, `--transcript`, and golden tests
+
+2. **Debug output truncation**: Truncate at configurable limit
+   - **Decision**: 500 chars with "...[truncated]" suffix
+   - Add `DEBUG_OUTPUT_TRUNCATION_LIMIT = 500` to `settings.ts`
+
+3. **Environment variable**: `LOG_LEVEL=debug`
+   - **Decision**: `LOG_LEVEL` is fine
+   - Must have equivalent semantics to `--debug` flag
+   - Values: `quiet`, `default`, `verbose`, `debug`
+
+4. **Web search result extraction**: Show first 5-8 result titles/domains
+   - **Decision**: Extract titles and domains from all providers
+   - All providers (OpenAI, Anthropic, Google, XAI) return structured results with titles/URLs
+   - Show: "Sources: IMDb, Wikipedia, ..." (domains extracted from URLs)
+   - Show: "Title 1, Title 2, Title 3, ..." (first 5-8 titles, then "...")
+   - Provider-specific parsing is feasible - all return `title` and `url` fields
+
+5. **Emoji usage**: Follow CLI best practices - limited emoji set
+   - **Decision**: Use only approved emojis per `typescript-cli-tool-rules.md`:
+     - ✅ for success (or ✓ checkmark)
+     - ❌ for failure/error
+     - ⚠️ for warnings
+     - ⏰ for timing information
+   - Avoid excessive emojis like 🔍 - use text labels instead
+   - picocolors handles TTY detection automatically
+
+6. **Callback backward compatibility**: No backward compat needed
+   - **Decision**: Clean break - design for future, not past
+   - New callback fields are required, not optional
+   - This is a hard cut
+
+7. **Progress without spinner**: Use log lines for non-TTY
+   - **Decision**: Non-TTY environments get regular log lines (existing behavior)
+   - `createSpinnerIfTty()` already handles this by returning a no-op spinner
+   - `shouldUseColors()` respects `NO_COLOR` env var and TTY detection
+   - picocolors automatically handles color detection
+   - No new implementation needed - just use existing infrastructure
+
+8. **Agent reasoning/thinking capture**: Capture AI SDK reasoning fields
+   - **Decision**: Extend `WireResponseStep` to include reasoning content
+   - The AI SDK provides the following reasoning-related fields:
+     - `reasoningText`: String with reasoning from the last step
+     - `reasoning`: Array of `ReasoningOutput` objects (type: 'reasoning' | 'redacted', text)
+     - `steps[].reasoning`: Reasoning for each step
+     - `usage.reasoningTokens`: Token count for reasoning (for providers that support it)
+     - `providerMetadata`: Provider-specific data (may contain additional reasoning info)
+   - **Implementation**:
+     - Add `reasoning?: ReasoningOutput[]` field to `WireResponseStep` interface
+     - Add `reasoningTokens?: number` to usage object
+     - Update `buildWireFormat()` to extract reasoning from each step
+     - In verbose/debug modes, show reasoning content in console output
+     - Wire log always captures full reasoning when available
+   - **Console display (verbose mode)**:
+     ```
+     Turn 1: 5 issue(s)
+       [reasoning] "I need to search for information about..."
+       [web_search] "Pulp Fiction 1994"
+       ...
+     ```
+   - **Wire format changes**:
+     ```typescript
+     interface WireResponseStep {
+       toolCalls: WireToolCall[];
+       toolResults: WireToolResult[];
+       text: string | null;
+       reasoning?: Array<{
+         type: 'reasoning' | 'redacted';
+         text?: string;
+       }>;
+     }
+
+     interface WireResponseFormat {
+       steps: WireResponseStep[];
+       usage: {
+         inputTokens: number;
+         outputTokens: number;
+         reasoningTokens?: number;  // NEW
+       };
+     }
+     ```
+   - **Note**: Reasoning availability depends on model/provider. Not all models support
+     extended thinking or expose reasoning content. The implementation should handle
+     missing reasoning gracefully.
+
+9. **Wire format content selection**: Capture selected fields, not entire response
+   - **Decision**: Capture specific useful fields, skip redundant/noisy data
+   - **Capture** (valuable for debugging/analysis):
+     - `steps[].toolCalls` - tool name and input
+     - `steps[].toolResults` - tool name and result
+     - `steps[].text` - model text output
+     - `steps[].reasoning` - reasoning content (when available)
+     - `usage.inputTokens`, `usage.outputTokens`, `usage.reasoningTokens`
+     - `response.id` - response ID for debugging
+     - `response.modelId` - actual model used
+   - **Skip** (redundant or not portable):
+     - `providerMetadata` - provider-specific, not useful across providers
+     - `response.messages` - reconstructable from steps
+     - `finishReason` per step - only final finish reason matters
+     - `isContinued` - internal SDK state, not meaningful for logging
+   - This keeps wire logs focused and diffable while capturing essential info
+
+## Stage 5: Validation Stage
+
+Detailed validation checklists have been moved to the validation spec:
+**[valid-2026-01-04-agent-cli-logging-improvements.md](valid-2026-01-04-agent-cli-logging-improvements.md)**
+
+### Summary of Validation Coverage
+
+1. **Automated Testing**
+   - Unit tests in `fillLogging.test.ts` cover all logging callbacks
+   - TypeScript strict mode, ESLint, and full test suite pass
+   - Integration tests verify end-to-end behavior
+
+2. **Manual Verification**
+   - Visual output review at all log levels
+   - Wire format YAML structure validation
+   - TTY vs non-TTY behavior
+   - Environment variable handling
+
+3. **Acceptance Criteria Verification**
+   - Default mode shows essential info (model, tools, tokens, patches)
+   - Verbose mode adds detail (harness config, full listings)
+   - Debug mode adds diagnostics (prompts, raw I/O)
+   - `--wire-log` produces valid YAML
+   - All commands use unified logging callbacks
+
+### Quality Gates
+
+All changes pass these gates before merge:
+- `pnpm typecheck` - TypeScript strict mode
+- `pnpm lint` - ESLint with --max-warnings 0
+- `pnpm test` - All 1432+ tests pass
+- `pnpm build` - Production bundle succeeds
diff --git a/docs/project/specs/active/review-2026-01-04-cli-logging-system.md b/docs/project/specs/active/review-2026-01-04-cli-logging-system.md
new file mode 100644
index 00000000..d864ed14
--- /dev/null
+++ b/docs/project/specs/active/review-2026-01-04-cli-logging-system.md
@@ -0,0 +1,224 @@
+# Senior Engineering Review: CLI Logging System
+
+**Date:** 2026-01-04
+**PR:** #73 - feat(cli): Implement enhanced CLI logging with multiple log levels
+**Reviewer:** Claude (Senior Engineering Review)
+
+## Executive Summary
+
+The logging system implementation is well-structured and provides a solid foundation. However, there are several issues that need attention to ensure the system works intuitively as both a CLI and library, with clear separation between logging modes.
+
+**Overall Assessment:** Good implementation with architectural cleanup needed.
+
+## Current Architecture
+
+### Log Levels
+- `quiet`: Only errors
+- `default`: Turn info, tool calls, patches, completion status
+- `verbose`: + harness config, full result listings, LLM metadata
+- `debug`: + full prompts, raw tool I/O (truncated)
+
+### Key Files
+- `src/cli/lib/cliTypes.ts` - LogLevel type, CommandContext
+- `src/cli/lib/shared.ts` - logDebug, logVerbose, logInfo, computeLogLevel
+- `src/cli/lib/fillLogging.ts` - createFillLoggingCallbacks factory
+- `src/cli/lib/fillCallbacks.ts` - createCliToolCallbacks (legacy)
+- `src/harness/harnessTypes.ts` - FillCallbacks interface
+- `src/harness/toolParsing.ts` - Structured tool output parsing
+- `src/harness/liveAgent.ts` - Wire format capture
+
+### Wire Format Capture
+- `--wire-log <file>` flag captures full LLM request/response
+- `MARKFORM_WIRE_LOG` environment variable support
+- Session transcript includes wire data when captureWireFormat enabled
+
+---
+
+## Issues Identified
+
+### Issue 1: Duplicate Logging Code in fill.ts (HIGH)
+
+**Problem:** `fill.ts` has inline logging code (lines 486-530) that duplicates the functionality in `fillLogging.ts`. It uses `createCliToolCallbacks` from `fillCallbacks.ts` for spinner updates but then logs patches/stats manually.
+
+**Impact:**
+- Maintenance burden - changes must be made in two places
+- Inconsistent output format between fill and research commands
+- fillCallbacks.ts exists only for fill.ts and does less than fillLogging.ts
+
+**Evidence:**
+```typescript
+// fill.ts lines 486-504 - manual patch logging
+logInfo(ctx, `  → ${pc.yellow(String(patches.length))} patches${tokenSuffix}:`);
+for (const patch of patches) {
+  const typeName = formatPatchType(patch);
+  // ...duplicates fillLogging.ts logic
+}
+```
+
+**Recommendation:** Refactor fill.ts to use createFillLoggingCallbacks like research.ts does.
+
+---
+
+### Issue 2: run.ts Doesn't Pass Model Info to Callbacks (MEDIUM)
+
+**Problem:** In run.ts line 363, `createFillLoggingCallbacks(ctx)` is called without the model/provider options, so the "Model: ..." line never appears.
+
+**Evidence:**
+```typescript
+// run.ts line 363 - missing modelId and provider
+const callbacks = createFillLoggingCallbacks(ctx);
+```
+
+Compared to research.ts:
+```typescript
+// research.ts line 177-181 - correct usage
+const callbacks = createFillLoggingCallbacks(ctx, {
+  spinner,
+  modelId,
+  provider,
+});
+```
+
+**Recommendation:** Pass modelId to createFillLoggingCallbacks in run.ts.
+
+---
+
+### Issue 3: Missing Trace File Capability (HIGH - User Request)
+
+**Problem:** User specifically requested "trace writing to a file capability" for incremental logging during execution. Current `--wire-log` only writes at the end of the session.
+
+**Current Behavior:**
+- `--wire-log` writes complete session at end
+- No incremental output during long-running fills
+- No way to monitor progress in real-time to a file
+
+**Recommendation:** Add `--trace <file>` flag that appends log lines incrementally during execution. This is distinct from --wire-log which captures structured data.
+
+---
+
+### Issue 4: fillCallbacks.ts is Redundant (LOW)
+
+**Problem:** `fillCallbacks.ts` provides `createCliToolCallbacks` which only implements onToolStart and onToolEnd for spinner updates. It does less than `createFillLoggingCallbacks` and is only used by fill.ts.
+
+**Recommendation:** Delete fillCallbacks.ts after refactoring fill.ts to use fillLogging.ts.
+
+---
+
+### Issue 5: Debug Output Truncation Too Short (MEDIUM)
+
+**Problem:** `DEBUG_OUTPUT_TRUNCATION_LIMIT = 500` in settings.ts may be too short for effective debugging of tool outputs.
+
+**Recommendation:**
+- Increase to 2000 or make configurable via environment variable
+- Consider separate limits for prompts vs tool outputs
+
+---
+
+### Issue 6: Inconsistent Spinner Query Display (LOW)
+
+**Problem:** Spinner updates for web search don't consistently show the query.
+
+**Evidence:**
+```typescript
+// fillLogging.ts line 177 - shows query
+options.spinner?.message(`Web search${queryText}...`);
+
+// fillCallbacks.ts line 38 - doesn't show query
+spinner.message(`🔍 Web search...`);
+```
+
+**Recommendation:** Standardize spinner messages to show query when available.
+
+---
+
+### Issue 7: Library Consumer Logging Unclear (MEDIUM)
+
+**Problem:** While FillCallbacks is well-designed, there's no easy way for library consumers to get console logging without implementing all callbacks themselves.
+
+**Recommendation:** Export a `createConsoleCallbacks()` helper from the library that provides default console logging (without CLI-specific features like spinners).
+
+---
+
+### Issue 8: Reasoning Tokens Not Displayed (LOW)
+
+**Problem:** `onLlmCallEnd` callback receives `reasoningTokens` but it's not displayed anywhere in the logging output.
+
+**Evidence:**
+```typescript
+// fillLogging.ts line 251 - reasoningTokens received but not shown
+onLlmCallEnd: ({ model, inputTokens, outputTokens, reasoningTokens }) => {
+  if (shouldShow(ctx, 'verbose')) {
+    const reasoningInfo = reasoningTokens ? ` reasoning=${reasoningTokens}` : '';
+    // reasoningInfo IS shown - this is actually fine
+  }
+}
+```
+
+Actually this is already implemented correctly. ✓
+
+---
+
+## Recommended Improvements
+
+### Priority 1 (HIGH - Should Fix Before Merge)
+
+1. **Unify fill.ts logging with fillLogging.ts**
+   - Refactor fill.ts to use createFillLoggingCallbacks instead of manual logging
+   - Remove createCliToolCallbacks and fillCallbacks.ts after migration
+
+2. **Add --trace flag for incremental file logging**
+   - New flag: `--trace <file>`
+   - Appends log lines during execution (not just at end)
+   - Useful for monitoring long-running fills
+
+### Priority 2 (MEDIUM - Should Fix Soon)
+
+3. **Pass model info in run.ts callbacks**
+   - Update createFillLoggingCallbacks call to include modelId/provider
+
+4. **Increase DEBUG_OUTPUT_TRUNCATION_LIMIT**
+   - Change from 500 to 2000 characters
+   - Consider MARKFORM_DEBUG_TRUNCATION_LIMIT env var
+
+5. **Add library-friendly console callbacks**
+   - Export createConsoleCallbacks() for library consumers
+
+### Priority 3 (LOW - Nice to Have)
+
+6. **Standardize spinner query display**
+   - Always show query in spinner message when available
+
+7. **Document logging levels in README/docs**
+   - Add clear documentation of what each level shows
+
+---
+
+## Testing Recommendations
+
+1. Add integration tests that verify output at each log level
+2. Test trace file output with long-running fills
+3. Test environment variable precedence (MARKFORM_LOG_LEVEL)
+4. Verify fill/research/run commands produce consistent output
+
+---
+
+## Files to Modify
+
+1. `src/cli/commands/fill.ts` - Refactor to use fillLogging.ts
+2. `src/cli/commands/run.ts` - Pass model info to callbacks
+3. `src/cli/lib/fillLogging.ts` - Add trace file support
+4. `src/cli/lib/fillCallbacks.ts` - DELETE after migration
+5. `src/settings.ts` - Increase truncation limit
+6. `src/harness/programmaticFill.ts` - Export console callbacks helper
+7. `docs/development.md` - Document logging levels
+
+---
+
+## Conclusion
+
+The core logging architecture is sound. The main work is:
+1. Consolidating duplicate code (fill.ts → fillLogging.ts)
+2. Adding incremental trace file output
+3. Minor consistency fixes
+
+Estimated effort: 2-4 hours for Priority 1 items.
diff --git a/docs/project/specs/active/valid-2026-01-04-agent-cli-logging-improvements.md b/docs/project/specs/active/valid-2026-01-04-agent-cli-logging-improvements.md
new file mode 100644
index 00000000..0f53e09f
--- /dev/null
+++ b/docs/project/specs/active/valid-2026-01-04-agent-cli-logging-improvements.md
@@ -0,0 +1,361 @@
+# Feature Validation: Agent CLI Logging Improvements
+
+## Purpose
+
+This is a validation spec for the enhanced CLI logging system that provides:
+- Multiple log levels (quiet, default, verbose, debug)
+- Structured tool callback information (web search queries, results, sources)
+- **Trace file support via `--trace` flag for incremental logging during execution**
+- Unified logging callbacks across fill, research, and run commands
+- Reasoning capture in wire format for models with extended thinking
+- Shared utility library (`formatUtils.ts`) for string formatting functions
+
+**Feature Plan:** [plan-2026-01-04-agent-cli-logging-improvements.md](plan-2026-01-04-agent-cli-logging-improvements.md)
+
+**Review Document:** [review-2026-01-04-cli-logging-system.md](review-2026-01-04-cli-logging-system.md)
+
+## Stage 4: Validation Stage
+
+## Validation Planning
+
+This PR implements the comprehensive logging improvements outlined in the plan spec.
+All code changes have been reviewed, type-checked, linted, and tested.
+
+---
+
+## Automated Validation (Testing Performed)
+
+### Unit Testing
+
+- **fillLogging.test.ts** - 19 tests covering all logging callbacks:
+  - `createFillLoggingCallbacks` returns all expected callbacks
+  - `onIssuesIdentified` logs turn number and issues by default
+  - `onIssuesIdentified` does not log when quiet mode is enabled
+  - `onPatchesGenerated` logs patches with field IDs and values
+  - `onPatchesGenerated` shows token counts in output
+  - `onTurnComplete` logs completion status
+  - `onToolStart` logs tool calls in default mode
+  - `onToolStart` logs with query when provided
+  - `onToolEnd` logs with formatted duration (seconds format)
+  - `onToolEnd` logs errors with failure message
+  - `onLlmCallStart` logs model name in verbose mode
+  - `onLlmCallEnd` logs token counts in verbose mode
+  - `onReasoningGenerated` callbacks for thinking content
+  - Spinner integration updates message for web search
+  - **Trace file tests** - createTracer writes header and strips ANSI codes
+
+- **commands.tryscript.md** - 12 CLI command tests including:
+  - `--help` shows all global options including `--debug` and `--trace`
+  - All commands function correctly with updated option parsing
+
+- **logging.tryscript.md** - 11 CLI logging integration tests including:
+  - Default log level output verification
+  - Verbose mode config details
+  - Quiet mode suppression
+  - Trace file creation and content
+
+### Integration Testing
+
+- **Type checking passes** - All 0 TypeScript errors
+- **Lint passes** - All 0 ESLint errors
+- **1460 unit tests pass** - Full test suite green
+- **29 tryscript tests pass** - CLI command integration tests
+- **Build succeeds** - dist/ output verified
+
+### Code Quality Verification
+
+All changes have been verified against the following quality gates:
+- `pnpm run typecheck` - TypeScript strict mode
+- `pnpm run lint` - ESLint with --max-warnings 0
+- `pnpm run test` - Vitest full test suite
+- `pnpm run test:tryscript` - CLI integration tests
+- `pnpm run build` - Production bundle
+
+---
+
+## Manual Testing Completed (2026-01-05, Session 2)
+
+### Test Environment
+- Branch: `claude/review-merge-cli-logging-HznVa`
+- Merged upstream main at commit `b263cbe`
+- OpenAI API key configured
+
+### Test Results Summary
+
+| Test Category | Test | Result | Notes |
+|---------------|------|--------|-------|
+| **Log Levels** | Default level | ✅ PASS | Shows turns, issues, patches with field IDs and values |
+| | --quiet flag | ⚠️ BUG | Turn output suppressed but session transcript still printed (markform-8) |
+| | --verbose flag | ✅ PASS | Shows reading/parsing info, harness config details |
+| | --debug flag | ✅ PASS | Works (no extra output for mock since no LLM calls) |
+| **Trace File** | --trace creates file | ✅ PASS | File created at specified path |
+| | Trace header format | ✅ PASS | `# Markform Trace Log`, timestamp, model info |
+| | Trace content | ✅ PASS | Turns, patches, field values logged |
+| | ANSI stripping | ✅ PASS | No escape codes in trace file (verified with grep) |
+| **Session Recording** | --record flag | ✅ PASS | YAML file created with session structure |
+| | Session content | ✅ PASS | Contains turns, harness config, final status |
+| **Live Agent** | OpenAI connectivity | ✅ PASS | Required proxy preload for Node.js (undici) |
+| | Token counts | ✅ PASS | `(tokens: ↓8174 ↑51)` format works |
+| | LLM call logging | ✅ PASS | `LLM call: gpt-4.1-mini` shown in verbose mode |
+| | Tool usage tracking | ✅ PASS | `Tools: web_search(1)` logged |
+| | Trace file with live | ✅ PASS | All LLM/tool activity captured |
+
+### Detailed Test Results
+
+#### 1. Mock Mode - Default Log Level ✅
+```bash
+markform fill examples/startup-research/startup-research.form.md \
+  --mock --mock-source examples/startup-research/startup-research-mock-filled.form.md
+```
+**Observed output:**
+- `Filling form: <path>` - Form path displayed
+- `Agent: mock` - Agent type shown
+- `Turn 1: 10 issue(s): company_website (missing), ...` - Issues summarized with "+N more"
+- `→ 9 patches:` - Patch count
+- `company_website (url) = "https://..."` - Field ID, type, and value
+- Lists formatted as `[item1, item2, ...]`
+
+#### 2. Mock Mode - Quiet Log Level ⚠️ BUG
+```bash
+markform fill ... --mock --mock-source ... --quiet
+```
+**Observed:**
+- Turn-by-turn output correctly suppressed
+- ⚠️ **Session transcript still printed at end** (markform-8)
+
+#### 3. Mock Mode - Verbose Log Level ✅
+```bash
+markform fill ... --mock --mock-source ... --verbose
+```
+**Additional output observed:**
+- `Reading form: <path>`
+- `Parsing form...`
+- `Reading mock source: <path>`
+- `Max turns: 100`
+- `Max patches per turn: 20`
+- `Max issues per turn: 10`
+- `Target roles: agent`
+- `Fill mode: continue`
+
+#### 4. Mock Mode - Debug Log Level ✅
+```bash
+markform fill ... --mock --mock-source ... --debug
+```
+**Observed:**
+- Same as verbose for mock mode (expected - no LLM calls to show debug info for)
+- Debug callbacks would show prompts, reasoning, tool I/O with live agents
+
+#### 5. Trace File Output ✅
+```bash
+markform fill ... --mock --mock-source ... --trace /tmp/trace-mock.log
+```
+**Trace file content verified:**
+```
+# Markform Trace Log
+# Started: 2026-01-05T19:27:47.892Z
+# Model: unknown
+
+Filling form: /home/user/markform/packages/markform/examples/startup-research/startup-research.form.md
+Agent: mock
+Max turns: 100
+...
+Turn 1: 10 issue(s): company_website (missing), ...
+  → 9 patches:
+    company_website (url) = "https://www.anthropic.com"
+    ...
+```
+- ✅ Header present with timestamp
+- ✅ Model shows "unknown" for mock (correct)
+- ✅ All turn info logged
+- ✅ No ANSI codes (verified with `grep -P '\x1b\['`)
+
+#### 6. Session Recording ✅
+```bash
+markform fill ... --mock --mock-source ... --record /tmp/session.yaml
+```
+**Session YAML content:**
+```yaml
+session_version: 0.1.0
+mode: mock
+form:
+  path: /home/user/markform/packages/markform/examples/simple/simple.form.md
+harness:
+  max_turns: 100
+  max_patches_per_turn: 20
+  max_issues_per_turn: 10
+  target_roles:
+    - agent
+  fill_mode: continue
+turns: []
+final:
+  expect_complete: true
+  expected_completed_form: ...
+mock:
+  completed_mock: ...
+```
+
+#### 7. Live Agent Testing ✅ PASS
+
+Tested with proxy preload:
+```bash
+NODE_OPTIONS="--require /tmp/proxy-preload.js" \
+markform fill examples/startup-research/startup-research.form.md \
+  --model openai/gpt-4.1-mini --max-turns 2 --verbose --trace /tmp/live-test.log
+```
+
+**Observed output:**
+- `LLM call: gpt-4.1-mini` - Model name logged
+- `LLM response: gpt-4.1-mini (in=8174 out=51)` - Token counts
+- `→ 10 patches (tokens: ↓5599 ↑47):` - Patch line with token counts
+- `Tools: web_search(1)` - Tool usage summary
+- System and context prompts shown in verbose mode
+
+**Trace file verified:**
+- Header with timestamp and model
+- All LLM calls and responses logged
+- Token counts recorded
+- No ANSI escape codes
+
+**Note:** Required `undici` ProxyAgent to work around Node.js DNS issues in containerized environment.
+
+---
+
+## Bug Fixes Applied (2026-01-06)
+
+### Server-Side Tool Callbacks (Critical Fix)
+
+**Issue:** Tool callbacks (`onToolStart`, `onToolEnd`) were not firing for server-side tools like OpenAI's `web_search`. This meant console output showed patches and token counts but NO tool usage information like `[web_search] "query"`.
+
+**Root Cause:** OpenAI's `web_search` tool has `execute: undefined` because it executes server-side (not locally via the SDK). The `wrapToolsWithCallbacks` function only wrapped tools with local execute functions, so server-side tools were passed through unwrapped, causing callbacks to never fire.
+
+**Fix Applied to `src/harness/liveAgent.ts`:**
+1. Modified `wrapToolsWithCallbacks` to return `{ tools, wrappedToolNames }` - tracking which tools were wrapped locally
+2. Added code in the step processing loop to fire callbacks for server-side tools by checking step results:
+   - If a tool call is not in `wrappedToolNames`, fire `onToolStart` with extracted info
+   - Fire `onToolEnd` using tool result from `toolResultMap` (built from step.toolResults)
+3. Fixed property names for AI SDK types: `toolCall.input` (not `args`), `toolResult.output` (not `result`)
+
+**Verification:**
+- TypeScript typecheck: ✅ PASS
+- Unit tests: ✅ 1460 tests pass
+- ESLint: ✅ PASS
+- Build: ✅ PASS
+
+---
+
+## Known Issues
+
+### markform-8: --quiet flag doesn't suppress session transcript
+**Status:** Open bug
+**Impact:** Minor UX issue
+**Description:** When using `--quiet`, turn-by-turn logging is correctly suppressed, but the session transcript is still printed at the end. Expected behavior: quiet mode should only show errors.
+
+---
+
+## Reviewer Testing Checklist
+
+The following tests require reviewer verification (blocked by network issues in CI environment):
+
+### Live Agent Tests (Requires API Access)
+- [ ] Test with `--model openai/gpt-4.1-mini` or similar
+- [ ] Verify token counts appear in output: `→ N patch(es) (tokens: ↓500 ↑100):`
+- [ ] Verify LLM call metadata in verbose mode: `LLM call: <model>`, `LLM response: ...`
+- [ ] Verify reasoning output in debug mode (if model supports extended thinking)
+
+### Web Search Tests (Requires Live Agent + Web Search)
+- [ ] Verify `[web_search] "query text"` shows query
+- [ ] Verify `✓ web_search: N results (Xs)` shows results and duration
+- [ ] Verify `Sources: domain1.com, domain2.com` shows domains
+- [ ] Verify trace file captures web search queries and results
+
+### Run Command Tests
+- [ ] Test `markform run` with `--trace` flag
+- [ ] Verify trace file created during form selection workflow
+
+### Research Command Tests
+- [ ] Test `markform research` with `--trace` and `--model`
+- [ ] Verify web search activity logged to trace
+
+### Environment Variable Tests
+- [ ] Test `MARKFORM_TRACE=/tmp/env-trace.log markform fill ...`
+- [ ] Verify `--trace` flag takes precedence over env var
+- [ ] Test `MARKFORM_LOG_LEVEL=debug markform fill ...`
+- [ ] Verify `--debug` flag takes precedence over env var
+
+---
+
+## Files Changed
+
+### New Files
+- `src/utils/formatUtils.ts` - Shared string formatting utilities (stripAnsi, safeTruncate, formatDuration, humanReadableSize, safeStringify)
+- `src/harness/toolParsing.ts` - Web search result extraction utilities
+- `tests/cli/logging.tryscript.md` - CLI logging integration tests
+
+### Modified Files
+- `src/cli/lib/cliTypes.ts` - Added LogLevel type, debug property, traceFile to CommandContext
+- `src/cli/lib/shared.ts` - Added logDebug function, computeLogLevel helper, traceFile extraction
+- `src/cli/lib/traceUtils.ts` - createTracer function, re-exports from formatUtils
+- `src/cli/lib/fillCallbacks.ts` - Enhanced with trace support, LLM/reasoning callbacks
+- `src/cli/cli.ts` - Added --debug and --trace global flags
+- `src/cli/lib/fillLogging.ts` - Enhanced with LogLevel support, structured tool info, trace file support
+- `src/cli/commands/fill.ts` - Trace file support with createTracer helper, updated callbacks
+- `src/cli/commands/research.ts` - Unified callbacks, traceFile support
+- `src/cli/commands/run.ts` - Transcript support via fillForm, traceFile support
+- `src/harness/harnessTypes.ts` - Extended FillCallbacks with structured fields
+- `src/harness/programmaticFill.ts` - Added transcript building when captureWireFormat enabled
+- `src/harness/liveAgent.ts` - Reasoning extraction with text/content property support
+- `src/engine/coreTypes.ts` - Added WireReasoningContent type
+- `src/settings.ts` - Added DEBUG_OUTPUT_TRUNCATION_LIMIT constant
+
+---
+
+## PR Review Comments Addressed
+
+All 11 PR #84 review comments have been addressed:
+
+1. ✅ **2660027464** - Trace flag no-ops on fill - Fixed by adding trace to createCliToolCallbacks
+2. ✅ **2660066343** - --wire-log renamed to --trace consistently
+3. ✅ **2660066678** - Variable naming (tracePathOption)
+4. ✅ **2660067107** - Clean data (no ANSI) written to trace
+5. ✅ **2660067484** - Renamed WireLog to Trace everywhere
+6. ✅ **2660067661** - Wrong name in run.ts fixed
+7. ✅ **2660068216** - Utilities moved to common library (formatUtils.ts)
+8. ✅ **2660068464** - Same (common library)
+9. ✅ **2660068557** - Same (common utility)
+10. ✅ **2660068669** - safeStringify moved to formatUtils.ts
+11. ✅ **2660070263** - tsx dependency removed
+
+---
+
+## Potential Issues to Watch For
+
+1. **Trace file size**: Long-running fills with verbose prompts could create large trace files
+2. **File locking**: Concurrent writes to the same trace file are not protected
+3. **Performance**: Synchronous file I/O for each trace line could slow down execution
+4. **Unicode handling**: Complex characters in field values might not display correctly in trace
+
+---
+
+## Summary
+
+**Automated Testing:** ✅ All 1460 unit tests + 29 tryscript tests pass
+
+**Manual Testing:**
+- ✅ Mock mode at all log levels (default, quiet*, verbose, debug)
+- ✅ Trace file output with ANSI stripping
+- ✅ Session recording (--record)
+- ✅ Live agent with GPT-4.1-mini (token counts, LLM logging, tool tracking)
+
+**Known Bugs:**
+- markform-8: --quiet mode doesn't suppress session transcript (minor)
+
+**All Core Logging Features Verified:**
+- ✅ Token counts: `(tokens: ↓8174 ↑51)` format
+- ✅ LLM call logging: `LLM call: gpt-4.1-mini`
+- ✅ Tool usage tracking: `Tools: web_search(1)`
+- ✅ Trace file captures all activity with no ANSI codes
+- ✅ Debug mode shows system/context prompts
+
+**Reviewer Notes:**
+- Test run and research commands with --trace (not tested due to time)
+- Verify environment variable precedence (MARKFORM_TRACE, MARKFORM_LOG_LEVEL)
diff --git a/package.json b/package.json
index 6aa26164..d6a009fa 100644
--- a/package.json
+++ b/package.json
@@ -24,10 +24,10 @@
     "lint": "eslint . --fix && pnpm typecheck && eslint . --max-warnings 0",
     "lint:check": "pnpm typecheck && eslint . --max-warnings 0",
     "precommit": "pnpm format && pnpm lint:check && pnpm test",
-    "markform": "tsx packages/markform/src/cli/bin.ts",
+    "markform": "npx tsx packages/markform/src/cli/bin.ts",
     "markform:bin": "node packages/markform/dist/bin.mjs",
     "changeset": "changeset",
-    "changeset:add": "tsx scripts/create-changeset.ts",
+    "changeset:add": "npx tsx scripts/create-changeset.ts",
     "version-packages": "changeset version",
     "release": "pnpm build && pnpm publint && changeset publish"
   },
@@ -42,7 +42,6 @@
     "lefthook": "^2.0.13",
     "prettier": "^3.7.4",
     "tryscript": "0.1.1",
-    "tsx": "^4.21.0",
     "typescript": "^5.9.3",
     "typescript-eslint": "^8.51.0"
   }
diff --git a/packages/markform/src/cli/cli.ts b/packages/markform/src/cli/cli.ts
index 9c3a2470..51560a33 100644
--- a/packages/markform/src/cli/cli.ts
+++ b/packages/markform/src/cli/cli.ts
@@ -58,6 +58,8 @@ function createProgram(): Command {
     .showHelpAfterError()
     .option('--verbose', 'Enable verbose output')
     .option('--quiet', 'Suppress non-essential output')
+    .option('--debug', 'Enable debug output (full prompts, raw tool I/O)')
+    .option('--trace <file>', 'Write incremental log output to file during execution')
     .option('--dry-run', 'Show what would be done without making changes')
     .option('--format <format>', `Output format: ${OUTPUT_FORMATS.join(', ')}`, 'console')
     .option('--forms-dir <dir>', `Directory for form output (default: ${DEFAULT_FORMS_DIR})`)
diff --git a/packages/markform/src/cli/commands/fill.ts b/packages/markform/src/cli/commands/fill.ts
index e30e8190..bd1e3423 100644
--- a/packages/markform/src/cli/commands/fill.ts
+++ b/packages/markform/src/cli/commands/fill.ts
@@ -70,6 +70,11 @@ import { formatTurnIssues } from '../lib/formatting.js';
 import { inspect } from '../../engine/inspect.js';
 import { applyPatches } from '../../engine/apply.js';
 import { createCliToolCallbacks } from '../lib/fillCallbacks.js';
+import { createTracer } from '../lib/traceUtils.js';
+
+// =============================================================================
+// Console Formatting
+// =============================================================================
 
 /**
  * Format session transcript for console output.
@@ -349,6 +354,9 @@ export function registerFillCommand(program: Command): void {
           // Create harness
           const harness = createHarness(form, harnessConfig);
 
+          // Create tracer for incremental file logging (no-op if no traceFile)
+          const trace = createTracer(ctx.traceFile, options.model);
+
           // Create agent based on type
           let agent: Agent;
           let mockPath: string | undefined;
@@ -390,9 +398,10 @@ export function registerFillCommand(program: Command): void {
 
             // Create callbacks that reference the mutable spinner
             // Callbacks update spinner during tool execution (especially web search)
-            const callbacks = createCliToolCallbacks(
-              {
-                // Proxy to current spinner (may be null between turns)
+            // Also writes to trace file when --trace is provided
+            const callbacks = createCliToolCallbacks({
+              // Proxy to current spinner (may be null between turns)
+              spinner: {
                 message: (msg) => currentSpinner?.message(msg),
                 update: (context) => currentSpinner?.update(context),
                 stop: (msg) => currentSpinner?.stop(msg),
@@ -400,7 +409,8 @@ export function registerFillCommand(program: Command): void {
                 getElapsedMs: () => currentSpinner?.getElapsedMs() ?? 0,
               },
               ctx,
-            );
+              trace,
+            });
 
             // Pass first target role to agent (for instruction lookup)
             targetRole = targetRoles[0] === '*' ? AGENT_ROLE : (targetRoles[0] ?? AGENT_ROLE);
@@ -421,27 +431,29 @@ export function registerFillCommand(program: Command): void {
           }
 
           logInfo(ctx, pc.cyan(`Filling form: ${filePath}`));
-          logInfo(
-            ctx,
-            `Agent: ${options.mock ? 'mock' : 'live'}${options.model ? ` (${options.model})` : ''}`,
-          );
+          trace(`Filling form: ${filePath}`);
+          const agentInfo = `Agent: ${options.mock ? 'mock' : 'live'}${options.model ? ` (${options.model})` : ''}`;
+          logInfo(ctx, agentInfo);
+          trace(agentInfo);
           logVerbose(ctx, `Max turns: ${harnessConfig.maxTurns}`);
+          trace(`Max turns: ${harnessConfig.maxTurns}`);
           logVerbose(ctx, `Max patches per turn: ${harnessConfig.maxPatchesPerTurn}`);
+          trace(`Max patches per turn: ${harnessConfig.maxPatchesPerTurn}`);
           logVerbose(ctx, `Max issues per turn: ${harnessConfig.maxIssuesPerTurn}`);
-          logVerbose(
-            ctx,
-            `Target roles: ${targetRoles.includes('*') ? '*' : targetRoles.join(', ')}`,
-          );
+          trace(`Max issues per turn: ${harnessConfig.maxIssuesPerTurn}`);
+          const rolesInfo = `Target roles: ${targetRoles.includes('*') ? '*' : targetRoles.join(', ')}`;
+          logVerbose(ctx, rolesInfo);
+          trace(rolesInfo);
           logVerbose(ctx, `Fill mode: ${fillMode}`);
+          trace(`Fill mode: ${fillMode}`);
 
           // Run harness loop
           let stepResult = harness.step();
           // Track rejections for wire format context (helps LLM learn from mistakes)
           let previousRejections: PatchRejection[] | undefined;
-          logInfo(
-            ctx,
-            `${pc.bold(`Turn ${stepResult.turnNumber}:`)} ${formatTurnIssues(stepResult.issues)}`,
-          );
+          const issuesText = formatTurnIssues(stepResult.issues);
+          logInfo(ctx, `${pc.bold(`Turn ${stepResult.turnNumber}:`)} ${issuesText}`);
+          trace(`Turn ${stepResult.turnNumber}: ${issuesText}`);
 
           while (!stepResult.isComplete && !harness.hasReachedMaxTurns()) {
             // Create spinner for LLM call (only for live agent with TTY)
@@ -484,7 +496,11 @@ export function registerFillCommand(program: Command): void {
             const tokenSuffix = stats
               ? ` ${pc.dim(`(tokens: ↓${stats.inputTokens ?? 0} ↑${stats.outputTokens ?? 0})`)}`
               : '';
+            const tokenSuffixPlain = stats
+              ? ` (tokens: ↓${stats.inputTokens ?? 0} ↑${stats.outputTokens ?? 0})`
+              : '';
             logInfo(ctx, `  → ${pc.yellow(String(patches.length))} patches${tokenSuffix}:`);
+            trace(`  → ${patches.length} patches${tokenSuffixPlain}:`);
             for (const patch of patches) {
               const typeName = formatPatchType(patch);
               const value = formatPatchValue(patch);
@@ -496,33 +512,39 @@ export function registerFillCommand(program: Command): void {
                   ctx,
                   `    ${pc.cyan(fieldId)} ${pc.dim(`(${typeName})`)} = ${pc.green(value)}`,
                 );
+                trace(`    ${fieldId} (${typeName}) = ${value}`);
               } else {
                 logInfo(ctx, `    ${pc.dim(`(${typeName})`)} = ${pc.green(value)}`);
+                trace(`    (${typeName}) = ${value}`);
               }
             }
 
             // Log stats and prompts in verbose mode
             if (stats) {
-              logVerbose(
-                ctx,
-                `  Stats: tokens ↓${stats.inputTokens ?? 0} ↑${stats.outputTokens ?? 0}`,
-              );
+              const statsInfo = `  Stats: tokens ↓${stats.inputTokens ?? 0} ↑${stats.outputTokens ?? 0}`;
+              logVerbose(ctx, statsInfo);
+              trace(statsInfo);
               if (stats.toolCalls && stats.toolCalls.length > 0) {
                 const toolSummary = stats.toolCalls.map((t) => `${t.name}(${t.count})`).join(', ');
                 logVerbose(ctx, `  Tools: ${toolSummary}`);
+                trace(`  Tools: ${toolSummary}`);
               }
 
               // Log full prompts in verbose mode
               if (stats.prompts) {
                 logVerbose(ctx, ``);
                 logVerbose(ctx, pc.dim(`  ─── System Prompt ───`));
+                trace(`  ─── System Prompt ───`);
                 for (const line of stats.prompts.system.split('\n')) {
                   logVerbose(ctx, pc.dim(`  ${line}`));
+                  trace(`  ${line}`);
                 }
                 logVerbose(ctx, ``);
                 logVerbose(ctx, pc.dim(`  ─── Context Prompt ───`));
+                trace(`  ─── Context Prompt ───`);
                 for (const line of stats.prompts.context.split('\n')) {
                   logVerbose(ctx, pc.dim(`  ${line}`));
+                  trace(`  ${line}`);
                 }
                 logVerbose(ctx, ``);
               }
@@ -574,13 +596,13 @@ export function registerFillCommand(program: Command): void {
 
             if (stepResult.isComplete) {
               logInfo(ctx, pc.green(`  ✓ Complete`));
+              trace(`  ✓ Complete`);
             } else if (!harness.hasReachedMaxTurns()) {
               // Step for next turn (only if not at max turns)
               stepResult = harness.step();
-              logInfo(
-                ctx,
-                `${pc.bold(`Turn ${stepResult.turnNumber}:`)} ${formatTurnIssues(stepResult.issues)}`,
-              );
+              const nextIssuesText = formatTurnIssues(stepResult.issues);
+              logInfo(ctx, `${pc.bold(`Turn ${stepResult.turnNumber}:`)} ${nextIssuesText}`);
+              trace(`Turn ${stepResult.turnNumber}: ${nextIssuesText}`);
             }
           }
 
@@ -588,12 +610,17 @@ export function registerFillCommand(program: Command): void {
 
           // Check if completed
           if (stepResult.isComplete) {
-            logSuccess(ctx, `Form completed in ${harness.getTurnNumber()} turn(s)`);
+            const successMsg = `Form completed in ${harness.getTurnNumber()} turn(s)`;
+            logSuccess(ctx, successMsg);
+            trace(successMsg);
           } else if (harness.hasReachedMaxTurns()) {
-            logWarn(ctx, `Max turns reached (${harnessConfig.maxTurns})`);
+            const warnMsg = `Max turns reached (${harnessConfig.maxTurns})`;
+            logWarn(ctx, warnMsg);
+            trace(warnMsg);
           }
 
           logTiming(ctx, 'Fill time', durationMs);
+          trace(`Fill time: ${durationMs}ms`);
 
           // Write output file
           // Default to forms directory when --output is not specified
@@ -609,9 +636,11 @@ export function registerFillCommand(program: Command): void {
 
           if (ctx.dryRun) {
             logInfo(ctx, `[DRY RUN] Would write form to: ${outputPath}`);
+            trace(`[DRY RUN] Would write form to: ${outputPath}`);
           } else {
             await writeFile(outputPath, formMarkdown);
             logSuccess(ctx, `Form written to: ${outputPath}`);
+            trace(`Form written to: ${outputPath}`);
           }
 
           // Build session transcript
diff --git a/packages/markform/src/cli/commands/research.ts b/packages/markform/src/cli/commands/research.ts
index 14d31eda..d913498e 100644
--- a/packages/markform/src/cli/commands/research.ts
+++ b/packages/markform/src/cli/commands/research.ts
@@ -28,7 +28,7 @@ import {
 } from '../../settings.js';
 import { getFormsDir } from '../lib/paths.js';
 import {
-  createSpinner,
+  createSpinnerIfTty,
   getCommandContext,
   logError,
   logInfo,
@@ -37,10 +37,12 @@ import {
   logVerbose,
   logWarn,
   readFile,
+  writeFile,
 } from '../lib/shared.js';
 import { exportMultiFormat } from '../lib/exportHelpers.js';
 import { generateVersionedPathInFormsDir } from '../lib/versioning.js';
 import { parseInitialValues, validateInitialValueFields } from '../lib/initialValues.js';
+import { createFillLoggingCallbacks } from '../lib/fillLogging.js';
 
 /**
  * Register the research command.
@@ -167,14 +169,15 @@ export function registerResearchCommand(program: Command): void {
 
         // Create spinner for research operation (only for TTY, not quiet mode)
         // Note: provider and modelName already extracted via parseModelIdForDisplay above
-        const spinner =
-          process.stdout.isTTY && !ctx.quiet
-            ? createSpinner({
-                type: 'api',
-                provider,
-                model: modelName,
-              })
-            : null;
+        const spinner = createSpinnerIfTty({ type: 'api', provider, model: modelName }, ctx);
+
+        // Create unified logging callbacks (with optional trace file)
+        const callbacks = createFillLoggingCallbacks(ctx, {
+          spinner,
+          modelId,
+          provider,
+          traceFile: ctx.traceFile,
+        });
 
         // Run research fill
         let result;
@@ -182,16 +185,17 @@ export function registerResearchCommand(program: Command): void {
           result = await runResearch(form, {
             model: modelId,
             enableWebSearch: true,
-            captureWireFormat: false,
+            captureWireFormat: !!options.transcript,
             maxTurnsTotal: maxTurns,
             maxPatchesPerTurn,
             maxIssuesPerTurn,
             targetRoles: [AGENT_ROLE],
             fillMode: 'continue',
+            callbacks,
           });
-          spinner?.stop();
+          spinner.stop();
         } catch (error) {
-          spinner?.error('Research failed');
+          spinner.error('Research failed');
           throw error;
         }
 
@@ -231,7 +235,6 @@ export function registerResearchCommand(program: Command): void {
         if (options.transcript && result.transcript) {
           const { serializeSession } = await import('../../engine/session.js');
           const transcriptPath = outputPath.replace(/\.form\.md$/, '.session.yaml');
-          const { writeFile } = await import('../lib/shared.js');
           await writeFile(transcriptPath, serializeSession(result.transcript));
           logInfo(ctx, `Transcript: ${transcriptPath}`);
         }
diff --git a/packages/markform/src/cli/commands/run.ts b/packages/markform/src/cli/commands/run.ts
index 2db80210..3580580e 100644
--- a/packages/markform/src/cli/commands/run.ts
+++ b/packages/markform/src/cli/commands/run.ts
@@ -56,6 +56,7 @@ import {
 } from '../lib/shared.js';
 import { createFillLoggingCallbacks } from '../lib/fillLogging.js';
 import { fillForm } from '../../harness/programmaticFill.js';
+import { createTracer } from '../lib/traceUtils.js';
 import { scanFormsDirectory, enrichFormEntry, buildModelOptions } from '../lib/runHelpers.js';
 
 /**
@@ -193,7 +194,15 @@ async function runInteractiveWorkflow(
   console.log(`  ${formatPath(exportResult.schemaPath)}  ${pc.dim('(JSON Schema)')}`);
 
   logTiming(
-    { verbose: false, format: 'console', dryRun: false, quiet: false, overwrite: false },
+    {
+      verbose: false,
+      format: 'console',
+      dryRun: false,
+      quiet: false,
+      debug: false,
+      logLevel: 'default',
+      overwrite: false,
+    },
     'Fill time',
     Date.now() - startTime,
   );
@@ -230,11 +239,29 @@ async function runAgentFillWorkflow(
     `Config: max_turns=${maxTurns}, max_issues_per_turn=${maxIssuesPerTurn}, max_patches_per_turn=${maxPatchesPerTurn}`,
   );
 
-  // Create logging callbacks
-  const callbacks = createFillLoggingCallbacks(ctx);
+  // Parse model ID to extract provider
+  const [provider] = modelId.split('/');
 
-  // Run form fill
+  // Create tracer for incremental file logging (no-op if no traceFile)
   const workflowLabel = isResearch ? 'Research' : 'Agent fill';
+  const trace = createTracer(ctx.traceFile, modelId, workflowLabel);
+
+  // Log workflow configuration to trace
+  trace(`Filling form: ${filePath}`);
+  trace(`Mode: ${workflowLabel}`);
+  trace(`Max turns: ${maxTurns}`);
+  trace(`Max patches per turn: ${maxPatchesPerTurn}`);
+  trace(`Max issues per turn: ${maxIssuesPerTurn}`);
+  trace(`Fill mode: ${overwrite ? 'overwrite' : 'continue'}`);
+
+  // Create logging callbacks with model info and optional trace file
+  const callbacks = createFillLoggingCallbacks(ctx, {
+    modelId,
+    provider,
+    traceFile: ctx.traceFile,
+  });
+
+  // Run form fill
   p.log.step(pc.bold(`${workflowLabel} in progress...`));
 
   const result = await fillForm({
@@ -251,19 +278,28 @@ async function runAgentFillWorkflow(
   });
 
   // Check result
+  const durationMs = Date.now() - startTime;
   if (result.status.ok) {
-    p.log.success(pc.green(`Form completed in ${result.turns} turn(s)`));
+    const successMsg = `Form completed in ${result.turns} turn(s)`;
+    p.log.success(pc.green(successMsg));
+    trace(successMsg);
   } else if (result.status.reason === 'max_turns') {
-    p.log.warn(pc.yellow(`Max turns reached (${maxTurns})`));
+    const warnMsg = `Max turns reached (${maxTurns})`;
+    p.log.warn(pc.yellow(warnMsg));
+    trace(warnMsg);
   } else {
     throw new Error(result.status.message ?? `Fill failed: ${result.status.reason}`);
   }
 
+  trace(`Fill time: ${durationMs}ms`);
+
   // Export
   await ensureFormsDir(formsDir);
   const outputPath = generateVersionedPathInFormsDir(filePath, formsDir);
   const exportResult = await exportMultiFormat(result.form, outputPath);
 
+  trace(`Form written to: ${exportResult.formPath}`);
+
   console.log('');
   p.log.success(`${workflowLabel} complete. Outputs:`);
   console.log(`  ${formatPath(exportResult.reportPath)}  ${pc.dim('(output report)')}`);
@@ -271,7 +307,7 @@ async function runAgentFillWorkflow(
   console.log(`  ${formatPath(exportResult.formPath)}  ${pc.dim('(filled markform source)')}`);
   console.log(`  ${formatPath(exportResult.schemaPath)}  ${pc.dim('(JSON Schema)')}`);
 
-  logTiming(ctx, isResearch ? 'Research time' : 'Fill time', Date.now() - startTime);
+  logTiming(ctx, isResearch ? 'Research time' : 'Fill time', durationMs);
 
   return exportResult;
 }
@@ -300,6 +336,8 @@ export async function runForm(
   const effectiveCtx: CommandContext = ctx ?? {
     verbose: false,
     quiet: false,
+    debug: false,
+    logLevel: 'default',
     dryRun: false,
     format: 'console',
     overwrite,
diff --git a/packages/markform/src/cli/lib/cliTypes.ts b/packages/markform/src/cli/lib/cliTypes.ts
index 211b0a63..61800696 100644
--- a/packages/markform/src/cli/lib/cliTypes.ts
+++ b/packages/markform/src/cli/lib/cliTypes.ts
@@ -22,6 +22,20 @@
  */
 export type OutputFormat = 'console' | 'plaintext' | 'yaml' | 'json' | 'markform' | 'markdown';
 
+// =============================================================================
+// Log Level Types
+// =============================================================================
+
+/**
+ * Log level for CLI output verbosity.
+ *
+ * - quiet: Minimal output, only errors
+ * - default: Normal output with turn info, patches, completion status
+ * - verbose: Additional details like token counts, tool timing, harness config
+ * - debug: Full diagnostic output including prompts, raw tool I/O (truncated)
+ */
+export type LogLevel = 'quiet' | 'default' | 'verbose' | 'debug';
+
 /**
  * Context available to all commands.
  */
@@ -29,11 +43,25 @@ export interface CommandContext {
   dryRun: boolean;
   verbose: boolean;
   quiet: boolean;
+  /** Debug mode for full diagnostic output (--debug or MARKFORM_LOG_LEVEL=debug) */
+  debug: boolean;
+  /**
+   * Computed log level from flags and environment.
+   *
+   * Priority: --quiet > --debug > --verbose > MARKFORM_LOG_LEVEL > default
+   */
+  logLevel: LogLevel;
   format: OutputFormat;
   /** Optional forms directory override from --forms-dir CLI option */
   formsDir?: string;
   /** Whether to overwrite existing field values (default: continue/skip filled) */
   overwrite: boolean;
+  /**
+   * Path to trace file for incremental logging output.
+   * When provided, all log output is also appended to this file (without ANSI colors).
+   * Set via --trace <file> or MARKFORM_TRACE environment variable.
+   */
+  traceFile?: string;
 }
 
 // =============================================================================
diff --git a/packages/markform/src/cli/lib/fillCallbacks.ts b/packages/markform/src/cli/lib/fillCallbacks.ts
index cbb4c04d..6a9c3db5 100644
--- a/packages/markform/src/cli/lib/fillCallbacks.ts
+++ b/packages/markform/src/cli/lib/fillCallbacks.ts
@@ -6,45 +6,104 @@
 
 import type { FillCallbacks } from '../../harness/harnessTypes.js';
 import type { SpinnerHandle } from './shared.js';
-import { logVerbose } from './shared.js';
+import { logVerbose, logDebug } from './shared.js';
 import type { CommandContext } from './cliTypes.js';
+import type { TraceFn } from './traceUtils.js';
+import { truncate, formatDuration } from './traceUtils.js';
+
+/**
+ * Options for creating CLI tool callbacks.
+ */
+export interface CliToolCallbacksOptions {
+  /** Spinner handle for UI feedback */
+  spinner: SpinnerHandle;
+  /** Command context for logging */
+  ctx: CommandContext;
+  /** Optional trace function for file output */
+  trace?: TraceFn;
+}
 
 /**
  * Create FillCallbacks for CLI commands.
  *
  * Provides spinner feedback during tool execution (especially web search).
+ * Also supports trace file output when trace function is provided.
  * Only implements tool callbacks - turn/LLM callbacks are handled by CLI's
  * own logging which has richer context.
  *
- * @param spinner - Active spinner handle to update
- * @param ctx - Command context for verbose logging
+ * @param options - Spinner, context, and optional trace function
  * @returns FillCallbacks with onToolStart and onToolEnd
  *
  * @example
  * ```typescript
  * const spinner = createSpinner({ type: 'api', provider, model });
- * const callbacks = createCliToolCallbacks(spinner, ctx);
+ * const trace = createTracer(ctx.traceFile, modelId);
+ * const callbacks = createCliToolCallbacks({ spinner, ctx, trace });
  * const agent = createLiveAgent({ model, callbacks, ... });
  * ```
  */
 export function createCliToolCallbacks(
-  spinner: SpinnerHandle,
-  ctx: CommandContext,
-): Pick<FillCallbacks, 'onToolStart' | 'onToolEnd'> {
+  options: CliToolCallbacksOptions,
+): Pick<
+  FillCallbacks,
+  'onToolStart' | 'onToolEnd' | 'onLlmCallStart' | 'onLlmCallEnd' | 'onReasoningGenerated'
+> {
+  const { spinner, ctx, trace = () => undefined } = options;
+
   return {
-    onToolStart: ({ name }) => {
+    onToolStart: ({ name, query }) => {
       // Update spinner for web search tools
       if (name.includes('search')) {
-        spinner.message(`🔍 Web search...`);
+        const queryText = query ? ` "${query}"` : '';
+        spinner.message(`🔍 Web search${queryText}...`);
       }
-      logVerbose(ctx, `  Tool started: ${name}`);
+      const queryInfo = query ? ` "${query}"` : '';
+      logVerbose(ctx, `  Tool started: ${name}${queryInfo}`);
+      trace(`  [${name}]${queryInfo}`);
     },
 
-    onToolEnd: ({ name, durationMs, error }) => {
+    onToolEnd: ({ name, durationMs, error, resultCount, sources }) => {
+      const duration = formatDuration(durationMs);
       if (error) {
-        logVerbose(ctx, `  Tool ${name} failed: ${error} (${durationMs}ms)`);
+        logVerbose(ctx, `  Tool ${name} failed: ${error} (${duration})`);
+        trace(`  ❌ ${name} failed (${duration}): ${error}`);
       } else {
-        logVerbose(ctx, `  Tool ${name} completed (${durationMs}ms)`);
+        const countInfo = resultCount !== undefined ? ` (${resultCount} results)` : '';
+        logVerbose(ctx, `  Tool ${name} completed${countInfo} (${duration})`);
+        trace(`  ✓ ${name}${countInfo} (${duration})`);
+        if (sources) {
+          trace(`     Sources: ${sources}`);
+        }
+      }
+    },
+
+    onLlmCallStart: ({ model }) => {
+      logVerbose(ctx, `  LLM call: ${model}`);
+      trace(`  LLM call: ${model}`);
+    },
+
+    onLlmCallEnd: ({ model, inputTokens, outputTokens, reasoningTokens }) => {
+      const reasoningInfo = reasoningTokens ? ` reasoning=${reasoningTokens}` : '';
+      const line = `  LLM response: ${model} (in=${inputTokens} out=${outputTokens}${reasoningInfo})`;
+      logVerbose(ctx, line);
+      trace(line);
+    },
+
+    onReasoningGenerated: ({ stepNumber, reasoning }) => {
+      logDebug(ctx, `  [reasoning step ${stepNumber}]`);
+      trace(`  [reasoning step ${stepNumber}]`);
+      for (const r of reasoning) {
+        if (r.type === 'redacted') {
+          logDebug(ctx, `     [redacted]`);
+          trace(`     [redacted]`);
+        } else if (r.text) {
+          const text = truncate(r.text);
+          logDebug(ctx, `     ${text}`);
+          trace(`     ${text}`);
+        } else {
+          logDebug(ctx, `     [reasoning content not available]`);
+          trace(`     [reasoning content not available]`);
+        }
       }
     },
   };
diff --git a/packages/markform/src/cli/lib/fillLogging.ts b/packages/markform/src/cli/lib/fillLogging.ts
index d41a83ab..6152744a 100644
--- a/packages/markform/src/cli/lib/fillLogging.ts
+++ b/packages/markform/src/cli/lib/fillLogging.ts
@@ -5,25 +5,26 @@
  * run form-filling (fill, run, examples). API consumers can also use
  * these callbacks or implement their own.
  *
- * Default output (always shown unless --quiet):
- * - Turn numbers with issues list (field IDs + issue types)
- * - Patches per turn (field ID + value)
- * - Completion status
+ * Log Levels:
+ * - quiet: Only errors
+ * - default: Turn info, tool calls with queries/results, patches, completion
+ * - verbose: + harness config, full result listings, accept/reject details
+ * - debug: + full prompts, raw tool inputs/outputs (truncated)
  *
- * Verbose output (--verbose flag):
- * - Token counts per turn
- * - Tool call start/end with timing
- * - Detailed stats and LLM metadata
+ * Trace File:
+ * - When traceFile is provided, all log output is also appended to the file
+ * - Useful for monitoring long-running fills and post-hoc debugging
  */
 
 import pc from 'picocolors';
 
-import type { FillCallbacks } from '../../harness/harnessTypes.js';
-import type { CommandContext } from './cliTypes.js';
+import type { FillCallbacks, TurnStats } from '../../harness/harnessTypes.js';
+import type { CommandContext, LogLevel } from './cliTypes.js';
 import type { SpinnerHandle } from './shared.js';
-import { logInfo, logVerbose } from './shared.js';
+import { logInfo, logVerbose, logDebug } from './shared.js';
 import { formatTurnIssues } from './formatting.js';
 import { formatPatchType, formatPatchValue } from './patchFormat.js';
+import { createTracer, truncate, formatDuration, safeStringify } from './traceUtils.js';
 
 // =============================================================================
 // Types
@@ -35,6 +36,26 @@ import { formatPatchType, formatPatchValue } from './patchFormat.js';
 export interface FillLoggingOptions {
   /** Spinner handle for updating during LLM/tool calls */
   spinner?: SpinnerHandle;
+  /** Model identifier for display */
+  modelId?: string;
+  /** Provider name for display */
+  provider?: string;
+  /**
+   * Path to trace file for incremental logging.
+   * When provided, all log output is also appended to this file (without ANSI colors).
+   * The file is created/truncated at start with a timestamp header.
+   */
+  traceFile?: string;
+}
+
+/**
+ * Check if we should show output at this level.
+ */
+function shouldShow(ctx: CommandContext, minLevel: LogLevel): boolean {
+  const levels: LogLevel[] = ['quiet', 'default', 'verbose', 'debug'];
+  const currentIndex = levels.indexOf(ctx.logLevel);
+  const minIndex = levels.indexOf(minLevel);
+  return currentIndex >= minIndex;
 }
 
 // =============================================================================
@@ -44,25 +65,21 @@ export interface FillLoggingOptions {
 /**
  * Create FillCallbacks that produce standard CLI logging output.
  *
- * Default output (always shown unless --quiet):
- * - Turn numbers with issues list (field IDs + issue types)
- * - Patches per turn (field ID + value)
- * - Completion status
- *
- * Verbose output (--verbose flag):
- * - Token counts per turn
- * - Tool call start/end with timing
- * - Detailed stats and LLM metadata
+ * Log Levels:
+ * - quiet: Only errors
+ * - default: Turn info, tool calls with queries/results, patches, completion
+ * - verbose: + harness config, full result listings, accept/reject details
+ * - debug: + full prompts, raw tool inputs/outputs (truncated)
  *
  * This is used by fill, run, and examples commands for consistent output.
  *
- * @param ctx - Command context for verbose/quiet flags
- * @param options - Optional spinner for tool progress
+ * @param ctx - Command context for log level
+ * @param options - Optional spinner and model info
  * @returns FillCallbacks with all logging implemented
  *
  * @example
  * ```typescript
- * const callbacks = createFillLoggingCallbacks(ctx, { spinner });
+ * const callbacks = createFillLoggingCallbacks(ctx, { spinner, modelId, provider });
  * const result = await fillForm({
  *   form: formMarkdown,
  *   model: 'anthropic/claude-sonnet-4-5',
@@ -75,15 +92,38 @@ export function createFillLoggingCallbacks(
   ctx: CommandContext,
   options: FillLoggingOptions = {},
 ): FillCallbacks {
+  // Create tracer for file output (no-op if no traceFile provided)
+  const trace = createTracer(options.traceFile, options.modelId);
+
+  // Show model info at start if provided (default level)
+  if (options.modelId && shouldShow(ctx, 'default')) {
+    const providerInfo = options.provider ? ` (provider: ${options.provider})` : '';
+    const modelLine = pc.bold(`Model: ${options.modelId}${providerInfo}`);
+    logInfo(ctx, modelLine);
+    trace(`Model: ${options.modelId}${providerInfo}`);
+  }
+
   return {
     // DEFAULT: Always show turn number and issues
     onIssuesIdentified: ({ turnNumber, issues }) => {
-      logInfo(ctx, `${pc.bold(`Turn ${turnNumber}:`)} ${formatTurnIssues(issues)}`);
+      if (!shouldShow(ctx, 'default')) return;
+      const issuesText = formatTurnIssues(issues);
+      logInfo(ctx, `${pc.bold(`Turn ${turnNumber}:`)} ${issuesText}`);
+      trace(`Turn ${turnNumber}: ${issuesText}`);
     },
 
     // DEFAULT: Always show patches with field IDs and values
     onPatchesGenerated: ({ patches, stats }) => {
-      logInfo(ctx, `  -> ${pc.yellow(String(patches.length))} patch(es):`);
+      if (!shouldShow(ctx, 'default')) return;
+
+      // Show patches
+      const tokenInfo = formatTokenInfo(stats);
+      const tokenInfoPlain =
+        stats?.inputTokens || stats?.outputTokens
+          ? ` (tokens: ↓${stats.inputTokens ?? 0} ↑${stats.outputTokens ?? 0})`
+          : '';
+      logInfo(ctx, `  → ${pc.yellow(String(patches.length))} patch(es)${tokenInfo}:`);
+      trace(`  → ${patches.length} patch(es)${tokenInfoPlain}:`);
 
       for (const patch of patches) {
         const typeName = formatPatchType(patch);
@@ -93,52 +133,168 @@ export function createFillLoggingCallbacks(
           'fieldId' in patch ? patch.fieldId : patch.op === 'add_note' ? patch.ref : '';
         if (fieldId) {
           logInfo(ctx, `    ${pc.cyan(fieldId)} ${pc.dim(`(${typeName})`)} = ${pc.green(value)}`);
+          trace(`    ${fieldId} (${typeName}) = ${value}`);
         } else {
           logInfo(ctx, `    ${pc.dim(`(${typeName})`)} = ${pc.green(value)}`);
+          trace(`    (${typeName}) = ${value}`);
         }
       }
 
-      // VERBOSE: Token counts and detailed stats
-      if (stats && ctx.verbose) {
-        logVerbose(ctx, `  Tokens: in=${stats.inputTokens ?? 0} out=${stats.outputTokens ?? 0}`);
-        if (stats.toolCalls && stats.toolCalls.length > 0) {
-          const toolSummary = stats.toolCalls.map((t) => `${t.name}(${t.count})`).join(', ');
-          logVerbose(ctx, `  Tools: ${toolSummary}`);
-        }
+      // VERBOSE: Tool summary
+      if (stats?.toolCalls && stats.toolCalls.length > 0 && shouldShow(ctx, 'verbose')) {
+        const toolSummary = stats.toolCalls.map((t) => `${t.name}(${t.count})`).join(', ');
+        logVerbose(ctx, `  Tools: ${toolSummary}`);
+        trace(`  Tools: ${toolSummary}`);
+      }
+
+      // DEBUG: Full prompts
+      if (stats?.prompts && shouldShow(ctx, 'debug')) {
+        logDebug(ctx, `  ─── System Prompt ───`);
+        logDebug(ctx, truncate(stats.prompts.system));
+        logDebug(ctx, `  ─── Context Prompt ───`);
+        logDebug(ctx, truncate(stats.prompts.context));
+        trace(`  ─── System Prompt ───\n${truncate(stats.prompts.system)}`);
+        trace(`  ─── Context Prompt ───\n${truncate(stats.prompts.context)}`);
       }
     },
 
     // DEFAULT: Show completion status
     onTurnComplete: ({ isComplete }) => {
-      if (isComplete) {
+      if (isComplete && shouldShow(ctx, 'default')) {
         logInfo(ctx, pc.green(`  ✓ Complete`));
+        trace(`  ✓ Complete`);
       }
     },
 
-    // VERBOSE: Tool call details (with spinner update for web search)
-    onToolStart: ({ name }) => {
-      // Web search gets spinner update even without --verbose
-      if (name.includes('search')) {
-        options.spinner?.message(`Web search...`);
+    // DEFAULT: Tool calls with queries and structured results
+    onToolStart: ({ name, input, query, toolType }) => {
+      // Update spinner for web search (even in quiet mode)
+      if (toolType === 'web_search' || name.includes('search')) {
+        const queryText = query ? ` "${query}"` : '';
+        options.spinner?.message(`Web search${queryText}...`);
+      }
+
+      if (!shouldShow(ctx, 'default')) return;
+
+      // Show tool start with query if available
+      const queryInfo = query ? ` ${pc.yellow(`"${query}"`)}` : '';
+      const queryInfoPlain = query ? ` "${query}"` : '';
+      logInfo(ctx, `  [${name}]${queryInfo}`);
+      trace(`  [${name}]${queryInfoPlain}`);
+
+      // DEBUG: Show raw input
+      if (shouldShow(ctx, 'debug') && input !== undefined) {
+        const inputStr = truncate(safeStringify(input));
+        logDebug(ctx, `     Input: ${inputStr}`);
+        trace(`     Input: ${inputStr}`);
       }
-      logVerbose(ctx, `  Tool started: ${name}`);
     },
 
-    onToolEnd: ({ name, durationMs, error }) => {
+    onToolEnd: ({
+      name,
+      durationMs,
+      error,
+      toolType,
+      resultCount,
+      sources,
+      topResults,
+      fullResults,
+      output,
+    }) => {
+      if (!shouldShow(ctx, 'default')) return;
+
+      const durationStr = formatDuration(durationMs);
+
       if (error) {
-        logVerbose(ctx, `  Tool ${name} failed: ${error} (${durationMs}ms)`);
+        logInfo(ctx, `  ${pc.red('❌')} ${name} failed (${durationStr}): ${error}`);
+        trace(`  ❌ ${name} failed (${durationStr}): ${error}`);
+        return;
+      }
+
+      // Format result info based on tool type
+      if (toolType === 'web_search') {
+        const countStr = resultCount !== undefined ? `${resultCount} results` : 'done';
+        logInfo(ctx, `  ${pc.green('✓')} ${name}: ${countStr} (${durationStr})`);
+        trace(`  ✓ ${name}: ${countStr} (${durationStr})`);
+
+        // DEFAULT: Show sources and top results
+        if (sources) {
+          logInfo(ctx, `     Sources: ${sources}`);
+          trace(`     Sources: ${sources}`);
+        }
+        if (topResults) {
+          logInfo(ctx, `     Results: ${topResults}`);
+          trace(`     Results: ${topResults}`);
+        }
+
+        // VERBOSE: Show full result listings
+        if (fullResults && fullResults.length > 0 && shouldShow(ctx, 'verbose')) {
+          for (const result of fullResults) {
+            const resultLine = `     [${result.index}] "${result.title}" - ${result.url}`;
+            logVerbose(ctx, resultLine);
+            trace(resultLine);
+          }
+        }
       } else {
-        logVerbose(ctx, `  Tool ${name} completed (${durationMs}ms)`);
+        logInfo(ctx, `  ${pc.green('✓')} ${name}: done (${durationStr})`);
+        trace(`  ✓ ${name}: done (${durationStr})`);
+      }
+
+      // DEBUG: Show raw output (input is available on onToolStart)
+      if (shouldShow(ctx, 'debug') && output !== undefined) {
+        const outputStr = truncate(safeStringify(output));
+        logDebug(ctx, `     Output: ${outputStr}`);
+        trace(`     Output: ${outputStr}`);
       }
     },
 
     // VERBOSE: LLM call metadata
     onLlmCallStart: ({ model }) => {
-      logVerbose(ctx, `  LLM call: ${model}`);
+      if (shouldShow(ctx, 'verbose')) {
+        logVerbose(ctx, `  LLM call: ${model}`);
+        trace(`  LLM call: ${model}`);
+      }
     },
 
-    onLlmCallEnd: ({ model, inputTokens, outputTokens }) => {
-      logVerbose(ctx, `  LLM response: ${model} (in=${inputTokens} out=${outputTokens})`);
+    onLlmCallEnd: ({ model, inputTokens, outputTokens, reasoningTokens }) => {
+      if (shouldShow(ctx, 'verbose')) {
+        const reasoningInfo = reasoningTokens ? ` reasoning=${reasoningTokens}` : '';
+        const line = `  LLM response: ${model} (in=${inputTokens} out=${outputTokens}${reasoningInfo})`;
+        logVerbose(ctx, line);
+        trace(line);
+      }
+    },
+
+    // DEBUG: Reasoning content
+    onReasoningGenerated: ({ stepNumber, reasoning }) => {
+      if (!shouldShow(ctx, 'debug')) return;
+
+      logDebug(ctx, `  [reasoning step ${stepNumber}]`);
+      trace(`  [reasoning step ${stepNumber}]`);
+      for (const r of reasoning) {
+        if (r.type === 'redacted') {
+          logDebug(ctx, `     [redacted]`);
+          trace(`     [redacted]`);
+        } else if (r.text) {
+          const text = truncate(r.text);
+          logDebug(ctx, `     ${text}`);
+          trace(`     ${text}`);
+        } else {
+          // Show placeholder if reasoning item has no text content
+          logDebug(ctx, `     [reasoning content not available]`);
+          trace(`     [reasoning content not available]`);
+        }
+      }
     },
   };
 }
+
+/**
+ * Format token info for patch output.
+ */
+function formatTokenInfo(stats?: TurnStats): string {
+  if (!stats?.inputTokens && !stats?.outputTokens) return '';
+  const inTokens = stats.inputTokens ?? 0;
+  const outTokens = stats.outputTokens ?? 0;
+  return pc.dim(` (tokens: ↓${inTokens} ↑${outTokens})`);
+}
diff --git a/packages/markform/src/cli/lib/shared.ts b/packages/markform/src/cli/lib/shared.ts
index a1943456..92562547 100644
--- a/packages/markform/src/cli/lib/shared.ts
+++ b/packages/markform/src/cli/lib/shared.ts
@@ -12,7 +12,7 @@ import pc from 'picocolors';
 import YAML from 'yaml';
 
 import { convertKeysToSnakeCase } from './naming.js';
-import type { CommandContext, OutputFormat } from './cliTypes.js';
+import type { CommandContext, LogLevel, OutputFormat } from './cliTypes.js';
 
 // =============================================================================
 // Spinner Utility Types
@@ -65,7 +65,7 @@ export interface SpinnerHandle {
 }
 
 // Re-export types for backwards compatibility
-export type { CommandContext, OutputFormat } from './cliTypes.js';
+export type { CommandContext, LogLevel, OutputFormat } from './cliTypes.js';
 
 // =============================================================================
 // Spinner Utility Functions
@@ -213,6 +213,26 @@ export const OUTPUT_FORMATS: OutputFormat[] = [
   'markdown',
 ];
 
+/**
+ * Compute log level from flags and environment.
+ *
+ * Priority: --quiet > --debug > --verbose > MARKFORM_LOG_LEVEL > default
+ */
+function computeLogLevel(opts: { quiet?: boolean; debug?: boolean; verbose?: boolean }): LogLevel {
+  // Flags take precedence over environment
+  if (opts.quiet) return 'quiet';
+  if (opts.debug) return 'debug';
+  if (opts.verbose) return 'verbose';
+
+  // Check environment variable (consistent naming with MARKFORM_ prefix)
+  const envLevel = process.env.MARKFORM_LOG_LEVEL?.toLowerCase();
+  if (envLevel === 'quiet' || envLevel === 'debug' || envLevel === 'verbose') {
+    return envLevel;
+  }
+
+  return 'default';
+}
+
 /**
  * Extract command context from Commander options.
  */
@@ -221,17 +241,28 @@ export function getCommandContext(command: Command): CommandContext {
     dryRun?: boolean;
     verbose?: boolean;
     quiet?: boolean;
+    debug?: boolean;
+    trace?: string;
     format?: OutputFormat;
     formsDir?: string;
     overwrite?: boolean;
   }>();
+
+  const logLevel = computeLogLevel(opts);
+
+  // Trace file: --trace flag or MARKFORM_TRACE env var
+  const traceFile = opts.trace ?? process.env.MARKFORM_TRACE;
+
   return {
     dryRun: opts.dryRun ?? false,
     verbose: opts.verbose ?? false,
     quiet: opts.quiet ?? false,
+    debug: opts.debug ?? false,
+    logLevel,
     format: opts.format ?? 'console',
     formsDir: opts.formsDir,
     overwrite: opts.overwrite ?? false,
+    traceFile,
   };
 }
 
@@ -284,14 +315,25 @@ export function logDryRun(message: string, details?: unknown): void {
 }
 
 /**
- * Log a verbose message (only shown if --verbose is set).
+ * Log a verbose message (only shown if --verbose or --debug is set).
  */
 export function logVerbose(ctx: CommandContext, message: string): void {
-  if (ctx.verbose) {
+  if (ctx.verbose || ctx.debug) {
     console.log(pc.dim(message));
   }
 }
 
+/**
+ * Log a debug message (only shown if --debug is set or MARKFORM_LOG_LEVEL=debug).
+ *
+ * Use for full diagnostic output like raw prompts and tool I/O.
+ */
+export function logDebug(ctx: CommandContext, message: string): void {
+  if (ctx.debug || ctx.logLevel === 'debug') {
+    console.log(pc.magenta(message));
+  }
+}
+
 /**
  * Log an info message (hidden if --quiet is set).
  */
diff --git a/packages/markform/src/cli/lib/traceUtils.ts b/packages/markform/src/cli/lib/traceUtils.ts
new file mode 100644
index 00000000..1bc56657
--- /dev/null
+++ b/packages/markform/src/cli/lib/traceUtils.ts
@@ -0,0 +1,66 @@
+/**
+ * Trace file utilities for CLI logging.
+ *
+ * This module provides utilities for trace file output during command execution.
+ * For general string formatting utilities, see src/utils/formatUtils.ts.
+ */
+
+import { appendFileSync, writeFileSync } from 'node:fs';
+
+import { stripAnsi } from '../../utils/formatUtils.js';
+
+// Re-export common utilities for convenience (backward compatibility)
+export {
+  stripAnsi,
+  safeTruncate,
+  formatDuration,
+  humanReadableSize,
+  safeStringify,
+} from '../../utils/formatUtils.js';
+
+// Alias for backward compatibility
+export { safeTruncate as truncate } from '../../utils/formatUtils.js';
+
+// =============================================================================
+// Trace File Utilities
+// =============================================================================
+
+/** Function type for writing to trace file */
+export type TraceFn = (line: string) => void;
+
+/**
+ * Create a trace function that writes to a file if traceFile is provided.
+ * Returns a no-op function if no trace file is configured.
+ *
+ * The trace file is initialized with a header containing timestamp and model info.
+ * Each call to the returned function appends a line (with ANSI codes stripped).
+ */
+export function createTracer(
+  traceFile: string | undefined,
+  modelId: string | undefined,
+  commandName = 'Markform',
+): TraceFn {
+  if (!traceFile) {
+    return () => undefined; // No-op
+  }
+
+  // Initialize trace file with header
+  const timestamp = new Date().toISOString();
+  const header = `# ${commandName} Trace Log\n# Started: ${timestamp}\n# Model: ${modelId ?? 'unknown'}\n\n`;
+  try {
+    writeFileSync(traceFile, header, 'utf-8');
+  } catch {
+    console.error(`Warning: Could not create trace file: ${traceFile}`);
+    return () => undefined;
+  }
+
+  // Return function that appends lines
+  return (line: string) => {
+    try {
+      const plainLine = stripAnsi(line);
+      appendFileSync(traceFile, plainLine + '\n', 'utf-8');
+    } catch {
+      // Silently ignore write errors to not disrupt main flow
+    }
+  };
+}
diff --git a/packages/markform/src/engine/coreTypes.ts b/packages/markform/src/engine/coreTypes.ts
index cb6c2f11..b193b6b9 100644
--- a/packages/markform/src/engine/coreTypes.ts
+++ b/packages/markform/src/engine/coreTypes.ts
@@ -968,6 +968,17 @@ export interface WireToolResult {
   result: unknown;
 }
 
+/**
+ * Reasoning content from LLM extended thinking.
+ * Captured in wire format for transparency and debugging.
+ */
+export interface WireReasoningContent {
+  /** Type of reasoning content */
+  type: 'reasoning' | 'redacted';
+  /** The reasoning text (present when type='reasoning') */
+  text?: string;
+}
+
 /**
  * A single step in the LLM response.
  * Corresponds to one iteration of the tool-calling loop.
@@ -979,6 +990,8 @@ export interface WireResponseStep {
   toolResults: WireToolResult[];
   /** Text output from the model in this step (null if none) */
   text: string | null;
+  /** Reasoning/thinking content (for models with extended thinking) */
+  reasoning?: WireReasoningContent[];
 }
 
 /**
@@ -1011,6 +1024,8 @@ export interface WireResponseFormat {
   usage: {
     inputTokens: number;
     outputTokens: number;
+    /** Reasoning tokens (for models with extended thinking) */
+    reasoningTokens?: number;
   };
 }
 
@@ -1822,10 +1837,16 @@ export const WireToolResultSchema = z.object({
   result: z.unknown(),
 });
 
+export const WireReasoningContentSchema = z.object({
+  type: z.enum(['reasoning', 'redacted']),
+  text: z.string().optional(),
+});
+
 export const WireResponseStepSchema = z.object({
   toolCalls: z.array(WireToolCallSchema),
   toolResults: z.array(WireToolResultSchema),
   text: z.string().nullable(),
+  reasoning: z.array(WireReasoningContentSchema).optional(),
 });
 
 export const WireRequestFormatSchema = z.object({
@@ -1845,6 +1866,7 @@ export const WireResponseFormatSchema = z.object({
   usage: z.object({
     inputTokens: z.number().int().nonnegative(),
     outputTokens: z.number().int().nonnegative(),
+    reasoningTokens: z.number().int().nonnegative().optional(),
   }),
 });
 
diff --git a/packages/markform/src/harness/harnessTypes.ts b/packages/markform/src/harness/harnessTypes.ts
index 5a058426..2e77820d 100644
--- a/packages/markform/src/harness/harnessTypes.ts
+++ b/packages/markform/src/harness/harnessTypes.ts
@@ -17,8 +17,10 @@ import type {
   ParsedForm,
   Patch,
   PatchRejection,
+  SessionTranscript,
   // Wire format types (defined in coreTypes for session logging)
   WireFormat,
+  WireReasoningContent,
   WireRequestFormat,
   WireResponseFormat,
   WireResponseStep,
@@ -30,6 +32,7 @@ import type { InputContext } from '../engine/valueCoercion.js';
 // Re-export wire format types for convenience
 export type {
   WireFormat,
+  WireReasoningContent,
   WireRequestFormat,
   WireResponseFormat,
   WireResponseStep,
@@ -207,6 +210,37 @@ export interface ProviderInfo {
 // Fill Callbacks
 // =============================================================================
 
+// =============================================================================
+// Tool Types for Callbacks
+// =============================================================================
+
+/**
+ * Tool type classification for structured callback data.
+ */
+export type ToolType = 'web_search' | 'fill_form' | 'custom';
+
+/**
+ * Structured web search result for callback data.
+ */
+export interface WebSearchResult {
+  /** Result index (1-based) */
+  index: number;
+  /** Result title */
+  title: string;
+  /** Result URL */
+  url: string;
+  /** Optional snippet/description */
+  snippet?: string;
+}
+
+/**
+ * Reasoning output from LLM (for models that support extended thinking).
+ */
+export interface ReasoningOutput {
+  type: 'reasoning' | 'redacted';
+  text?: string;
+}
+
 /**
  * Callbacks for observing form-filling execution in real-time.
  *
@@ -223,7 +257,12 @@ export interface ProviderInfo {
  *     onTurnStart: ({ turnNumber }) => console.log(`Starting turn ${turnNumber}`),
  *     onIssuesIdentified: ({ issues }) => console.log(`Found ${issues.length} issues`),
  *     onPatchesGenerated: ({ patches }) => console.log(`Generated ${patches.length} patches`),
- *     onToolStart: ({ name }) => spinner.message(`🔧 ${name}...`),
+ *     onToolStart: ({ name, query }) => {
+ *       if (query) console.log(`Searching: ${query}`);
+ *     },
+ *     onToolEnd: ({ name, resultCount, sources }) => {
+ *       if (resultCount) console.log(`Found ${resultCount} results from ${sources}`);
+ *     },
  *     onTurnComplete: (progress) => console.log(`Turn ${progress.turnNumber} done`),
  *   },
  * });
@@ -242,17 +281,71 @@ export interface FillCallbacks {
   /** Called when a turn completes */
   onTurnComplete?(progress: TurnProgress): void;
 
-  /** Called before a tool executes */
-  onToolStart?(call: { name: string; input: unknown }): void;
+  /**
+   * Called before a tool executes.
+   *
+   * Enhanced with structured information for known tool types.
+   */
+  onToolStart?(call: {
+    /** Tool name */
+    name: string;
+    /** Raw input to the tool */
+    input: unknown;
+    /** Tool type classification */
+    toolType?: ToolType;
+    /** Search query (for web_search tools) */
+    query?: string;
+  }): void;
 
-  /** Called after a tool completes */
-  onToolEnd?(call: { name: string; output: unknown; durationMs: number; error?: string }): void;
+  /**
+   * Called after a tool completes.
+   *
+   * Enhanced with structured information for known tool types.
+   */
+  onToolEnd?(call: {
+    /** Tool name */
+    name: string;
+    /** Raw output from the tool */
+    output: unknown;
+    /** Duration in milliseconds */
+    durationMs: number;
+    /** Error message if tool failed */
+    error?: string;
+    /** Tool type classification */
+    toolType?: ToolType;
+    /** Number of results (for web_search tools) */
+    resultCount?: number;
+    /** Source domains summary (e.g., "imdb.com, wikipedia.org") */
+    sources?: string;
+    /** Top result titles (first 5-8 with "..." for more) */
+    topResults?: string;
+    /** Full structured results (for detailed logging) */
+    fullResults?: WebSearchResult[];
+  }): void;
 
   /** Called before an LLM request */
   onLlmCallStart?(call: { model: string }): void;
 
   /** Called after an LLM response */
-  onLlmCallEnd?(call: { model: string; inputTokens: number; outputTokens: number }): void;
+  onLlmCallEnd?(call: {
+    model: string;
+    inputTokens: number;
+    outputTokens: number;
+    /** Reasoning tokens (for models that support extended thinking) */
+    reasoningTokens?: number;
+  }): void;
+
+  /**
+   * Called when reasoning/thinking content is generated.
+   *
+   * Only fired for models that support extended thinking (e.g., Claude with thinking enabled).
+   */
+  onReasoningGenerated?(info: {
+    /** Step number in the response */
+    stepNumber: number;
+    /** Reasoning content */
+    reasoning: ReasoningOutput[];
+  }): void;
 }
 
 // =============================================================================
@@ -410,4 +503,7 @@ export interface FillResult {
     severity: 'required' | 'recommended';
     priority: number;
   }[];
+  /** Session transcript (present when captureWireFormat is enabled) */
+  transcript?: Partial<SessionTranscript> &
+    Pick<SessionTranscript, 'sessionVersion' | 'mode' | 'turns'>;
 }
diff --git a/packages/markform/src/harness/liveAgent.ts b/packages/markform/src/harness/liveAgent.ts
index 8749272d..38edcf0a 100644
--- a/packages/markform/src/harness/liveAgent.ts
+++ b/packages/markform/src/harness/liveAgent.ts
@@ -20,6 +20,7 @@ import type {
   Patch,
   PatchRejection,
   WireFormat,
+  WireReasoningContent,
   WireResponseStep,
 } from '../engine/coreTypes.js';
 import { PatchSchema } from '../engine/coreTypes.js';
@@ -43,6 +44,7 @@ import {
   getPatchFormatHint,
 } from './prompts.js';
 import { FILL_FORM_TOOL_NAME, FILL_FORM_TOOL_DESCRIPTION } from './toolApi.js';
+import { extractToolStartInfo, extractToolEndInfo } from './toolParsing.js';
 
 // Re-export types for backwards compatibility
 export type { LiveAgentConfig } from './harnessTypes.js';
@@ -157,7 +159,8 @@ export class LiveAgent implements Agent {
     };
 
     // Wrap tools with callbacks for observability
-    const tools = wrapToolsWithCallbacks(rawTools, this.callbacks);
+    // Returns both wrapped tools and set of tool names that have local execute (for tracking)
+    const { tools, wrappedToolNames } = wrapToolsWithCallbacks(rawTools, this.callbacks);
 
     // Get model ID for callbacks (may not be available on all model types)
     const modelId = (this.model as { modelId?: string }).modelId ?? 'unknown';
@@ -180,6 +183,10 @@ export class LiveAgent implements Agent {
       stopWhen: stepCountIs(this.maxStepsPerTurn),
     });
 
+    // Extract reasoningTokens from usage (AI SDK may include this for models with extended thinking)
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any, @typescript-eslint/no-unsafe-member-access
+    const reasoningTokens = (result.usage as any)?.reasoningTokens as number | undefined;
+
     // Call onLlmCallEnd callback (errors don't abort)
     if (this.callbacks?.onLlmCallEnd) {
       try {
@@ -187,6 +194,7 @@ export class LiveAgent implements Agent {
           model: modelId,
           inputTokens: result.usage?.inputTokens ?? 0,
           outputTokens: result.usage?.outputTokens ?? 0,
+          reasoningTokens,
         });
       } catch {
         // Ignore callback errors
@@ -197,18 +205,76 @@ export class LiveAgent implements Agent {
     const patches: Patch[] = [];
     const toolCallCounts = new Map<string, number>();
 
-    for (const step of result.steps) {
+    for (let stepIndex = 0; stepIndex < result.steps.length; stepIndex++) {
+      const step = result.steps[stepIndex]!;
+
+      // Build a map of tool results by toolCallId for matching
+      const toolResultMap = new Map<string, unknown>();
+      for (const toolResult of step.toolResults) {
+        if ('toolCallId' in toolResult) {
+          toolResultMap.set(toolResult.toolCallId, toolResult.output);
+        }
+      }
+
       for (const toolCall of step.toolCalls) {
         // Count tool calls
         const count = toolCallCounts.get(toolCall.toolName) ?? 0;
         toolCallCounts.set(toolCall.toolName, count + 1);
 
+        // Fire callbacks for server-side tools (those not wrapped locally)
+        // These include OpenAI's web_search which executes server-side
+        if (!wrappedToolNames.has(toolCall.toolName) && this.callbacks) {
+          // Fire onToolStart
+          if (this.callbacks.onToolStart) {
+            try {
+              const startInfo = extractToolStartInfo(toolCall.toolName, toolCall.input);
+              this.callbacks.onToolStart(startInfo);
+            } catch {
+              // Ignore callback errors
+            }
+          }
+
+          // Fire onToolEnd with result if available
+          if (this.callbacks.onToolEnd) {
+            try {
+              const toolResult = toolResultMap.get(toolCall.toolCallId);
+              // Server-side tools don't have timing info, use 0
+              const endInfo = extractToolEndInfo(toolCall.toolName, toolResult, 0);
+              this.callbacks.onToolEnd(endInfo);
+            } catch {
+              // Ignore callback errors
+            }
+          }
+        }
+
         // Extract patches from fill_form calls
         if (toolCall.toolName === FILL_FORM_TOOL_NAME && 'input' in toolCall) {
           const input = toolCall.input as { patches: Patch[] };
           patches.push(...input.patches);
         }
       }
+
+      // Extract reasoning from step (AI SDK exposes this for models with extended thinking)
+      // Different providers may use different property names (text, content, etc.)
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any, @typescript-eslint/no-unsafe-member-access
+      const stepReasoning = (step as any).reasoning as
+        | { type?: string; text?: string; content?: string }[]
+        | undefined;
+      if (stepReasoning && stepReasoning.length > 0 && this.callbacks?.onReasoningGenerated) {
+        try {
+          const reasoningOutput = stepReasoning.map((r) => ({
+            type: r.type === 'redacted' ? ('redacted' as const) : ('reasoning' as const),
+            // Support both 'text' and 'content' property names
+            text: r.text ?? r.content,
+          }));
+          this.callbacks.onReasoningGenerated({
+            stepNumber: stepIndex + 1,
+            reasoning: reasoningOutput,
+          });
+        } catch {
+          // Ignore callback errors
+        }
+      }
     }
 
     // Build tool call stats
@@ -315,22 +381,49 @@ function buildWireFormat(
       toolCalls: { toolName: string; input?: unknown }[];
       toolResults?: { toolName: string; result?: unknown }[];
       text?: string | null;
+      reasoning?: { type: string; text?: string }[];
     }[];
-    usage?: { inputTokens?: number; outputTokens?: number };
+    usage?: { inputTokens?: number; outputTokens?: number; reasoningTokens?: number };
   },
 ): WireFormat {
   // Build response steps (omit toolCallId for stability)
-  const steps: WireResponseStep[] = result.steps.map((step) => ({
-    toolCalls: step.toolCalls.map((tc) => ({
-      toolName: tc.toolName,
-      input: sortObjectKeys(tc.input),
-    })),
-    toolResults: (step.toolResults ?? []).map((tr) => ({
-      toolName: tr.toolName,
-      result: sortObjectKeys(tr.result),
-    })),
-    text: step.text ?? null,
-  }));
+  const steps: WireResponseStep[] = result.steps.map((step) => {
+    const wireStep: WireResponseStep = {
+      toolCalls: step.toolCalls.map((tc) => ({
+        toolName: tc.toolName,
+        input: sortObjectKeys(tc.input),
+      })),
+      toolResults: (step.toolResults ?? []).map((tr) => ({
+        toolName: tr.toolName,
+        result: sortObjectKeys(tr.result),
+      })),
+      text: step.text ?? null,
+    };
+
+    // Include reasoning if present (for models with extended thinking)
+    // Support both 'text' and 'content' property names for different providers
+    if (step.reasoning && step.reasoning.length > 0) {
+      wireStep.reasoning = step.reasoning.map((r): WireReasoningContent => {
+        // eslint-disable-next-line @typescript-eslint/no-explicit-any, @typescript-eslint/no-unsafe-member-access
+        const content = (r as any).content as string | undefined;
+        return {
+          type: r.type === 'redacted' ? 'redacted' : 'reasoning',
+          text: r.text ?? content,
+        };
+      });
+    }
+
+    return wireStep;
+  });
+
+  // Build usage with optional reasoningTokens
+  const usage: WireFormat['response']['usage'] = {
+    inputTokens: result.usage?.inputTokens ?? 0,
+    outputTokens: result.usage?.outputTokens ?? 0,
+  };
+  if (result.usage?.reasoningTokens !== undefined) {
+    usage.reasoningTokens = result.usage.reasoningTokens;
+  }
 
   return {
     request: {
@@ -340,10 +433,7 @@ function buildWireFormat(
     },
     response: {
       steps,
-      usage: {
-        inputTokens: result.usage?.inputTokens ?? 0,
-        outputTokens: result.usage?.outputTokens ?? 0,
-      },
+      usage,
     },
   };
 }
@@ -572,14 +662,19 @@ function findField(form: ParsedForm, fieldId: string) {
  *
  * Only wraps tools that have an execute function.
  * Declarative tools (schema only) are passed through unchanged.
+ *
+ * Returns both the wrapped tools and a set of tool names that were wrapped,
+ * so we can fire callbacks for server-side tools from step results.
  */
 function wrapToolsWithCallbacks(
   tools: Record<string, Tool>,
   callbacks?: FillCallbacks,
-): Record<string, Tool> {
+): { tools: Record<string, Tool>; wrappedToolNames: Set<string> } {
+  const wrappedToolNames = new Set<string>();
+
   // Skip wrapping if no tool callbacks
   if (!callbacks?.onToolStart && !callbacks?.onToolEnd) {
-    return tools;
+    return { tools, wrappedToolNames };
   }
 
   const wrapped: Record<string, Tool> = {};
@@ -590,16 +685,20 @@ function wrapToolsWithCallbacks(
     if (typeof execute === 'function') {
       // eslint-disable-next-line @typescript-eslint/no-unsafe-argument
       wrapped[name] = wrapTool(name, tool, execute, callbacks);
+      wrappedToolNames.add(name);
     } else {
       // Pass through declarative tools unchanged
       wrapped[name] = tool;
     }
   }
-  return wrapped;
+  return { tools: wrapped, wrappedToolNames };
 }
 
 /**
  * Wrap a single tool with callbacks.
+ *
+ * Uses toolParsing utilities to extract structured information for
+ * web search results and other known tool types.
  */
 function wrapTool(
   name: string,
@@ -612,10 +711,11 @@ function wrapTool(
     execute: async (input: unknown) => {
       const startTime = Date.now();
 
-      // Call onToolStart (errors don't abort)
+      // Call onToolStart with structured info (errors don't abort)
       if (callbacks.onToolStart) {
         try {
-          callbacks.onToolStart({ name, input });
+          const startInfo = extractToolStartInfo(name, input);
+          callbacks.onToolStart(startInfo);
         } catch {
           // Ignore callback errors
         }
@@ -623,15 +723,13 @@ function wrapTool(
 
       try {
         const output = await originalExecute(input);
+        const durationMs = Date.now() - startTime;
 
-        // Call onToolEnd on success (errors don't abort)
+        // Call onToolEnd on success with structured info (errors don't abort)
         if (callbacks.onToolEnd) {
           try {
-            callbacks.onToolEnd({
-              name,
-              output,
-              durationMs: Date.now() - startTime,
-            });
+            const endInfo = extractToolEndInfo(name, output, durationMs);
+            callbacks.onToolEnd(endInfo);
           } catch {
             // Ignore callback errors
           }
@@ -639,15 +737,14 @@ function wrapTool(
 
         return output;
       } catch (error) {
-        // Call onToolEnd on error (errors don't abort)
+        const durationMs = Date.now() - startTime;
+        const errorMsg = error instanceof Error ? error.message : String(error);
+
+        // Call onToolEnd on error with structured info (errors don't abort)
         if (callbacks.onToolEnd) {
           try {
-            callbacks.onToolEnd({
-              name,
-              output: null,
-              durationMs: Date.now() - startTime,
-              error: error instanceof Error ? error.message : String(error),
-            });
+            const endInfo = extractToolEndInfo(name, null, durationMs, errorMsg);
+            callbacks.onToolEnd(endInfo);
           } catch {
             // Ignore callback errors
           }
diff --git a/packages/markform/src/harness/programmaticFill.ts b/packages/markform/src/harness/programmaticFill.ts
index ea7ae284..18813a10 100644
--- a/packages/markform/src/harness/programmaticFill.ts
+++ b/packages/markform/src/harness/programmaticFill.ts
@@ -68,6 +68,7 @@ function buildResult(
   status: FillStatus,
   inputContextWarnings?: string[],
   remainingIssues?: InspectIssue[],
+  transcript?: FillResult['transcript'],
 ): FillResult {
   // Extract values from responses
   const values: Record<string, FieldValue> = {};
@@ -99,6 +100,10 @@ function buildResult(
     }));
   }
 
+  if (transcript) {
+    result.transcript = transcript;
+  }
+
   return result;
 }
 
@@ -392,9 +397,29 @@ export async function fillForm(options: FillOptions): Promise<FillResult> {
     }
   }
 
-  // 6. Determine final status
+  // 6. Build transcript if captureWireFormat was enabled
+  let transcript: FillResult['transcript'] | undefined;
+  if (options.captureWireFormat) {
+    const modelId = typeof options.model === 'string' ? options.model : undefined;
+    transcript = {
+      sessionVersion: '0.1.0',
+      mode: 'live',
+      turns: harness.getTurns(),
+      ...(modelId && { live: { modelId } }),
+    };
+  }
+
+  // 7. Determine final status
   if (stepResult.isComplete) {
-    return buildResult(form, turnCount, totalPatches, { ok: true }, inputContextWarnings);
+    return buildResult(
+      form,
+      turnCount,
+      totalPatches,
+      { ok: true },
+      inputContextWarnings,
+      undefined,
+      transcript,
+    );
   }
 
   // Hit max turns without completing
@@ -405,5 +430,6 @@ export async function fillForm(options: FillOptions): Promise<FillResult> {
     { ok: false, reason: 'max_turns', message: `Reached maximum total turns (${maxTurnsTotal})` },
     inputContextWarnings,
     stepResult.issues,
+    transcript,
   );
 }
diff --git a/packages/markform/src/harness/toolParsing.ts b/packages/markform/src/harness/toolParsing.ts
new file mode 100644
index 00000000..a29b6c29
--- /dev/null
+++ b/packages/markform/src/harness/toolParsing.ts
@@ -0,0 +1,264 @@
+/**
+ * Tool Parsing Utilities - Extract structured information from tool inputs/outputs.
+ *
+ * Provides helpers to parse web search results from various providers (OpenAI,
+ * Anthropic, Google, XAI) into a consistent format for logging and callbacks.
+ */
+
+import type { ToolType, WebSearchResult } from './harnessTypes.js';
+import { FILL_FORM_TOOL_NAME } from './toolApi.js';
+
+// =============================================================================
+// Constants
+// =============================================================================
+
+/** Maximum number of top results to include in summary */
+const MAX_TOP_RESULTS = 8;
+
+/** Web search tool names across providers */
+const WEB_SEARCH_TOOL_NAMES = ['web_search', 'webSearch', 'google_search', 'googleSearch'];
+
+// =============================================================================
+// Tool Type Detection
+// =============================================================================
+
+/**
+ * Determine the tool type from its name.
+ */
+export function getToolType(toolName: string): ToolType {
+  if (toolName === FILL_FORM_TOOL_NAME) {
+    return 'fill_form';
+  }
+  if (WEB_SEARCH_TOOL_NAMES.includes(toolName) || toolName.toLowerCase().includes('search')) {
+    return 'web_search';
+  }
+  return 'custom';
+}
+
+// =============================================================================
+// Query Extraction
+// =============================================================================
+
+/**
+ * Extract search query from tool input.
+ *
+ * Handles various input formats from different providers.
+ */
+export function extractSearchQuery(input: unknown): string | undefined {
+  if (!input || typeof input !== 'object') return undefined;
+
+  const obj = input as Record<string, unknown>;
+
+  // Direct query field (most common)
+  if (typeof obj.query === 'string') {
+    return obj.query;
+  }
+
+  // OpenAI format: { search_query: "..." }
+  if (typeof obj.search_query === 'string') {
+    return obj.search_query;
+  }
+
+  // Nested query object
+  if (obj.query && typeof obj.query === 'object') {
+    const queryObj = obj.query as Record<string, unknown>;
+    if (typeof queryObj.text === 'string') {
+      return queryObj.text;
+    }
+  }
+
+  return undefined;
+}
+
+// =============================================================================
+// Result Extraction
+// =============================================================================
+
+/**
+ * Parsed web search results with summary information.
+ */
+export interface ParsedWebSearchResults {
+  /** Total number of results */
+  resultCount: number;
+  /** Source domains (e.g., "imdb.com, wikipedia.org") */
+  sources: string;
+  /** Top result titles with "..." for more */
+  topResults: string;
+  /** Full structured results */
+  fullResults: WebSearchResult[];
+}
+
+/**
+ * Extract domain from URL.
+ */
+function extractDomain(url: string): string {
+  try {
+    const parsed = new URL(url);
+    return parsed.hostname.replace(/^www\./, '');
+  } catch {
+    return url;
+  }
+}
+
+/**
+ * Extract web search results from tool output.
+ *
+ * Handles various output formats from different providers:
+ * - OpenAI: { results: [...] } or { web_search_results: [...] }
+ * - Anthropic: { results: [...] }
+ * - Google: { results: [...] }
+ * - XAI: { results: [...] }
+ */
+export function extractWebSearchResults(output: unknown): ParsedWebSearchResults | undefined {
+  if (!output || typeof output !== 'object') return undefined;
+
+  const obj = output as Record<string, unknown>;
+
+  // Find the results array
+  let results: unknown[] | undefined;
+
+  if (Array.isArray(obj.results)) {
+    results = obj.results;
+  } else if (Array.isArray(obj.web_search_results)) {
+    results = obj.web_search_results;
+  } else if (Array.isArray(obj.organic_results)) {
+    results = obj.organic_results;
+  } else if (Array.isArray(output)) {
+    // Direct array of results
+    results = output;
+  }
+
+  if (!results || results.length === 0) {
+    return {
+      resultCount: 0,
+      sources: '',
+      topResults: '(no results)',
+      fullResults: [],
+    };
+  }
+
+  // Parse individual results
+  const fullResults: WebSearchResult[] = [];
+  const domains = new Set<string>();
+
+  for (let i = 0; i < results.length; i++) {
+    const result = results[i];
+    if (!result || typeof result !== 'object') continue;
+
+    const r = result as Record<string, unknown>;
+    const title =
+      (typeof r.title === 'string' ? r.title : '') || (typeof r.name === 'string' ? r.name : '');
+    const url =
+      (typeof r.url === 'string' ? r.url : '') || (typeof r.link === 'string' ? r.link : '');
+    const snippet =
+      typeof r.snippet === 'string'
+        ? r.snippet
+        : typeof r.description === 'string'
+          ? r.description
+          : undefined;
+
+    if (title || url) {
+      fullResults.push({
+        index: i + 1,
+        title: title || '(untitled)',
+        url,
+        snippet,
+      });
+
+      if (url) {
+        domains.add(extractDomain(url));
+      }
+    }
+  }
+
+  // Build sources summary (unique domains)
+  const domainList = Array.from(domains).slice(0, 5);
+  const sources = domainList.join(', ') + (domains.size > 5 ? ', ...' : '');
+
+  // Build top results summary
+  const topTitles = fullResults.slice(0, MAX_TOP_RESULTS).map((r) => `"${r.title}"`);
+  const topResults = topTitles.join(', ') + (fullResults.length > MAX_TOP_RESULTS ? ', ...' : '');
+
+  return {
+    resultCount: fullResults.length,
+    sources,
+    topResults,
+    fullResults,
+  };
+}
+
+// =============================================================================
+// Tool Info Extraction
+// =============================================================================
+
+/**
+ * Structured tool start information.
+ */
+export interface ToolStartInfo {
+  name: string;
+  input: unknown;
+  toolType: ToolType;
+  query?: string;
+}
+
+/**
+ * Structured tool end information.
+ */
+export interface ToolEndInfo {
+  name: string;
+  output: unknown;
+  durationMs: number;
+  error?: string;
+  toolType: ToolType;
+  resultCount?: number;
+  sources?: string;
+  topResults?: string;
+  fullResults?: WebSearchResult[];
+}
+
+/**
+ * Extract structured information for tool start callback.
+ */
+export function extractToolStartInfo(name: string, input: unknown): ToolStartInfo {
+  const toolType = getToolType(name);
+  const info: ToolStartInfo = { name, input, toolType };
+
+  if (toolType === 'web_search') {
+    const query = extractSearchQuery(input);
+    if (query) {
+      info.query = query;
+    }
+  }
+
+  return info;
+}
+
+/**
+ * Extract structured information for tool end callback.
+ */
+export function extractToolEndInfo(
+  name: string,
+  output: unknown,
+  durationMs: number,
+  error?: string,
+): ToolEndInfo {
+  const toolType = getToolType(name);
+  const info: ToolEndInfo = { name, output, durationMs, toolType };
+
+  if (error) {
+    info.error = error;
+    return info;
+  }
+
+  if (toolType === 'web_search') {
+    const results = extractWebSearchResults(output);
+    if (results) {
+      info.resultCount = results.resultCount;
+      info.sources = results.sources;
+      info.topResults = results.topResults;
+      info.fullResults = results.fullResults;
+    }
+  }
+
+  return info;
+}
diff --git a/packages/markform/src/research/runResearch.ts b/packages/markform/src/research/runResearch.ts
index 0ac67c49..8e671e13 100644
--- a/packages/markform/src/research/runResearch.ts
+++ b/packages/markform/src/research/runResearch.ts
@@ -79,6 +79,7 @@ export async function runResearch(
     targetRole: config.targetRoles?.[0] ?? AGENT_ROLE,
     enableWebSearch: options.enableWebSearch,
     additionalTools: options.additionalTools,
+    callbacks: options.callbacks,
   });
 
   // Get available tools for logging
diff --git a/packages/markform/src/settings.ts b/packages/markform/src/settings.ts
index 8eed86aa..c564ddef 100644
--- a/packages/markform/src/settings.ts
+++ b/packages/markform/src/settings.ts
@@ -108,6 +108,14 @@ export const DEFAULT_PRIORITY: FieldPriorityLevel = 'medium';
  */
 export const DEFAULT_FORMS_DIR = './forms';
 
+/**
+ * Maximum characters to show in debug output for tool inputs/outputs.
+ * Values longer than this are truncated with "...[truncated]" suffix.
+ * Can be overridden via MARKFORM_DEBUG_TRUNCATION_LIMIT environment variable.
+ */
+export const DEBUG_OUTPUT_TRUNCATION_LIMIT =
+  parseInt(process.env.MARKFORM_DEBUG_TRUNCATION_LIMIT ?? '', 10) || 2000;
+
 /**
  * Maximum forms to display in 'markform run' menu.
  * Additional forms are not shown but can be run directly by path.
diff --git a/packages/markform/src/utils/formatUtils.ts b/packages/markform/src/utils/formatUtils.ts
new file mode 100644
index 00000000..e088d8a5
--- /dev/null
+++ b/packages/markform/src/utils/formatUtils.ts
@@ -0,0 +1,75 @@
+/**
+ * String and formatting utilities.
+ *
+ * General-purpose utilities for formatting strings, numbers, and other data
+ * for display. These are reusable across the codebase (CLI, engine, harness, etc.).
+ */
+
+import { DEBUG_OUTPUT_TRUNCATION_LIMIT } from '../settings.js';
+
+// =============================================================================
+// ANSI Utilities
+// =============================================================================
+
+/**
+ * Strip ANSI escape codes from a string.
+ * Useful for file output where colors should not appear.
+ */
+export function stripAnsi(str: string): string {
+  // eslint-disable-next-line no-control-regex
+  return str.replace(/\x1b\[[0-9;]*m/g, '');
+}
+
+// =============================================================================
+// String Truncation
+// =============================================================================
+
+/**
+ * Truncate a string to a maximum length with ellipsis indicator.
+ * Useful for debug output where full content would be too verbose.
+ */
+export function safeTruncate(
+  str: string,
+  maxLength: number = DEBUG_OUTPUT_TRUNCATION_LIMIT,
+): string {
+  if (str.length <= maxLength) return str;
+  return str.slice(0, maxLength) + '...[truncated]';
+}
+
+// =============================================================================
+// Duration & Size Formatting
+// =============================================================================
+
+/**
+ * Format duration in milliseconds to human-readable string.
+ * Uses seconds format (e.g., "1.5s") for consistency.
+ */
+export function formatDuration(ms: number): string {
+  return `${(ms / 1000).toFixed(1)}s`;
+}
+
+/**
+ * Format a file size in bytes to human-readable string.
+ * Examples: "512 B", "1.5 KB", "2.3 MB"
+ */
+export function humanReadableSize(bytes: number): string {
+  if (bytes < 1024) return `${bytes} B`;
+  if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
+  return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
+}
+
+// =============================================================================
+// JSON Utilities
+// =============================================================================
+
+/**
+ * Safely stringify an object for debug output.
+ * Falls back to String() if JSON.stringify fails (e.g., circular references).
+ */
+export function safeStringify(obj: unknown): string {
+  try {
+    return JSON.stringify(obj, null, 2);
+  } catch {
+    return String(obj);
+  }
+}
diff --git a/packages/markform/tests/cli/commands.tryscript.md b/packages/markform/tests/cli/commands.tryscript.md
index d211edff..a42fb2da 100644
--- a/packages/markform/tests/cli/commands.tryscript.md
+++ b/packages/markform/tests/cli/commands.tryscript.md
@@ -35,6 +35,9 @@ Options:
   --version                   output the version number
   --verbose                   Enable verbose output
   --quiet                     Suppress non-essential output
+  --debug                     Enable debug output (full prompts, raw tool I/O)
+  --trace <file>              Write incremental log output to file during
+                              execution
   --dry-run                   Show what would be done without making changes
   --format <format>           Output format: console, plaintext, yaml, json,
                               markform, markdown (default: "console")
@@ -156,11 +159,10 @@ optional_year: (unanswered)
 # Test: readme displays README
 
 ```console
-$ $CLI readme | head -5
+$ $CLI readme | head -3
 # Markform
 
-[![CI](https://github.com/jlevy/markform/actions/workflows/ci.yml/badge.svg)][..]
-...
+[..]
 ? 0
 ```
 
diff --git a/packages/markform/tests/cli/logging.tryscript.md b/packages/markform/tests/cli/logging.tryscript.md
new file mode 100644
index 00000000..2471ff57
--- /dev/null
+++ b/packages/markform/tests/cli/logging.tryscript.md
@@ -0,0 +1,139 @@
+---
+cwd: ../..
+env:
+  NO_COLOR: "1"
+  FORCE_COLOR: "0"
+  CLI: ./dist/bin.mjs
+timeout: 30000
+---
+
+# Markform CLI Logging Tests
+
+Tests for CLI logging at different verbosity levels, trace file output, and debug mode.
+
+---
+
+## Setup
+
+# Test: setup creates test forms
+
+```console
+$ cp examples/startup-research/startup-research.form.md /tmp/logging-test.form.md && echo "Form copied"
+Form copied
+? 0
+```
+
+---
+
+## Default Logging Level
+
+# Test: fill with mock shows turn and patch info
+
+```console
+$ $CLI fill /tmp/logging-test.form.md --mock --mock-source examples/startup-research/startup-research-mock-filled.form.md --max-turns 1 -o /tmp/logging-out.form.md 2>&1 | grep -E "(Turn|patches|Filling|Agent:)" | head -4
+Filling form: /tmp/logging-test.form.md
+Agent: mock
+Turn 1: 10 issue(s): company_website (missing), company_description (unanswered), competitors (unanswered), crunchbase_url (unanswered), employee_count (unanswered), +5 more
+  → 9 patches:
+? 0
+```
+
+---
+
+## Verbose Mode Shows Config
+
+# Test: fill with --verbose shows config details
+
+```console
+$ $CLI fill /tmp/logging-test.form.md --mock --mock-source examples/startup-research/startup-research-mock-filled.form.md --max-turns 1 -o /tmp/logging-verbose.form.md --verbose 2>&1 | grep -E "Max turns|Max patches|Target roles" | head -3
+Max turns: 100
+Max patches per turn: 20
+...
+? 0
+```
+
+---
+
+## Quiet Mode
+
+# Test: fill with --quiet suppresses turn info
+
+```console
+$ $CLI fill /tmp/logging-test.form.md --mock --mock-source examples/startup-research/startup-research-mock-filled.form.md --max-turns 1 -o /tmp/logging-quiet.form.md --quiet 2>&1 | grep -c "Turn 1"
+12
+? 0
+```
+
+---
+
+## Trace File Output
+
+# Test: fill with --trace creates trace file
+
+```console
+$ rm -f /tmp/test-trace.log && $CLI fill /tmp/logging-test.form.md --mock --mock-source examples/startup-research/startup-research-mock-filled.form.md --max-turns 1 -o /tmp/logging-trace.form.md --trace /tmp/test-trace.log 2>&1 > /dev/null ; test -f /tmp/test-trace.log && echo "trace file created"
+trace file created
+? 0
+```
+
+# Test: trace file has header with timestamp
+
+```console
+$ head -2 /tmp/test-trace.log
+# Markform Trace Log
+# Started: ...
+? 0
+```
+
+# Test: trace file contains filling info
+
+```console
+$ grep -c "Filling form" /tmp/test-trace.log
+1
+? 0
+```
+
+---
+
+## Output File Verification
+
+# Test: output form is created
+
+```console
+$ test -f /tmp/logging-out.form.md && echo "output file exists"
+output file exists
+? 0
+```
+
+---
+
+## User Role Fill with Simple Form
+
+# Test: fill user role fields shows patches
+
+```console
+$ cp examples/simple/simple.form.md /tmp/simple-test.form.md && $CLI fill /tmp/simple-test.form.md --mock --mock-source examples/simple/simple-mock-filled.form.md --max-turns 1 --roles user -o /tmp/simple-out.form.md 2>&1 | grep -E "(Turn|patch)" | head -2
+Turn 1: 10 issue(s): age (missing), categories (missing), confirmations (missing), email (missing), event_date (missing), +5 more
+  → 10 patches:
+? 0
+```
+
+# Test: user role fill produces filled output with values
+
+```console
+$ grep "Alice Johnson" /tmp/simple-out.form.md | head -1
+Alice Johnson
+? 0
+```
+
+---
+
+## Cleanup
+
+# Test: cleanup temp files
+
+```console
+$ rm -f /tmp/logging-test.form.md /tmp/logging-out.form.md /tmp/logging-quiet.form.md /tmp/logging-verbose.form.md /tmp/logging-trace.form.md /tmp/test-trace.log /tmp/simple-test.form.md /tmp/simple-out.form.md && echo "Cleaned up"
+Cleaned up
+? 0
+```
diff --git a/packages/markform/tests/unit/cli/fillLogging.test.ts b/packages/markform/tests/unit/cli/fillLogging.test.ts
index 83fce397..e6df6d1e 100644
--- a/packages/markform/tests/unit/cli/fillLogging.test.ts
+++ b/packages/markform/tests/unit/cli/fillLogging.test.ts
@@ -9,6 +9,22 @@ import type { CommandContext } from '../../../src/cli/lib/cliTypes.js';
 import type { InspectIssue, Patch } from '../../../src/engine/coreTypes.js';
 import type { TurnStats } from '../../../src/harness/harnessTypes.js';
 
+/**
+ * Create a default CommandContext for testing.
+ */
+function createTestContext(overrides: Partial<CommandContext> = {}): CommandContext {
+  return {
+    verbose: false,
+    quiet: false,
+    debug: false,
+    logLevel: 'default',
+    dryRun: false,
+    format: 'console',
+    overwrite: false,
+    ...overrides,
+  };
+}
+
 describe('fillLogging', () => {
   // Capture console.log output
   let consoleOutput: string[];
@@ -27,13 +43,7 @@ describe('fillLogging', () => {
 
   describe('createFillLoggingCallbacks', () => {
     it('returns all expected callbacks', () => {
-      const ctx: CommandContext = {
-        verbose: false,
-        quiet: false,
-        dryRun: false,
-        format: 'console',
-        overwrite: false,
-      };
+      const ctx = createTestContext();
 
       const callbacks = createFillLoggingCallbacks(ctx);
 
@@ -48,13 +58,7 @@ describe('fillLogging', () => {
 
     describe('onIssuesIdentified', () => {
       it('logs turn number and issues by default', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext();
 
         const callbacks = createFillLoggingCallbacks(ctx);
         const issues: InspectIssue[] = [
@@ -87,13 +91,7 @@ describe('fillLogging', () => {
       });
 
       it('does not log when quiet mode is enabled', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: true,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext({ quiet: true, logLevel: 'quiet' });
 
         const callbacks = createFillLoggingCallbacks(ctx);
         callbacks.onIssuesIdentified!({ turnNumber: 1, issues: [] });
@@ -104,13 +102,7 @@ describe('fillLogging', () => {
 
     describe('onPatchesGenerated', () => {
       it('logs patches with field IDs and values by default', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext();
 
         const callbacks = createFillLoggingCallbacks(ctx);
         const patches: Patch[] = [
@@ -130,21 +122,9 @@ describe('fillLogging', () => {
       });
 
       it('shows token counts only in verbose mode', () => {
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctxVerbose = createTestContext({ verbose: true, logLevel: 'verbose' });
 
-        const ctxNormal: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctxNormal = createTestContext();
 
         const patches: Patch[] = [{ op: 'set_string', fieldId: 'test', value: 'value' }];
         const stats: TurnStats = {
@@ -169,13 +149,13 @@ describe('fillLogging', () => {
         const callbacksVerbose = createFillLoggingCallbacks(ctxVerbose);
         callbacksVerbose.onPatchesGenerated!({ turnNumber: 1, patches, stats });
 
-        // Normal should not have token info in main output
+        // Normal mode should have token info in patch header line
         const normalHasTokens = normalOutput.some(
           (line) => line.includes('500') && line.includes('100'),
         );
-        expect(normalHasTokens).toBe(false);
+        expect(normalHasTokens).toBe(true);
 
-        // Verbose should have token info
+        // Verbose should also have token info (in additional verbose lines)
         const verboseHasTokens = consoleOutput.some(
           (line) => line.includes('500') && line.includes('100'),
         );
@@ -185,13 +165,7 @@ describe('fillLogging', () => {
 
     describe('onTurnComplete', () => {
       it('logs completion status when complete', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext();
 
         const callbacks = createFillLoggingCallbacks(ctx);
         callbacks.onTurnComplete!({
@@ -210,13 +184,7 @@ describe('fillLogging', () => {
       });
 
       it('does not log when not complete', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext();
 
         const callbacks = createFillLoggingCallbacks(ctx);
         callbacks.onTurnComplete!({
@@ -234,46 +202,30 @@ describe('fillLogging', () => {
       });
     });
 
-    describe('tool callbacks (verbose only)', () => {
-      it('onToolStart logs only in verbose mode', () => {
-        const ctxNormal: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+    describe('tool callbacks', () => {
+      it('onToolStart logs in default mode', () => {
+        const ctx = createTestContext();
 
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        // Default mode - tool start now logs by default
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onToolStart!({ name: 'web_search', input: {} });
+        expect(consoleOutput.length).toBe(1);
+        expect(consoleOutput[0]).toContain('web_search');
+      });
 
-        // Normal mode
-        const callbacksNormal = createFillLoggingCallbacks(ctxNormal);
-        callbacksNormal.onToolStart!({ name: 'web_search', input: {} });
-        expect(consoleOutput.length).toBe(0);
+      it('onToolStart logs with query when provided', () => {
+        const ctx = createTestContext();
 
-        // Verbose mode
-        const callbacksVerbose = createFillLoggingCallbacks(ctxVerbose);
-        callbacksVerbose.onToolStart!({ name: 'web_search', input: {} });
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onToolStart!({ name: 'web_search', input: {}, query: 'test query' });
         expect(consoleOutput.length).toBe(1);
-        expect(consoleOutput[0]).toContain('web_search');
+        expect(consoleOutput[0]).toContain('test query');
       });
 
-      it('onToolEnd logs only in verbose mode', () => {
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+      it('onToolEnd logs in default mode with formatted duration', () => {
+        const ctx = createTestContext();
 
-        const callbacks = createFillLoggingCallbacks(ctxVerbose);
+        const callbacks = createFillLoggingCallbacks(ctx);
         callbacks.onToolEnd!({
           name: 'web_search',
           output: 'results',
@@ -282,17 +234,12 @@ describe('fillLogging', () => {
 
         expect(consoleOutput.length).toBe(1);
         expect(consoleOutput[0]).toContain('web_search');
-        expect(consoleOutput[0]).toContain('1234');
+        // Duration is now formatted as seconds (1.2s instead of 1234ms)
+        expect(consoleOutput[0]).toContain('1.2s');
       });
 
       it('onToolEnd logs errors', () => {
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctxVerbose = createTestContext({ verbose: true, logLevel: 'verbose' });
 
         const callbacks = createFillLoggingCallbacks(ctxVerbose);
         callbacks.onToolEnd!({
@@ -310,13 +257,7 @@ describe('fillLogging', () => {
 
     describe('LLM callbacks (verbose only)', () => {
       it('onLlmCallStart logs only in verbose mode', () => {
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctxVerbose = createTestContext({ verbose: true, logLevel: 'verbose' });
 
         const callbacks = createFillLoggingCallbacks(ctxVerbose);
         callbacks.onLlmCallStart!({ model: 'claude-sonnet' });
@@ -326,13 +267,7 @@ describe('fillLogging', () => {
       });
 
       it('onLlmCallEnd logs token counts in verbose mode', () => {
-        const ctxVerbose: CommandContext = {
-          verbose: true,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctxVerbose = createTestContext({ verbose: true, logLevel: 'verbose' });
 
         const callbacks = createFillLoggingCallbacks(ctxVerbose);
         callbacks.onLlmCallEnd!({
@@ -347,15 +282,83 @@ describe('fillLogging', () => {
       });
     });
 
+    describe('onReasoningGenerated (debug only)', () => {
+      it('logs reasoning content in debug mode', () => {
+        const ctx = createTestContext({ debug: true, logLevel: 'debug' });
+
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onReasoningGenerated!({
+          stepNumber: 1,
+          reasoning: [
+            { type: 'reasoning', text: 'Let me think about this problem...' },
+            { type: 'reasoning', text: 'The answer should be 42.' },
+          ],
+        });
+
+        expect(consoleOutput.length).toBe(3); // header + 2 reasoning lines
+        expect(consoleOutput[0]).toContain('[reasoning step 1]');
+        expect(consoleOutput[1]).toContain('Let me think about this problem');
+        expect(consoleOutput[2]).toContain('The answer should be 42');
+      });
+
+      it('logs redacted reasoning', () => {
+        const ctx = createTestContext({ debug: true, logLevel: 'debug' });
+
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onReasoningGenerated!({
+          stepNumber: 2,
+          reasoning: [{ type: 'redacted' }],
+        });
+
+        expect(consoleOutput.length).toBe(2);
+        expect(consoleOutput[0]).toContain('[reasoning step 2]');
+        expect(consoleOutput[1]).toContain('[redacted]');
+      });
+
+      it('does not log in non-debug mode', () => {
+        const ctx = createTestContext({ verbose: true, logLevel: 'verbose' });
+
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onReasoningGenerated!({
+          stepNumber: 1,
+          reasoning: [{ type: 'reasoning', text: 'Some thinking...' }],
+        });
+
+        expect(consoleOutput.length).toBe(0);
+      });
+
+      it('handles empty reasoning array', () => {
+        const ctx = createTestContext({ debug: true, logLevel: 'debug' });
+
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onReasoningGenerated!({
+          stepNumber: 1,
+          reasoning: [],
+        });
+
+        // Should still log the header
+        expect(consoleOutput.length).toBe(1);
+        expect(consoleOutput[0]).toContain('[reasoning step 1]');
+      });
+
+      it('shows placeholder when reasoning text is missing', () => {
+        const ctx = createTestContext({ debug: true, logLevel: 'debug' });
+
+        const callbacks = createFillLoggingCallbacks(ctx);
+        callbacks.onReasoningGenerated!({
+          stepNumber: 1,
+          reasoning: [{ type: 'reasoning' }], // No text property
+        });
+
+        expect(consoleOutput.length).toBe(2);
+        expect(consoleOutput[0]).toContain('[reasoning step 1]');
+        expect(consoleOutput[1]).toContain('[reasoning content not available]');
+      });
+    });
+
     describe('spinner integration', () => {
       it('updates spinner message for web search', () => {
-        const ctx: CommandContext = {
-          verbose: false,
-          quiet: false,
-          dryRun: false,
-          format: 'console',
-          overwrite: false,
-        };
+        const ctx = createTestContext();
 
         const spinnerMessage = vi.fn();
         const callbacks = createFillLoggingCallbacks(ctx, {
diff --git a/packages/markform/tests/unit/cli/shared.test.ts b/packages/markform/tests/unit/cli/shared.test.ts
index 502cf74f..31875ba0 100644
--- a/packages/markform/tests/unit/cli/shared.test.ts
+++ b/packages/markform/tests/unit/cli/shared.test.ts
@@ -81,6 +81,8 @@ describe('shared utilities', () => {
       dryRun: false,
       verbose: false,
       quiet: false,
+      debug: false,
+      logLevel: 'default',
       format: 'console',
       overwrite: false,
     };
@@ -123,6 +125,8 @@ describe('shared utilities', () => {
       dryRun: false,
       verbose: false,
       quiet: false,
+      debug: false,
+      logLevel: 'default',
       format: 'console',
       overwrite: false,
     };
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index 4461c767..e3684352 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -38,9 +38,6 @@ importers:
       tryscript:
         specifier: 0.1.1
         version: 0.1.1(c8@10.1.3)
-      tsx:
-        specifier: ^4.21.0
-        version: 4.21.0
       typescript:
         specifier: ^5.9.3
         version: 5.9.3
@@ -3821,6 +3818,7 @@ snapshots:
       get-tsconfig: 4.13.0
     optionalDependencies:
       fsevents: 2.3.3
+    optional: true
 
   type-check@0.4.0:
     dependencies: