Skip to content

feat: add structural metadata to search JSON for evidence filtering#556

Merged
buger merged 1 commit intomainfrom
fix/search-evidence-metadata-555
Apr 6, 2026
Merged

feat: add structural metadata to search JSON for evidence filtering#556
buger merged 1 commit intomainfrom
fix/search-evidence-metadata-555

Conversation

@buger
Copy link
Copy Markdown
Collaborator

@buger buger commented Apr 6, 2026

Summary

Fixes #555 — adds structural metadata fields to JSON search output so downstream tools (ReqProof, traceability tools, test-mapping tools) can reliably filter requirement evidence without custom heuristics.

New JSON fields

Field Type Description
scope string "test" | "function" | "declaration" | "module" | "doc" | "example"
is_doc bool? true for markdown/docs/help files (omitted when false)
is_example bool? true for fenced code blocks in docs (omitted when false)
owner_symbol string? Owning function/class/method name extracted from code

Fixes

  • is_test no longer set true for fenced code examples in docs — a fenced_code_block in a markdown file that contains func TestFoo() was incorrectly flagged is_test: true
  • scope classifies blocks structurally — consumers can filter scope === "test" for evidence-only results

Example output

For probe search --allow-tests --exact --no-merge -o json SYS-REQ-985:

# Real test evidence (WANT):
scope=test         is_test=true  owner=TestFormatNested...  file=workflow_code_mcdc_test.go

# Doc (can filter out):
scope=doc          is_doc=true                              file=code_mcdc_coverage.md

# Implementation declaration (can filter out):
scope=declaration                owner=codeMCDCCoverageCheck file=workflow_code_mcdc.go

Test plan

  • 24 new unit tests for is_doc_file, is_fenced_example, classify_scope, extract_owner_symbol, and 3 integration-level tests
  • Manual verification with repro scenario from issue (test files, doc with fenced example, implementation file)
  • All 361 unit tests pass, 11 integration tests pass, 21 CLI tests pass
  • Fenced code in docs no longer gets is_test: true
  • Owner symbol extraction works for Go, Rust, Python, JS/TS, Java

🤖 Generated with Claude Code

…ut (#555)

Adds structural metadata fields to JSON search output so downstream
tools can reliably distinguish test evidence from docs, fenced
examples, and implementation declarations without custom heuristics:

- scope: "test" | "function" | "declaration" | "module" | "doc" | "example"
- is_doc: true for markdown/docs/help files (omitted when false)
- is_example: true for fenced code blocks in docs (omitted when false)
- owner_symbol: owning function/class/method name extracted from code
- Fix is_test: no longer set true for fenced code examples in docs

Consumers can now filter `scope === "test"` for evidence-only results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger merged commit 045bbca into main Apr 6, 2026
15 checks passed
@probelabs
Copy link
Copy Markdown
Contributor

probelabs bot commented Apr 6, 2026

PR Overview: Add Structural Metadata to Search JSON for Evidence Filtering

Summary

This PR adds structural metadata fields to JSON search output, enabling downstream tools (ReqProof, traceability tools, test-mapping tools) to reliably filter requirement evidence without custom heuristics. The change fixes issue #555 where fenced code blocks in markdown documentation were incorrectly flagged as test code.

Files Changed

src/search/search_output.rs (+576 lines, -5 lines)

  • Modified JsonResult struct to include new metadata fields
  • Added is_doc_file() function to detect documentation files
  • Added is_fenced_example() function to detect fenced code blocks
  • Added classify_scope() function to categorize result blocks
  • Added extract_owner_symbol() function to extract owning symbol names
  • Updated format_and_print_json_results() to populate new fields
  • Added 24 new unit tests and 3 integration-level tests

Key Technical Changes

1. New JSON Output Fields

Field Type Description
scope string Structural classification: "test", "function", "declaration", "module", "doc", "example"
is_doc bool? true for markdown/docs/help files (omitted when false)
is_example bool? true for fenced code blocks in docs (omitted when false)
owner_symbol string? Owning function/class/method name extracted from code

2. Bug Fix: Fenced Code No Longer Flagged as Test

Before: A fenced code block in markdown containing func TestFoo() was incorrectly flagged is_test: true

After: Documentation files are excluded from test detection logic:

let doc = is_doc_file(file_path);
let fenced_example = doc && is_fenced_example(&r.node_type);
let file_is_test = !doc && is_test_file(file_path);
let code_is_test = !doc && !file_is_test && is_test_code_block(&r.code, &r.node_type);

3. Documentation File Detection

The is_doc_file() function identifies documentation by:

  • Extensions: .md, .mdx, .rst, .adoc, .txt
  • Directory patterns: /docs/, /doc/, /help/, /specs/, /.proof/, /examples/*.md

4. Scope Classification Logic

The classify_scope() function categorizes results based on:

  • Test scope: When is_test: true
  • Example scope: When is_example: true (fenced code in docs)
  • Doc scope: When is_doc: true but not example
  • Function scope: Functions, methods, closures, arrow functions
  • Module scope: Modules, packages, namespaces, programs
  • Declaration scope: Classes, structs, interfaces, enums, consts, types
  • Comment handling: Comments with function signatures classified as "function"

5. Owner Symbol Extraction

The extract_owner_symbol() function parses code to extract symbol names:

  • Go: func Name(...) or func (recv) Name(...)
  • Rust: fn name(...) or pub fn name(...)
  • Python: def name(...) or class Name:
  • JS/TS: function name(...), async function, export function
  • Java/C#: public/protected/private + return type + name
  • Skips doc-only nodes (section, document, paragraph, heading)

Architecture & Impact

Component Relationships

graph TD
    A[Search Results] --> B[format_and_print_json_results]
    B --> C[is_doc_file]
    B --> D[is_fenced_example]
    B --> E[is_test_file / is_test_code_block]
    B --> F[classify_scope]
    B --> G[extract_owner_symbol]
    C --> H[JsonResult with metadata]
    D --> H
    E --> H
    F --> H
    G --> H
    H --> I[JSON Output]
    
    J[Downstream Tools] --> I
    K[ReqProof] --> I
    L[Traceability Tools] --> I
    M[Test-Mapping Tools] --> I
Loading

Data Flow

sequenceDiagram
    participant CLI as probe search
    participant Output as search_output.rs
    participant JSON as JsonResult
    participant Consumer as Downstream Tool
    
    CLI->>Output: SearchResults with file, node_type, code
    Output->>Output: is_doc_file(path)
    Output->>Output: is_fenced_example(node_type)
    Output->>Output: is_test_file(path) & is_test_code_block()
    Output->>Output: classify_scope(node_type, code, ...)
    Output->>Output: extract_owner_symbol(code, node_type)
    Output->>JSON: Populate all fields
    Output->>Consumer: JSON with scope, is_doc, is_example, owner_symbol
    Consumer->>Consumer: Filter by scope === "test"
Loading

Scope Discovery & Context

Affected System Components

  1. Search Output Module (src/search/search_output.rs)

    • Core change: JsonResult struct and formatting logic
    • New helper functions for classification
    • Comprehensive test coverage
  2. Language Detection (src/language/test_detection.rs)

    • Existing is_test_file() function reused
    • Supports 15+ languages (Rust, Go, Python, JS/TS, Java, etc.)
  3. Tree-sitter Node Types (src/language/*.rs)

    • Scope classification relies on node_type strings
    • Covers: function_declaration, class_declaration, struct_item, module, etc.
  4. Downstream Consumers

    • ReqProof/Traceability tools: Can now filter scope === "test" for evidence
    • Test-mapping tools: Use owner_symbol to map tests to requirements
    • Documentation tools: Filter out is_doc: true or scope === "example"

Integration Points

  • CLI: probe search --format json now includes new fields
  • MCP Tools: JSON output available to AI agents
  • Node.js SDK: Programmatic access to enriched metadata
  • LSP Integration: Symbol signatures complement owner_symbol

Test Coverage

24 New Unit Tests:

  • test_is_doc_file_markdown() - Markdown file detection
  • test_is_doc_file_directories() - Directory pattern detection
  • test_is_doc_file_source_not_doc() - Source files excluded
  • test_is_fenced_example() - Fenced code block detection
  • test_classify_scope_*() - 8 tests for scope classification
  • test_extract_owner_*() - 10 tests for symbol extraction (Go, Rust, Python, JS)

3 Integration Tests:

  • test_fenced_code_in_docs_not_test() - Verifies bug fix
  • test_real_test_file_gets_test_scope() - Real tests get correct scope
  • test_implementation_file_gets_declaration_scope() - Implementation files classified correctly

Existing Tests: All 361 unit tests, 11 integration tests, 21 CLI tests pass

Example Usage

Before (Incorrect)

{
  "file": "docs/help/checks/code_mcdc_coverage.md",
  "node_type": "fenced_code_block",
  "code": "```go
func TestShared(t *testing.T) {}
```",
  "is_test": true
}

After (Correct)

{
  "file": "docs/help/checks/code_mcdc_coverage.md",
  "node_type": "fenced_code_block",
  "code": "```go
func TestShared(t *testing.T) {}
```",
  "scope": "example",
  "is_doc": true,
  "is_example": true,
  "owner_symbol": "TestShared"
}

Filtering for Evidence

# Get only test evidence (exclude docs and implementation)
probe search --allow-tests --format json "SYS-REQ-985" | \
  jq '.results[] | select(.scope == "test")'

# Get owner symbols for test mapping
probe search --format json "requirement" | \
  jq '.results[] | {file: .file, owner: .owner_symbol, scope: .scope}'

References

Modified Files:

  • src/search/search_output.rs:563-700 - JsonResult struct and format_and_print_json_results()
  • src/search/search_output.rs:726-1030 - New helper functions (is_doc_file, is_fenced_example, classify_scope, extract_owner_symbol)
  • src/search/search_output.rs:3179-3750 - 24 new unit tests and 3 integration tests

Related Files (Context):

  • src/language/test_detection.rs - is_test_file() function (reused)
  • src/models.rs - SearchResult struct definition
  • src/language/language_trait.rs - LanguageImpl trait for node type handling
  • src/language/parser.rs - Important block types priority list
  • tests/json_format_tests.rs - JSON output integration tests
  • tests/json_schema_validation_tests.rs - Schema validation tests
Metadata
  • Review Effort: 3 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-04-06T09:46:29.180Z | Triggered by: pr_opened | Commit: 2a3eeba

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs bot commented Apr 6, 2026

Architecture Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Performance Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'
\n\n

Architecture Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'
\n\n ### Performance Issues (1)
Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'
\n\n ### Quality Issues (1)
Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Powered by Visor from Probelabs

Last updated: 2026-04-06T09:42:59.363Z | Triggered by: pr_opened | Commit: 2a3eeba

💡 TIP: You can chat with Visor using /visor ask <your question>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Requirement annotation search needs evidence-only mode (exclude docs, fenced examples, top-level declarations)

1 participant