Spike: evaluate BDD frameworks for domain behavior validation (behave, pytest-bdd, etc.)

## Context

As the DDD restructuring of `app/matching/` progresses (#23), we need automated validation that domain behaviors match business specifications. BDD (Behavior-Driven Development) frameworks allow expressing domain rules as executable Gherkin-style scenarios that serve as both documentation and tests.

Related issues:
- **#23** — DDD restructuring of `app/matching/` (primary consumer of BDD scenarios)
- **#134** — Architecture enforcement spike (complementary — enforces structure, BDD validates behavior)
- **#103** — Architecture hygiene audit (parent)

## Goal

Evaluate BDD frameworks for Python that integrate with our existing pytest-based test infrastructure. Determine which best fits the project for encoding domain rules as executable specifications.

## Candidates

### Primary

| Framework | Repo | Stars | Approach |
|-----------|------|-------|----------|
| **behave** | https://github.com/behave/behave | 3K+ | Standalone Gherkin runner, step definitions in Python |
| **pytest-bdd** | https://github.com/pytest-dev/pytest-bdd | 1.3K+ | Native pytest integration, `@given`/`@when`/`@then` decorators |

### Also consider

| Framework | Repo | Notes |
|-----------|------|-------|
| **radish-bdd** | https://github.com/radish-bdd/radish | Scenario loops, preconditions, pytest integration |
| **mamba** | https://github.com/nestorsalceda/mamba | RSpec-style BDD, not Gherkin |
| **pytest + approvaltests** | (already in use) | Current approach — approval tests for contract validation |

## Evaluation Criteria

### 1. Feature Fit
- [ ] Gherkin `.feature` file support (Given/When/Then scenarios)
- [ ] Step parameterization and data tables
- [ ] Scenario outlines (parameterized scenarios)
- [ ] Background steps (shared setup)
- [ ] Tags for filtering (e.g., `@slow`, `@integration`)
- [ ] Hooks (before/after scenario, step)
- [ ] Multi-scenario composition

### 2. pytest Integration
- [ ] Works as pytest plugin (not separate runner)
- [ ] Compatible with existing `conftest.py` fixtures
- [ ] Works with `pytest-asyncio` (project is async)
- [ ] Supports `pytest.mark` markers
- [ ] IDE support (PyCharm, VS Code test discovery)

### 3. DDD Alignment
- [ ] Can express domain events and aggregates
- [ ] Supports shared contexts across scenarios
- [ ] Step definitions map cleanly to domain service methods
- [ ] Readable by non-technical stakeholders

### 4. Security & snip Supply Chain
- [ ] Dependency footprint (transitive deps count)
- [ ] Maintained actively (commit cadence, release frequency)
- [ ] Known CVEs in recent versions
- [ ] Pinned/lockfile compatible with `uv`

### 5. Reputation & snip Ecosystem
- [ ] GitHub stars, contributors, response time on issues
- [ ] Documentation quality
- [ ] Community adoption (StackOverflow, blog posts)
- [ ] Compatibility with Python 3.12–3.14 (project's range)

## Deliverables

1. **Comparison matrix** — scoring each candidate across criteria above
2. **PoC spike** — implement 2–3 matching domain scenarios in the top candidate against `app/matching/` domain services
3. **Recommendation** — which framework to adopt, with migration path from current pytest-only approach
4. **Security audit** — dependency review of recommended framework

## Non-Goals

- Replacing existing pytest tests — BDD layer supplements, not replaces
- Frontend BDD (separate spike if needed)
- Full migration plan (follow-up issue if recommended)

## Example Domain Scenarios to Spike

From #23 domain model, these behaviors would benefit from BDD validation:

```gherkin
Feature: Voter matching
  As a campaign organizer
  I want OCR-extracted names matched against voter records
  So that I can verify petition signatures accurately

  Scenario: Exact match with high confidence
    Given a voter record for "John Smith" at "Ward 3 Precinct 1"
    And an OCR candidate "John Smith" with confidence 0.95
    When the matching service processes the candidate
    Then the match prediction should be "accepted"
    And the confidence level should be "high"

  Scenario Outline: Fuzzy matching threshold
    Given a voter record for "<voter_name>"
    And an OCR candidate "<ocr_text>" with confidence <ocr_confidence>
    When the matching service processes the candidate
    Then the match status should be <expected_status>

    Examples:
      | voter_name    | ocr_text      | ocr_confidence | expected_status  |
      | John Smith    | Jon Smith     | 0.90           | accepted         |
      | John Smith    | J. Smith      | 0.85           | needs_review     |
      | John Smith    | Jane Doe      | 0.70           | rejected         |
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spike: evaluate BDD frameworks for domain behavior validation (behave, pytest-bdd, etc.) #136

Context

Goal

Candidates

Primary

Also consider

Evaluation Criteria

1. Feature Fit

2. pytest Integration

3. DDD Alignment

4. Security & snip Supply Chain

5. Reputation & snip Ecosystem

Deliverables

Non-Goals

Example Domain Scenarios to Spike

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Framework	Repo	Stars	Approach
behave	https://github.com/behave/behave	3K+	Standalone Gherkin runner, step definitions in Python
pytest-bdd	https://github.com/pytest-dev/pytest-bdd	1.3K+	Native pytest integration, `@given`/`@when`/`@then` decorators

Framework	Repo	Notes
radish-bdd	https://github.com/radish-bdd/radish	Scenario loops, preconditions, pytest integration
mamba	https://github.com/nestorsalceda/mamba	RSpec-style BDD, not Gherkin
pytest + approvaltests	(already in use)	Current approach — approval tests for contract validation

Uh oh!

Spike: evaluate BDD frameworks for domain behavior validation (behave, pytest-bdd, etc.) #136

Description

Context

Goal

Candidates

Primary

Also consider

Evaluation Criteria

1. Feature Fit

2. pytest Integration

3. DDD Alignment

4. Security & snip Supply Chain

5. Reputation & snip Ecosystem

Deliverables

Non-Goals

Example Domain Scenarios to Spike

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions