Skip to content

Spike: evaluate BDD frameworks for domain behavior validation (behave, pytest-bdd, etc.) #136

Description

@kvithayathil

Context

As the DDD restructuring of app/matching/ progresses (#23), we need automated validation that domain behaviors match business specifications. BDD (Behavior-Driven Development) frameworks allow expressing domain rules as executable Gherkin-style scenarios that serve as both documentation and tests.

Related issues:

Goal

Evaluate BDD frameworks for Python that integrate with our existing pytest-based test infrastructure. Determine which best fits the project for encoding domain rules as executable specifications.

Candidates

Primary

Framework Repo Stars Approach
behave https://github.com/behave/behave 3K+ Standalone Gherkin runner, step definitions in Python
pytest-bdd https://github.com/pytest-dev/pytest-bdd 1.3K+ Native pytest integration, @given/@when/@then decorators

Also consider

Framework Repo Notes
radish-bdd https://github.com/radish-bdd/radish Scenario loops, preconditions, pytest integration
mamba https://github.com/nestorsalceda/mamba RSpec-style BDD, not Gherkin
pytest + approvaltests (already in use) Current approach — approval tests for contract validation

Evaluation Criteria

1. Feature Fit

  • Gherkin .feature file support (Given/When/Then scenarios)
  • Step parameterization and data tables
  • Scenario outlines (parameterized scenarios)
  • Background steps (shared setup)
  • Tags for filtering (e.g., @slow, @integration)
  • Hooks (before/after scenario, step)
  • Multi-scenario composition

2. pytest Integration

  • Works as pytest plugin (not separate runner)
  • Compatible with existing conftest.py fixtures
  • Works with pytest-asyncio (project is async)
  • Supports pytest.mark markers
  • IDE support (PyCharm, VS Code test discovery)

3. DDD Alignment

  • Can express domain events and aggregates
  • Supports shared contexts across scenarios
  • Step definitions map cleanly to domain service methods
  • Readable by non-technical stakeholders

4. Security & snip Supply Chain

  • Dependency footprint (transitive deps count)
  • Maintained actively (commit cadence, release frequency)
  • Known CVEs in recent versions
  • Pinned/lockfile compatible with uv

5. Reputation & snip Ecosystem

  • GitHub stars, contributors, response time on issues
  • Documentation quality
  • Community adoption (StackOverflow, blog posts)
  • Compatibility with Python 3.12–3.14 (project's range)

Deliverables

  1. Comparison matrix — scoring each candidate across criteria above
  2. PoC spike — implement 2–3 matching domain scenarios in the top candidate against app/matching/ domain services
  3. Recommendation — which framework to adopt, with migration path from current pytest-only approach
  4. Security audit — dependency review of recommended framework

Non-Goals

  • Replacing existing pytest tests — BDD layer supplements, not replaces
  • Frontend BDD (separate spike if needed)
  • Full migration plan (follow-up issue if recommended)

Example Domain Scenarios to Spike

From #23 domain model, these behaviors would benefit from BDD validation:

Feature: Voter matching
  As a campaign organizer
  I want OCR-extracted names matched against voter records
  So that I can verify petition signatures accurately

  Scenario: Exact match with high confidence
    Given a voter record for "John Smith" at "Ward 3 Precinct 1"
    And an OCR candidate "John Smith" with confidence 0.95
    When the matching service processes the candidate
    Then the match prediction should be "accepted"
    And the confidence level should be "high"

  Scenario Outline: Fuzzy matching threshold
    Given a voter record for "<voter_name>"
    And an OCR candidate "<ocr_text>" with confidence <ocr_confidence>
    When the matching service processes the candidate
    Then the match status should be <expected_status>

    Examples:
      | voter_name    | ocr_text      | ocr_confidence | expected_status  |
      | John Smith    | Jon Smith     | 0.90           | accepted         |
      | John Smith    | J. Smith      | 0.85           | needs_review     |
      | John Smith    | Jane Doe      | 0.70           | rejected         |

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:backendBackend (Python/FastAPI) changesdevexDeveloper and contributor experiencepriority:mediumNormal priorityspikeResearch and investigation task

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions