Skip to content

Refactor EventScraperTest to delegate to datamachine/test-handler #188

@chubes4

Description

@chubes4

Context

During the ability contract alignment work (data-machine#999), we identified that data-machine-events/test-event-scraper (in inc/Abilities/EventScraperTest.php) directly imports and instantiates UniversalWebScraper to call get_fetch_data() — the same thing the core datamachine/test-handler ability already does generically.

Current architecture

EventScraperTest.php
  └── new UniversalWebScraper()        ← direct import, tight coupling
      └── handler->get_fetch_data()
      └── domain-specific analysis (coverage, extraction info, venue metadata)

Proposed architecture

EventScraperTest.php
  └── wp_get_ability('datamachine/test-handler')
      └── ->execute({ handler_slug: 'universal_web_scraper', config: { source_url: $target_url } })
      └── domain-specific analysis (coverage, extraction info, venue metadata)

Why the events ability should stay

The events version is NOT just a wrapper. After fetching, it does substantial domain analysis:

  1. Log capture — intercepts datamachine_log warnings from the scrape
  2. Packet interpretation — parses JSON body, extracts event object
  3. 3 extraction pathways — raw HTML fallback, vision flyer, structured event
  4. Coverage analysis — missing time, missing venue, incomplete address, time data warnings
  5. Venue metadata merging — assembles structured event_data from packet + venue_metadata
  6. Domain-specific extraction_info — source_type, extraction_method, payload_type

This is events-domain knowledge that belongs in data-machine-events, not in core.

What to do

  1. Replace the direct new UniversalWebScraper() + get_fetch_data() with a call to datamachine/test-handler ability
  2. Keep all the domain-specific analysis code (coverage, extraction, venue merging)
  3. The core ability returns packet summaries — EventScraperTest would need access to raw packet data, so check if datamachine/test-handler returns enough, or if it needs a raw_packets option

Slug reference cleanup

The consumer in extrachill-events (VenueQualificationAbilities.php line 356) still references the old slug datamachine/test-event-scraper. This was renamed to data-machine-events/test-event-scraper in PR #187 but the cross-plugin consumer wasn't updated. Fix as part of this or separately.

Refs data-machine#999

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions