Skip to content

Expanded browser scraping#521

Merged
LiamMcFall merged 3 commits into
mainfrom
expanded_browser_Scraping
May 13, 2026
Merged

Expanded browser scraping#521
LiamMcFall merged 3 commits into
mainfrom
expanded_browser_Scraping

Conversation

@LiamMcFall
Copy link
Copy Markdown
Contributor

@LiamMcFall LiamMcFall commented May 13, 2026

Adds two new data sources to the existing release_scraping job:

User-facing content: blog post feeds (Chrome, Edge, Brave, Opera, Vivaldi) and Firefox user-facing release notes (from the Mozilla product-details API). Stored at MARKET_RESEARCH/BLOGS/ and MARKET_RESEARCH/STRUCTURED/Firefox/user_release_*.

Job postings: Mozilla and Brave via the Greenhouse API, Opera via Teamtailor HTML scraping. Captures full descriptions for hiring-signal analysis. Stored as date-partitioned snapshots at MARKET_RESEARCH/JOBS/{Company}/{YYYYMMDD}/.

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title)

  • Scan the PR and verify that no changes (particularly to .circleci/config.yml) will cause environment variables (particularly credentials) to be exposed in test logs

  • Ensure the container image will be using permissions granted to telemetry-airflow responsibly.

@LiamMcFall LiamMcFall requested a review from a team as a code owner May 13, 2026 17:29
Copy link
Copy Markdown

@gkatre gkatre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LiamMcFall LiamMcFall merged commit 2f787af into main May 13, 2026
3 checks passed
@LiamMcFall LiamMcFall deleted the expanded_browser_Scraping branch May 13, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants