Skip to content

perf: improve GitHub commit activity fetching#258

Merged
ReenigneArcher merged 6 commits into
masterfrom
fix/github-update-3
May 19, 2026
Merged

perf: improve GitHub commit activity fetching#258
ReenigneArcher merged 6 commits into
masterfrom
fix/github-update-3

Conversation

@ReenigneArcher
Copy link
Copy Markdown
Member

Description

Replace the previous ThreadPoolExecutor-based timeout approach with direct REST calls for repo commit activity. Add status constants (ready/pending/failed), helper functions (_commit_activity_url, _write_commit_activity, _fetch_commit_activity) to handle GitHub 202/204/200 responses, and a two-pass _collect_commit_activity to trigger and then collect pending stats. Update update_github to prefetch commit activity for active repos, include proper headers, and iterate only non-archived repos. Update tests to cover the new fetch/collect behavior and remove the previous timeout-based tests.

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

Replace the previous ThreadPoolExecutor-based timeout approach with direct REST calls for repo commit activity. Add status constants (ready/pending/failed), helper functions (_commit_activity_url, _write_commit_activity, _fetch_commit_activity) to handle GitHub 202/204/200 responses, and a two-pass _collect_commit_activity to trigger and then collect pending stats. Update update_github to prefetch commit activity for active repos, include proper headers, and iterate only non-archived repos. Update tests to cover the new fetch/collect behavior and remove the previous timeout-based tests.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (c3162f2) to head (04e9065).
⚠️ Report is 2 commits behind head on master.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##            master      #258   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          862       955   +93     
=========================================
+ Hits           862       955   +93     
Flag Coverage Δ
Linux 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/updater.py 100.00% <100.00%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3162f2...04e9065. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Introduce a per-repository GitHub step runner with timeouts (GITHUB_REPO_STEP_TIMEOUT=90s) to guard against slow or failing API calls. Adds _run_github_repo_step which runs callables in a daemon thread, returns a default on timeout/error, and logs warnings. Extracts helper functions _collect_open_pulls and _fetch_open_graph_image_url and integrate the timeout wrapper into _process_github_repo for languages, pulls, code scanning alerts, star history, and OpenGraph image fetch/download (image download uses a 30s timeout). Update tests: add test_run_github_repo_step_timeout and adjust an existing test to assert warning behavior instead of raising SystemExit.
Add retry polling to _collect_commit_activity to handle GitHub's 202/async stats calculation. Introduce COMMIT_ACTIVITY_POLL_ATTEMPTS and COMMIT_ACTIVITY_POLL_INTERVAL (defaults 6 and 15s) and allow callers to override poll_attempts and poll_interval. Between attempts the function logs and writes progress messages, sleeps for the configured interval, and retries only pending repos; when repos remain pending after all attempts it logs a consolidated warning. Update unit tests to pass poll parameters, add a test that verifies repeated polls and sleeps, and ensure existing tests monkeypatch time.sleep where needed.
Switch commit-activity collection to use /stats/participation (weekly totals) instead of /stats/commit_activity to avoid long 202 responses in CI. Add cache helper paths and functions to read/write commitActivity and commitActivityHashes (by default-branch SHA), compute commitActivity-shaped records from participation totals, and only refresh stats when the default-branch SHA changes. Remove polling logic/constants and simplify collection to skip repos with up-to-date cached data. Also add a GH Actions step to restore the generated data cache before collection. Update unit tests to cover the new caching behaviour, participation conversion, and workflow changes.
Consolidate logging in _run_github_repo_step to use single log.warning calls (remove tqdm.write duplicates) and keep returning the default on timeout or error. Revamp _collect_commit_activity to use a two-pass approach: a priming pass to trigger GitHub stats calculation and a second pass to re-check only pending repositories, aggregating pending repo names into a single warning. Update unit tests to match the new logging/behavior (add test_run_github_repo_step_error, adjust timeout test expectations, replace the pending test with deterministic status sequences, and add a test ensuring early return when all repos are ready).
@sonarqubecloud
Copy link
Copy Markdown

@ReenigneArcher ReenigneArcher merged commit 42ba71e into master May 19, 2026
17 checks passed
@ReenigneArcher ReenigneArcher deleted the fix/github-update-3 branch May 19, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant