Skip to content

feat: git sparse checkout mode for batch scanning (rebase of #1)#13

Merged
north-echo merged 1 commit into
mainfrom
seckatie-pr1-rebased
May 28, 2026
Merged

feat: git sparse checkout mode for batch scanning (rebase of #1)#13
north-echo merged 1 commit into
mainfrom
seckatie-pr1-rebased

Conversation

@north-echo
Copy link
Copy Markdown
Owner

Rebase of @SecKatie's #1 onto current main — original PR was closed because main had moved substantially (v0.7.0 split internal/scanner/pkg/scanner/, platform additions, migration numbering collision). Authorship on the commit is preserved as Katie Mulliken — credit is hers.

Summary

  • Adds --clone flag to batch and discover commands to scan repos via local git sparse checkout instead of the GitHub API, avoiding rate limits at scale
  • Implements sliding star-count windows in FetchTopRepos to paginate beyond GitHub's 1,000-result search limit
  • Caches repo lists in SQLite so --resume with --top N skips re-fetching
  • Hardens SQLite for concurrent goroutine writes (single-conn serialization + busy_timeout)

Rebase resolution notes

  • cmd/fluxgate/main.go — kept --clone / --concurrency / --keep alongside the newer --tokens PAT-rotation flag; both work together
  • internal/github/batch.go — sliding-window query param + withRetryRotate so search picks up token rotation when present
  • internal/store/migrations.go — original migration002AddRepoLists collided with migration002Disclosures that landed on main; renumbered to migration005AddRepoLists
  • SQLite hardening + RepoList* API came through clean
  • internal/git package included as-is

Test plan

  • CGO_ENABLED=0 go build ./... clean
  • go test ./... passes
  • CI self-scan clean
  • Spot-check batch --top N --clone --resume on the research station before merge

Credit: @SecKatie — thanks for the patience on the rebase, this is a solid contribution. Closes the work originally proposed in #1.

Adds --clone flag to batch and discover commands, scanning repos via
local git sparse checkout instead of the GitHub API. This avoids API
rate limits when scanning large numbers of repos.

Key changes:
- internal/git: sparse clone package with concurrent clone-and-scan
- Sliding star-count windows in FetchTopRepos to paginate beyond
  GitHub's 1,000-result search limit
- Repo list caching in SQLite for --resume with --top N
- SQLite hardening: single-conn serialization + busy_timeout for
  concurrent goroutine writes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@north-echo north-echo merged commit 040d4df into main May 28, 2026
1 check passed
@north-echo north-echo deleted the seckatie-pr1-rebased branch May 28, 2026 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants