Replace GitHub API sync with git clone, fully git-agnostic#159
Conversation
Replace the sequential N+1 API call pattern (1 list_contents + N get_file_content) with a single Git Trees API call to discover all spec files, followed by parallel blob fetches using std::thread::scope. Key changes: - Add get_tree_recursive() to fetch full repo tree in one API call - Add get_blob_content() and get_blobs_parallel() for concurrent fetches - Rewrite sync_specs_to_cache to use tree + parallel blobs (10 concurrent) - Rewrite detect_specs to use tree API instead of sequential scanning For a project with 50 specs, this reduces API calls from ~51 sequential requests to 1 tree request + 5 parallel batches of 10 blob requests, cutting sync time from ~25s to ~2-3s. https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
There was a problem hiding this comment.
Pull request overview
This PR optimizes GitHub spec discovery and syncing by switching from N+1 Contents API calls to a single recursive Git Trees API call, then fetching README blobs in parallel to reduce sync latency.
Changes:
- Add Git Trees + Git Blobs API response types to support tree/blob fetching.
- Implement recursive tree fetch and parallel blob fetch in
GitHubClient, and use it for spec detection. - Update HTTP GitHub sync handler to use the same tree+parallel-blob approach for caching specs locally.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| rust/leanspec-http/src/handlers/github.rs | Uses tree discovery + parallel blob fetch to sync spec READMEs into the local cache. |
| rust/leanspec-core/src/github/types.rs | Adds deserialization types for Git Trees and Git Blobs API responses. |
| rust/leanspec-core/src/github/client.rs | Implements tree recursion, blob fetching, parallel blob fetching, and updates detect_specs to use them. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) -> CoreResult<Vec<GitHubTreeItem>> { | ||
| let url = format!( | ||
| "{}/repos/{}/{}/git/trees/{}", | ||
| GITHUB_API_BASE, repo_ref.owner, repo_ref.repo, branch |
There was a problem hiding this comment.
get_tree_recursive interpolates branch directly into the URL path (.../git/trees/{}); branch/ref names can legally contain / (e.g. feature/foo), which will produce an incorrect URL and likely 404. URL-encode the ref (or resolve the branch to a commit/tree SHA via the refs API) before building the URL.
| GITHUB_API_BASE, repo_ref.owner, repo_ref.repo, branch | |
| GITHUB_API_BASE, | |
| repo_ref.owner, | |
| repo_ref.repo, | |
| urlencoding::encode(branch) |
| let tree: GitHubTreeResponse = resp | ||
| .json() | ||
| .map_err(|e| CoreError::Other(format!("Failed to parse tree response: {}", e)))?; | ||
|
|
There was a problem hiding this comment.
The Git Trees API can return truncated: true for large repositories, meaning tree.tree is incomplete. Currently this is ignored, which can cause false negatives in spec detection/sync. Consider checking tree.truncated and returning a clear error (or falling back to a non-recursive / targeted directory listing strategy).
| if tree.truncated { | |
| return Err(CoreError::Other( | |
| "GitHub tree response was truncated; repository may be too large for recursive listing" | |
| .to_string(), | |
| )); | |
| } |
| let blob: GitHubBlobResponse = resp | ||
| .json() | ||
| .map_err(|e| CoreError::Other(format!("Failed to parse blob response: {}", e)))?; | ||
|
|
||
| let cleaned: String = blob | ||
| .content | ||
| .chars() | ||
| .filter(|c| !c.is_whitespace()) | ||
| .collect(); | ||
|
|
||
| use base64::Engine; | ||
| let bytes = base64::engine::general_purpose::STANDARD | ||
| .decode(&cleaned) | ||
| .map_err(|e| CoreError::Other(format!("Failed to decode base64 blob: {}", e)))?; |
There was a problem hiding this comment.
get_blob_content always base64-decodes blob.content but doesn't validate blob.encoding. If GitHub ever returns a different encoding (or an unexpected payload), this will fail with a misleading decode error; check blob.encoding == "base64" and return a clearer error otherwise.
| /// Fetch multiple blobs in parallel using thread scoping. | ||
| /// | ||
| /// Returns a Vec of (path, content) pairs for successfully fetched blobs. | ||
| /// Fetches up to `max_concurrent` blobs simultaneously. | ||
| pub fn get_blobs_parallel( | ||
| &self, | ||
| repo_ref: &RepoRef, | ||
| items: &[(String, String)], // (path, sha) pairs | ||
| max_concurrent: usize, | ||
| ) -> Vec<(String, CoreResult<String>)> { |
There was a problem hiding this comment.
The doc comment for get_blobs_parallel says it "Returns a Vec of (path, content) pairs for successfully fetched blobs", but the function actually returns Vec<(String, CoreResult<String>)> including failures. Either adjust the docs to match the signature/behavior or change the return type to only include successes.
| for chunk in items.chunks(max_concurrent) { | ||
| let chunk_results: Vec<_> = std::thread::scope(|s| { | ||
| let handles: Vec<_> = chunk | ||
| .iter() | ||
| .map(|(path, sha)| { | ||
| let path = path.clone(); | ||
| s.spawn(move || { | ||
| let content = self.get_blob_content(repo_ref, sha); | ||
| (path, content) | ||
| }) | ||
| }) | ||
| .collect(); | ||
|
|
||
| handles | ||
| .into_iter() | ||
| .map(|h| h.join().expect("blob fetch thread panicked")) | ||
| .collect() |
There was a problem hiding this comment.
get_blobs_parallel will panic if max_concurrent == 0 (due to chunks(0)) and will also panic on any thread panic via join().expect(...). Since this is part of the core client API, it should avoid panicking; validate max_concurrent > 0 and convert join failures into a CoreError/CoreResult instead.
| // Fetch README.md blobs in parallel (up to 50, 10 concurrent) | ||
| let items_to_fetch: Vec<(String, String)> = spec_readmes.into_iter().take(50).collect(); | ||
| let blob_results = self.get_blobs_parallel(repo_ref, &items_to_fetch, 10); |
There was a problem hiding this comment.
In detect_specs, you take(50) README blobs from spec_readmes before sorting directory names, but later you build the response from the first 50 sorted_dirs. When there are >50 specs, this can lead to missing metadata for directories that are in the response but whose README wasn't fetched. Consider selecting the 50 directories first (after sorting), then fetching README blobs for exactly those directories.
| let prefix = format!("{}/", specs_path); | ||
|
|
||
| let local_dir = local_specs_dir.join(&item.name); | ||
| std::fs::create_dir_all(&local_dir) | ||
| .map_err(|e| leanspec_core::CoreError::Other(format!("Failed to create dir: {}", e)))?; | ||
|
|
||
| let readme_path = format!("{}/{}/README.md", specs_path, item.name); | ||
| match client.get_file_content(repo_ref, &readme_path, Some(branch)) { | ||
| Ok(content) => { | ||
| std::fs::write(local_dir.join("README.md"), &content).map_err(|e| { | ||
| leanspec_core::CoreError::Other(format!("Failed to write spec: {}", e)) | ||
| })?; | ||
| synced += 1; | ||
| // Find all README.md files under numbered spec directories | ||
| let readme_items: Vec<(String, String)> = tree_items | ||
| .iter() | ||
| .filter_map(|item| { | ||
| if item.item_type != "blob" { | ||
| return None; | ||
| } | ||
| let rest = item.path.strip_prefix(&prefix)?; | ||
| let parts: Vec<&str> = rest.splitn(2, '/').collect(); | ||
| if parts.len() == 2 | ||
| && parts[1] == "README.md" | ||
| && parts[0].chars().next().is_some_and(|c| c.is_ascii_digit()) |
There was a problem hiding this comment.
sync_specs_to_cache relies on strip_prefix(format!("{}/", specs_path)). If specs_path comes from user input and includes leading/trailing slashes (e.g. /specs or specs/), no tree entries will match and sync will silently do nothing. Normalizing specs_path (trim leading/trailing /) before building prefix would make this more robust.
…upport
Replace the GitHub REST API-based spec sync (Trees/Blobs API) with a
git clone-based approach using the system `git` binary. This enables:
- Any Git host (GitHub, GitLab, Gitea, self-hosted, SSH)
- No API token required — uses existing git credentials
- Read AND write support (edit specs, commit, push back)
- No rate limits
Key changes:
- New `rust/leanspec-core/src/git/` module with CloneManager
(shallow sparse clone, pull, commit+push, status)
- GitHubConfig → GitConfig with remote_url field (serde aliases for compat)
- ProjectSource::GitHub → ProjectSource::Git (serde alias for compat)
- HTTP handlers rewritten: /api/git/ endpoints + backward /api/github/ aliases
- New endpoints: POST /api/git/push/{id}, GET /api/git/status/{id}
- UI: removed token field from import form, accepts any git URL
- CLI: uses git clone instead of GitHub API
https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
- Delete old GitHub API client module (rust/leanspec-core/src/github/) and its cargo feature - Rename all types: GitHubConfig → GitConfig, GitHubDetectResult → GitDetectResult, GitHubImportResult → GitImportResult, etc. - Remove backward-compat serde aliases (#[serde(alias = "github")]) - Remove /api/github/* route aliases, keep only /api/git/* - Rename CLI subcommand: `lean-spec github` → `lean-spec git` - Rename UI components: GitHubImportForm → GitImportForm, etc. - Update all button labels from "Import from GitHub" to "Import from Git" - Replace GitHub icon with GitBranch icon in UI - Remove unused token fields from CLI args and API request types https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
Summary
git cloneusing the systemgitbinaryWhat changed
Removed:
rust/leanspec-core/src/github/moduleGitHubClient,GitHubRepo, etc.) fully deletedgithubcargo feature removed fromleanspec-coreandleanspec-cliNew module:
rust/leanspec-core/src/git/CloneManager— shallow sparse clone, pull, commit+push, statusRemoteRef— parses any git URL format (HTTPS, SSH,owner/reposhorthand)operations.rs— wrapsgitbinary withGIT_TERMINAL_PROMPT=0Backend
GitHubConfigtype alias removed; onlyGitConfigremainsProjectSource::GitHubserde alias removed; onlyProjectSource::Githandlers/github.rs→handlers/git.rs/api/github/*route aliases removed; only/api/git/*endpointsUI
GitHubImportForm→GitImportForm, file renamed accordinglyGitHubImportDialog→GitImportDialog, file renamed accordinglyGitHubDetectResult/GitHubImportResult→GitDetectResult/GitImportResultdetectGithubSpecs/importGithubRepo/syncGithubProject→detectGitSpecs/importGitRepo/syncGitProjectlistGithubRepos()removed (was already a no-op)Github→GitBranch(from lucide-react)ProjectSourcetype: removed'github', only'local' | 'git'CLI
lean-spec github→lean-spec gitsubcommandGitHubSubcommand→GitSubcommand,GitHubCommand→GitRepoCommand--tokenflag removed from detect/importrepossubcommand removed entirelyTest plan
cargo clippy -- -D warnings— cleancargo test— all passcargo fmt -- --check— cleanpnpm typecheck— cleanpnpm build— cleanpnpm test— 96/96 passowner/repogit@...)https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg