Skip to content

Replace GitHub API sync with git clone, fully git-agnostic#159

Merged
tikazyq merged 3 commits intomainfrom
claude/fix-github-sync-speed-kDcUo
Mar 25, 2026
Merged

Replace GitHub API sync with git clone, fully git-agnostic#159
tikazyq merged 3 commits intomainfrom
claude/fix-github-sync-speed-kDcUo

Conversation

@tikazyq
Copy link
Copy Markdown
Contributor

@tikazyq tikazyq commented Mar 24, 2026

Summary

  • Replace GitHub REST API-based spec sync (Trees/Blobs API) with shallow sparse git clone using the system git binary
  • Works with any Git host (GitHub, GitLab, Gitea, self-hosted, SSH) — no longer GitHub-only
  • No API token required — uses existing git credentials (SSH keys, credential helpers)
  • Enables editing specs and pushing changes back via new push/status endpoints
  • All GitHub-specific naming removed — types, routes, CLI commands, and UI all use generic "git" naming

What changed

Removed: rust/leanspec-core/src/github/ module

  • Old GitHub REST API client (GitHubClient, GitHubRepo, etc.) fully deleted
  • github cargo feature removed from leanspec-core and leanspec-cli

New module: rust/leanspec-core/src/git/

  • CloneManager — shallow sparse clone, pull, commit+push, status
  • RemoteRef — parses any git URL format (HTTPS, SSH, owner/repo shorthand)
  • operations.rs — wraps git binary with GIT_TERMINAL_PROMPT=0

Backend

  • GitHubConfig type alias removed; only GitConfig remains
  • ProjectSource::GitHub serde alias removed; only ProjectSource::Git
  • Handler file renamed: handlers/github.rshandlers/git.rs
  • /api/github/* route aliases removed; only /api/git/* endpoints
  • Token fields removed from request types

UI

  • GitHubImportFormGitImportForm, file renamed accordingly
  • GitHubImportDialogGitImportDialog, file renamed accordingly
  • GitHubDetectResult/GitHubImportResultGitDetectResult/GitImportResult
  • detectGithubSpecs/importGithubRepo/syncGithubProjectdetectGitSpecs/importGitRepo/syncGitProject
  • listGithubRepos() removed (was already a no-op)
  • Button labels: "Import from GitHub" → "Import from Git"
  • Icons: GithubGitBranch (from lucide-react)
  • ProjectSource type: removed 'github', only 'local' | 'git'

CLI

  • lean-spec githublean-spec git subcommand
  • GitHubSubcommandGitSubcommand, GitHubCommandGitRepoCommand
  • --token flag removed from detect/import
  • repos subcommand removed entirely

Test plan

  • cargo clippy -- -D warnings — clean
  • cargo test — all pass
  • cargo fmt -- --check — clean
  • pnpm typecheck — clean
  • pnpm build — clean
  • pnpm test — 96/96 pass
  • TS bindings up-to-date
  • Manual: import a repo via the UI using owner/repo
  • Manual: import via SSH URL (git@...)
  • Manual: edit a spec and push changes back

https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg

Replace the sequential N+1 API call pattern (1 list_contents + N
get_file_content) with a single Git Trees API call to discover all
spec files, followed by parallel blob fetches using std::thread::scope.

Key changes:
- Add get_tree_recursive() to fetch full repo tree in one API call
- Add get_blob_content() and get_blobs_parallel() for concurrent fetches
- Rewrite sync_specs_to_cache to use tree + parallel blobs (10 concurrent)
- Rewrite detect_specs to use tree API instead of sequential scanning

For a project with 50 specs, this reduces API calls from ~51 sequential
requests to 1 tree request + 5 parallel batches of 10 blob requests,
cutting sync time from ~25s to ~2-3s.

https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
Copilot AI review requested due to automatic review settings March 24, 2026 22:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes GitHub spec discovery and syncing by switching from N+1 Contents API calls to a single recursive Git Trees API call, then fetching README blobs in parallel to reduce sync latency.

Changes:

  • Add Git Trees + Git Blobs API response types to support tree/blob fetching.
  • Implement recursive tree fetch and parallel blob fetch in GitHubClient, and use it for spec detection.
  • Update HTTP GitHub sync handler to use the same tree+parallel-blob approach for caching specs locally.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
rust/leanspec-http/src/handlers/github.rs Uses tree discovery + parallel blob fetch to sync spec READMEs into the local cache.
rust/leanspec-core/src/github/types.rs Adds deserialization types for Git Trees and Git Blobs API responses.
rust/leanspec-core/src/github/client.rs Implements tree recursion, blob fetching, parallel blob fetching, and updates detect_specs to use them.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

) -> CoreResult<Vec<GitHubTreeItem>> {
let url = format!(
"{}/repos/{}/{}/git/trees/{}",
GITHUB_API_BASE, repo_ref.owner, repo_ref.repo, branch
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_tree_recursive interpolates branch directly into the URL path (.../git/trees/{}); branch/ref names can legally contain / (e.g. feature/foo), which will produce an incorrect URL and likely 404. URL-encode the ref (or resolve the branch to a commit/tree SHA via the refs API) before building the URL.

Suggested change
GITHUB_API_BASE, repo_ref.owner, repo_ref.repo, branch
GITHUB_API_BASE,
repo_ref.owner,
repo_ref.repo,
urlencoding::encode(branch)

Copilot uses AI. Check for mistakes.
let tree: GitHubTreeResponse = resp
.json()
.map_err(|e| CoreError::Other(format!("Failed to parse tree response: {}", e)))?;

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Git Trees API can return truncated: true for large repositories, meaning tree.tree is incomplete. Currently this is ignored, which can cause false negatives in spec detection/sync. Consider checking tree.truncated and returning a clear error (or falling back to a non-recursive / targeted directory listing strategy).

Suggested change
if tree.truncated {
return Err(CoreError::Other(
"GitHub tree response was truncated; repository may be too large for recursive listing"
.to_string(),
));
}

Copilot uses AI. Check for mistakes.
Comment on lines +263 to +276
let blob: GitHubBlobResponse = resp
.json()
.map_err(|e| CoreError::Other(format!("Failed to parse blob response: {}", e)))?;

let cleaned: String = blob
.content
.chars()
.filter(|c| !c.is_whitespace())
.collect();

use base64::Engine;
let bytes = base64::engine::general_purpose::STANDARD
.decode(&cleaned)
.map_err(|e| CoreError::Other(format!("Failed to decode base64 blob: {}", e)))?;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_blob_content always base64-decodes blob.content but doesn't validate blob.encoding. If GitHub ever returns a different encoding (or an unexpected payload), this will fail with a misleading decode error; check blob.encoding == "base64" and return a clearer error otherwise.

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +291
/// Fetch multiple blobs in parallel using thread scoping.
///
/// Returns a Vec of (path, content) pairs for successfully fetched blobs.
/// Fetches up to `max_concurrent` blobs simultaneously.
pub fn get_blobs_parallel(
&self,
repo_ref: &RepoRef,
items: &[(String, String)], // (path, sha) pairs
max_concurrent: usize,
) -> Vec<(String, CoreResult<String>)> {
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment for get_blobs_parallel says it "Returns a Vec of (path, content) pairs for successfully fetched blobs", but the function actually returns Vec<(String, CoreResult<String>)> including failures. Either adjust the docs to match the signature/behavior or change the return type to only include successes.

Copilot uses AI. Check for mistakes.
Comment on lines +294 to +310
for chunk in items.chunks(max_concurrent) {
let chunk_results: Vec<_> = std::thread::scope(|s| {
let handles: Vec<_> = chunk
.iter()
.map(|(path, sha)| {
let path = path.clone();
s.spawn(move || {
let content = self.get_blob_content(repo_ref, sha);
(path, content)
})
})
.collect();

handles
.into_iter()
.map(|h| h.join().expect("blob fetch thread panicked"))
.collect()
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_blobs_parallel will panic if max_concurrent == 0 (due to chunks(0)) and will also panic on any thread panic via join().expect(...). Since this is part of the core client API, it should avoid panicking; validate max_concurrent > 0 and convert join failures into a CoreError/CoreResult instead.

Copilot uses AI. Check for mistakes.
Comment on lines +376 to +378
// Fetch README.md blobs in parallel (up to 50, 10 concurrent)
let items_to_fetch: Vec<(String, String)> = spec_readmes.into_iter().take(50).collect();
let blob_results = self.get_blobs_parallel(repo_ref, &items_to_fetch, 10);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In detect_specs, you take(50) README blobs from spec_readmes before sorting directory names, but later you build the response from the first 50 sorted_dirs. When there are >50 specs, this can lead to missing metadata for directories that are in the response but whose README wasn't fetched. Consider selecting the 50 directories first (after sorting), then fetching README blobs for exactly those directories.

Copilot uses AI. Check for mistakes.
Comment on lines +192 to +205
let prefix = format!("{}/", specs_path);

let local_dir = local_specs_dir.join(&item.name);
std::fs::create_dir_all(&local_dir)
.map_err(|e| leanspec_core::CoreError::Other(format!("Failed to create dir: {}", e)))?;

let readme_path = format!("{}/{}/README.md", specs_path, item.name);
match client.get_file_content(repo_ref, &readme_path, Some(branch)) {
Ok(content) => {
std::fs::write(local_dir.join("README.md"), &content).map_err(|e| {
leanspec_core::CoreError::Other(format!("Failed to write spec: {}", e))
})?;
synced += 1;
// Find all README.md files under numbered spec directories
let readme_items: Vec<(String, String)> = tree_items
.iter()
.filter_map(|item| {
if item.item_type != "blob" {
return None;
}
let rest = item.path.strip_prefix(&prefix)?;
let parts: Vec<&str> = rest.splitn(2, '/').collect();
if parts.len() == 2
&& parts[1] == "README.md"
&& parts[0].chars().next().is_some_and(|c| c.is_ascii_digit())
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_specs_to_cache relies on strip_prefix(format!("{}/", specs_path)). If specs_path comes from user input and includes leading/trailing slashes (e.g. /specs or specs/), no tree entries will match and sync will silently do nothing. Normalizing specs_path (trim leading/trailing /) before building prefix would make this more robust.

Copilot uses AI. Check for mistakes.
…upport

Replace the GitHub REST API-based spec sync (Trees/Blobs API) with a
git clone-based approach using the system `git` binary. This enables:

- Any Git host (GitHub, GitLab, Gitea, self-hosted, SSH)
- No API token required — uses existing git credentials
- Read AND write support (edit specs, commit, push back)
- No rate limits

Key changes:
- New `rust/leanspec-core/src/git/` module with CloneManager
  (shallow sparse clone, pull, commit+push, status)
- GitHubConfig → GitConfig with remote_url field (serde aliases for compat)
- ProjectSource::GitHub → ProjectSource::Git (serde alias for compat)
- HTTP handlers rewritten: /api/git/ endpoints + backward /api/github/ aliases
- New endpoints: POST /api/git/push/{id}, GET /api/git/status/{id}
- UI: removed token field from import form, accepts any git URL
- CLI: uses git clone instead of GitHub API

https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
@tikazyq tikazyq changed the title fix: speed up GitHub sync with Trees API and parallel fetching Replace GitHub API sync with git clone for universal repo support Mar 24, 2026
- Delete old GitHub API client module (rust/leanspec-core/src/github/)
  and its cargo feature
- Rename all types: GitHubConfig → GitConfig, GitHubDetectResult →
  GitDetectResult, GitHubImportResult → GitImportResult, etc.
- Remove backward-compat serde aliases (#[serde(alias = "github")])
- Remove /api/github/* route aliases, keep only /api/git/*
- Rename CLI subcommand: `lean-spec github` → `lean-spec git`
- Rename UI components: GitHubImportForm → GitImportForm, etc.
- Update all button labels from "Import from GitHub" to "Import from Git"
- Replace GitHub icon with GitBranch icon in UI
- Remove unused token fields from CLI args and API request types

https://claude.ai/code/session_01WGBfJSXTVMHSLeXfVc8yzg
@tikazyq tikazyq changed the title Replace GitHub API sync with git clone for universal repo support Replace GitHub API sync with git clone, fully git-agnostic Mar 25, 2026
@tikazyq tikazyq merged commit 3c9dfaf into main Mar 25, 2026
2 checks passed
@tikazyq tikazyq deleted the claude/fix-github-sync-speed-kDcUo branch March 25, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants