Skip to content

feat: support Git repository cloning in spawn workflow#22

Merged
KerwinTsaiii merged 41 commits intodevelopfrom
feature/support-custom-spawner
Feb 25, 2026
Merged

feat: support Git repository cloning in spawn workflow#22
KerwinTsaiii merged 41 commits intodevelopfrom
feature/support-custom-spawner

Conversation

@MioYuuIH
Copy link
Contributor

@MioYuuIH MioYuuIH commented Feb 25, 2026

Summary

Add support for users to optionally clone a Git repository when spawning a new session. The repository is cloned via an init container and automatically cleaned up when the session ends (ephemeral, not persisted between sessions).

Changes

Git Clone Backend (runtime/hub/core/spawner/kubernetes.py)

  • Add _validate_and_sanitize_repo_url() with URL normalization (prepend https://, strip /tree/<branch>, strip .git)
  • Add _extract_repo_name() and _build_git_init_container() for init container construction
  • Parse repo_url, repo_branch, access_token from spawn form
  • Clone repo via init container using emptyDir volume, use preStop lifecycle hook to clean up on session end
  • Auto-set default_url to open the cloned repo directory in JupyterLab
  • Configurable ALLOWED_GIT_PROVIDERS, MAX_CLONE_TIMEOUT, GIT_INIT_CONTAINER_IMAGE from values.yaml
  • Config defaults consolidated into single source of truth via configure_from_config()
  • Pod monitor guards against stale Failed status from reflector cache
  • Clone path dynamically derived from home volume mount via _get_home_mount_path() (no hardcoded /home/jovyan)
  • notebook_dir set to home mount path to ensure JupyterLab file browser matches clone target

Git Clone Script (runtime/hub/core/scripts/git-clone.sh)

  • Shell script for init container: validates env vars, runs git clone --depth 1 with optional branch, respects timeout
  • Removes leftover clone directory from previous sessions (handles skipped preStop hooks)

Git Repo Validation (runtime/hub/core/handlers.py)

  • New ValidateRepoHandler at /hub/api/validate-repo — validates repo URL and optional branch via dulwich.porcelain.ls_remote
  • Pure Python validation (no subprocess), checks both refs/heads/ and refs/tags/ for branch existence
  • 10-second timeout, async via run_in_executor
  • dulwich added to requirements.txt

GitSpawnHandler (runtime/hub/core/handlers.py)

  • New /hub/git/<provider/owner/repo> URL handler for direct repo spawning
  • Validates provider against allowlist, supports ?autostart=1&resource=<key>&accelerator=<key> query params
  • Returns allowedGitProviders in Resources API response
  • Groups sorting: CUSTOM REPO and OTHERS pinned to bottom
  • Quota API returns graceful disabled response when QuotaManager is not initialized

Configuration (runtime/hub/core/config.py, runtime/values.yaml, runtime/chart/values.schema.yaml)

  • Add GitCloneSettings model (initContainerImage, allowedProviders, maxCloneTimeout)
  • Add allowGitClone flag per resource metadata
  • Add custom.gitClone section in values schema
  • Config defaults deduplicated — setup.py delegates to RemoteLabKubeSpawner.configure_from_config()
  • GPU resource default image updated to ghcr.io/amdresearch/base-gfx1151:v0.1-full

Spawn UI (runtime/hub/frontend/apps/spawn/)

  • Git URL input as resource card sub-option — rendered inside the selected resource card, same level as GPU node selector
  • Smart URL normalization: auto-prepend https://, extract branch from /tree/<branch>, strip .git
  • Real-time repo validation via dulwich ls-remote API (debounced 800ms), with branch existence check
  • Validating state indicator ("Checking repository..."), Launch button blocked during validation
  • Branch detection hint display
  • Git Repo badge on resource cards that support cloning
  • Shareable /hub/git/ link generation with resource and accelerator params
  • Auto-select resource and auto-submit via URL params (?autostart=1)
  • Accordion behavior: only one category group expanded at a time
  • Collapsing a category clears selection if the selected resource is in that group
  • Basic resources (cpu, gpu) moved to dedicated CUSTOM REPO group (pinned to bottom)
  • Shared validateRepo() API function in @auplc/shared package

Cleanup

  • Removed 1196-line inline spawn UI fallback from resource_options_form.html (reverted to React bundle loader)

Configuration

Git Clone Settings (runtime/values.yaml)

custom:
  gitClone:
    initContainerImage: "alpine/git:2.47.2"
    allowedProviders:
      - github.com
      - gitlab.com
      - bitbucket.org
    maxCloneTimeout: 300

Per-Resource allowGitClone Flag

custom:
  resources:
    metadata:
      cpu:
        group: "CUSTOM REPO"
        allowGitClone: true   # Show git URL input on spawn form
      Course-CV:
        group: "COURSE"
        allowGitClone: false  # Course already has its own content

Testing

  • Clone public repo via spawn form, verify repo appears in JupyterLab
  • Stop session, verify cloned directory is cleaned up (preStop hook)
  • Test /hub/git/github.com/owner/repo URL redirect
  • Test ?autostart=1 auto-submit flow
  • Test branch extraction from /tree/<branch> URLs
  • Test URL validation against allowed providers
  • Test shareable link generation and copy
  • Test leftover directory cleanup (handles force-deleted pods)
  • Test quota API with quota disabled
  • Test category collapse clears selection
  • Test accordion behavior (single group expanded)
  • Test dulwich ls-remote validation (valid repo, invalid repo, valid branch, invalid branch)
  • Test Git URL input as sub-option inside resource card
  • Verified on Kubernetes cluster

Checklist

  • Code follows project style guidelines (ruff format)
  • Changes are backward compatible
  • Tested on Kubernetes cluster

KerwinTsaiii and others added 20 commits February 24, 2026 16:05
Rewrite the git init container implementation with several fixes:
- Base64-encode the shell script to avoid KubeSpawner _expand_all
  treating shell braces as Python format string placeholders
- Reference existing home volume (volume-{username}) instead of adding
  a duplicate PVC mount which causes RWO pods to hang
- Set HOME=/tmp so alpine/git can write .gitconfig without errors
- Validate repo URL against an allowed providers whitelist
- Exit 0 on any failure so spawn is never blocked
- Switch to alpine/git:2.47.2 with imagePullPolicy: IfNotPresent

Add optional Git repository URL input field to the React spawn UI.
- Add allowGitClone field to ResourceMetadata; only resources with this
  flag enabled show the git URL input (cpu resource enabled by default)
- Move repo URL input above runtime selector in spawn UI
- Add frontend URL validation using providers list from API (no hardcode)
- Expose allowedGitProviders in /api/resources response
- Backend double-checks allowGitClone before adding init container
- Add dark mode styles and inline error state for repo URL input
- Update values.schema.yaml and regenerate values.schema.json
- Add /hub/git/<provider/owner/repo> shortcut that validates the repo
  URL against allowed providers and redirects to the spawn page with
  repo_url pre-filled; supports ?autostart=1, ?resource=, ?accelerator=
- Frontend reads repo_url/autostart/resource/accelerator query params,
  auto-selects matching resource, and auto-submits the form when autostart=1
- Validate initial repo_url and query params with user-visible warnings
  for unknown resource or accelerator keys
Add a GPU-accelerated resource profile using the default image,
with 4 CPU / 16Gi memory / 1 AMD GPU, supporting strix-halo and dgpu
accelerators; allow git clone and grant access to the official team.
- Show a "Git Repo" badge on resource cards that support git clone
- Replace always-visible hint text with a hoverable question mark
  tooltip next to the Git Repository URL label
- Add dark mode styles for badge and tooltip
- Extract clone script to core/scripts/git-clone.sh (pure shell, no
  Python string interpolation); pass REPO_URL/CLONE_DIR/MAX_CLONE_TIMEOUT
  as environment variables so the script is shellcheck-clean
- Exit 1 on clone failure so Kubernetes marks the pod as Failed
- Set restartPolicy=Never on user pods so a Failed init container does
  not enter CrashLoopBackOff
- Add _monitor_pod_failure() coroutine that runs concurrently with
  super().start() and raises immediately when pod phase==Failed,
  avoiding the full start_timeout wait
- Wire _hub_config and git clone settings in setup.py so allowGitClone
  and provider/timeout config are applied at spawn time
- Frontend normalizes URL on blur: adds https://, strips .git, extracts branch from /tree/<branch> path
- Extracted branch shown as hint below input; sent as repo_branch form field
- Backend mirrors normalization in _validate_and_sanitize_repo_url (defensive layer)
- Branch name sanitized with allowlist regex before passing to init container
- git-clone.sh: optional BRANCH env var, uses --branch on clone and fetch by ref
- Fix import sort order in kubernetes.py (I001)
- Remove unused access_token variable in start() (F841)
- Add 'raise ... from e' in handlers.py exception chain (B904)
Display value stays as typed; normalization (https prefix, .git strip,
/tree/<branch> extraction) happens silently via hidden inputs. Validation
still runs against the normalized form on every keystroke.
Frontend submits raw URL via name=repo_url; no hidden inputs needed.
Backend extracts branch from /tree/<branch> path before sanitization,
so both normalization and branch detection are fully server-side.
Without repo URL: generates /spawn?resource=...&accelerator=...
With repo URL: generates /hub/git/...?resource=...&accelerator=...
- Only the group containing the pre-selected resource expands (once)
- Other groups stay collapsed, respecting user manual collapse
- Fallback to first resource when pre-selection fails (invalid resource
  key, no allowGitClone resource) to avoid all groups being collapsed
Cloned repositories are now ephemeral — they live in an emptyDir volume
that is discarded when the session ends. This avoids accumulating stale
repos on the user's persistent storage and gives a clean environment on
each spawn. The shell script is simplified (no fetch/reset branch since
the volume is always empty). Tooltip updated accordingly.
Clone repos directly onto home PVC and clean up via preStop lifecycle
hook when session ends. Fix shareable URL regex to handle username in
spawn path (e.g. /hub/spawn/admin).
@KerwinTsaiii KerwinTsaiii force-pushed the feature/support-custom-spawner branch from e0831f0 to 1a394ff Compare February 25, 2026 09:46
…scroll container

Add horizontal padding to .gpu-options-container so box-shadow on selected
GPU option is not cut off by overflow.

Co-authored-by: Cursor <cursoragent@cursor.com>
@KerwinTsaiii KerwinTsaiii merged commit 87c5682 into develop Feb 25, 2026
4 checks passed
@MioYuuIH MioYuuIH deleted the feature/support-custom-spawner branch February 26, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants