One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
-
Updated
Oct 28, 2025 - Shell
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
Serve the home! Inference stack for your Nvidia DGX Spark aka the Grace Blackwell AI supercomputer on your desk. Mostly vLLM based for now and single-spark. For the not-so-rich buddies
headless remote desktop to your dgx spark in crystal clear 4k
Turn any NVIDIA GPU into a local AI platform. Inference + fine-tuning in your browser. One command to start, automatic clustering.
A lightweight web UI for managing AI models on the NVIDIA DGX Spark. Pull Ollama models, download from HuggingFace, manage LiteLLM routing, and control SGLang or vLLM — all from one browser tab.
(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs optimized for GB10 homelabs
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
SGLang optimizations for NVIDIA Spark (GB10) — SM121 Grace Blackwell
Enhanced GPU throttle diagnostic for DGX Spark (GB10): NVML direct telemetry, throttle cause decoder, PCIe link monitoring, baseline drift detection, timeline capture.
DGX Spark (GB10/SM121) platform support for Meta's KernelAgent — auto-detect, hardware constraints, safe Triton configs
Pre-built PyTorch wheels and build scripts for NVIDIA DGX Spark (GB10, sm_121, Blackwell, CUDA 13.0, ARM64)
This project is the ARM architecture version of unsloth.
vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs
Cycle-accurate UMA fault latency and bandwidth measurement for NVIDIA GPUs. C and PTX. No Python. Pascal (SM 6.0) through Blackwell GB10 (SM 12.1).
practical guide to multi-node NCCL over switched RoCE fabric on NVIDIA GB10 (DGX Spark class) — documenting the gaps in NVIDIA's official playbooks
Deliver a scalable LLM inference API in TypeScript and Python with GPU scheduling, dynamic batching, and multi-modal support for production use.
3-bit Lloyd-Max KV Cache Compression for LLM Inference on NVIDIA DGX Spark GB10 — 5.12x compression, 0.983 cosine similarity, pure numpy on ARM unified memory
Add a description, image, and links to the gb10 topic page so that developers can more easily learn about it.
To associate your repository with the gb10 topic, visit your repo's landing page and select "manage topics."