nvfp4

Here are 21 public repositories matching this topic...

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

reinforcement-learning transformers pytorch diffusion dit video-generation sana text-to-video linear-transformer text-to-image-generation system-algorithm-deisgn nvfp4

Updated Apr 14, 2026
Python

(one of )The SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

transformers rounding quantization int4 llms vllm gguf vlms sglang mxfp4 nvfp4

Updated Apr 14, 2026
Python

BenChaliah / NVFP4-on-4090-vLLM

Star

AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8

gpu-acceleration gpu-computing inference-engine nvidia-gpu llm nvfp4

Updated Feb 15, 2026
Python

taishan1994 / LLM-Quantization

Star

记录量化LLM中的总结。

quantization llm gptq quarot qwen3 nvfp4

Updated Jan 8, 2026
Python

ChiefNakor / comfyui-blackwell-docker

Star

A production-ready Docker setup for ComfyUI that unlocks the full potential of NVIDIA Blackwell GPUs (RTX 50 series) through 4-bit quantization with NVFP4.

docker pytorch nvidia image-generation nvidia-cuda ai-art stable-diffusion comfyui flux-ai nvidia-blackwell nvfp4

Updated Jan 28, 2026
Dockerfile

actypedef / ARCQuant

Star

[ACL 2026 Main] Code for the paper "ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs"

quantization mixed-precision blackwell llm llm-inference microscaling nvfp4

Updated Apr 7, 2026
Cuda

waybarrios / dgx-spark-finetune-llm

Star

LLM fine-tuning with LoRA + NVFP4/MXFP8 on NVIDIA DGX Spark (Blackwell GB10)

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Dec 22, 2025
Python

sayakpaul / diffusers-blackwell-quants

Star

Easy recipes to speed up latency of Flux, QwenImage, and LTX-2 with NVFP4 and MXFP8 on Blackwell.

pytorch image-gen diffusers video-gen torchao blackwell-gpu nvfp4 mxfp8

Updated Apr 10, 2026
Python

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated Apr 13, 2026
Shell

LianHe-BI / Blackwell-optimized-llama.cpp-Docker-image

Star

Blackwell-optimized llama.cpp Docker image – works on all NVIDIA GPUs, but tuned for RTX 50 series. Built from scratch with CUDA 12.8, sm_120, NVFP4-ready. 250+ tok/s on 4B F16. Includes llama-chat script.

docker cpp docker-image cuda python3 pytorch nvidia quantization performance-optimization ready-to-use llm llamacpp rtx-50-series nvfp4 sm-120

Updated Mar 28, 2026

Navi-AI-Lab / nvllm

Star

(Experimental) A high-throughput and memory-efficient inference and serving engine for LLMs optimized for GB10 homelabs

nvidia cuda-kernels cutlass local-inference vllm llm-inference qwen paged-attention self-hosted-ai gb10 sm120 nvfp4 dgx-spark fp4-quantization attention-kernel fp8-kv-cache

Updated Apr 14, 2026
Python

Sggin1 / DGX-SPARK

Star

DGX Spark research and tests - containers, benchmarks, and investigation notes for running models on GB10 (SM 12.1)

aarch64 blackwell kv-cache vllm nvfp4 dgx-spark mamba-ssm sm121 turboquant

Updated Apr 12, 2026
Python

CodeBarrie / WanGP-Pinokio-RTX50XX-Upgrade

Star

WanGP v10.61 RTX 50XX Pinokio Upgrade - Python 3.11, PyTorch 2.10, CUDA 13.0, NVFP4 kernels. Copy files to wan.git folder, Reset + Install + Update

cuda pytorch ai-video pinokio nvfp4 wangp rtx50xx

Updated Feb 4, 2026
JavaScript

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

Star

NVFP4 AWQ Full quantization of SuperGemma4-26B-Abliterated-Multimodal for Blackwell GPUs — pre-built vLLM container + patches included

moe quantization multimodal blackwell awq llm vllm nvfp4 dgx-spark gemma4 modelopt

Updated Apr 13, 2026
Python

MoHussein197 / dgx-spark-finetune-llm

Star

🔧 Fine-tune large language models efficiently on NVIDIA DGX Spark with LoRA adapters and optimized quantization for high performance.

deep-learning pytorch nvidia lora quantization fine-tuning blackwell llm nvfp4 dgx-spark transformer-engine mxfp8

Updated Apr 14, 2026
Python

AEON-7 / Gemma-4-E4B-DECKARD-HERETIC-Uncensored-NVFP4

Star

EAGLE E4B speculative decoding drafter for Gemma 4 31B DECKARD HERETIC Uncensored NVFP4 — optimized for NVIDIA DGX Spark

eagle drafter blackwell awq vllm speculative-decoding nvfp4 dgx-spark gemma4 modelopt

Updated Apr 13, 2026

thupalo / tensorrt-on-dgx-spark

Star

Deploy Nemotron 3 Nano 30B on NVIDIA DGX Spark using TensorRT-LLM (Blackwell GB10, NVFP4 quantization, OpenAI-compatible API)

docker inference aarch64 mamba mixture-of-experts blackwell local-llm tensorrt-llm nemotron openai-compatible nvfp4 nvidia-dgx-spark

Updated Mar 22, 2026
Shell

shettysach / nvfp4_competition_cutedsl_solns

Star

My solutions for the GPUMODE NVFP4 competition, written in CuTe DSL

nvfp4 cute-dsl

Updated Feb 28, 2026
Python

nikhilj202 / comfyui-blackwell-docker

Star

🚀 Accelerate image generation with ComfyUI's Docker for NVIDIA Blackwell GPUs, optimizing speed and memory usage through NVFP4 support.

docker pytorch nvidia image-generation nvidia-cuda ai-art stable-diffusion comfyui flux-ai nvidia-blackwell nvfp4

Updated Apr 14, 2026
HTML

PrimitiveContext / blackwell

Star

Production LLM deployment specs for NVIDIA Blackwell GPUs (RTX Pro 6000, DGX Spark). Includes vLLM configurations, benchmarks, load balancer, and throughput calculators for NVFP4/FP8/MoE models.

benchmark nvidia moe blackwell vllm sglang mxfp4 nvfp4 dgx-spark rtx-pro-6000 msi-edgexpert

Updated Mar 25, 2026
Python

Improve this page

Add a description, image, and links to the nvfp4 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nvfp4 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvfp4

Here are 21 public repositories matching this topic...

NVlabs / Sana

intel / auto-round

BenChaliah / NVFP4-on-4090-vLLM

taishan1994 / LLM-Quantization

ChiefNakor / comfyui-blackwell-docker

actypedef / ARCQuant

waybarrios / dgx-spark-finetune-llm

sayakpaul / diffusers-blackwell-quants

AEON-7 / vllm-dflash

LianHe-BI / Blackwell-optimized-llama.cpp-Docker-image

Navi-AI-Lab / nvllm

Sggin1 / DGX-SPARK

CodeBarrie / WanGP-Pinokio-RTX50XX-Upgrade

AEON-7 / supergemma4-26b-abliterated-multimodal-nvfp4

MoHussein197 / dgx-spark-finetune-llm

AEON-7 / Gemma-4-E4B-DECKARD-HERETIC-Uncensored-NVFP4

thupalo / tensorrt-on-dgx-spark

shettysach / nvfp4_competition_cutedsl_solns

nikhilj202 / comfyui-blackwell-docker

PrimitiveContext / blackwell

Improve this page

Add this topic to your repo