You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
High-performance CUDA implementation of Muon optimizer for LLM training. Features Newton-Schulz polar decomposition, cuBLAS acceleration, and transpose optimization for 8x FLOP savings on transformer FFN layers. Benchmarked on NVIDIA A100 with Llama 3.1 8B architectures (4096×11008 weights).
Few-Shot Adaptation for Vision-Language Models. Implements Base-to-Novel generalization on CLIP using LoRA, LP++, and Muon Optimizer to enhance performance on the Oxford Flowers-102 dataset.