- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
- Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
- DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models