The NoteBook and Assignments implemention via Learning CS336 Spring 2026!
“And in the end, the love you take is equal to the love you make.” —— The Beatles
I'm mungeryang, a master's student from the University of Chinese Academy of Sciences(UCAS-iie). In this Repo, I have open-sourced all of my study notes, implementation details for assignments, and results.
Class HomePage:https://cs336.stanford.edu/
图解大模型架构参见:Hand-Drawn-LLM
参考资料: CS336课堂笔记、李博杰老师 - 大模型面试题 200 问、百面大模型、大模型技术30讲
⭐️⭐️⭐️ 50问整理
P.S. 由于精力和能力有限,仅整理出我本人认为较为经典的50问。问题与作答全部开源,欢迎任何感兴趣的人fork更新~ 👏👏👏
- Implement all of the components (tokenizer, model architecture, optimizer) necessary to train a standard Transformer language model
- Train a minimal language model
| Assignment1 | Status | Link |
|---|---|---|
| train_bpe | ✅ | BPE Implementation |
| BPETokenizer | ✅ | Tiny_BPETokenizer Class Implementation |
| Linear | ✅ | Linear Class |
| Emebdding | ✅ | EMbedding Class |
| RMSNorm | ✅ | RMSNorm |
| Swiglu | ✅ | SwiGLU FFN |
| RoPE | ✅ | RoPE Class |
| softmax | ✅ | softmax funcion |
| attention | ✅ | Scaled_Dot_Attn |
| mul-attn | ✅ | MultiHeadAttn Class |
| LM block | ✅ | Transformer Block |
| cross-entropy | ✅ | train function |
| AdamW | ✅ | Adamw optimizer |
-
Profile and benchmark the model and layers from Assignment 1 using advanced tools, optimize Attention with your own Triton implementation of FlashAttention2
-
Build a memory-efficient, distributed version of the Assignment 1 model training code
| Assignment2 | Status | Link |
|---|
- Understand the function of each component of the Transformer
- Query a training API to fit a scaling law to project model scaling
- Convert raw Common Crawl dumps into usable pretraining data
- Perform filtering and deduplication to improve model performance
- Apply supervised finetuning and reinforcement learning to train LMs to reason when solving math problems
- Optional Part 2: implement and apply safety alignment methods such as DPO

