Replacing LayerNorm with Dynamic Tanh (DyT) in DistilGPT2 + LoRA, evaluated on RE-WILD, Alpaca, and ShareGPT.
deep-learning pytorch transformer research-project lora pythia fine-tuning peft huggingface hpml distilgpt2 dyt dynamic-tanh rewild
-
Updated
Jun 3, 2026 - Jupyter Notebook