Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 501
KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs Paper • 2604.13226 • Published 25 days ago • 10
Running Featured 77 Distilling 100B+ Models 40x Faster with TRL 📝 77 TRL distillation for 100B+ teachers, 40x faster