Running on CPU Upgrade Featured 2.69k The Smol Training Playbook 📚 2.69k The secrets to building world-class LLMs
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Paper • 2406.07529 • Published Jun 11, 2024
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29 • 221
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29 • 221
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 Dec 19, 2024 • 713
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Jul 21 • 374