Running Featured 65 QED-Nano: Teaching a Tiny Model to Prove Hard Theorems π 65 Who needs 1T parameters? Olympiad proofs with a 4B model
Running 56 Bringing paper to life: A modern template for scientific writing π 56 Explore and download a modern scientific paper template
Running 3.74k The Ultra-Scale Playbook π 3.74k The ultimate guide to training LLM on large GPU Clusters
Running 89 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks π 89 Evaluate multilingual models using FineTasks
Running 133 TxT360: Trillion Extracted Text π 133 Explore and download the TxT360 LLM preβtraining dataset