CohereLabs/cohere-transcribe-03-2026 Automatic Speech Recognition β’ Updated 3 days ago β’ 253k β’ 933
Running on CPU Upgrade 232 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens π 232 Explore synthetic data experiments on a virtual bookshelf
Running 224 FineVision: Open Data is All You Need π 224 A new open-source dataset for training VLMs
Running 92 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks π 92 Evaluate multilingual models using FineTasks
Running Featured 1.34k FineWeb: decanting the web for the finest text data at scale π· 1.34k Explore and download the FineWeb webβtext dataset
Running on CPU Upgrade Featured 3.15k The Smol Training Playbook π 3.15k The secrets to building world-class LLMs
Running 80 Maintain the unmaintainable π 80 Explore the complex relationships between 400+ machine learning models