view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 123
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚡ Jul 9, 2024 • 78
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15, 2025 • 228
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 280
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 69