view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 411
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 517
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 262
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Paper • 2103.14030 • Published Mar 25, 2021 • 5
Enable Language Models to Implicitly Learn Self-Improvement From Data Paper • 2310.00898 • Published Oct 2, 2023 • 24
Toolformer: Language Models Can Teach Themselves to Use Tools Paper • 2302.04761 • Published Feb 9, 2023 • 12