Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 245
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29, 2025 • 69
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 611
view article Article The N Implementation Details of RLHF with PPO +1 vwxyzjn, tianlinliu0121, lvwerra • Oct 24, 2023 • 72
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published Jan 11, 2025 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1, 2025 • 110
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 17
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13, 2024 • 69
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated Feb 10, 2025 • 81
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models +1 loubnabnl, anton-l, davanstrien • Mar 20, 2024 • 113
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 mirinflim, aldopareja, muellerzr, stas • Jun 13, 2024 • 62