AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning Paper • 2606.24526 • Published 4 days ago • 2
TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Paper • 2601.16480 • Published Jan 23 • 50
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development Paper • 2601.11077 • Published Jan 16 • 67
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published Jan 4 • 19
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published Dec 4, 2025 • 85
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset Paper • 2507.03483 • Published Jul 4, 2025 • 24
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 24
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published Mar 2, 2025 • 13
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial open-r1 • Jan 31, 2025 • 51
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published Feb 26, 2025 • 11
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 43 items • Updated Mar 2 • 728
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning mayank-mishra • Jun 11, 2024 • 21
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Dec 6, 2024 • 975
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation Paper • 2402.13013 • Published Feb 20, 2024 • 1