RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation Paper • 2601.08430 • Published 15 days ago • 57
Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering Paper • 2601.10402 • Published 13 days ago • 36
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published 13 days ago • 154
Evaluating Parameter Efficient Methods for RLVR Paper • 2512.23165 • Published about 1 month ago • 26
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published Dec 18, 2025 • 212
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published Dec 18, 2025 • 11
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published Dec 19, 2025 • 27
Are We on the Right Way to Assessing LLM-as-a-Judge? Paper • 2512.16041 • Published Dec 17, 2025 • 34
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 75
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published Dec 2, 2025 • 53
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 254 • 6
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 254
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models Paper • 2511.10017 • Published Nov 13, 2025 • 7
MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples Paper • 2511.10047 • Published Nov 13, 2025 • 2
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents Paper • 2511.07685 • Published Nov 10, 2025 • 10
Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation Paper • 2511.10547 • Published Nov 13, 2025 • 5