GR00T N1: An Open Foundation Model for Generalist Humanoid Robots Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 157
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19, 2025 • 89
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24, 2025 • 41
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25, 2025 • 33
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving Paper • 2507.17596 • Published Jul 23, 2025 • 7
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper • 2507.18742 • Published Jul 24, 2025 • 6
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI Paper • 2507.10510 • Published Jul 14, 2025 • 5
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Paper • 2507.19457 • Published Jul 25, 2025 • 30
Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report Paper • 2507.16534 • Published Jul 22, 2025 • 9
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17, 2025 • 263
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 253
GUI-G^2: Gaussian Reward Modeling for GUI Grounding Paper • 2507.15846 • Published Jul 21, 2025 • 135
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Paper • 2507.16784 • Published Jul 22, 2025 • 123
T-LoRA: Single Image Diffusion Model Customization Without Overfitting Paper • 2507.05964 • Published Jul 8, 2025 • 121
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19, 2025 • 136
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Paper • 2410.10813 • Published Oct 14, 2024 • 16
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? Paper • 2506.11928 • Published Jun 13, 2025 • 25
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents Paper • 2505.22954 • Published May 29, 2025 • 15
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis Paper • 2505.11581 • Published May 16, 2025 • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12, 2024 • 128
Gorilla: Large Language Model Connected with Massive APIs Paper • 2305.15334 • Published May 24, 2023 • 6
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace Paper • 2303.17580 • Published Mar 30, 2023 • 15
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework Paper • 2308.08155 • Published Aug 16, 2023 • 11
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 37
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 110
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 191
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3, 2025 • 58
BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues Paper • 2501.10836 • Published Jan 18, 2025 • 1
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published Nov 4, 2024 • 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents Paper • 2401.00812 • Published Jan 1, 2024 • 12
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28, 2025 • 31
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM Paper • 2509.18058 • Published Sep 22, 2025 • 12
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs Paper • 2508.10029 • Published Aug 8, 2025
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs Paper • 2508.10031 • Published Aug 9, 2025
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs Paper • 2508.20333 • Published Aug 28, 2025
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models Paper • 2509.17938 • Published Sep 22, 2025 • 4
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness Paper • 2509.14297 • Published Sep 17, 2025
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 513
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Paper • 2412.21199 • Published Dec 30, 2024 • 13
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization Paper • 2510.24592 • Published Oct 28, 2025 • 17
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 78
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance Paper • 2506.03828 • Published Jun 4, 2025 • 20
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 89
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 258
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 3 days ago • 61
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs Paper • 2604.10539 • Published 4 days ago • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models Paper • 2604.04385 • Published 3 days ago
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation Paper • 2604.09212 • Published 6 days ago • 1
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Paper • 2604.11290 • Published 3 days ago • 1
CocoaBench: Evaluating Unified Digital Agents in the Wild Paper • 2604.11201 • Published 3 days ago • 29
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models Paper • 2604.10949 • Published 3 days ago • 36
Zero-shot World Models Are Developmentally Efficient Learners Paper • 2604.10333 • Published 5 days ago • 6
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models Paper • 2604.02340 • Published 5 days ago • 6
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published 3 days ago • 6
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding Paper • 2604.09557 • Published Feb 10 • 9
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models Paper • 2604.09459 • Published 3 days ago • 9
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator Paper • 2604.08121 • Published 7 days ago • 39
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation Paper • 2604.09132 • Published 6 days ago • 46
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation Paper • 2604.10098 • Published 5 days ago • 66
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization Paper • 2604.11259 • Published 3 days ago • 10
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks Paper • 2604.11753 • Published 3 days ago • 12
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory Paper • 2604.11544 • Published 3 days ago • 1
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction Paper • 2604.08921 • Published 6 days ago • 2
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences? Paper • 2604.10718 • Published 4 days ago • 2
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain Paper • 2604.10425 • Published 4 days ago • 2
Learning Long-term Motion Embeddings for Efficient Kinematics Generation Paper • 2604.11737 • Published 3 days ago • 4
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration Paper • 2604.11446 • Published 3 days ago • 3
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context Paper • 2604.11716 • Published 3 days ago • 3
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind Paper • 2604.11666 • Published 3 days ago • 3
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series Paper • 2604.10799 • Published 4 days ago • 4
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach Paper • 2604.11547 • Published 3 days ago • 4
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training Paper • 2604.10784 • Published 4 days ago • 5
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting Paper • 2604.10688 • Published 4 days ago • 6
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation Paper • 2604.10030 • Published 5 days ago • 12
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators Paper • 2604.11805 • Published 3 days ago • 13
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs Paper • 2604.10480 • Published 4 days ago • 16
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music Paper • 2604.10905 • Published 3 days ago • 22
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published 3 days ago • 88
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation Paper • 2604.08570 • Published 22 days ago • 115