AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security Paper • 2605.29801 • Published May 28 • 144
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening Paper • 2605.19597 • Published May 19 • 21
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 165
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published May 4 • 140
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models Paper • 2604.10866 • Published Apr 13 • 68
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping Paper • 2604.11297 • Published Apr 13 • 144
GEMS: Agent-Native Multimodal Generation with Memory and Skills Paper • 2603.28088 • Published Mar 30 • 87
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Paper • 2406.16554 • Published Jun 24, 2024 • 1
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA Paper • 2509.17743 • Published Sep 22, 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper • 2601.01576 • Published Jan 4 • 19
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training Paper • 2502.04066 • Published Feb 6, 2025