SkillNet: Create, Evaluate, and Connect AI Skills Paper • 2603.04448 • Published 15 days ago • 78 • 5
From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published Feb 4 • 15 • 4
InnoGym: Benchmarking the Innovation Potential of AI Agents Paper • 2512.01822 • Published Dec 1, 2025 • 36 • 2
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published Oct 21, 2025 • 114 • 3
Executable Knowledge Graphs for Replicating AI Research Paper • 2510.17795 • Published Oct 20, 2025 • 15 • 2
OceanGym: A Benchmark Environment for Underwater Embodied Agents Paper • 2509.26536 • Published Sep 30, 2025 • 36 • 2
Towards Personalized Deep Research: Benchmarks and Evaluations Paper • 2509.25106 • Published Sep 29, 2025 • 30 • 1
Automating Steering for Safe Multimodal Large Language Models Paper • 2507.13255 • Published Jul 17, 2025 • 4 • 1
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25, 2025 • 10 • 1
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality Paper • 2506.19807 • Published Jun 24, 2025 • 7 • 1
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study Paper • 2506.19794 • Published Jun 24, 2025 • 8 • 1
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science Paper • 2506.10974 • Published Jun 12, 2025 • 19 • 2
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark Paper • 2506.10960 • Published Jun 12, 2025 • 12 • 2
Spatial Knowledge Graph-Guided Multimodal Synthesis Paper • 2505.22633 • Published May 28, 2025 • 4 • 1
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms Paper • 2505.20322 • Published May 23, 2025 • 14 • 2