FrenchBench Evaluation datasets Collection These datasets are used to evaluate models on French performance using: https://github.com/EleutherAI/lm-evaluation-harness (from CroissantLLM paper) • 11 items • Updated Jun 7, 2024 • 8
AutoResearch-RL: Perpetual Self-Evaluating Reinforcement Learning Agents for Autonomous Neural Architecture Discovery Paper • 2603.07300 • Published 8 days ago • 15
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 18 days ago • 197
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 16 days ago • 87
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning Paper • 2603.00889 • Published 15 days ago • 50
OpenAutoNLU: Open Source AutoML Library for NLU Paper • 2603.01824 • Published 14 days ago • 47
Heterogeneous Agent Collaborative Reinforcement Learning Paper • 2603.02604 • Published 13 days ago • 174
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval Paper • 2603.04743 • Published 11 days ago • 47
SkillNet: Create, Evaluate, and Connect AI Skills Paper • 2603.04448 • Published 18 days ago • 80
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published 21 days ago • 55
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs Paper • 2508.06601 • Published Aug 8, 2025 • 7
🎯 Liquid Nanos Collection Library of task-specific models: https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices • 26 items • Updated 1 day ago • 110