Human Psychometric Questionnaires Mischaracterize LLM Behavior Paper • 2509.10078 • Published May 29 • 36
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 22 days ago • 54
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 29 days ago • 57
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 29 days ago • 59
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation Paper • 2605.31264 • Published May 29 • 120
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses Paper • 2606.02373 • Published 29 days ago • 59
KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks Paper • 2606.03458 • Published 28 days ago • 67
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval Paper • 2604.18584 • Published Apr 20 • 15
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published Apr 29 • 47
A Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning Paper • 2511.16073 • Published Nov 20, 2025 • 1
view article Article Complete Guide: Training and Inference with π₀.₅ (pi05) on Custom Datasets Tonic • Dec 13, 2025 • 5
PromptRL: Prompt Matters in RL for Flow-Based Image Generation Paper • 2602.01382 • Published Feb 1 • 10
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models Paper • 2412.06071 • Published Dec 8, 2024 • 11
fev-bench: A Realistic Benchmark for Time Series Forecasting Paper • 2509.26468 • Published Sep 30, 2025 • 4
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 16 • 73
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models Paper • 2604.09459 • Published Apr 13 • 14