DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Paper • 2606.12871 • Published 14 days ago • 10
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Paper • 2606.12871 • Published 14 days ago • 10
DailyReport: An Open-ended Benchmark for Evaluating Search Agents on Daily Search Tasks Paper • 2606.12871 • Published 14 days ago • 10
Stream-T1: Test-Time Scaling for Streaming Video Generation Paper • 2605.04461 • Published May 6 • 109
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation Paper • 2605.03849 • Published May 5 • 129
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 328
A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces Paper • 2602.03442 • Published Feb 3 • 21
FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents Paper • 2602.01566 • Published Feb 2 • 52
Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles Paper • 2602.01590 • Published Feb 2 • 33
WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora Paper • 2602.02053 • Published Feb 2 • 41