ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11, 2025 • 53
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance Paper • 2506.06444 • Published Jun 6, 2025 • 73
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents Paper • 2506.11763 • Published Jun 13, 2025 • 73
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research Paper • 2502.04644 • Published Feb 7, 2025 • 4
Deep Research Agents: A Systematic Examination And Roadmap Paper • 2506.18096 • Published Jun 22, 2025 • 3
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers Paper • 2507.02694 • Published Jul 3, 2025 • 19
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4, 2025 • 7
PresentAgent: Multimodal Agent for Presentation Video Generation Paper • 2507.04036 • Published Jul 5, 2025 • 11
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs Paper • 2507.09477 • Published Jul 13, 2025 • 88
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17, 2025 • 20
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6, 2025 • 60
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges Paper • 2508.18076 • Published Aug 25, 2025 • 6
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28, 2025 • 140
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 54
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation Paper • 2510.09116 • Published Oct 10, 2025 • 96
Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published Nov 17, 2025 • 69
Rethinking Training Dynamics in Scale-wise Autoregressive Generation Paper • 2512.06421 • Published Dec 6, 2025 • 7
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics Paper • 2601.02075 • Published 25 days ago • 8
Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models Paper • 2601.01321 • Published 26 days ago • 18
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility Paper • 2601.17027 • Published 13 days ago • 39
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Paper • 2601.20380 • Published 2 days ago • 6