02. Prompt engineering - a dram023 Collection

dram023 's Collections

12. Econometría

09. Investigación

07. IA agentica

03. AI imagenes

02. Prompt engineering

02. Prompt engineering

updated Jun 19

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Paper • 2605.29648 • Published May 28 • 10
Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

Paper • 2605.29548 • Published May 28 • 13
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

Paper • 2605.29861 • Published May 28 • 16
COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Paper • 2605.31264 • Published May 29 • 124
dMoE: dLLMs with Learnable Block Experts

Paper • 2605.30876 • Published May 29 • 38
On the Geometry of On-Policy Distillation

Paper • 2606.07082 • Published Jun 5 • 75
SWE-Explore: Benchmarking How Coding Agents Explore Repositories

Paper • 2606.07297 • Published Jun 5 • 123
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders

Paper • 2606.07473 • Published Jun 5 • 15
DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Paper • 2606.07299 • Published Jun 5 • 7
Why Muon Outperforms Adam: A Curvature Perspective

Paper • 2606.04662 • Published Jun 3 • 10
Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Paper • 2606.09365 • Published Jun 8 • 3
Kwai Keye-VL-2.0 Technical Report

Paper • 2606.10651 • Published Jun 9 • 194
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution

Paper • 2606.10917 • Published Jun 9 • 77
Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

Paper • 2606.12476 • Published Jun 10 • 2
Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Paper • 2605.30789 • Published Jun 2 • 26
Kairos: A Native World Model Stack for Physical AI

Paper • 2606.16533 • Published Jun 16 • 42
SciOrch: Learning to Orchestrate Expert LLMs for Solving Frontier Multimodal Scientific Reasoning Tasks

Paper • 2606.15872 • Published Jun 14 • 12
CEO-Bench: Can Agents Play the Long Game?

Paper • 2606.18543 • Published Jun 16 • 8