OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification Paper • 2606.01476 • Published 29 days ago • 8
DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents Paper • 2605.04808 • Published May 6 • 20
Synthetic Sandbox for Training Machine Learning Engineering Agents Paper • 2604.04872 • Published Apr 6 • 14
Synthetic Sandbox for Training Machine Learning Engineering Agents Paper • 2604.04872 • Published Apr 6 • 14
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding Paper • 2412.06474 • Published Dec 9, 2024
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment Paper • 2501.09620 • Published Jan 16, 2025
S'MoRE: Structural Mixture of Residual Experts for LLM Fine-tuning Paper • 2504.06426 • Published Apr 8, 2025 • 2
CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning Paper • 2503.19900 • Published Mar 25, 2025
RecoWorld: Building Simulated Environments for Agentic Recommender Systems Paper • 2509.10397 • Published Sep 12, 2025 • 8
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding Paper • 2508.15717 • Published Aug 21, 2025 • 1
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning Paper • 2510.05251 • Published Oct 6, 2025 • 8