One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications Paper • 2606.25621 • Published 2 days ago • 12
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 10 days ago • 61
Cosmos3 Collection Omnimodal World Models for Physical AI • 16 items • Updated about 2 hours ago • 131
SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning Paper • 2606.13673 • Published 15 days ago • 106
FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding Paper • 2605.19846 • Published May 20 • 3
Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them Paper • 2606.06361 • Published 22 days ago • 16
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models Paper • 2605.30161 • Published 29 days ago • 60
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 30 days ago • 93
view article Article Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation nvidia • May 18 • 21
ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks Paper • 2603.27862 • Published Mar 29 • 33
TIPO: Text to Image with Text Presampling for Prompt Optimization Paper • 2411.08127 • Published Nov 12, 2024 • 4
TIPO Collection Text to Image with text presampling for Prompt Optimization • 6 items • Updated Jan 22, 2025 • 6
Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs Paper • 2602.01600 • Published Feb 2 • 21
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 229
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models Paper • 2506.15681 • Published Jun 18, 2025 • 43
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting Paper • 2512.20927 • Published Dec 24, 2025 • 17
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 56