REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation Paper • 2512.19562 • Published 23 days ago • 5
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published Mar 13, 2025 • 16 • 2
TIM: A Time Interval Machine for Audio-Visual Action Recognition Paper • 2404.05559 • Published Apr 8, 2024
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published Mar 13, 2025 • 16