view article Article From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease Oct 21, 2022 • 44
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 6 days ago • 264
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published 26 days ago • 24
VideoLLaMA2 Collection Optimized VideoLLaMA with improved spatial-temporal modeling and better audio understanding capability • 13 items • Updated Sep 2, 2025 • 20
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9
LASER: Lip Landmark Assisted Speaker Detection for Robustness Paper • 2501.11899 • Published Jan 21, 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9 • 3
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9