SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks
Paper • 2606.09669 • Published • 46
None defined yet.
ShutterMuse: Capture-Time Photography Guidance with MLLMs
Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation