Xiao

Yang1213112131

23 2

AI & ML interests

None yet

Recent Activity

upvoted a paper about 9 hours ago

Let RGB Be the Language of Vision

upvoted a paper 2 days ago

UniVR: Thinking in Visual Space for Unified Visual Reasoning

upvoted a paper 3 months ago

Let ViT Speak: Generative Language-Image Pre-training

View all activity

Organizations

None yet

upvoted a paper about 9 hours ago

Let RGB Be the Language of Vision

Paper • 2607.12450 • Published 6 days ago • 13

upvoted a paper 2 days ago

UniVR: Thinking in Visual Space for Unified Visual Reasoning

Paper • 2607.12800 • Published 6 days ago • 28

upvoted a paper 3 months ago

Let ViT Speak: Generative Language-Image Pre-training

Paper • 2605.00809 • Published May 1 • 33

upvoted 2 papers 4 months ago

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Paper • 2603.29664 • Published Mar 31 • 51

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment

Paper • 2603.23376 • Published Mar 24 • 3

upvoted 2 papers 5 months ago

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Paper • 2602.18422 • Published Feb 20 • 30

VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

Paper • 2602.10102 • Published Feb 10 • 14

upvoted a paper 6 months ago

Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning

Paper • 2601.16163 • Published Jan 22 • 15

upvoted 4 papers 7 months ago

ThinkGen: Generalized Thinking for Visual Generation

Paper • 2512.23568 • Published Dec 29, 2025 • 1

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published Dec 16, 2025 • 72

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published Dec 23, 2025 • 44

StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation

Paper • 2512.09363 • Published Dec 10, 2025 • 74

upvoted a paper 9 months ago

PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling

Paper • 2505.23155 • Published May 29, 2025 • 2

upvoted an article 9 months ago

Article

Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI

nvidia

•

Oct 28, 2025

• 21

upvoted 4 papers 10 months ago

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models

Paper • 2509.21760 • Published Sep 26, 2025 • 15

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Paper • 2509.09674 • Published Sep 11, 2025 • 81

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Paper • 2509.09174 • Published Sep 11, 2025 • 62

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Paper • 2509.08519 • Published Sep 10, 2025 • 130

upvoted a paper 11 months ago

From Editor to Dense Geometry Estimator

Paper • 2509.04338 • Published Sep 4, 2025 • 96

upvoted a paper about 1 year ago

Whole-Body Conditioned Egocentric Video Prediction

Paper • 2506.21552 • Published Jun 26, 2025 • 11

Xiao

AI & ML interests

Recent Activity

Organizations

Yang1213112131's activity

Cosmos Predict 2.5 & Transfer 2.5: Evolving the World Foundation Models for Physical AI