10 39 27

Zhongang Cai

caizhongang

http://caizhongang.com/

AI & ML interests

Multimodal, Video Reasoning, Spatial Intelligence, Virtual Humans.

Recent Activity

upvoted a paper 8 days ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

upvoted a paper 10 days ago

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

liked a model 17 days ago

sensenova/SenseNova-U1-8B-MoT-Infographic

View all activity

Organizations

upvoted a paper 8 days ago

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Paper • 2606.20515 • Published 9 days ago • 39

upvoted a paper 10 days ago

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Paper • 2606.15236 • Published 11 days ago • 21

liked a model 17 days ago

sensenova/SenseNova-U1-8B-MoT-Infographic

Any-to-Any • 18B • Updated May 16 • 627 • 49

upvoted a paper 30 days ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published about 1 month ago • 75

upvoted a paper about 1 month ago

PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects

Paper • 2605.21572 • Published May 20 • 55

liked a dataset about 1 month ago

sensenova/SenseNova-SI-8M

Viewer • Updated about 10 hours ago • 8.17M • 5.41k • 21

authored a paper about 1 month ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published May 12 • 194

upvoted a paper about 1 month ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published May 12 • 194

liked a model about 2 months ago

sensenova/SenseNova-SI-1.3-Qwen3-VL-8B

Image-Text-to-Text • 9B • Updated Apr 16 • 3.81k • 6

upvoted a paper about 2 months ago

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Paper • 2604.28185 • Published Apr 30 • 92

liked 2 models about 2 months ago

sensenova/SenseNova-U1-8B-MoT-SFT

Any-to-Any • 18B • Updated May 15 • 535 • 51

sensenova/SenseNova-U1-8B-MoT

Any-to-Any • 18B • Updated May 15 • 42.2k • 287

upvoted a collection 2 months ago

SenseNova-U1

Collection

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 10 items • Updated 14 days ago • 74

liked a dataset 3 months ago

Video-Reason/VBVR-Bench-Data

Viewer • Updated Apr 1 • 500 • 3.6k • 9

liked a model 3 months ago

sensenova/SenseNova-SI-1.5-InternVL3-8B

Image-Text-to-Text • 8B • Updated May 12 • 663 • 6

upvoted 2 papers 3 months ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 237

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

liked a model 3 months ago

sensenova/SenseNova-SI-1.4-InternVL3-8B

Image-Text-to-Text • 8B • Updated May 12 • 657 • 4

authored a paper 3 months ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published Mar 19 • 42

upvoted a paper 3 months ago