3 13 10

Shusheng Yang PRO

ShushengYang

https://shushengyang.com

AI & ML interests

computer vision, vision language model

Recent Activity

authored a paper 15 days ago

BLIP3o-NEXT: Next Frontier of Native Image Generation

authored a paper 15 days ago

VideoNSA: Native Sparse Attention Scales Video Understanding

authored a paper 15 days ago

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

View all activity

Organizations

authored 5 papers 15 days ago

BLIP3o-NEXT: Next Frontier of Native Image Generation

Paper • 2510.15857 • Published Oct 17, 2025 • 26

VideoNSA: Native Sparse Attention Scales Video Understanding

Paper • 2510.02295 • Published Oct 2, 2025 • 10

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

Paper • 2511.04655 • Published Nov 6, 2025 • 10

Cambrian-P: Pose-Grounded Video Understanding

Paper • 2605.22819 • Published May 21

Benchmarking Visual State Tracking in Multimodal Video Understanding

Paper • 2606.03920 • Published 24 days ago • 50

updated a dataset 15 days ago

ShushengYang/VSTAT5S

Viewer • Updated 15 days ago • 500 • 89 • 1

published a dataset 15 days ago

ShushengYang/VSTAT5S

Viewer • Updated 15 days ago • 500 • 89 • 1

updated a dataset 18 days ago

ShushengYang/RealOrFake

Viewer • Updated 18 days ago • 2k • 71

published a dataset 18 days ago

ShushengYang/RealOrFake

Viewer • Updated 18 days ago • 2k • 71

upvoted a paper 22 days ago

Benchmarking Visual State Tracking in Multimodal Video Understanding

Paper • 2606.03920 • Published 24 days ago • 50

updated a dataset 22 days ago

ShushengYang/VSTAT

Viewer • Updated 22 days ago • 1.5k • 81

published a dataset 22 days ago

ShushengYang/VSTAT

Viewer • Updated 22 days ago • 1.5k • 81

updated a dataset 3 months ago

nyu-visionx/VSI-590K-MetaInfo

Updated Apr 3 • 19

published a dataset 3 months ago

nyu-visionx/VSI-590K-MetaInfo

Updated Apr 3 • 19

upvoted a paper 4 months ago

Beyond Language Modeling: An Exploration of Multimodal Pretraining

Paper • 2603.03276 • Published Mar 3 • 107

upvoted a paper 5 months ago

Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55

updated a dataset 5 months ago

nyu-visionx/Cambrian-S-3M

Updated Jan 22 • 32k • 7

upvoted a collection 6 months ago

Cambrian-S-Data

Collection

Data used during Cambrian-S's 4-stage training • 4 items • Updated Feb 27 • 5

updated a model 6 months ago

nyu-visionx/Cambrian-S-3B-S3

3B • Updated Jan 4 • 2

updated a collection 6 months ago

Cambrian-S Models

Collection

18 items • Updated Mar 2 • 8

Shusheng Yang PRO

AI & ML interests

Recent Activity

Organizations

ShushengYang's activity