🤝 Open to Collab

Sean Yu

yushaohan

3 19 2

https://yushaohan.github.io/

yushaohan

AI & ML interests

None yet

Recent Activity

upvoted a paper about 12 hours ago

NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning

upvoted an article 6 days ago

NEO-unify: Building Native Multimodal Unified Models End to End

upvoted a collection 8 days ago

V-JEPA 2

View all activity

Organizations

upvoted a paper about 12 hours ago

NormGuard: Reward-Preserving Norm Constraints in Flow-Matching Reinforcement Learning

Paper • 2606.27771 • Published 4 days ago • 3

upvoted an article 6 days ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

sensenova

•

Mar 5

• 167

upvoted a collection 8 days ago

V-JEPA 2

Collection

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 225

upvoted a paper 17 days ago

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Paper • 2606.13679 • Published 19 days ago • 82

upvoted a paper about 1 month ago

From Pixels to Words -- Towards Native One-Vision Models at Scale

Paper • 2605.28820 • Published May 27 • 75

upvoted 2 papers about 2 months ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published May 12 • 194

Continuous Latent Diffusion Language Model

Paper • 2605.06548 • Published May 7 • 85

upvoted a collection about 2 months ago

SenseNova-U1

Collection

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 10 items • Updated 17 days ago • 74

upvoted a paper 2 months ago

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Paper • 2604.24763 • Published Apr 27 • 71

commented a paper 2 months ago

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Paper • 2604.21686 • Published Apr 23 • 36 •

upvoted a paper 2 months ago

Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation

Paper • 2604.10030 • Published Apr 11 • 15

upvoted 3 papers 3 months ago

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 148

MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published Mar 23 • 138

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published Mar 22 • 78

liked 2 datasets 4 months ago

HankYang428/GENIUS

Updated Mar 24 • 63 • 1

multimodal-reasoning-lab/Zebra-CoT

Viewer • Updated Jan 30 • 160k • 4.87k • 70

upvoted a paper 4 months ago

DeepSight: An All-in-One LM Safety Toolkit

Paper • 2602.12092 • Published Feb 12 • 15

updated 2 models 4 months ago

yushaohan/ProGuard-3B

Image-Text-to-Text • 4B • Updated Feb 15 • 2

yushaohan/ProGuard-7B

Image-Text-to-Text • 8B • Updated Feb 15 • 45

New activity in yushaohan/ProGuard-7B 4 months ago

Add model metadata and link to DeepSight paper

#1 opened 5 months ago by

nielsr

Sean Yu

AI & ML interests

Recent Activity

Organizations

yushaohan's activity

NEO-unify: Building Native Multimodal Unified Models End to End

Add model metadata and link to DeepSight paper