5 52 45

Haiwen Diao

Paranioar

https://Paranioar.github.io/

AI & ML interests

Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model

Recent Activity

updated a collection about 7 hours ago

SenseNova-U1

updated a collection about 7 hours ago

SenseNova-U1

liked a model 4 days ago

sensenova/SenseNova-U1-8B-MoT-8step-preview

View all activity

Organizations

upvoted a collection 6 days ago

SenseNova-U1

Collection

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 4 items • Updated about 7 hours ago • 43

upvoted a paper 10 days ago

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

Paper • 2604.22748 • Published 14 days ago • 224

upvoted a paper 30 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 235

upvoted a paper about 1 month ago

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published Mar 29 • 145

upvoted 5 papers about 2 months ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published Mar 19 • 42

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Paper • 2603.19231 • Published Mar 19 • 36

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published Mar 17 • 70

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 371

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Paper • 2603.15612 • Published Mar 16 • 153

upvoted an article 2 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

Mar 5

•

152

upvoted 3 papers 2 months ago

upvoted 5 papers 3 months ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published Feb 12 • 20

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published Feb 17 • 149

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published Jan 29 • 75

upvoted 2 papers 5 months ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 68

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Paper • 2512.03041 • Published Dec 2, 2025 • 65

Haiwen Diao

AI & ML interests

Recent Activity

Organizations

Paranioar's activity

NEO-unify: Building Native Multimodal Unified Models End to End