Ming Chen

ChenMing-thu14

·

AI & ML interests

3D Human Pose Estimation

Recent Activity

upvoted a paper 5 days ago

Wan-Streamer v0.2: Higher Resolution, Same Latency

upvoted a paper 16 days ago

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

upvoted a paper 20 days ago

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

View all activity

Organizations

None yet

upvoted a paper 5 days ago

Wan-Streamer v0.2: Higher Resolution, Same Latency

Paper • 2607.04443 • Published 10 days ago • 37

upvoted a paper 16 days ago

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Paper • 2606.28128 • Published 19 days ago • 51

upvoted a paper 20 days ago

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Paper • 2606.25041 • Published 22 days ago • 116

upvoted 9 papers about 1 month ago

Avatar V: Scaling Video-Reference Avatar Video Generation

Paper • 2606.13872 • Published Jun 11 • 9

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Paper • 2606.13432 • Published Jun 11 • 113

Echo-Memory: A Controlled Study of Memory in Action World Models

Paper • 2606.09803 • Published Jun 8 • 33

Latent Spatial Memory for Video World Models

Paper • 2606.09828 • Published Jun 8 • 71

AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation

Paper • 2606.03972 • Published Jun 2 • 14

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

Paper • 2606.04527 • Published Jun 3 • 28

Cosmos 3: Omnimodal World Models for Physical AI

Paper • 2606.02800 • Published Jun 1 • 140

StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration

Paper • 2605.25659 • Published May 25 • 17

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

Paper • 2605.30351 • Published May 28 • 26

upvoted 5 papers about 2 months ago

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published May 26 • 145

WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation

Paper • 2605.25874 • Published May 25 • 105

Bernini: Latent Semantic Planning for Video Diffusion

Paper • 2605.22344 • Published May 21 • 20

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Paper • 2605.18678 • Published May 18 • 79

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published May 18 • 116

upvoted 3 papers 3 months ago

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Paper • 2604.21686 • Published Apr 23 • 36

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Paper • 2604.19636 • Published Apr 21 • 88

HDR Video Generation via Latent Alignment with Logarithmic Encoding

Paper • 2604.11788 • Published Apr 13 • 14