qiulu

qiulu66

·

AI & ML interests

None yet

Organizations

upvoted a paper 2 months ago

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

Paper • 2604.19734 • Published Apr 21 • 33

upvoted 2 papers 4 months ago

Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens

Paper • 2603.19232 • Published Mar 19 • 33

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Paper • 2603.02138 • Published Mar 2 • 151

upvoted a paper 6 months ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published Dec 23, 2025 • 52

upvoted a collection 7 months ago

TimeLens

[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs • 5 items • Updated Feb 24 • 9

upvoted 2 papers 8 months ago

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

Paper • 2511.16669 • Published Nov 20, 2025 • 31

OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes

Paper • 2510.26800 • Published Oct 30, 2025 • 22

upvoted 3 papers 10 months ago

FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark

Paper • 2509.09680 • Published Sep 11, 2025 • 44

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Paper • 2508.20088 • Published Aug 27, 2025 • 21

T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

Paper • 2508.17472 • Published Aug 24, 2025 • 26

upvoted a paper 11 months ago

ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

Paper • 2507.20939 • Published Jul 28, 2025 • 58

upvoted 3 papers 12 months ago

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

Paper • 2507.07984 • Published Jul 10, 2025 • 43

OmniPart: Part-Aware 3D Generation with Semantic Decoupling and Structural Cohesion

Paper • 2507.06165 • Published Jul 8, 2025 • 60

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

Paper • 2507.05240 • Published Jul 7, 2025 • 49

upvoted 6 papers about 1 year ago

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning

Paper • 2506.16141 • Published Jun 19, 2025 • 27

DreamCube: 3D Panorama Generation via Multi-plane Synchronization

Paper • 2506.17206 • Published Jun 20, 2025 • 23

Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5, 2025 • 27

AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation

Paper • 2506.03126 • Published Jun 3, 2025 • 22

Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Paper • 2505.21374 • Published May 27, 2025 • 29

Personalized Text-to-Image Generation with Auto-Regressive Models

Paper • 2504.13162 • Published Apr 17, 2025 • 18