Li

Xiangtai

·

AI & ML interests

None yet

Organizations

upvoted a paper 4 months ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 526

upvoted a paper 5 months ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 44

upvoted 8 papers 8 months ago

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30, 2025 • 34

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30, 2025 • 133

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 117

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer

Paper • 2504.10462 • Published Apr 14, 2025 • 16

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

Paper • 2506.03144 • Published Jun 3, 2025 • 8

CyberV: Cybernetics for Test-time Scaling in Video Understanding

Paper • 2506.07971 • Published Jun 9, 2025 • 5

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

upvoted a paper 9 months ago

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Paper • 2510.11712 • Published Oct 13, 2025 • 31

upvoted a paper 10 months ago

DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 310

upvoted 2 papers 12 months ago

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Paper • 2507.07999 • Published Jul 10, 2025 • 51

Calligrapher: Freestyle Text Image Customization

Paper • 2506.24123 • Published Jun 30, 2025 • 37

upvoted 6 papers about 1 year ago

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20, 2025 • 79

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Paper • 2505.15277 • Published May 21, 2025 • 105

MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21, 2025 • 99

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding

Paper • 2504.10465 • Published Apr 14, 2025 • 27

An Empirical Study of GPT-4o Image Generation Capabilities

Paper • 2504.05979 • Published Apr 8, 2025 • 64