Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.25716

about 21 hours ago

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published 22 days ago • 142
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published 21 days ago • 143

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published 24 days ago • 32
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 24 days ago • 131
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 27 days ago • 135
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published 27 days ago • 123

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Paper • 2603.28762 • Published 20 days ago • 25
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published 24 days ago • 32

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Paper • 2512.21337 • Published Dec 24, 2025 • 31
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Paper • 2512.15374 • Published Dec 17, 2025 • 6
Fast-weight Product Key Memory

Paper • 2601.00671 • Published Jan 2 • 6

about 21 hours ago

ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published 22 days ago • 142
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Paper • 2603.27538 • Published 21 days ago • 143

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Paper • 2603.28762 • Published 20 days ago • 25
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Paper • 2603.25716 • Published 24 days ago • 154
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

Paper • 2603.25746 • Published 24 days ago • 155
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published 24 days ago • 32

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published 24 days ago • 32
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published 24 days ago • 131
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published 27 days ago • 135
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published 27 days ago • 123

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 176
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 72
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27

Mixture of Contexts for Long Video Generation

Paper • 2508.21058 • Published Aug 28, 2025 • 35
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models

Paper • 2512.21337 • Published Dec 24, 2025 • 31
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Paper • 2512.15374 • Published Dec 17, 2025 • 6
Fast-weight Product Key Memory

Paper • 2601.00671 • Published Jan 2 • 6

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs