Linearizing Vision Transformer with Test-Time Training Paper • 2605.02772 • Published 30 days ago • 20
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published May 14 • 35
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision Paper • 2605.05781 • Published May 7 • 5
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision Paper • 2605.05781 • Published May 7 • 5
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models Paper • 2604.25636 • Published Apr 28 • 24
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published Jan 21 • 75
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14, 2024 • 17
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14, 2024 • 22
CODA: Repurposing Continuous VAEs for Discrete Tokenization Paper • 2503.17760 • Published Mar 22, 2025 • 4
CODA: Repurposing Continuous VAEs for Discrete Tokenization Paper • 2503.17760 • Published Mar 22, 2025 • 4