Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility Paper • 2601.17027 • Published 10 days ago • 19
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19, 2025 • 89
ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch Paper • 2601.13606 • Published 7 days ago • 6
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 6 days ago • 64
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published 6 days ago • 32
Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings Paper • 2512.12167 • Published Dec 13, 2025 • 2
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published 7 days ago • 49
End-to-End Video Character Replacement without Structural Guidance Paper • 2601.08587 • Published 14 days ago • 8
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting Paper • 2403.08551 • Published Mar 13, 2024 • 11
FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing Paper • 2601.01720 • Published 22 days ago • 6
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 19 days ago • 50
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published 19 days ago • 28
ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation Paper • 2601.03955 • Published 20 days ago • 3