Multimodal OCR: Parse Anything from Documents
Paper
• 2603.13032 • Published
• 23
None defined yet.
CodePercept: Code-Grounded Visual STEM Perception for MLLMs
From Narrow to Panoramic Vision: Attention-Guided Cold-Start Reshapes Multimodal Reasoning