Eureka-Audio: Triggering Audio Intelligence in Compact Language Models Paper • 2602.13954 • Published 27 days ago • 3
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16, 2025 • 120
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21, 2025 • 53
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models Paper • 2505.13444 • Published May 19, 2025 • 16
GUI Datasets Collection Datasets from the graphical user interfaces domain (screenshots). • 20 items • Updated Dec 3, 2024 • 8