Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism Paper • 2605.12524 • Published Apr 7 • 1
No One Knows the State of the Art in Geospatial Foundation Models Paper • 2605.12678 • Published 7 days ago • 1
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting Paper • 2506.01015 • Published 5 days ago • 1
Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction Paper • 2602.09016 • Published 8 days ago • 2
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning Paper • 2605.14040 • Published 6 days ago • 2
Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution Paper • 2605.15138 • Published 5 days ago • 2
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 4 days ago • 29
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items Paper • 2604.19748 • Published 28 days ago • 250
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications Paper • 2508.16279 • Published Aug 22, 2025 • 63
Flow-OPD: On-Policy Distillation for Flow Matching Models Paper • 2605.08063 • Published 11 days ago • 96
Very Large-Scale Multi-Agent Simulation in AgentScope Paper • 2407.17789 • Published Jul 25, 2024 • 42
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14, 2025 • 158
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published 19 days ago • 71
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16, 2025 • 58
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation Paper • 2410.17799 • Published Oct 23, 2024 • 13
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published 29 days ago • 94
Geometric Context Transformer for Streaming 3D Reconstruction Paper • 2604.14141 • Published Apr 15 • 21
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16, 2025 • 125