moonshotai/Kimi-VL-A3B-Thinking-2506 Image-Text-to-Text • 16B • Updated Aug 18, 2025 • 166k • 339
Running on Zero Featured 184 Chat with Kimi-VL-A3B-Thinking-2506 🤔 184 Chat with images, videos, or PDFs to generate text
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 7 items • Updated about 19 hours ago • 78
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems Paper • 2401.03945 • Published Jan 8, 2024
SpeechAlign: Aligning Speech Generation to Human Preferences Paper • 2404.05600 • Published Apr 8, 2024 • 1
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model Paper • 2408.02503 • Published Aug 5, 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models Paper • 2411.09691 • Published Nov 14, 2024
QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models Paper • 2405.13014 • Published May 14, 2024
LEGO:Language Enhanced Multi-modal Grounding Model Paper • 2401.06071 • Published Jan 11, 2024 • 12