PaDT Multi-Modal Model series based on Patch-as-Decodable-Token framework. PaDT-MLLM/PaDT_OVD_3B Any-to-Any • 4B • Updated Oct 10, 2025 • 819 PaDT-MLLM/PaDT_Pro_3B Any-to-Any • 4B • Updated Oct 10, 2025 • 333 • 2 PaDT-MLLM/PaDT_Pro_7B Any-to-Any • 8B • Updated Oct 10, 2025 • 19 • 2 PaDT-MLLM/PaDT_REC_7B Any-to-Any • 8B • Updated Oct 10, 2025 • 5
PaDT-Paper Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14
PaDT-Dataset Preprocessed datasets used to train PaDT framework. PaDT-MLLM/ReferringImageCaptioning Viewer • Updated Oct 10, 2025 • 575k • 363 • 3 PaDT-MLLM/COCO Viewer • Updated Oct 10, 2025 • 123k • 375 • 1 PaDT-MLLM/RefCOCO Viewer • Updated Oct 10, 2025 • 357k • 3.27k • 4
PaDT Multi-Modal Model series based on Patch-as-Decodable-Token framework. PaDT-MLLM/PaDT_OVD_3B Any-to-Any • 4B • Updated Oct 10, 2025 • 819 PaDT-MLLM/PaDT_Pro_3B Any-to-Any • 4B • Updated Oct 10, 2025 • 333 • 2 PaDT-MLLM/PaDT_Pro_7B Any-to-Any • 8B • Updated Oct 10, 2025 • 19 • 2 PaDT-MLLM/PaDT_REC_7B Any-to-Any • 8B • Updated Oct 10, 2025 • 5
PaDT-Dataset Preprocessed datasets used to train PaDT framework. PaDT-MLLM/ReferringImageCaptioning Viewer • Updated Oct 10, 2025 • 575k • 363 • 3 PaDT-MLLM/COCO Viewer • Updated Oct 10, 2025 • 123k • 375 • 1 PaDT-MLLM/RefCOCO Viewer • Updated Oct 10, 2025 • 357k • 3.27k • 4
PaDT-Paper Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14
Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs Paper • 2510.01954 • Published Oct 2, 2025 • 14