Spaces:
Sleeping
Sleeping
Create README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,69 @@
|
|
| 1 |
---
|
| 2 |
-
title: MiniCPM-V-4
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: MiniCPM-V-4.5 Multimodal Chat
|
| 3 |
+
emoji: ๐
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: "4.0.0"
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# MiniCPM-V-4.5 Multimodal Chat ๐
|
| 14 |
+
|
| 15 |
+
A powerful Gradio interface for the MiniCPM-V-4.5 multimodal model - a GPT-4V level MLLM with only 8B parameters!
|
| 16 |
+
|
| 17 |
+
## Features
|
| 18 |
+
|
| 19 |
+
- ๐ธ **Image Understanding**: Analyze single or multiple images with high-resolution support (up to 1.8M pixels)
|
| 20 |
+
- ๐ฅ **Video Understanding**: Process videos with high refresh rate (up to 10 FPS) and efficient compression
|
| 21 |
+
- ๐ **Document Parsing**: Strong OCR capabilities and PDF document parsing
|
| 22 |
+
- ๐ง **Thinking Modes**: Choose between fast thinking for efficiency or deep thinking for complex problems
|
| 23 |
+
- ๐ **Multilingual**: Support for 30+ languages
|
| 24 |
+
- โ๏ธ **Customizable**: Adjust FPS, context size, temperature, and system prompts
|
| 25 |
+
|
| 26 |
+
## Model Capabilities
|
| 27 |
+
|
| 28 |
+
MiniCPM-V-4.5 achieves state-of-the-art performance across multiple benchmarks:
|
| 29 |
+
- Surpasses GPT-4o-latest and Gemini-2.0 Pro on vision-language tasks
|
| 30 |
+
- Leading OCR performance on OCRBench
|
| 31 |
+
- Efficient video token compression (96x rate)
|
| 32 |
+
- Trustworthy behaviors with multilingual support
|
| 33 |
+
|
| 34 |
+
## Usage
|
| 35 |
+
|
| 36 |
+
1. **Upload**: Choose an image or video file
|
| 37 |
+
2. **Configure**: Adjust settings like FPS (for videos), context size, and temperature
|
| 38 |
+
3. **Prompt**: Enter your question or use the system prompt for specific instructions
|
| 39 |
+
4. **Generate**: Click the generate button to get the model's response
|
| 40 |
+
|
| 41 |
+
## Examples
|
| 42 |
+
|
| 43 |
+
- "What objects do you see in this image?"
|
| 44 |
+
- "Describe the main action happening in this video"
|
| 45 |
+
- "Read and transcribe any text visible in the image"
|
| 46 |
+
- "Analyze this image from an artistic perspective"
|
| 47 |
+
|
| 48 |
+
## Technical Details
|
| 49 |
+
|
| 50 |
+
- **Architecture**: Built on Qwen3-8B and SigLIP2-400M
|
| 51 |
+
- **Parameters**: 8B total parameters
|
| 52 |
+
- **Video Processing**: 3D-Resampler with temporal understanding
|
| 53 |
+
- **Resolution**: Supports images up to 1344x1344 pixels
|
| 54 |
+
- **Efficiency**: 4x fewer visual tokens than most MLLMs
|
| 55 |
+
|
| 56 |
+
## License
|
| 57 |
+
|
| 58 |
+
This model is released under the MiniCPM Model License. Free for academic research and commercial use after registration.
|
| 59 |
+
|
| 60 |
+
## Citation
|
| 61 |
+
|
| 62 |
+
```bibtex
|
| 63 |
+
@article{yao2024minicpm,
|
| 64 |
+
title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
|
| 65 |
+
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
|
| 66 |
+
journal={Nat Commun 16, 5509 (2025)},
|
| 67 |
+
year={2025}
|
| 68 |
+
}
|
| 69 |
+
```
|