File size: 1,573 Bytes
2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c 8f0d2c4 2400d1c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # Model Card — Qwen2-VL-ImgChat-2B
## Model Details
- **Model Name:** Qwen2-VL-ImgChat-2B
- **Model Type:** Vision-Language Model fine-tuned for multimodal dialog auto-completion
- **Language(s):** English
- **Base Model:** Qwen2-VL-2B
- **Fine-tuning Dataset:** ImageChat
- **License:** Same as base model (Qwen2-VL license)
- **Repository:** https://github.com/devichand579/MAC
---
## Intended Use
### Direct Use
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
- Multimodal dialog systems
- Image-grounded conversational agents
- Research on multimodal auto-completion
### Out-of-Scope Use
The model is not intended for:
- Medical, legal, or financial advice
- Safety-critical decision-making
- Autonomous systems requiring guaranteed correctness
---
## Limitations and Risks
- Model outputs may contain inaccuracies or biases inherited from training data.
- Performance depends on image relevance and dialogue context quality.
- The model is not explicitly safety-filtered.
---
## How to Use
Example usage with Hugging Face Transformers:
```python
from transformers import AutoProcessor, AutoModelForVision2Seq
processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
inputs = processor(images=your_image,
text="Describe the image.",
return_tensors="pt")
outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))
|