File size: 1,573 Bytes

2400d1c
8f0d2c4
 
2400d1c
 
 
 
 
 
 
8f0d2c4
2400d1c
8f0d2c4
2400d1c
8f0d2c4
 
2400d1c
 
 
 
8f0d2c4
 
2400d1c
 
 
 
8f0d2c4
2400d1c
8f0d2c4
2400d1c
 
 
 
8f0d2c4
2400d1c
8f0d2c4
2400d1c
8f0d2c4
2400d1c
8f0d2c4
2400d1c
 
8f0d2c4
2400d1c
 
8f0d2c4
2400d1c
 
 
8f0d2c4
2400d1c

# Model Card — Qwen2-VL-ImgChat-2B

## Model Details
- **Model Name:** Qwen2-VL-ImgChat-2B  
- **Model Type:** Vision-Language Model fine-tuned for multimodal dialog auto-completion  
- **Language(s):** English  
- **Base Model:** Qwen2-VL-2B  
- **Fine-tuning Dataset:** ImageChat  
- **License:** Same as base model (Qwen2-VL license)  
- **Repository:** https://github.com/devichand579/MAC

---

## Intended Use

### Direct Use
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
- Multimodal dialog systems
- Image-grounded conversational agents
- Research on multimodal auto-completion

### Out-of-Scope Use
The model is not intended for:
- Medical, legal, or financial advice
- Safety-critical decision-making
- Autonomous systems requiring guaranteed correctness

---

## Limitations and Risks
- Model outputs may contain inaccuracies or biases inherited from training data.
- Performance depends on image relevance and dialogue context quality.
- The model is not explicitly safety-filtered.

---

## How to Use

Example usage with Hugging Face Transformers:

```python
from transformers import AutoProcessor, AutoModelForVision2Seq

processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")

inputs = processor(images=your_image,
                   text="Describe the image.",
                   return_tensors="pt")

outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))