# Model Card — Qwen2-VL-ImgChat-2B

## Model Details
- **Model Name:** Qwen2-VL-ImgChat-2B  
- **Model Type:** Vision-Language Model fine-tuned for multimodal dialog auto-completion  
- **Language(s):** English  
- **Base Model:** Qwen2-VL-2B  
- **Fine-tuning Dataset:** ImageChat  
- **License:** Same as base model (Qwen2-VL license)  
- **Repository:** https://github.com/devichand579/MAC

---

## Intended Use

### Direct Use
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
- Multimodal dialog systems
- Image-grounded conversational agents
- Research on multimodal auto-completion

### Out-of-Scope Use
The model is not intended for:
- Medical, legal, or financial advice
- Safety-critical decision-making
- Autonomous systems requiring guaranteed correctness

---

## Limitations and Risks
- Model outputs may contain inaccuracies or biases inherited from training data.
- Performance depends on image relevance and dialogue context quality.
- The model is not explicitly safety-filtered.

---

## How to Use

Example usage with Hugging Face Transformers:

```python
from transformers import AutoProcessor, AutoModelForVision2Seq

processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")

inputs = processor(images=your_image,
                   text="Describe the image.",
                   return_tensors="pt")

outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))