devichand's picture
Update README.md
2400d1c verified
# Model Card — Qwen2-VL-ImgChat-2B
## Model Details
- **Model Name:** Qwen2-VL-ImgChat-2B
- **Model Type:** Vision-Language Model fine-tuned for multimodal dialog auto-completion
- **Language(s):** English
- **Base Model:** Qwen2-VL-2B
- **Fine-tuning Dataset:** ImageChat
- **License:** Same as base model (Qwen2-VL license)
- **Repository:** https://github.com/devichand579/MAC
---
## Intended Use
### Direct Use
This model generates conversational responses conditioned on both textual and visual context. It is suitable for:
- Multimodal dialog systems
- Image-grounded conversational agents
- Research on multimodal auto-completion
### Out-of-Scope Use
The model is not intended for:
- Medical, legal, or financial advice
- Safety-critical decision-making
- Autonomous systems requiring guaranteed correctness
---
## Limitations and Risks
- Model outputs may contain inaccuracies or biases inherited from training data.
- Performance depends on image relevance and dialogue context quality.
- The model is not explicitly safety-filtered.
---
## How to Use
Example usage with Hugging Face Transformers:
```python
from transformers import AutoProcessor, AutoModelForVision2Seq
processor = AutoProcessor.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
model = AutoModelForVision2Seq.from_pretrained("devichand/MiniCPM_V_Noimg_ImgChat-7B")
inputs = processor(images=your_image,
text="Describe the image.",
return_tensors="pt")
outputs = model.generate(**inputs)
print(processor.decode(outputs[0]))