|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: HuggingFaceTB/SmolVLM-Instruct |
|
|
tags: |
|
|
- vision-language |
|
|
- multimodal |
|
|
- chat |
|
|
- conversational |
|
|
- text-generation |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# SmolVLM Final Merged |
|
|
|
|
|
This is a fine-tuned version of SmolVLM-Instruct, optimized for conversational AI and vision-language tasks. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: HuggingFaceTB/SmolVLM-Instruct |
|
|
- **Training**: Fine-tuned using LLaMA-Factory |
|
|
- **Use Cases**: Chat, vision understanding, multimodal reasoning |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoProcessor, AutoModelForVision2Seq |
|
|
import torch |
|
|
|
|
|
model = AutoModelForVision2Seq.from_pretrained("Tj/smolvlm-final-merged") |
|
|
processor = AutoProcessor.from_pretrained("Tj/smolvlm-final-merged") |
|
|
|
|
|
# Your inference code here |