Belle-VLM-Base / README.md
beyoru's picture
Upload Belle-VLM (epoch-based training)
ec93f53 verified
---
license: apache-2.0
language:
- vi
- en
tags:
- vision-language-model
- vlm
- qwen3
- fastvlm
- vietnamese
base_model: Qwen/Qwen3-0.6B
datasets:
- 5CD-AI/Viet-multimodal-open-r1-8k-verified
---
# Belle-VLM: Vietnamese Vision Language Model
## Model Description
Belle-VLM is a Vision Language Model trained for Vietnamese multimodal reasoning tasks.
### Architecture
- **LLM Backbone**: Qwen3-0.6B
- **Vision Encoder**: FastViTHD (MobileCLIP)
- **Projector**: MLP 2-layer (3072 -> 1024)
### Training
- **Dataset**: 5CD-AI/Viet-multimodal-open-r1-8k-verified
- **Method**: LoRA fine-tuning
- **Epochs**: 2
- **Learning Rate**: 2e-05
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"beyoru/Belle-VLM",
trust_remote_code=True,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("beyoru/Belle-VLM", trust_remote_code=True)
```
## Training Details
| Parameter | Value |
|-----------|-------|
| Base Model | Qwen/Qwen3-0.6B |
| Vision Tower | mobileclip_l_384 |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| Batch Size | 1 x 1 |
| Epochs | 2 |
## License
Apache 2.0