Belle-VLM-Base / README.md
beyoru's picture
Upload Belle-VLM (epoch-based training)
ec93f53 verified
metadata
license: apache-2.0
language:
  - vi
  - en
tags:
  - vision-language-model
  - vlm
  - qwen3
  - fastvlm
  - vietnamese
base_model: Qwen/Qwen3-0.6B
datasets:
  - 5CD-AI/Viet-multimodal-open-r1-8k-verified

Belle-VLM: Vietnamese Vision Language Model

Model Description

Belle-VLM is a Vision Language Model trained for Vietnamese multimodal reasoning tasks.

Architecture

  • LLM Backbone: Qwen3-0.6B
  • Vision Encoder: FastViTHD (MobileCLIP)
  • Projector: MLP 2-layer (3072 -> 1024)

Training

  • Dataset: 5CD-AI/Viet-multimodal-open-r1-8k-verified
  • Method: LoRA fine-tuning
  • Epochs: 2
  • Learning Rate: 2e-05

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "beyoru/Belle-VLM",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("beyoru/Belle-VLM", trust_remote_code=True)

Training Details

Parameter Value
Base Model Qwen/Qwen3-0.6B
Vision Tower mobileclip_l_384
LoRA Rank 8
LoRA Alpha 16
Batch Size 1 x 1
Epochs 2

License

Apache 2.0