Belle-VLM-Base / README.md

beyoru

Upload Belle-VLM (epoch-based training)

ec93f53 verified about 2 months ago

preview code

raw

history blame contribute delete

1.21 kB

metadata

license: apache-2.0
language:
  - vi
  - en
tags:
  - vision-language-model
  - vlm
  - qwen3
  - fastvlm
  - vietnamese
base_model: Qwen/Qwen3-0.6B
datasets:
  - 5CD-AI/Viet-multimodal-open-r1-8k-verified

Belle-VLM: Vietnamese Vision Language Model

Model Description

Belle-VLM is a Vision Language Model trained for Vietnamese multimodal reasoning tasks.

Architecture

LLM Backbone: Qwen3-0.6B
Vision Encoder: FastViTHD (MobileCLIP)
Projector: MLP 2-layer (3072 -> 1024)

Training

Dataset: 5CD-AI/Viet-multimodal-open-r1-8k-verified
Method: LoRA fine-tuning
Epochs: 2
Learning Rate: 2e-05

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "beyoru/Belle-VLM",
    trust_remote_code=True,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("beyoru/Belle-VLM", trust_remote_code=True)

Training Details

Parameter	Value
Base Model	Qwen/Qwen3-0.6B
Vision Tower	mobileclip_l_384
LoRA Rank	8
LoRA Alpha	16
Batch Size	1 x 1
Epochs	2

License

Apache 2.0