You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ChindaMT-4B

ChindaMT-4B is an open-weight Thai-English machine translation model fine-tuned from Qwen/Qwen3.5-4B. It supports plain translation and instruction-following translation with auxiliary rules in the prompt.

Task: Thai-English machine translation with instruction-following
Base model: Qwen3.5-4B
Parameter count: 4B
License: Apache-2.0

Prompting

Plain translation. Same template for both directions; swap the language line and the source-tag:

Translate English to Thai.

EN: The weather is nice today.

Translate Thai to English.

TH: วันนี้อากาศดีมาก

With instruction following. Add a Rules: block between the language line and the source line. Rules are free-form text:

Translate English to Thai.
Rules:
- Return only the translated text
- Use a clear, professional tone in Thai
- Keep all numerals in Arabic digits

EN: <source text>

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "iapp/ChindaMT-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = "Translate English to Thai.\n\nEN: The weather is nice today."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=1024, temperature=0.01, top_p=0.7, top_k=20,
    repetition_penalty=1.05, do_sample=True,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Evaluation datasets

The evaluation suites used during development will be released soon:

iapp/ChindaMT-CoreEval: 5-domain primary evaluation
iapp/ChindaMT-BroadEval: 10-domain generalization check

Limitations

Thai-English only.
Behavior on out-of-domain or paragraph-length inputs is not comprehensively characterized.

iApp AI Research

Downloads last month: 20

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for iapp/ChindaMT-4B

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(217)

this model

Quantizations

1 model