Instructions to use ThaiLLM/ThaiLLM-30B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ThaiLLM/ThaiLLM-30B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ThaiLLM/ThaiLLM-30B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ThaiLLM/ThaiLLM-30B") model = AutoModelForCausalLM.from_pretrained("ThaiLLM/ThaiLLM-30B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ThaiLLM/ThaiLLM-30B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThaiLLM/ThaiLLM-30B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThaiLLM/ThaiLLM-30B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ThaiLLM/ThaiLLM-30B
- SGLang
How to use ThaiLLM/ThaiLLM-30B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ThaiLLM/ThaiLLM-30B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThaiLLM/ThaiLLM-30B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ThaiLLM/ThaiLLM-30B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThaiLLM/ThaiLLM-30B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ThaiLLM/ThaiLLM-30B with Docker Model Runner:
docker model run hf.co/ThaiLLM/ThaiLLM-30B
ThaiLLM-30B info
This model is a continued pre-training from Qwen3-30B-A3B, which underwent training on a diverse corpus of approximately 63 billion tokens.
Important Note: This is a base model that requires instruction fine-tuning to align with specific user requirements and use cases.
Data
The training corpus consists of the following datasets:
| Dataset | Tokens |
|---|---|
| Fineweb2-ENG | 24,000,000,000 |
| Fineweb2-TH | 31,525,674,209 |
| CuratedData | 8,054,246,789 |
CuratedData Breakdown
| Category | Token Count |
|---|---|
| Business & Finance | 736,071,807 |
| News | 1,700,662,378 |
| Education | 576,489,778 |
| Social | 211,000,000 |
| Government | 40,492,117 |
| Medical | 42,987,587 |
| Conversation | 80,919,390 |
| Code | 620,218 |
| Research Articles | 4,185,649,758 |
| Law | 467,994,847 |
| Travel | 6,948,290 |
| Others | 4,410,619 |
*Token counts calculated using Qwen3 Tokenizer
Requirements
The code of Qwen3 has been integrated into the latest Hugging Face transformers library. We strongly recommend using the latest version of transformers.
With transformers<4.51.0, you will encounter the following error:
KeyError: 'qwen3'
Usage Training
Important: This is a base model and requires instruction fine-tuning before use to ensure optimal performance for your specific tasks and requirements.
Recommended Training Setup
We recommend using LLaMA-Factory for instruction fine-tuning. This framework provides an easy-to-use interface for training language models with various optimization techniques.
Quick Start with LLaMA-Factory
# Clone the repository
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
# Install dependencies
pip install -e .
# Example training command for LoRA
llamafactory-cli train \
--model_name_or_path ThaiLLM/ThaiLLM-30B \
--stage sft \
--do_train \
--finetuning_type lora \
--dataset your_dataset \
--template qwen3 \
--cutoff_len 8192 \
--learning_rate 5e-05 \
--num_train_epochs 3.0 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--max_grad_norm 1.0 \
--logging_steps 5 \
--save_steps 100 \
--warmup_steps 0 \
--output_dir saves/ThaiLLM-30B-lora \
--bf16
Usage Inference
Below are code snippets to get quickly started with running the model. First, install the necessary libraries.
pip install -U transformers torch accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM,
import torch
model_id = "ThaiLLM/ThaiLLM-30B"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
# Example prompt
prompt = "ΰΈΰΉΰΈ³ΰΈΰΈ£ΰΈ΄ΰΈͺΰΈΈΰΈΰΈΰΈ΄ΰΉΰΈ‘ΰΈ΅ΰΈΰΉΰΈ² pH ΰΉΰΈΰΉΰΈ²ΰΉΰΈ"
inputs = tokenizer(prompt, return_tensors="pt")
# Generate response
with torch.inference_mode():
generate_ids = model.generate(
inputs.input_ids,
max_new_tokens=500,
repetition_penalty=1.2,
num_beams=1,
do_sample=True,
top_k=40,
top_p=0.75,
temperature=0.4,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.batch_decode(
generate_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=True
)[0]
print(response)
Benchmarks
We evaluated ThaiLLM-30B against Qwen3-30B-Base using multiple-choice question datasets in both Thai and English.
Each benchmark measures the probability of selecting the correct choice based on the modelβs next-token prediction.
Natural Language Understanding (NLU)
| Task | Qwen3-30B-Base | ThaiLLM-30B (Qwen3-30B-A3B-cpt) | Ξ |
|---|---|---|---|
| Belebele (TH) | 0.8704 | 0.8849 | +0.0145 |
| XNLI (TH) | 0.7507 | 0.7363 | -0.0144 |
| ThaiExam (Overall) | 0.5947 | 0.6478 | +0.0531 |
| βββ A-Level | 0.5276 | 0.6457 | +0.1181 |
| βββ IC | 0.6737 | 0.7158 | +0.0421 |
| βββ ONET | 0.5864 | 0.6296 | +0.0432 |
| βββ TGAT | 0.7538 | 0.7692 | +0.0154 |
| βββ TPAT-1 | 0.5259 | 0.5517 | +0.0258 |
| M3Exam (Overall) | 0.5452 | 0.5660 | +0.0208 |
| MMLU (ENG, 5-shot) | 0.9600 | 0.9500 | -0.0100 |
| MMLU-Thai | 0.7004 | 0.7284 | +0.0280 |
| XCOPA-Thai | 0.8940 | 0.8760 | -0.0180 |
| M6Exam (Overall) | 0.5869 | 0.6196 | +0.0327 |
| βββ English | 0.8846 | 0.8462 | -0.0384 |
| βββ Math | 0.5294 | 0.5294 | 0.0000 |
| βββ Science | 0.6071 | 0.6786 | +0.0715 |
| βββ Social | 0.7091 | 0.7636 | +0.0545 |
| βββ Thai | 0.4980 | 0.5388 | +0.0408 |
| Model | Average Score |
|---|---|
| Qwen3-30B-Base | 0.7378 |
| ThaiLLM-30B | 0.7511 |
MMLU-ProX
| Category | Qwen3-30B-Base | ThaiLLM-30B | Ξ |
|---|---|---|---|
| Biology | 0.7294 | 0.7462 | +0.0168 |
| Business | 0.4411 | 0.4499 | +0.0088 |
| Chemistry | 0.4064 | 0.4046 | -0.0018 |
| Computer Science | 0.5122 | 0.5220 | +0.0098 |
| Economics | 0.6434 | 0.6339 | -0.0095 |
| Engineering | 0.4943 | 0.4881 | -0.0062 |
| Health | 0.4891 | 0.5226 | +0.0335 |
| History | 0.4514 | 0.4488 | -0.0026 |
| Law | 0.2982 | 0.2982 | 0.0000 |
| Math | 0.4537 | 0.4597 | +0.0060 |
| Other | 0.3918 | 0.4232 | +0.0314 |
| Philosophy | 0.3768 | 0.3627 | -0.0141 |
| Physics | 0.4450 | 0.4442 | -0.0008 |
| Psychology | 0.5952 | 0.6078 | +0.0126 |
| Model | Overall |
|---|---|
| Qwen3-30B-Base | 0.4739 |
| ThaiLLM-30B | 0.4797 |
Limitations
- This is a base model and requires instruction fine-tuning for optimal performance
- Performance on specialized domains may require domain-specific fine-tuning
- As with all language models, outputs should be verified for accuracy in critical applications
Citation
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
Dataset Contributors
- Downloads last month
- 724
