Instructions to use ThaiLLM/ThaiLLM-30B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ThaiLLM/ThaiLLM-30B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ThaiLLM/ThaiLLM-30B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ThaiLLM/ThaiLLM-30B")
model = AutoModelForCausalLM.from_pretrained("ThaiLLM/ThaiLLM-30B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ThaiLLM/ThaiLLM-30B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ThaiLLM/ThaiLLM-30B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThaiLLM/ThaiLLM-30B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ThaiLLM/ThaiLLM-30B

SGLang

How to use ThaiLLM/ThaiLLM-30B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ThaiLLM/ThaiLLM-30B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThaiLLM/ThaiLLM-30B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ThaiLLM/ThaiLLM-30B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThaiLLM/ThaiLLM-30B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ThaiLLM/ThaiLLM-30B with Docker Model Runner:
```
docker model run hf.co/ThaiLLM/ThaiLLM-30B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

ThaiLLM-30B info

This model is a continued pre-training from Qwen3-30B-A3B, which underwent training on a diverse corpus of approximately 63 billion tokens.

Important Note: This is a base model that requires instruction fine-tuning to align with specific user requirements and use cases.

Data

The training corpus consists of the following datasets:

Dataset	Tokens
Fineweb2-ENG	24,000,000,000
Fineweb2-TH	31,525,674,209
CuratedData	8,054,246,789

CuratedData Breakdown

Category	Token Count
Business & Finance	736,071,807
News	1,700,662,378
Education	576,489,778
Social	211,000,000
Government	40,492,117
Medical	42,987,587
Conversation	80,919,390
Code	620,218
Research Articles	4,185,649,758
Law	467,994,847
Travel	6,948,290
Others	4,410,619

*Token counts calculated using Qwen3 Tokenizer

Requirements

The code of Qwen3 has been integrated into the latest Hugging Face transformers library. We strongly recommend using the latest version of transformers.

With transformers<4.51.0, you will encounter the following error:

KeyError: 'qwen3'

Usage Training

Important: This is a base model and requires instruction fine-tuning before use to ensure optimal performance for your specific tasks and requirements.

Recommended Training Setup

We recommend using LLaMA-Factory for instruction fine-tuning. This framework provides an easy-to-use interface for training language models with various optimization techniques.

Quick Start with LLaMA-Factory

# Clone the repository
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory

# Install dependencies
pip install -e .

# Example training command for LoRA
llamafactory-cli train \
    --model_name_or_path ThaiLLM/ThaiLLM-30B \
    --stage sft \
    --do_train \
    --finetuning_type lora \
    --dataset your_dataset \
    --template qwen3 \
    --cutoff_len 8192 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --output_dir saves/ThaiLLM-30B-lora \
    --bf16

Usage Inference

Below are code snippets to get quickly started with running the model. First, install the necessary libraries.

pip install -U transformers torch accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM, 
import torch

model_id = "ThaiLLM/ThaiLLM-30B"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    device_map="auto", 
    torch_dtype=torch.bfloat16
)

# Example prompt
prompt = "น้ำบริสุทธิ์มีค่า pH เท่าใด"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate response
with torch.inference_mode(): 
    generate_ids = model.generate( 
        inputs.input_ids, 
        max_new_tokens=500, 
        repetition_penalty=1.2, 
        num_beams=1, 
        do_sample=True, 
        top_k=40, 
        top_p=0.75, 
        temperature=0.4, 
        pad_token_id=tokenizer.eos_token_id, 
    )

response = tokenizer.batch_decode(
    generate_ids, 
    skip_special_tokens=True, 
    clean_up_tokenization_spaces=True
)[0]

print(response)

Benchmarks

We evaluated ThaiLLM-30B against Qwen3-30B-Base using multiple-choice question datasets in both Thai and English.
Each benchmark measures the probability of selecting the correct choice based on the model’s next-token prediction.

Natural Language Understanding (NLU)

Task	Qwen3-30B-Base	ThaiLLM-30B (Qwen3-30B-A3B-cpt)	Δ
Belebele (TH)	0.8704	0.8849	+0.0145
XNLI (TH)	0.7507	0.7363	-0.0144
ThaiExam (Overall)	0.5947	0.6478	+0.0531
├── A-Level	0.5276	0.6457	+0.1181
├── IC	0.6737	0.7158	+0.0421
├── ONET	0.5864	0.6296	+0.0432
├── TGAT	0.7538	0.7692	+0.0154
├── TPAT-1	0.5259	0.5517	+0.0258
M3Exam (Overall)	0.5452	0.5660	+0.0208
MMLU (ENG, 5-shot)	0.9600	0.9500	-0.0100
MMLU-Thai	0.7004	0.7284	+0.0280
XCOPA-Thai	0.8940	0.8760	-0.0180
M6Exam (Overall)	0.5869	0.6196	+0.0327
├── English	0.8846	0.8462	-0.0384
├── Math	0.5294	0.5294	0.0000
├── Science	0.6071	0.6786	+0.0715
├── Social	0.7091	0.7636	+0.0545
└── Thai	0.4980	0.5388	+0.0408

Model	Average Score
Qwen3-30B-Base	0.7378
ThaiLLM-30B	0.7511

MMLU-ProX

Category	Qwen3-30B-Base	ThaiLLM-30B	Δ
Biology	0.7294	0.7462	+0.0168
Business	0.4411	0.4499	+0.0088
Chemistry	0.4064	0.4046	-0.0018
Computer Science	0.5122	0.5220	+0.0098
Economics	0.6434	0.6339	-0.0095
Engineering	0.4943	0.4881	-0.0062
Health	0.4891	0.5226	+0.0335
History	0.4514	0.4488	-0.0026
Law	0.2982	0.2982	0.0000
Math	0.4537	0.4597	+0.0060
Other	0.3918	0.4232	+0.0314
Philosophy	0.3768	0.3627	-0.0141
Physics	0.4450	0.4442	-0.0008
Psychology	0.5952	0.6078	+0.0126

Model	Overall
Qwen3-30B-Base	0.4739
ThaiLLM-30B	0.4797

Limitations

This is a base model and requires instruction fine-tuning for optimal performance
Performance on specialized domains may require domain-specific fine-tuning
As with all language models, outputs should be verified for accuracy in critical applications

Citation

@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

Dataset Contributors

Downloads last month: 724

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for ThaiLLM/ThaiLLM-30B

Quantizations

1 model

Paper for ThaiLLM/ThaiLLM-30B

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 343