Instructions to use devrf/qwen32b-thai-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use devrf/qwen32b-thai-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-32B")
model = PeftModel.from_pretrained(base_model, "devrf/qwen32b-thai-lora")

Transformers

How to use devrf/qwen32b-thai-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="devrf/qwen32b-thai-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("devrf/qwen32b-thai-lora")
model = AutoModelForCausalLM.from_pretrained("devrf/qwen32b-thai-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use devrf/qwen32b-thai-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "devrf/qwen32b-thai-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devrf/qwen32b-thai-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/devrf/qwen32b-thai-lora

SGLang

How to use devrf/qwen32b-thai-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "devrf/qwen32b-thai-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devrf/qwen32b-thai-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "devrf/qwen32b-thai-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "devrf/qwen32b-thai-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use devrf/qwen32b-thai-lora with Docker Model Runner:
```
docker model run hf.co/devrf/qwen32b-thai-lora
```

See axolotl config

axolotl version: 0.13.0.dev0

adapter: lora
base_model: Qwen/Qwen3-32B
bf16: true
flash_attention: true
gradient_checkpointing: true

datasets:
- path: /workspace/data/wangchan_fixed
  type: alpaca
  split: train

val_set_size: 0
sequence_len: 2048
train_on_inputs: false

micro_batch_size: 4
gradient_accumulation_steps: 8

optimizer: adamw_torch
learning_rate: 1.0e-4
lr_scheduler: cosine
warmup_ratio: 0.03
weight_decay: 0.01
max_grad_norm: 1.0
num_epochs: 2

lora_r: 32
lora_alpha: 64
lora_dropout: 0.05
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- down_proj
- up_proj

output_dir: ./outputs/qwen32b-thai
logging_steps: 10
save_steps: 300

Qwen3-32B Thai LoRA

This model is a fine-tuned version of Qwen/Qwen3-32B on the WangchanThaiInstruct dataset for improved Thai language instruction-following capabilities.

Model Description

This LoRA adapter enhances Qwen3-32B's ability to understand and respond to Thai language instructions across various domains including finance, general knowledge, creative writing, and classification tasks.

Base Model: Qwen/Qwen3-32B
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Language: Thai (th)
Training Loss: 0.85 → 0.55

Intended Uses & Limitations

Intended Uses

Thai language question answering
Thai instruction following
Thai content generation
Financial domain queries in Thai

Limitations

Performance may vary on domains not covered in the training data
Inherits limitations of the base Qwen3-32B model
Primarily optimized for Thai; multilingual performance may differ from base model

Training and Evaluation Data

Dataset

Name: WangchanThaiInstruct
Training Samples: ~29,000 (after filtering sequences > 2048 tokens)
Format: Alpaca-style (instruction, input, output)
Domains: Finance, General Knowledge, Creative Writing, Classification, Open QA, Closed QA

Training Procedure

Hardware

GPU: 1x NVIDIA H200 SXM (141GB VRAM)
Training Time: ~10 hours

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 43
training_steps: 1444

Training Results

Step	Loss
10	0.85
20	0.78
1068	0.55
1444 (final)	~0.50

Framework versions

PEFT 0.17.1
Transformers 4.57.3
Pytorch 2.7.1+cu126
Datasets 4.3.0
Tokenizers 0.22.1

Citation

If you use this model, please cite the original dataset and base model:

@misc{wangchanthaiinstruct,
  title={WangchanThaiInstruct},
  author={AIResearch.in.th},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/airesearch/WangchanThaiInstruct}
}

@misc{qwen3,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv}
}

Downloads last month: 4

Model tree for devrf/qwen32b-thai-lora

Base model

Qwen/Qwen3-32B

Adapter

(358)

this model

Paper for devrf/qwen32b-thai-lora

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 343