ThaiLLM-8B-ToolUse
ThaiLLM-8B-ToolUse is a reinforcement learning fine-tuned version of typhoon-ai/typhoon-s-thaillm-8b-instruct-research-preview, trained specifically for routing a user's request to the correct medical tool call.
Training Details
The model was trained using Prime-Intellect's prime-rl framework
Training Configuration
This was the prime-rl configuration used to train the model
max_steps = 100
seq_len = 8192
[deployment]
type = "single_node"
num_train_gpus = 2
num_infer_gpus = 6
[inference.parallel]
dp = 6
[wandb]
project = "med-tool-use"
[trainer.model]
attn = "flash_attention_3"
optimization_dtype = "bfloat16"
reduce_dtype = "bfloat16"
[trainer.optim]
lr = 5e-5
[orchestrator]
batch_size = 512
rollouts_per_example = 16
num_train_workers = 2
[orchestrator.wandb.log_extras]
samples = true
interval = 1
[orchestrator.sampling]
max_tokens = 4096
[[orchestrator.env]]
id = "med_app_env"
name = "med_app_env"
args = { dataset_name = "datasets/med-app-env" }
Reward Functions
The environment was developed following the verifiers framework with the following reward function
async def correct_tool_reward(completion, answer) -> float:
response = completion[-1]["content"]
has_tag = "<tool_call>" in response
tool_call = extract_tool_call(response)
if answer == "negatives":
if has_tag:
return -1.0
return 1.0 if 30 <= len(response) <= 3000 else 0.5
if tool_call is None:
return -0.5 if has_tag else 0.0
return 1.0 if tool_call.get("name") == answer else -0.5
Performance
| Model | Accuracy | Trigger F1 | Macro F1 |
|---|---|---|---|
| typhoon-s-thaillm-8b-instruct-research-preview | 0.675 | 0.475 | 0.394 |
| Qwen3-30B-A3B-Thinking-2507 | 0.990 | 0.992 | 0.978 |
| ThaiLLM-8B-ToolUse | 0.999 | 1.000 | 0.993 |
Per Tool F1 Performance
| Tool | typhoon-s-thaillm-8b-instruct-research-preview | Qwen3-30B-A3B-Thinking-2507 | ThaiLLM-8B-ToolUse |
|---|---|---|---|
| create_appointment | 0.071 | 0.987 | 0.995 |
| create_reminder | 0.360 | 0.988 | 1.000 |
| get_health_emergency_contact | 0.303 | 0.994 | 1.000 |
| list_appointment | 0.519 | 0.994 | 0.981 |
| list_reminder | 0.564 | 0.940 | 0.987 |
| prescreen | 0.051 | 0.934 | 0.981 |
| search_medical_facts | 0.517 | 0.990 | 0.999 |
| no_tool | 0.766 | 0.993 | 1.000 |
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for ThaiLLM/ThaiLLM-8B-ToolUse
Base model
ThaiLLM/ThaiLLM-8B