ThaiLLM-8B-ToolUse

ThaiLLM-8B-ToolUse is a reinforcement learning fine-tuned version of typhoon-ai/typhoon-s-thaillm-8b-instruct-research-preview, trained specifically for routing a user's request to the correct medical tool call.

Training Details

The model was trained using Prime-Intellect's prime-rl framework

Training Configuration

This was the prime-rl configuration used to train the model

max_steps = 100
seq_len = 8192

[deployment]
type = "single_node"
num_train_gpus = 2
num_infer_gpus = 6

[inference.parallel]
dp = 6

[wandb]
project = "med-tool-use"

[trainer.model]
attn = "flash_attention_3"
optimization_dtype = "bfloat16"
reduce_dtype = "bfloat16"

[trainer.optim]
lr = 5e-5

[orchestrator]
batch_size = 512
rollouts_per_example = 16
num_train_workers = 2

[orchestrator.wandb.log_extras]
samples = true
interval = 1

[orchestrator.sampling]
max_tokens = 4096

[[orchestrator.env]]
id = "med_app_env"
name = "med_app_env"
args = { dataset_name = "datasets/med-app-env" }

Reward Functions

The environment was developed following the verifiers framework with the following reward function

async def correct_tool_reward(completion, answer) -> float:
    response = completion[-1]["content"]
    has_tag = "<tool_call>" in response
    tool_call = extract_tool_call(response)
    if answer == "negatives":
        if has_tag:
            return -1.0
        return 1.0 if 30 <= len(response) <= 3000 else 0.5
    if tool_call is None:
        return -0.5 if has_tag else 0.0
    return 1.0 if tool_call.get("name") == answer else -0.5 

Performance

Model Accuracy Trigger F1 Macro F1
typhoon-s-thaillm-8b-instruct-research-preview 0.675 0.475 0.394
Qwen3-30B-A3B-Thinking-2507 0.990 0.992 0.978
ThaiLLM-8B-ToolUse 0.999 1.000 0.993

Per Tool F1 Performance

Tool typhoon-s-thaillm-8b-instruct-research-preview Qwen3-30B-A3B-Thinking-2507 ThaiLLM-8B-ToolUse
create_appointment 0.071 0.987 0.995
create_reminder 0.360 0.988 1.000
get_health_emergency_contact 0.303 0.994 1.000
list_appointment 0.519 0.994 0.981
list_reminder 0.564 0.940 0.987
prescreen 0.051 0.934 0.981
search_medical_facts 0.517 0.990 0.999
no_tool 0.766 0.993 1.000
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThaiLLM/ThaiLLM-8B-ToolUse