ThaiLLM-8B-ToolUse

ThaiLLM-8B-ToolUse is a reinforcement learning fine-tuned version of typhoon-ai/typhoon-s-thaillm-8b-instruct-research-preview, trained specifically for routing a user's request to the correct medical tool call.

Training Details

The model was trained using Prime-Intellect's prime-rl framework

Training Configuration

This was the prime-rl configuration used to train the model

max_steps = 100
seq_len = 8192

[deployment]
type = "single_node"
num_train_gpus = 2
num_infer_gpus = 6

[inference.parallel]
dp = 6

[wandb]
project = "med-tool-use"

[trainer.model]
attn = "flash_attention_3"
optimization_dtype = "bfloat16"
reduce_dtype = "bfloat16"

[trainer.optim]
lr = 5e-5

[orchestrator]
batch_size = 512
rollouts_per_example = 16
num_train_workers = 2

[orchestrator.wandb.log_extras]
samples = true
interval = 1

[orchestrator.sampling]
max_tokens = 4096

[[orchestrator.env]]
id = "med_app_env"
name = "med_app_env"
args = { dataset_name = "datasets/med-app-env" }

Reward Functions

The environment was developed following the verifiers framework with the following reward function

async def correct_tool_reward(completion, answer) -> float:
    response = completion[-1]["content"]
    has_tag = "<tool_call>" in response
    tool_call = extract_tool_call(response)
    if answer == "negatives":
        if has_tag:
            return -1.0
        return 1.0 if 30 <= len(response) <= 3000 else 0.5
    if tool_call is None:
        return -0.5 if has_tag else 0.0
    return 1.0 if tool_call.get("name") == answer else -0.5

Performance

Model	Accuracy	Trigger F1	Macro F1
typhoon-s-thaillm-8b-instruct-research-preview	0.675	0.475	0.394
Qwen3-30B-A3B-Thinking-2507	0.990	0.992	0.978
ThaiLLM-8B-ToolUse	0.999	1.000	0.993

Per Tool F1 Performance

Tool	typhoon-s-thaillm-8b-instruct-research-preview	Qwen3-30B-A3B-Thinking-2507	ThaiLLM-8B-ToolUse
create_appointment	0.071	0.987	0.995
create_reminder	0.360	0.988	1.000
get_health_emergency_contact	0.303	0.994	1.000
list_appointment	0.519	0.994	0.981
list_reminder	0.564	0.940	0.987
prescreen	0.051	0.934	0.981
search_medical_facts	0.517	0.990	0.999
no_tool	0.766	0.993	1.000

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThaiLLM/ThaiLLM-8B-ToolUse

Base model

ThaiLLM/ThaiLLM-8B

Finetuned

typhoon-ai/typhoon-s-thaillm-8b-instruct-research-preview

Finetuned

(1)

this model