ThaiLLM-27B-Prescreen

ThaiLLM-27B-Prescreen is a reinforcement learning fine-tuned version of google/medgemma-27b-text-it, trained specifically for patient pre-screening. Given a patient profile and current symptoms, the model predicts the likely disease, recommends the appropriate hospital department, and estimates clinical severity.

Training Details

The model was trained using Prime-Intellect's prime-rl framework

Data

The model was trained using https://huggingface.co/datasets/ThaiLLM/med-prescreen dataset with Prime Intellect's verifier framework.

Training Configuration

This was the prime-rl configuration used to train the model

max_steps = 500
seq_len = 16384

[deployment]
type = "single_node"
num_train_gpus = 2
num_infer_gpus = 6

[inference.parallel]
dp = 6

[trainer.model]
attn = "flash_attention_3"
optimization_dtype = "bfloat16"
reduce_dtype = "bfloat16"

[trainer.model.lora]
rank = 64
alpha = 128

[trainer.model.ac]

[trainer.optim]
lr = 5e-5

[orchestrator]
batch_size = 512
rollouts_per_example = 16
num_train_workers = 2

[orchestrator.wandb.log_extras]
samples = true
interval = 1

[orchestrator.sampling]
max_tokens = 8192

[[orchestrator.env]]
id = "prescreen_classification"
name = "prescreen_classification"

[ckpt]
interval = 50
keep_interval = 50

Reward Functions

The environment was developed following the verifiers framework with the following reward functions with the following weights for each reward [2.0, 1.0, 1.0, 0.3]

async def disease_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "disease")
    if predicted is None:
        return 0.0
    predicted = predicted.lower()
    true_disease = answer.get("disease", "").lower()
    if predicted == true_disease:
        return 1.0
    return 0.0

async def department_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "department").lower()
    if predicted is None:
        return 0.0
    answer = answer.get("department", "").lower()
    return 1.0 if predicted == answer else 0.0

async def severity_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "severity").lower()
    if predicted is None:
        return 0.0
    answer = answer.get("severity", "").lower()
    return 1.0 if predicted == answer else 0.0

async def format_reward(completion, answer) -> float:
    response = completion[-1]["content"]
    text_without_think = re.sub(r"<unused94>.*?</unused94>", "", response, flags=re.DOTALL | re.IGNORECASE) # medgemma uses the the <unused94> token instead of <think>
    tags = ["disease", "department", "severity"]
    present = sum(1 for t in tags if f"<{t}>" in text_without_think.lower() and f"</{t}>" in text_without_think.lower())
    return present / len(tags)

Performance

We benchmark against four baselines spanning general-purpose reasoning models (Qwen3-30B-A3B-Thinking-2507, Qwen3-8B) and medical-domain models (medgemma-27b-text-it, medgemma1.5-4b-it). ThaiLLM-27B-Prescreen improves disease F1 by +0.448 over its base model (0.287 โ†’ 0.735) and outperforms Qwen3-30B-A3B-Thinking-2507 at 0.515. Department routing also improves meaningfully (+0.048 F1 over the base, +0.077 over Qwen3-30B-A3B-Thinking-2507), with the largest gain appearing in accuracy (0.436 โ†’ 0.677), suggesting the model is substantially better at picking the single correct department rather than hedging across plausible ones. There is however a severity trade-off, severity F1 is slightly below the base MedGemma-27B (0.571 vs 0.601) and noticeably below Qwen3-30B-A3B-Thinking (0.659). However, ThaiLLM-27B-Prescreen achieves the highest severity accuracy of any model tested (0.799), and the per-class breakdown below shows why the two metrics diverge: the model is strong on the two clinically consequential classes (Emergency and Visit Hospital / Clinic) and fails entirely on Observe at Home.

Overall Performance (F1)

Model Disease Department Severity
Qwen3-30B-A3B-Thinking-2507 0.515 0.464 0.659
Qwen3-8B 0.157 0.449 0.574
medgemma1.5-4b-it 0.095 0.424 0.525
medgemma-27b-text-it 0.287 0.493 0.601
ThaiLLM-27B-Prescreen 0.735 0.541 0.571

Disease Classification

Model F1 Precision Recall Accuracy
Qwen3-30B-A3B-Thinking-2507 0.515 0.562 0.509 0.510
Qwen3-8B 0.157 0.215 0.148 0.149
medgemma1.5-4b-it 0.095 0.131 0.082 0.076
medgemma-27b-text-it 0.287 0.336 0.266 0.286
ThaiLLM-27B-Prescreen 0.735 0.776 0.730 0.729

Department Classification

Model F1 Precision Recall Accuracy
Qwen3-30B-A3B-Thinking-2507 0.464 0.466 0.677 0.420
Qwen3-8B 0.449 0.419 0.648 0.358
medgemma1.5-4b-it 0.424 0.394 0.541 0.358
medgemma-27b-text-it 0.493 0.469 0.678 0.436
ThaiLLM-27B-Prescreen 0.541 0.606 0.518 0.677

Severity Classification

Model F1 Precision Recall Accuracy
Qwen3-30B-A3B-Thinking-2507 0.659 0.722 0.639 0.774
Qwen3-8B 0.574 0.858 0.601 0.771
medgemma1.5-4b-it 0.525 0.616 0.529 0.715
medgemma-27b-text-it 0.601 0.835 0.609 0.755
ThaiLLM-27B-Prescreen 0.571 0.548 0.599 0.799
Class Precision Recall F1 Support
Emergency 0.878 0.857 0.867 84
Observe At Home 0.000 0.000 0.000 36
Visit Hospital / Clinic 0.767 0.940 0.845 168

The model never predicts Observe at Home โ€” those 36 cases are being absorbed into Visit Hospital / Clinic instead. The collapse of the Observe at Home class is a real limitation of the system and should be taken into account when deploying the model.

Usage

The model expects a specific system prompt (specified in system_prompt.py) where the list of possible diseases and department can be retrieved from https://github.com/vistec-AI/thaillm-prescreen-rulesets/blob/main/v1/const/diseases.yaml and https://github.com/vistec-AI/thaillm-prescreen-rulesets/blob/main/v1/const/departments.yaml respectively.

vLLM

uv run --with vllm vllm serve google/medgemma-27b-text-it \
     --enable-lora \
     --lora-modules prescreen=ThaiLLM/ThaiLLM-27B-Prescreen \
     --max-lora-rank 64
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ThaiLLM/ThaiLLM-27B-Prescreen

Finetuned
(20)
this model
Quantizations
1 model