ThaiLLM-27B-Prescreen

ThaiLLM-27B-Prescreen is a reinforcement learning fine-tuned version of google/medgemma-27b-text-it, trained specifically for patient pre-screening. Given a patient profile and current symptoms, the model predicts the likely disease, recommends the appropriate hospital department, and estimates clinical severity.

Training Details

The model was trained using Prime-Intellect's prime-rl framework

Data

The model was trained using https://huggingface.co/datasets/ThaiLLM/med-prescreen dataset with Prime Intellect's verifier framework.

Training Configuration

This was the prime-rl configuration used to train the model

max_steps = 500
seq_len = 16384

[deployment]
type = "single_node"
num_train_gpus = 2
num_infer_gpus = 6

[inference.parallel]
dp = 6

[trainer.model]
attn = "flash_attention_3"
optimization_dtype = "bfloat16"
reduce_dtype = "bfloat16"

[trainer.model.lora]
rank = 64
alpha = 128

[trainer.model.ac]

[trainer.optim]
lr = 5e-5

[orchestrator]
batch_size = 512
rollouts_per_example = 16
num_train_workers = 2

[orchestrator.wandb.log_extras]
samples = true
interval = 1

[orchestrator.sampling]
max_tokens = 8192

[[orchestrator.env]]
id = "prescreen_classification"
name = "prescreen_classification"

[ckpt]
interval = 50
keep_interval = 50

Reward Functions

The environment was developed following the verifiers framework with the following reward functions with the following weights for each reward [2.0, 1.0, 1.0, 0.3]

async def disease_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "disease")
    if predicted is None:
        return 0.0
    predicted = predicted.lower()
    true_disease = answer.get("disease", "").lower()
    if predicted == true_disease:
        return 1.0
    return 0.0

async def department_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "department").lower()
    if predicted is None:
        return 0.0
    answer = answer.get("department", "").lower()
    return 1.0 if predicted == answer else 0.0

async def severity_reward(completion, answer):
    response = completion[-1]["content"]
    predicted = _extract_tag(response, "severity").lower()
    if predicted is None:
        return 0.0
    answer = answer.get("severity", "").lower()
    return 1.0 if predicted == answer else 0.0

async def format_reward(completion, answer) -> float:
    response = completion[-1]["content"]
    text_without_think = re.sub(r"<unused94>.*?</unused94>", "", response, flags=re.DOTALL | re.IGNORECASE) # medgemma uses the the <unused94> token instead of <think>
    tags = ["disease", "department", "severity"]
    present = sum(1 for t in tags if f"<{t}>" in text_without_think.lower() and f"</{t}>" in text_without_think.lower())
    return present / len(tags)

Performance

We benchmark against four baselines spanning general-purpose reasoning models (Qwen3-30B-A3B-Thinking-2507, Qwen3-8B) and medical-domain models (medgemma-27b-text-it, medgemma1.5-4b-it). ThaiLLM-27B-Prescreen improves disease F1 by +0.448 over its base model (0.287 → 0.735) and outperforms Qwen3-30B-A3B-Thinking-2507 at 0.515. Department routing also improves meaningfully (+0.048 F1 over the base, +0.077 over Qwen3-30B-A3B-Thinking-2507), with the largest gain appearing in accuracy (0.436 → 0.677), suggesting the model is substantially better at picking the single correct department rather than hedging across plausible ones. There is however a severity trade-off, severity F1 is slightly below the base MedGemma-27B (0.571 vs 0.601) and noticeably below Qwen3-30B-A3B-Thinking (0.659). However, ThaiLLM-27B-Prescreen achieves the highest severity accuracy of any model tested (0.799), and the per-class breakdown below shows why the two metrics diverge: the model is strong on the two clinically consequential classes (Emergency and Visit Hospital / Clinic) and fails entirely on Observe at Home.

Overall Performance (F1)

Model	Disease	Department	Severity
Qwen3-30B-A3B-Thinking-2507	0.515	0.464	0.659
Qwen3-8B	0.157	0.449	0.574
medgemma1.5-4b-it	0.095	0.424	0.525
medgemma-27b-text-it	0.287	0.493	0.601
ThaiLLM-27B-Prescreen	0.735	0.541	0.571

Disease Classification

Model	F1	Precision	Recall	Accuracy
Qwen3-30B-A3B-Thinking-2507	0.515	0.562	0.509	0.510
Qwen3-8B	0.157	0.215	0.148	0.149
medgemma1.5-4b-it	0.095	0.131	0.082	0.076
medgemma-27b-text-it	0.287	0.336	0.266	0.286
ThaiLLM-27B-Prescreen	0.735	0.776	0.730	0.729

Department Classification

Model	F1	Precision	Recall	Accuracy
Qwen3-30B-A3B-Thinking-2507	0.464	0.466	0.677	0.420
Qwen3-8B	0.449	0.419	0.648	0.358
medgemma1.5-4b-it	0.424	0.394	0.541	0.358
medgemma-27b-text-it	0.493	0.469	0.678	0.436
ThaiLLM-27B-Prescreen	0.541	0.606	0.518	0.677

Severity Classification

Model	F1	Precision	Recall	Accuracy
Qwen3-30B-A3B-Thinking-2507	0.659	0.722	0.639	0.774
Qwen3-8B	0.574	0.858	0.601	0.771
medgemma1.5-4b-it	0.525	0.616	0.529	0.715
medgemma-27b-text-it	0.601	0.835	0.609	0.755
ThaiLLM-27B-Prescreen	0.571	0.548	0.599	0.799

Class	Precision	Recall	F1	Support
Emergency	0.878	0.857	0.867	84
Observe At Home	0.000	0.000	0.000	36
Visit Hospital / Clinic	0.767	0.940	0.845	168

The model never predicts Observe at Home — those 36 cases are being absorbed into Visit Hospital / Clinic instead. The collapse of the Observe at Home class is a real limitation of the system and should be taken into account when deploying the model.

Usage

The model expects a specific system prompt (specified in system_prompt.py) where the list of possible diseases and department can be retrieved from https://github.com/vistec-AI/thaillm-prescreen-rulesets/blob/main/v1/const/diseases.yaml and https://github.com/vistec-AI/thaillm-prescreen-rulesets/blob/main/v1/const/departments.yaml respectively.

vLLM

uv run --with vllm vllm serve google/medgemma-27b-text-it \
     --enable-lora \
     --lora-modules prescreen=ThaiLLM/ThaiLLM-27B-Prescreen \
     --max-lora-rank 64

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ThaiLLM/ThaiLLM-27B-Prescreen

Base model

google/gemma-3-27b-pt

Finetuned

google/medgemma-27b-text-it

Finetuned

(20)

this model

Quantizations

1 model