PrMed / README.md
Xinti's picture
Upload folder using huggingface_hub
d88abd1 verified
---
base_model: Qwen/Qwen3-32B
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
language:
- zh
- en
tags:
- lora
- transformers
- medical
- perturbation-robust
- qwen3
- chain-of-thought
- grpo
- reinforcement-learning
---
# PrMed — Perturbation-Resilient Medical Foundation Model
Large language models (LLMs) have achieved strong performance on medical benchmarks, yet their reliability in real-world clinical settings remains insufficient. We identify a key source of this gap: a mismatch between real patient expressions — which often contain linguistic perturbations such as colloquial, vague, dialectal, and emotionally charged language — and the relatively clean and standardized corpora on which most existing LLMs are trained.
We curated **569,913 real-world Chinese patient utterances** from six clinical specialties and found that **95.1%** contained at least one perturbation type, while **83.6%** contained two or more, indicating that linguistic perturbations are pervasive in real medical communication. Perturbation-gradient experiments showed that, although several leading LLMs approached or even exceeded open-book physician performance under clean inputs, their performance **declined sharply** under mild-to-severe perturbations, whereas physicians remained substantially more stable.
Error-pattern analysis revealed that linguistic perturbations not only impaired key-information extraction, but more importantly **disrupted reasoning accuracy and induced reasoning drift**, suggesting that the main limitation of current medical LLMs lies not in insufficient medical knowledge, but in fragile understanding and reasoning under non-standard patient language.
To address this gap, we developed **PrMed**, a perturbation-resilient medical foundation model trained in two stages on **1.2 million multi-source medical samples**, with stage 1 using perturbation-resilient chain-of-thought data for LoRA fine-tuning and stage 2 using GRPO-based reinforcement learning with a patient simulator to enhance multi-turn interactive reasoning. PrMed consistently showed stronger robustness than other LLMs, with an accuracy drop of only **2.71 percentage points** from formal to heavy perturbation, while better preserving reasoning stability, safety, completeness, and actionable advice in long-form dialogues.
## Model Training
We developed a two-stage training framework to enable LLMs to perform perturbation-resilient complex medical reasoning through structured multi-step inference.
### Stage 1: Perturbation-Resilient Reasoning CoT
**Training data construction.** We curate high-quality training samples by searching for correct reasoning trajectories under a strict rubric-based verification system. The rubric comprises three layers: a CoT layer with five axes, a response layer with five axes, and a cross layer with three axes to quantify the coherence and alignment between the CoT and the final response. The reasoning procedure follows five ordered steps:
1. **Emotion perception** — recognizing implicit emotional signals in perturbations to guide response tone and style
2. **Perturbation identification** — determining whether perturbations are present, labeling them at corresponding spans, and interpreting intended meaning
3. **Utterance correction** — reconstructing the patient message into a more clinically interpretable form
4. **Chief complaint extraction** — filtering distractions to focus on the core clinical request
5. **Medical reasoning** — conducting thorough and rigorous medical reasoning grounded in the extracted chief complaint
After generation, an independent judge agent scores the output using the predefined rubric on a 5-point Likert scale. A sample is included in the final training corpus only if **all axes receive scores > 4**. This generate–evaluate–refine loop is repeated for up to three iterations.
**Fine-tuning procedure.** We select Qwen3-32B as the base model and perform parameter-efficient fine-tuning using LoRA.
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-32B |
| PEFT method | LoRA |
| LoRA modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Rank (r) | 16 |
| Alpha (α) | 32 |
| Dropout | 0.05 |
| Max context length | 8192 tokens |
| Precision | bfloat16 (mixed precision) |
| Batch size | 1 per GPU, gradient accumulation 4 (effective batch size = 4) |
| Optimizer | AdamW, lr = 5×10⁻⁵, cosine schedule, 3% warmup |
| Training | Up to 5 epochs with early stopping on validation loss |
### Stage 2: Reinforcement Learning with GRPO
We further refine the Stage 1 model using Group Relative Policy Optimization (GRPO). For each prompt, GRPO generates multiple candidate responses from the current policy, scores them using a reward function, and updates the policy based on the relative advantage within each group. Training proceeds in two complementary phases:
- **Single-turn phase**: The model generates candidate responses to individual patient queries and is optimized based on rubric scores.
- **Multi-turn phase**: A DeepSeek-V3-based patient simulator generates follow-up utterances, and the model's next-turn response is evaluated under the same rubric, yielding an adaptive closed loop of simulate–evaluate–optimize.
## Quick Start
### Install Dependencies
```bash
pip install torch transformers peft accelerate
```
### Download Base Model
Via ModelScope (recommended for users in China):
```python
from modelscope import snapshot_download
model_dir = snapshot_download("Qwen/Qwen3-32B", cache_dir="./")
```
Or via HuggingFace:
```python
from huggingface_hub import snapshot_download
snapshot_download("Qwen/Qwen3-32B", local_dir="./Qwen3-32B")
```
### Load Model with PrMed
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_path = "./Qwen3-32B"
PrMed_path = "./PrMed"
tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
base_model_path,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, PrMed_path)
```
### Inference
```python
## Chinese (primary)
messages = [
{"role": "system", "content": "你是一个抗语言扰动的医疗专家,通过多步骤思考过程,给出高质量的医学回复。"},
{"role": "user", "content": "医生你好,我最近总是头疼,有时候还会恶心,这是怎么回事?"}
]
## English
messages = [
{"role": "system", "content": "You are a perturbation-resilient medical expert. Reason step by step and provide a high-quality medical response."},
{"role": "user", "content": "Hi doctor, I've been having headaches a lot lately, sometimes with nausea. What could be going on?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
output = model.generate(
inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=8192,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```
## Limitations
- This model is a **research prototype** and should **NOT** be used for actual clinical decision-making.
- Performance is optimized for Chinese medical text with linguistic perturbations.
- Requires Qwen3-32B as the base model (~60 GB in bfloat16).
## Authors
**Xinti Sun, Yuexuan Long, Qiyang Hong, Yinbo Xiao, Erping Long**
Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital
Contact: sunxinti@tmu.edu.cn
## Citation
```bibtex
@misc{prmed2026,
title={PrMed: Perturbation-Resilient Medical Foundation Model},
author={Xinti Sun and Yuexuan Long and Qiyang Hong and Yinbo Xiao and Erping Long},
year={2026},
url={https://huggingface.co/Xinti/PrMed}
}
```