PrMed / README.md

Upload folder using huggingface_hub

d88abd1 verified about 1 month ago

8.06 kB

	---
	base_model: Qwen/Qwen3-32B
	library_name: peft
	pipeline_tag: text-generation
	license: apache-2.0
	language:
	- zh
	- en
	tags:
	- lora
	- transformers
	- medical
	- perturbation-robust
	- qwen3
	- chain-of-thought
	- grpo
	- reinforcement-learning
	---

	# PrMed — Perturbation-Resilient Medical Foundation Model

	Large language models (LLMs) have achieved strong performance on medical benchmarks, yet their reliability in real-world clinical settings remains insufficient. We identify a key source of this gap: a mismatch between real patient expressions — which often contain linguistic perturbations such as colloquial, vague, dialectal, and emotionally charged language — and the relatively clean and standardized corpora on which most existing LLMs are trained.

	We curated 569,913 real-world Chinese patient utterances from six clinical specialties and found that 95.1% contained at least one perturbation type, while 83.6% contained two or more, indicating that linguistic perturbations are pervasive in real medical communication. Perturbation-gradient experiments showed that, although several leading LLMs approached or even exceeded open-book physician performance under clean inputs, their performance declined sharply under mild-to-severe perturbations, whereas physicians remained substantially more stable.

	Error-pattern analysis revealed that linguistic perturbations not only impaired key-information extraction, but more importantly disrupted reasoning accuracy and induced reasoning drift, suggesting that the main limitation of current medical LLMs lies not in insufficient medical knowledge, but in fragile understanding and reasoning under non-standard patient language.

	To address this gap, we developed PrMed, a perturbation-resilient medical foundation model trained in two stages on 1.2 million multi-source medical samples, with stage 1 using perturbation-resilient chain-of-thought data for LoRA fine-tuning and stage 2 using GRPO-based reinforcement learning with a patient simulator to enhance multi-turn interactive reasoning. PrMed consistently showed stronger robustness than other LLMs, with an accuracy drop of only 2.71 percentage points from formal to heavy perturbation, while better preserving reasoning stability, safety, completeness, and actionable advice in long-form dialogues.

	## Model Training

	We developed a two-stage training framework to enable LLMs to perform perturbation-resilient complex medical reasoning through structured multi-step inference.

	### Stage 1: Perturbation-Resilient Reasoning CoT

	Training data construction. We curate high-quality training samples by searching for correct reasoning trajectories under a strict rubric-based verification system. The rubric comprises three layers: a CoT layer with five axes, a response layer with five axes, and a cross layer with three axes to quantify the coherence and alignment between the CoT and the final response. The reasoning procedure follows five ordered steps:

	1. Emotion perception — recognizing implicit emotional signals in perturbations to guide response tone and style
	2. Perturbation identification — determining whether perturbations are present, labeling them at corresponding spans, and interpreting intended meaning
	3. Utterance correction — reconstructing the patient message into a more clinically interpretable form
	4. Chief complaint extraction — filtering distractions to focus on the core clinical request
	5. Medical reasoning — conducting thorough and rigorous medical reasoning grounded in the extracted chief complaint

	After generation, an independent judge agent scores the output using the predefined rubric on a 5-point Likert scale. A sample is included in the final training corpus only if all axes receive scores > 4. This generate–evaluate–refine loop is repeated for up to three iterations.

	Fine-tuning procedure. We select Qwen3-32B as the base model and perform parameter-efficient fine-tuning using LoRA.

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| Qwen/Qwen3-32B \|
	\| PEFT method \| LoRA \|
	\| LoRA modules \| q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj \|
	\| Rank (r) \| 16 \|
	\| Alpha (α) \| 32 \|
	\| Dropout \| 0.05 \|
	\| Max context length \| 8192 tokens \|
	\| Precision \| bfloat16 (mixed precision) \|
	\| Batch size \| 1 per GPU, gradient accumulation 4 (effective batch size = 4) \|
	\| Optimizer \| AdamW, lr = 5×10⁻⁵, cosine schedule, 3% warmup \|
	\| Training \| Up to 5 epochs with early stopping on validation loss \|

	### Stage 2: Reinforcement Learning with GRPO

	We further refine the Stage 1 model using Group Relative Policy Optimization (GRPO). For each prompt, GRPO generates multiple candidate responses from the current policy, scores them using a reward function, and updates the policy based on the relative advantage within each group. Training proceeds in two complementary phases:

	- Single-turn phase: The model generates candidate responses to individual patient queries and is optimized based on rubric scores.
	- Multi-turn phase: A DeepSeek-V3-based patient simulator generates follow-up utterances, and the model's next-turn response is evaluated under the same rubric, yielding an adaptive closed loop of simulate–evaluate–optimize.

	## Quick Start

	### Install Dependencies

	```bash
	pip install torch transformers peft accelerate
	```

	### Download Base Model

	Via ModelScope (recommended for users in China):

	```python
	from modelscope import snapshot_download
	model_dir = snapshot_download("Qwen/Qwen3-32B", cache_dir="./")
	```

	Or via HuggingFace:

	```python
	from huggingface_hub import snapshot_download
	snapshot_download("Qwen/Qwen3-32B", local_dir="./Qwen3-32B")
	```

	### Load Model with PrMed

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	base_model_path = "./Qwen3-32B"
	PrMed_path = "./PrMed"

	tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)

	model = AutoModelForCausalLM.from_pretrained(
	base_model_path,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	)

	model = PeftModel.from_pretrained(model, PrMed_path)
	```

	### Inference

	```python
	## Chinese (primary)
	messages = [
	{"role": "system", "content": "你是一个抗语言扰动的医疗专家，通过多步骤思考过程，给出高质量的医学回复。"},
	{"role": "user", "content": "医生你好，我最近总是头疼，有时候还会恶心，这是怎么回事？"}
	]

	## English
	messages = [
	{"role": "system", "content": "You are a perturbation-resilient medical expert. Reason step by step and provide a high-quality medical response."},
	{"role": "user", "content": "Hi doctor, I've been having headaches a lot lately, sometimes with nausea. What could be going on?"}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer([text], return_tensors="pt").to(model.device)

	output = model.generate(
	inputs.input_ids,
	attention_mask=inputs.attention_mask,
	max_new_tokens=8192,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	)

	response = tokenizer.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
	print(response)
	```

	## Limitations

	- This model is a research prototype and should NOT be used for actual clinical decision-making.
	- Performance is optimized for Chinese medical text with linguistic perturbations.
	- Requires Qwen3-32B as the base model (~60 GB in bfloat16).

	## Authors

	Xinti Sun, Yuexuan Long, Qiyang Hong, Yinbo Xiao, Erping Long

	Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital

	Contact: sunxinti@tmu.edu.cn

	## Citation

	```bibtex
	@misc{prmed2026,
	title={PrMed: Perturbation-Resilient Medical Foundation Model},
	author={Xinti Sun and Yuexuan Long and Qiyang Hong and Yinbo Xiao and Erping Long},
	year={2026},
	url={https://huggingface.co/Xinti/PrMed}
	}
	```