Text Generation
PEFT
Safetensors
Transformers
Chinese
English
lora
medical
perturbation-robust
qwen3
chain-of-thought
grpo
reinforcement-learning
Instructions to use Xinti/PrMed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Xinti/PrMed with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("./Qwen/Qwen3-32B") model = PeftModel.from_pretrained(base_model, "Xinti/PrMed") - Transformers
How to use Xinti/PrMed with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Xinti/PrMed")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Xinti/PrMed", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Xinti/PrMed with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Xinti/PrMed" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xinti/PrMed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Xinti/PrMed
- SGLang
How to use Xinti/PrMed with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Xinti/PrMed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xinti/PrMed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Xinti/PrMed" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Xinti/PrMed", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Xinti/PrMed with Docker Model Runner:
docker model run hf.co/Xinti/PrMed
| base_model: Qwen/Qwen3-32B | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| license: apache-2.0 | |
| language: | |
| - zh | |
| - en | |
| tags: | |
| - lora | |
| - transformers | |
| - medical | |
| - perturbation-robust | |
| - qwen3 | |
| - chain-of-thought | |
| - grpo | |
| - reinforcement-learning | |
| # PrMed — Perturbation-Resilient Medical Foundation Model | |
| Large language models (LLMs) have achieved strong performance on medical benchmarks, yet their reliability in real-world clinical settings remains insufficient. We identify a key source of this gap: a mismatch between real patient expressions — which often contain linguistic perturbations such as colloquial, vague, dialectal, and emotionally charged language — and the relatively clean and standardized corpora on which most existing LLMs are trained. | |
| We curated **569,913 real-world Chinese patient utterances** from six clinical specialties and found that **95.1%** contained at least one perturbation type, while **83.6%** contained two or more, indicating that linguistic perturbations are pervasive in real medical communication. Perturbation-gradient experiments showed that, although several leading LLMs approached or even exceeded open-book physician performance under clean inputs, their performance **declined sharply** under mild-to-severe perturbations, whereas physicians remained substantially more stable. | |
| Error-pattern analysis revealed that linguistic perturbations not only impaired key-information extraction, but more importantly **disrupted reasoning accuracy and induced reasoning drift**, suggesting that the main limitation of current medical LLMs lies not in insufficient medical knowledge, but in fragile understanding and reasoning under non-standard patient language. | |
| To address this gap, we developed **PrMed**, a perturbation-resilient medical foundation model trained in two stages on **1.2 million multi-source medical samples**, with stage 1 using perturbation-resilient chain-of-thought data for LoRA fine-tuning and stage 2 using GRPO-based reinforcement learning with a patient simulator to enhance multi-turn interactive reasoning. PrMed consistently showed stronger robustness than other LLMs, with an accuracy drop of only **2.71 percentage points** from formal to heavy perturbation, while better preserving reasoning stability, safety, completeness, and actionable advice in long-form dialogues. | |
| ## Model Training | |
| We developed a two-stage training framework to enable LLMs to perform perturbation-resilient complex medical reasoning through structured multi-step inference. | |
| ### Stage 1: Perturbation-Resilient Reasoning CoT | |
| **Training data construction.** We curate high-quality training samples by searching for correct reasoning trajectories under a strict rubric-based verification system. The rubric comprises three layers: a CoT layer with five axes, a response layer with five axes, and a cross layer with three axes to quantify the coherence and alignment between the CoT and the final response. The reasoning procedure follows five ordered steps: | |
| 1. **Emotion perception** — recognizing implicit emotional signals in perturbations to guide response tone and style | |
| 2. **Perturbation identification** — determining whether perturbations are present, labeling them at corresponding spans, and interpreting intended meaning | |
| 3. **Utterance correction** — reconstructing the patient message into a more clinically interpretable form | |
| 4. **Chief complaint extraction** — filtering distractions to focus on the core clinical request | |
| 5. **Medical reasoning** — conducting thorough and rigorous medical reasoning grounded in the extracted chief complaint | |
| After generation, an independent judge agent scores the output using the predefined rubric on a 5-point Likert scale. A sample is included in the final training corpus only if **all axes receive scores > 4**. This generate–evaluate–refine loop is repeated for up to three iterations. | |
| **Fine-tuning procedure.** We select Qwen3-32B as the base model and perform parameter-efficient fine-tuning using LoRA. | |
| | Parameter | Value | | |
| |---|---| | |
| | Base model | Qwen/Qwen3-32B | | |
| | PEFT method | LoRA | | |
| | LoRA modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | |
| | Rank (r) | 16 | | |
| | Alpha (α) | 32 | | |
| | Dropout | 0.05 | | |
| | Max context length | 8192 tokens | | |
| | Precision | bfloat16 (mixed precision) | | |
| | Batch size | 1 per GPU, gradient accumulation 4 (effective batch size = 4) | | |
| | Optimizer | AdamW, lr = 5×10⁻⁵, cosine schedule, 3% warmup | | |
| | Training | Up to 5 epochs with early stopping on validation loss | | |
| ### Stage 2: Reinforcement Learning with GRPO | |
| We further refine the Stage 1 model using Group Relative Policy Optimization (GRPO). For each prompt, GRPO generates multiple candidate responses from the current policy, scores them using a reward function, and updates the policy based on the relative advantage within each group. Training proceeds in two complementary phases: | |
| - **Single-turn phase**: The model generates candidate responses to individual patient queries and is optimized based on rubric scores. | |
| - **Multi-turn phase**: A DeepSeek-V3-based patient simulator generates follow-up utterances, and the model's next-turn response is evaluated under the same rubric, yielding an adaptive closed loop of simulate–evaluate–optimize. | |
| ## Quick Start | |
| ### Install Dependencies | |
| ```bash | |
| pip install torch transformers peft accelerate | |
| ``` | |
| ### Download Base Model | |
| Via ModelScope (recommended for users in China): | |
| ```python | |
| from modelscope import snapshot_download | |
| model_dir = snapshot_download("Qwen/Qwen3-32B", cache_dir="./") | |
| ``` | |
| Or via HuggingFace: | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| snapshot_download("Qwen/Qwen3-32B", local_dir="./Qwen3-32B") | |
| ``` | |
| ### Load Model with PrMed | |
| ```python | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| base_model_path = "./Qwen3-32B" | |
| PrMed_path = "./PrMed" | |
| tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| base_model_path, | |
| device_map="auto", | |
| torch_dtype=torch.bfloat16, | |
| trust_remote_code=True, | |
| ) | |
| model = PeftModel.from_pretrained(model, PrMed_path) | |
| ``` | |
| ### Inference | |
| ```python | |
| ## Chinese (primary) | |
| messages = [ | |
| {"role": "system", "content": "你是一个抗语言扰动的医疗专家,通过多步骤思考过程,给出高质量的医学回复。"}, | |
| {"role": "user", "content": "医生你好,我最近总是头疼,有时候还会恶心,这是怎么回事?"} | |
| ] | |
| ## English | |
| messages = [ | |
| {"role": "system", "content": "You are a perturbation-resilient medical expert. Reason step by step and provide a high-quality medical response."}, | |
| {"role": "user", "content": "Hi doctor, I've been having headaches a lot lately, sometimes with nausea. What could be going on?"} | |
| ] | |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) | |
| output = model.generate( | |
| inputs.input_ids, | |
| attention_mask=inputs.attention_mask, | |
| max_new_tokens=8192, | |
| do_sample=True, | |
| temperature=0.7, | |
| top_p=0.9, | |
| ) | |
| response = tokenizer.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ## Limitations | |
| - This model is a **research prototype** and should **NOT** be used for actual clinical decision-making. | |
| - Performance is optimized for Chinese medical text with linguistic perturbations. | |
| - Requires Qwen3-32B as the base model (~60 GB in bfloat16). | |
| ## Authors | |
| **Xinti Sun, Yuexuan Long, Qiyang Hong, Yinbo Xiao, Erping Long** | |
| Chinese Academy of Medical Sciences and Peking Union Medical College, Peking Union Medical College Hospital | |
| Contact: sunxinti@tmu.edu.cn | |
| ## Citation | |
| ```bibtex | |
| @misc{prmed2026, | |
| title={PrMed: Perturbation-Resilient Medical Foundation Model}, | |
| author={Xinti Sun and Yuexuan Long and Qiyang Hong and Yinbo Xiao and Erping Long}, | |
| year={2026}, | |
| url={https://huggingface.co/Xinti/PrMed} | |
| } | |
| ``` | |