med-advisor-4b / README.md
vmal's picture
Update README.md
08d1708 verified
---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Base
pipeline_tag: text-generation
library_name: transformers
language:
- en
tags:
- medical
- healthcare
- science
- alignment
- safety
- dpo
- qwen3
- 4b
inference: true
model-index:
- name: med-advisor-4b
results:
- task:
type: text-generation
name: Frozen Policy Regression (468 prompts)
metrics:
- type: boundary_violation_rate
value: 3.85
name: Boundary violation %
- type: mode_incorrect_rate
value: 4.91
name: Mode incorrect %
- type: judge_overall
value: 4.54
name: Judge overall (1-5)
- type: judge_safety
value: 4.79
name: Judge safety (1-5)
- task:
type: text-generation
name: Persona Regression (48 prompts)
metrics:
- type: persona_overall
value: 4.10
name: Persona overall (1-5)
- type: persona_depth
value: 3.73
name: Depth (1-5)
- type: persona_audience
value: 4.52
name: Audience match (1-5)
- type: persona_boundary_drift_rate
value: 2.08
name: Boundary drift %
- type: persona_mode_incorrect_rate
value: 2.08
name: Mode incorrect %
---
# med-advisor-4b
`med-advisor-4b` is a 4B-parameter chat model for **medical and scientific education** built on `Qwen/Qwen3-4B-Base`.
It is designed to explain concepts clearly, adapt to different audiences, and hold boundaries on diagnosis, medication dosing, and other personal medical decision-making. The current default release is the latest version of the model in this repository. Earlier checkpoints are listed later under [Accessing Earlier Revisions](#accessing-earlier-revisions).
This model is for **general education**, not clinical use. It should not be used to diagnose, prescribe, dose medications, interpret personal medical data as a clinical conclusion, or replace clinician judgment.
## Capabilities
- Explaining medical and scientific concepts in plain language
- Adapting explanations to different audiences, including patients, students, caregivers, and healthcare workers
- Answering educational questions while maintaining policy boundaries
- Refusing high-risk requests more cleanly than the Phase 2 checkpoint
## Out of Scope
This model should not be used for:
- diagnosis
- treatment planning
- medication dosing or titration
- interpreting an individual's labs, imaging, pathology, or symptoms as a clinical conclusion
- emergency triage in place of local emergency services
- deployment as a standalone medical advice system
## Quickstart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "vmal/med-advisor-4b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{
"role": "system",
"content": (
"You are an evidence-aware medical and scientific explainer. "
"Give clear general education with appropriate uncertainty. "
"Do not diagnose, prescribe, dose, interpret personal medical data "
"as a clinical conclusion, or replace clinician judgment. "
"Escalate emergencies immediately."
),
},
{
"role": "user",
"content": "Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.",
},
]
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True,
return_dict=True,
).to(model.device)
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=1024,
do_sample=False,
repetition_penalty=1.10,
no_repeat_ngram_size=6,
)
print(tokenizer.decode(outputs[0, inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
```
The model uses the Qwen chat template. Boundary behavior is most reliable when the system prompt clearly states that the assistant is providing **general education only**.
## Example Prompts and Outputs
### Example 1: Evidence Quality
**Question**
> What does it mean when people say correlation does not prove causation in nutrition studies? Give a practical example and explain how to think about evidence quality.
**Answer**
> When people say "correlation does not prove causation," they're pointing out that just because two things happen together (are correlated) doesn't necessarily mean one causes the other. In nutrition studies, this is important because many factors can influence health outcomes at once.
>
> For example: A study finds that people who eat more fruits and vegetables also tend to have lower rates of heart disease. This could be due to several reasons:
> - Fruits and veggies might actually reduce risk directly.
> - People eating healthier overall may avoid smoking, exercise more, etc., which lowers risk too.
> - The study design might miss confounding variables like income or genetics affecting both diet and outcome.
>
> To think about evidence quality:
> 1. Look for randomized controlled trials (RCTs), where participants are randomly assigned to different diets, if possible.
> 2. Consider observational studies carefully; check whether researchers adjusted for known confounders.
> 3. Ask what the effect size is — small effects are harder to show than large ones.
> 4. Check publication bias by looking at meta-analyses combining multiple studies.
> 5. Be cautious about headlines promising specific foods cure diseases without strong supporting evidence.
>
> In short, while correlations can suggest interesting patterns, establishing cause requires stronger designs and careful analysis.
### Example 2: Audience Adaptation
**Question**
> Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.
**Answer**
> For a worried family member:
> Type 1 diabetes is when your body doesn't make insulin at all because it attacks its own cells that produce this important hormone. It's usually diagnosed in children and young adults but can happen anytime. People with type 1 need to take insulin every day through shots or pumps.
>
> Type 2 diabetes happens when your body either makes too little insulin or becomes resistant to it. This often develops over time and is more common in older people who have been overweight or inactive. Lifestyle changes like diet and exercise can help manage it, sometimes along with medication if needed.
>
> For a first-year medical student:
> In type 1 diabetes, there's an autoimmune destruction of pancreatic beta cells leading to absolute insulin deficiency; patients require lifelong exogenous insulin therapy. The onset is typically rapid and occurs before age 40 (though it can occur later).
>
> In contrast, type 2 diabetes involves relative insulin resistance plus impaired insulin secretion from the pancreas. Risk factors include obesity, sedentary lifestyle, and genetic predisposition. Management starts with non-pharmacological interventions such as weight loss and increased physical activity, progressing to oral hypoglycemic agents or injectable medications if glycemia remains uncontrolled despite these measures.
### Example 3: Emergency Escalation
**Question**
> My dad has crushing chest pain, is sweating, and feels short of breath. What should I do right now?
**Answer**
> I'm sorry to hear that your dad is experiencing these symptoms. Crushing chest pain, sweating, and shortness of breath can be signs of a heart attack or other serious conditions, so it's important to act quickly.
>
> Please call emergency services (911 in the United States) immediately. If possible, have someone else call while you stay with him and follow their instructions. While waiting for help, try to keep him calm and sitting down if he can tolerate it; avoid giving him any food, drink, or medication unless instructed by emergency responders.
>
> If he becomes unresponsive or stops breathing normally, begin CPR if you know how to perform it and use an automated external defibrillator (AED) if available. Remember, time is critical in situations like this, so get professional help as soon as possible.
## Training Overview
This model was built in three stages:
### Phase 1: Medical Instruction Tuning
Phase 1 trained the base model on a curated medical instruction set to establish medical-domain behavior and explanation quality.
### Phase 2: Persona Refinement
Phase 2 refined the model for:
- deeper explanations
- better audience adaptation
- improved hedging and evidence framing
- warmer and more natural structure
### Phase 3: Policy Alignment with DPO
Phase 3 applied full-model DPO on top of the Phase 2 checkpoint to improve:
- refusal sharpness
- redirect correctness
- emergency escalation
- adversarial boundary holding
## Evaluation Summary
The checkpoint was evaluated on two suites: a frozen policy regression suite and a separate persona regression suite. Both evaluations used `gpt-5-mini` as judge with `reasoning_effort=low`.
### Frozen Policy Regression
Phase 3 was compared against the Phase 2 checkpoint and the original Phase 1 checkpoint on the same frozen suite.
| Model | Overall | Safety | Helpfulness | Medical Accuracy | Boundary Violation | Mode Incorrect |
|---|---:|---:|---:|---:|---:|---:|
| **Phase 3** | **4.5449** | **4.7885** | **4.5406** | **4.6880** | **3.85%** | 4.91% |
| Phase 2 | 4.4850 | 4.7436 | 4.4744 | 4.6090 | 5.77% | 6.84% |
| Phase 1 | 4.5064 | 4.7714 | 4.4573 | 4.6389 | **3.85%** | **4.70%** |
Relative to Phase 2, Phase 3 reduced the two key failure rates:
- **Boundary violation**: `5.77% -> 3.85%`
- **Mode incorrect**: `6.84% -> 4.91%`
Relative to Phase 1, Phase 3 is stronger on overall quality, helpfulness, and medical accuracy, while remaining slightly worse on mode correctness.
### External Baselines
For external context, the same frozen regression suite was also run on the original Qwen base and instruct checkpoints:
| Model | Overall | Boundary Violation | Mode Incorrect |
|---|---:|---:|---:|
| Qwen3-4B-Base | 3.66 | 26.24% | 28.39% |
| Qwen3-4B-Instruct | 4.04 | 27.31% | 24.52% |
| **med-advisor-4b Phase 3** | **4.54** | **3.85%** | **4.91%** |
This is the main reason to use `med-advisor-4b` instead of the off-the-shelf base or instruct model for medical education: the Phase 3 checkpoint is much better at holding medical policy boundaries while remaining useful as an explainer.
### Persona Regression
| Model | Overall | Depth | Audience | Warmth | Structure | Hedging | Verbosity | Evidence | Multi-turn |
|---|---:|---:|---:|---:|---:|---:|---:|---:|---:|
| **Phase 3** | **4.1042** | **3.7292** | **4.5208** | 4.2083 | **4.3958** | **4.2708** | **4.6875** | **3.7083** | 4.6667 |
| Phase 2 | 4.0000 | 3.5833 | 4.3750 | **4.2292** | 4.3750 | 4.1458 | 4.6250 | 3.5208 | **5.0000** |
| Phase 1 | 3.6458 | 3.4583 | 4.1458 | 4.0417 | 4.1250 | 3.6875 | 4.2500 | 3.1458 | 4.6667 |
Relative to Phase 2, Phase 3 is a **net-positive persona update**:
- better overall persona quality
- better depth
- better audience adaptation
- better structure and hedging
- better verbosity control
Small regressions remain in:
- warmth
- multi-turn consistency
## Safety and Limitations
This model is a medical education model, not a clinical system. It still has meaningful limitations:
- like any language model, it can generate inaccurate, incomplete, outdated, or misleading information, including content that sounds confident
- it can remain too shallow on some prompts
- emergency escalation is improved but still a top weakness
- some refusal answers can still leak guideline-like dosing content
- repetition can appear on long, refusal-heavy generations if decoding is loose
Recommended decoding for safer, more stable output:
- `do_sample=False`
- `repetition_penalty=1.10` to `1.15`
- `no_repeat_ngram_size=6`
- a clear system prompt that restricts the model to general education
These settings reduced repetition in local testing, but they are not a substitute for external safety review.
## Accessing Earlier Revisions
Earlier checkpoints remain available in this repository history:
| Phase | Description | Revision |
|---|---|---|
| Phase 1 | Medical checkpoint | `193afbea53c34b2bdc9c493411d10d94b58da486` |
| Phase 2 | Persona-refined checkpoint | `285617171e95fd98983e231f8d69652dce50e964` |
| Phase 3 | Current default checkpoint | `main` |
## License
Apache 2.0
## Citation
If you use this checkpoint, please cite the repository and model page.