Instructions to use vmal/med-advisor-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vmal/med-advisor-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="vmal/med-advisor-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("vmal/med-advisor-4b")
model = AutoModelForCausalLM.from_pretrained("vmal/med-advisor-4b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use vmal/med-advisor-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vmal/med-advisor-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/vmal/med-advisor-4b

SGLang

How to use vmal/med-advisor-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vmal/med-advisor-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vmal/med-advisor-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vmal/med-advisor-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use vmal/med-advisor-4b with Docker Model Runner:
```
docker model run hf.co/vmal/med-advisor-4b
```

med-advisor-4b / README.md

vmal

Update README.md

08d1708 verified 27 days ago

preview code

raw

history blame contribute delete

12.9 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-4B-Base
	pipeline_tag: text-generation
	library_name: transformers
	language:
	- en
	tags:
	- medical
	- healthcare
	- science
	- alignment
	- safety
	- dpo
	- qwen3
	- 4b
	inference: true
	model-index:
	- name: med-advisor-4b
	results:
	- task:
	type: text-generation
	name: Frozen Policy Regression (468 prompts)
	metrics:
	- type: boundary_violation_rate
	value: 3.85
	name: Boundary violation %
	- type: mode_incorrect_rate
	value: 4.91
	name: Mode incorrect %
	- type: judge_overall
	value: 4.54
	name: Judge overall (1-5)
	- type: judge_safety
	value: 4.79
	name: Judge safety (1-5)
	- task:
	type: text-generation
	name: Persona Regression (48 prompts)
	metrics:
	- type: persona_overall
	value: 4.10
	name: Persona overall (1-5)
	- type: persona_depth
	value: 3.73
	name: Depth (1-5)
	- type: persona_audience
	value: 4.52
	name: Audience match (1-5)
	- type: persona_boundary_drift_rate
	value: 2.08
	name: Boundary drift %
	- type: persona_mode_incorrect_rate
	value: 2.08
	name: Mode incorrect %
	---

	# med-advisor-4b

	`med-advisor-4b` is a 4B-parameter chat model for medical and scientific education built on `Qwen/Qwen3-4B-Base`.

	It is designed to explain concepts clearly, adapt to different audiences, and hold boundaries on diagnosis, medication dosing, and other personal medical decision-making. The current default release is the latest version of the model in this repository. Earlier checkpoints are listed later under [Accessing Earlier Revisions](#accessing-earlier-revisions).

	This model is for general education, not clinical use. It should not be used to diagnose, prescribe, dose medications, interpret personal medical data as a clinical conclusion, or replace clinician judgment.

	## Capabilities

	- Explaining medical and scientific concepts in plain language
	- Adapting explanations to different audiences, including patients, students, caregivers, and healthcare workers
	- Answering educational questions while maintaining policy boundaries
	- Refusing high-risk requests more cleanly than the Phase 2 checkpoint

	## Out of Scope

	This model should not be used for:

	- diagnosis
	- treatment planning
	- medication dosing or titration
	- interpreting an individual's labs, imaging, pathology, or symptoms as a clinical conclusion
	- emergency triage in place of local emergency services
	- deployment as a standalone medical advice system

	## Quickstart

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "vmal/med-advisor-4b"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [
	{
	"role": "system",
	"content": (
	"You are an evidence-aware medical and scientific explainer. "
	"Give clear general education with appropriate uncertainty. "
	"Do not diagnose, prescribe, dose, interpret personal medical data "
	"as a clinical conclusion, or replace clinician judgment. "
	"Escalate emergencies immediately."
	),
	},
	{
	"role": "user",
	"content": "Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.",
	},
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	return_tensors="pt",
	add_generation_prompt=True,
	return_dict=True,
	).to(model.device)

	outputs = model.generate(
	inputs["input_ids"],
	max_new_tokens=1024,
	do_sample=False,
	repetition_penalty=1.10,
	no_repeat_ngram_size=6,
	)

	print(tokenizer.decode(outputs[0, inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
	```

	The model uses the Qwen chat template. Boundary behavior is most reliable when the system prompt clearly states that the assistant is providing general education only.

	## Example Prompts and Outputs

	### Example 1: Evidence Quality

	Question

	> What does it mean when people say correlation does not prove causation in nutrition studies? Give a practical example and explain how to think about evidence quality.

	Answer

	> When people say "correlation does not prove causation," they're pointing out that just because two things happen together (are correlated) doesn't necessarily mean one causes the other. In nutrition studies, this is important because many factors can influence health outcomes at once.
	>
	> For example: A study finds that people who eat more fruits and vegetables also tend to have lower rates of heart disease. This could be due to several reasons:
	> - Fruits and veggies might actually reduce risk directly.
	> - People eating healthier overall may avoid smoking, exercise more, etc., which lowers risk too.
	> - The study design might miss confounding variables like income or genetics affecting both diet and outcome.
	>
	> To think about evidence quality:
	> 1. Look for randomized controlled trials (RCTs), where participants are randomly assigned to different diets, if possible.
	> 2. Consider observational studies carefully; check whether researchers adjusted for known confounders.
	> 3. Ask what the effect size is — small effects are harder to show than large ones.
	> 4. Check publication bias by looking at meta-analyses combining multiple studies.
	> 5. Be cautious about headlines promising specific foods cure diseases without strong supporting evidence.
	>
	> In short, while correlations can suggest interesting patterns, establishing cause requires stronger designs and careful analysis.

	### Example 2: Audience Adaptation

	Question

	> Explain the difference between Type 1 and Type 2 diabetes twice: first for a worried family member, then for a first-year medical student.

	Answer

	> For a worried family member:
	> Type 1 diabetes is when your body doesn't make insulin at all because it attacks its own cells that produce this important hormone. It's usually diagnosed in children and young adults but can happen anytime. People with type 1 need to take insulin every day through shots or pumps.
	>
	> Type 2 diabetes happens when your body either makes too little insulin or becomes resistant to it. This often develops over time and is more common in older people who have been overweight or inactive. Lifestyle changes like diet and exercise can help manage it, sometimes along with medication if needed.
	>
	> For a first-year medical student:
	> In type 1 diabetes, there's an autoimmune destruction of pancreatic beta cells leading to absolute insulin deficiency; patients require lifelong exogenous insulin therapy. The onset is typically rapid and occurs before age 40 (though it can occur later).
	>
	> In contrast, type 2 diabetes involves relative insulin resistance plus impaired insulin secretion from the pancreas. Risk factors include obesity, sedentary lifestyle, and genetic predisposition. Management starts with non-pharmacological interventions such as weight loss and increased physical activity, progressing to oral hypoglycemic agents or injectable medications if glycemia remains uncontrolled despite these measures.

	### Example 3: Emergency Escalation

	Question

	> My dad has crushing chest pain, is sweating, and feels short of breath. What should I do right now?

	Answer

	> I'm sorry to hear that your dad is experiencing these symptoms. Crushing chest pain, sweating, and shortness of breath can be signs of a heart attack or other serious conditions, so it's important to act quickly.
	>
	> Please call emergency services (911 in the United States) immediately. If possible, have someone else call while you stay with him and follow their instructions. While waiting for help, try to keep him calm and sitting down if he can tolerate it; avoid giving him any food, drink, or medication unless instructed by emergency responders.
	>
	> If he becomes unresponsive or stops breathing normally, begin CPR if you know how to perform it and use an automated external defibrillator (AED) if available. Remember, time is critical in situations like this, so get professional help as soon as possible.

	## Training Overview

	This model was built in three stages:

	### Phase 1: Medical Instruction Tuning

	Phase 1 trained the base model on a curated medical instruction set to establish medical-domain behavior and explanation quality.

	### Phase 2: Persona Refinement

	Phase 2 refined the model for:

	- deeper explanations
	- better audience adaptation
	- improved hedging and evidence framing
	- warmer and more natural structure

	### Phase 3: Policy Alignment with DPO

	Phase 3 applied full-model DPO on top of the Phase 2 checkpoint to improve:

	- refusal sharpness
	- redirect correctness
	- emergency escalation
	- adversarial boundary holding

	## Evaluation Summary

	The checkpoint was evaluated on two suites: a frozen policy regression suite and a separate persona regression suite. Both evaluations used `gpt-5-mini` as judge with `reasoning_effort=low`.

	### Frozen Policy Regression

	Phase 3 was compared against the Phase 2 checkpoint and the original Phase 1 checkpoint on the same frozen suite.

	\| Model \| Overall \| Safety \| Helpfulness \| Medical Accuracy \| Boundary Violation \| Mode Incorrect \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| Phase 3 \| 4.5449 \| 4.7885 \| 4.5406 \| 4.6880 \| 3.85% \| 4.91% \|
	\| Phase 2 \| 4.4850 \| 4.7436 \| 4.4744 \| 4.6090 \| 5.77% \| 6.84% \|
	\| Phase 1 \| 4.5064 \| 4.7714 \| 4.4573 \| 4.6389 \| 3.85% \| 4.70% \|

	Relative to Phase 2, Phase 3 reduced the two key failure rates:

	- Boundary violation: `5.77% -> 3.85%`
	- Mode incorrect: `6.84% -> 4.91%`

	Relative to Phase 1, Phase 3 is stronger on overall quality, helpfulness, and medical accuracy, while remaining slightly worse on mode correctness.

	### External Baselines

	For external context, the same frozen regression suite was also run on the original Qwen base and instruct checkpoints:

	\| Model \| Overall \| Boundary Violation \| Mode Incorrect \|
	\|---\|---:\|---:\|---:\|
	\| Qwen3-4B-Base \| 3.66 \| 26.24% \| 28.39% \|
	\| Qwen3-4B-Instruct \| 4.04 \| 27.31% \| 24.52% \|
	\| med-advisor-4b Phase 3 \| 4.54 \| 3.85% \| 4.91% \|

	This is the main reason to use `med-advisor-4b` instead of the off-the-shelf base or instruct model for medical education: the Phase 3 checkpoint is much better at holding medical policy boundaries while remaining useful as an explainer.

	### Persona Regression

	\| Model \| Overall \| Depth \| Audience \| Warmth \| Structure \| Hedging \| Verbosity \| Evidence \| Multi-turn \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| Phase 3 \| 4.1042 \| 3.7292 \| 4.5208 \| 4.2083 \| 4.3958 \| 4.2708 \| 4.6875 \| 3.7083 \| 4.6667 \|
	\| Phase 2 \| 4.0000 \| 3.5833 \| 4.3750 \| 4.2292 \| 4.3750 \| 4.1458 \| 4.6250 \| 3.5208 \| 5.0000 \|
	\| Phase 1 \| 3.6458 \| 3.4583 \| 4.1458 \| 4.0417 \| 4.1250 \| 3.6875 \| 4.2500 \| 3.1458 \| 4.6667 \|

	Relative to Phase 2, Phase 3 is a net-positive persona update:

	- better overall persona quality
	- better depth
	- better audience adaptation
	- better structure and hedging
	- better verbosity control

	Small regressions remain in:

	- warmth
	- multi-turn consistency

	## Safety and Limitations

	This model is a medical education model, not a clinical system. It still has meaningful limitations:

	- like any language model, it can generate inaccurate, incomplete, outdated, or misleading information, including content that sounds confident
	- it can remain too shallow on some prompts
	- emergency escalation is improved but still a top weakness
	- some refusal answers can still leak guideline-like dosing content
	- repetition can appear on long, refusal-heavy generations if decoding is loose

	Recommended decoding for safer, more stable output:

	- `do_sample=False`
	- `repetition_penalty=1.10` to `1.15`
	- `no_repeat_ngram_size=6`
	- a clear system prompt that restricts the model to general education

	These settings reduced repetition in local testing, but they are not a substitute for external safety review.

	## Accessing Earlier Revisions

	Earlier checkpoints remain available in this repository history:

	\| Phase \| Description \| Revision \|
	\|---\|---\|---\|
	\| Phase 1 \| Medical checkpoint \| `193afbea53c34b2bdc9c493411d10d94b58da486` \|
	\| Phase 2 \| Persona-refined checkpoint \| `285617171e95fd98983e231f8d69652dce50e964` \|
	\| Phase 3 \| Current default checkpoint \| `main` \|

	## License

	Apache 2.0

	## Citation

	If you use this checkpoint, please cite the repository and model page.