Instructions to use KingLLM/medical-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KingLLM/medical-finetuned with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-4b-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "KingLLM/medical-finetuned")

Transformers

How to use KingLLM/medical-finetuned with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="KingLLM/medical-finetuned")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("KingLLM/medical-finetuned", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use KingLLM/medical-finetuned with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KingLLM/medical-finetuned"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KingLLM/medical-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/KingLLM/medical-finetuned

SGLang

How to use KingLLM/medical-finetuned with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KingLLM/medical-finetuned" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KingLLM/medical-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KingLLM/medical-finetuned" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KingLLM/medical-finetuned",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use KingLLM/medical-finetuned with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KingLLM/medical-finetuned to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for KingLLM/medical-finetuned to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for KingLLM/medical-finetuned to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="KingLLM/medical-finetuned",
    max_seq_length=2048,
)

Docker Model Runner
How to use KingLLM/medical-finetuned with Docker Model Runner:
```
docker model run hf.co/KingLLM/medical-finetuned
```

Medical Fine-tuned Qwen3-4B

A LoRA adapter fine-tuned on top of Qwen3-4B for medical question answering. The model acts as an expert medical doctor, providing diagnosis guidance and treatment advice in response to patient questions.

Disclaimer: This model is for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider.

Model Details

Field	Value
Base model	`unsloth/Qwen3-4B`
Fine-tuning method	SFT (Supervised Fine-Tuning) with LoRA
LoRA rank	16
LoRA alpha	32
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training dataset	chatdoctor_healthcaremagic (5,000 samples)
Model type	Causal LM (Qwen3 architecture)
Language	English
License	Apache 2.0

Quick Start

Option 1 — Load LoRA adapter (recommended, ~140 MB download)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen3-4B"
ADAPTER    = "KingLLM/medical-finetuned"

device = "cuda" if torch.cuda.is_available() else \
         "mps"  if torch.backends.mps.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload()   # bake LoRA into weights
model = model.to(device).eval()

Option 2 — On Kaggle / Colab (GPU, with Unsloth)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    "KingLLM/medical-finetuned",
    max_seq_length = 2048,
    load_in_4bit   = True,
    dtype          = torch.float16,
)
model.eval()

Inference

from transformers import TextStreamer

SYSTEM_PROMPT = (
    "You are an expert medical doctor. "
    "Answer the patient's question with a clear diagnosis and treatment advice."
)

def ask(question: str):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": question},
    ], tokenize=False, add_generation_prompt=True)

    inputs = tokenizer(text, return_tensors="pt").to(device)
    with torch.no_grad():
        model.generate(
            **inputs,
            max_new_tokens = 512,
            temperature    = 0.7,
            do_sample      = True,
            streamer       = TextStreamer(tokenizer, skip_prompt=True),
        )

ask("I have had a fever of 39°C, sore throat, and fatigue for 3 days. What should I do?")
ask("I am a 45-year-old male with high blood pressure. Can I take ibuprofen?")

Training Details

Dataset

Malikeh1375/medical-question-answering-datasets — chatdoctor_healthcaremagic subset.

112k doctor–patient conversation pairs
Fields used: instruction / input (question) and output (doctor response)
5,000 samples used for this run

Procedure

Supervised fine-tuning (SFT) using the Qwen3 instruct chat template:

<|im_start|>system
You are an expert medical doctor...<|im_end|>
<|im_start|>user
{patient question}<|im_end|>
<|im_start|>assistant
{doctor response}<|im_end|>

Hyperparameters

Parameter	Value
Epochs	1
Batch size (per device)	2
Gradient accumulation	4 (effective batch = 8)
Learning rate	2e-4
LR scheduler	cosine
Warmup steps	10
Optimizer	adamw_8bit
Weight decay	0.01
Max sequence length	2048
Precision	fp16

Hardware

GPU: NVIDIA Tesla T4 (16 GB)
Platform: Kaggle (free tier)
Framework: Unsloth + TRL SFTTrainer

Limitations & Risks

Not a medical device. Outputs are not validated by clinical experts and must not be used for actual diagnosis or treatment decisions.
Hallucination. Like all LLMs, the model can produce plausible-sounding but incorrect medical information.
English only. Trained exclusively on English-language data.
Narrow coverage. Trained on general GP-style Q&A; may perform poorly on specialist domains (oncology, rare diseases, paediatrics, etc.).
No patient history. The model has no memory across turns and no access to lab results or imaging.

Citation

If you use this model, please cite the base model and dataset:

@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-4B}
}

@dataset{malikeh-medical-qa,
  author = {Malikeh Ehghaghi},
  title  = {Medical Question Answering Datasets},
  year   = {2023},
  url    = {https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets}
}

Framework Versions

PEFT 0.18.1
TRL (SFTTrainer)
Unsloth 2026.3.8
Transformers ≥ 4.51
PyTorch 2.10

Downloads last month: 1

Model tree for KingLLM/medical-finetuned

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B