Instructions to use KingLLM/medical-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use KingLLM/medical-finetuned with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen3-4b-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "KingLLM/medical-finetuned") - Transformers
How to use KingLLM/medical-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="KingLLM/medical-finetuned") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("KingLLM/medical-finetuned", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use KingLLM/medical-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "KingLLM/medical-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KingLLM/medical-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/KingLLM/medical-finetuned
- SGLang
How to use KingLLM/medical-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "KingLLM/medical-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KingLLM/medical-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "KingLLM/medical-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KingLLM/medical-finetuned", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use KingLLM/medical-finetuned with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for KingLLM/medical-finetuned to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for KingLLM/medical-finetuned to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for KingLLM/medical-finetuned to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="KingLLM/medical-finetuned", max_seq_length=2048, ) - Docker Model Runner
How to use KingLLM/medical-finetuned with Docker Model Runner:
docker model run hf.co/KingLLM/medical-finetuned
Medical Fine-tuned Qwen3-4B
A LoRA adapter fine-tuned on top of Qwen3-4B for medical question answering. The model acts as an expert medical doctor, providing diagnosis guidance and treatment advice in response to patient questions.
Disclaimer: This model is for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider.
Model Details
| Field | Value |
|---|---|
| Base model | unsloth/Qwen3-4B |
| Fine-tuning method | SFT (Supervised Fine-Tuning) with LoRA |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training dataset | chatdoctor_healthcaremagic (5,000 samples) |
| Model type | Causal LM (Qwen3 architecture) |
| Language | English |
| License | Apache 2.0 |
Quick Start
Option 1 — Load LoRA adapter (recommended, ~140 MB download)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel
BASE_MODEL = "Qwen/Qwen3-4B"
ADAPTER = "KingLLM/medical-finetuned"
device = "cuda" if torch.cuda.is_available() else \
"mps" if torch.backends.mps.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload() # bake LoRA into weights
model = model.to(device).eval()
Option 2 — On Kaggle / Colab (GPU, with Unsloth)
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
"KingLLM/medical-finetuned",
max_seq_length = 2048,
load_in_4bit = True,
dtype = torch.float16,
)
model.eval()
Inference
from transformers import TextStreamer
SYSTEM_PROMPT = (
"You are an expert medical doctor. "
"Answer the patient's question with a clear diagnosis and treatment advice."
)
def ask(question: str):
text = tokenizer.apply_chat_template([
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question},
], tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(device)
with torch.no_grad():
model.generate(
**inputs,
max_new_tokens = 512,
temperature = 0.7,
do_sample = True,
streamer = TextStreamer(tokenizer, skip_prompt=True),
)
ask("I have had a fever of 39°C, sore throat, and fatigue for 3 days. What should I do?")
ask("I am a 45-year-old male with high blood pressure. Can I take ibuprofen?")
Training Details
Dataset
Malikeh1375/medical-question-answering-datasets — chatdoctor_healthcaremagic subset.
- 112k doctor–patient conversation pairs
- Fields used:
instruction/input(question) andoutput(doctor response) - 5,000 samples used for this run
Procedure
Supervised fine-tuning (SFT) using the Qwen3 instruct chat template:
<|im_start|>system
You are an expert medical doctor...<|im_end|>
<|im_start|>user
{patient question}<|im_end|>
<|im_start|>assistant
{doctor response}<|im_end|>
Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Batch size (per device) | 2 |
| Gradient accumulation | 4 (effective batch = 8) |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Warmup steps | 10 |
| Optimizer | adamw_8bit |
| Weight decay | 0.01 |
| Max sequence length | 2048 |
| Precision | fp16 |
Hardware
- GPU: NVIDIA Tesla T4 (16 GB)
- Platform: Kaggle (free tier)
- Framework: Unsloth + TRL SFTTrainer
Limitations & Risks
- Not a medical device. Outputs are not validated by clinical experts and must not be used for actual diagnosis or treatment decisions.
- Hallucination. Like all LLMs, the model can produce plausible-sounding but incorrect medical information.
- English only. Trained exclusively on English-language data.
- Narrow coverage. Trained on general GP-style Q&A; may perform poorly on specialist domains (oncology, rare diseases, paediatrics, etc.).
- No patient history. The model has no memory across turns and no access to lab results or imaging.
Citation
If you use this model, please cite the base model and dataset:
@misc{qwen3-2025,
title = {Qwen3 Technical Report},
author = {Qwen Team},
year = {2025},
url = {https://huggingface.co/Qwen/Qwen3-4B}
}
@dataset{malikeh-medical-qa,
author = {Malikeh Ehghaghi},
title = {Medical Question Answering Datasets},
year = {2023},
url = {https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets}
}
Framework Versions
- PEFT 0.18.1
- TRL (SFTTrainer)
- Unsloth 2026.3.8
- Transformers ≥ 4.51
- PyTorch 2.10
- Downloads last month
- 1