How to use from
Docker Model Runner
docker model run hf.co/KingLLM/medical-finetuned
Quick Links

Medical Fine-tuned Qwen3-4B

A LoRA adapter fine-tuned on top of Qwen3-4B for medical question answering. The model acts as an expert medical doctor, providing diagnosis guidance and treatment advice in response to patient questions.

Disclaimer: This model is for educational and research purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider.


Model Details

Field Value
Base model unsloth/Qwen3-4B
Fine-tuning method SFT (Supervised Fine-Tuning) with LoRA
LoRA rank 16
LoRA alpha 32
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training dataset chatdoctor_healthcaremagic (5,000 samples)
Model type Causal LM (Qwen3 architecture)
Language English
License Apache 2.0

Quick Start

Option 1 — Load LoRA adapter (recommended, ~140 MB download)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
from peft import PeftModel

BASE_MODEL = "Qwen/Qwen3-4B"
ADAPTER    = "KingLLM/medical-finetuned"

device = "cuda" if torch.cuda.is_available() else \
         "mps"  if torch.backends.mps.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL, torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload()   # bake LoRA into weights
model = model.to(device).eval()

Option 2 — On Kaggle / Colab (GPU, with Unsloth)

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    "KingLLM/medical-finetuned",
    max_seq_length = 2048,
    load_in_4bit   = True,
    dtype          = torch.float16,
)
model.eval()

Inference

from transformers import TextStreamer

SYSTEM_PROMPT = (
    "You are an expert medical doctor. "
    "Answer the patient's question with a clear diagnosis and treatment advice."
)

def ask(question: str):
    text = tokenizer.apply_chat_template([
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": question},
    ], tokenize=False, add_generation_prompt=True)

    inputs = tokenizer(text, return_tensors="pt").to(device)
    with torch.no_grad():
        model.generate(
            **inputs,
            max_new_tokens = 512,
            temperature    = 0.7,
            do_sample      = True,
            streamer       = TextStreamer(tokenizer, skip_prompt=True),
        )

ask("I have had a fever of 39°C, sore throat, and fatigue for 3 days. What should I do?")
ask("I am a 45-year-old male with high blood pressure. Can I take ibuprofen?")

Training Details

Dataset

Malikeh1375/medical-question-answering-datasetschatdoctor_healthcaremagic subset.

  • 112k doctor–patient conversation pairs
  • Fields used: instruction / input (question) and output (doctor response)
  • 5,000 samples used for this run

Procedure

Supervised fine-tuning (SFT) using the Qwen3 instruct chat template:

<|im_start|>system
You are an expert medical doctor...<|im_end|>
<|im_start|>user
{patient question}<|im_end|>
<|im_start|>assistant
{doctor response}<|im_end|>

Hyperparameters

Parameter Value
Epochs 1
Batch size (per device) 2
Gradient accumulation 4 (effective batch = 8)
Learning rate 2e-4
LR scheduler cosine
Warmup steps 10
Optimizer adamw_8bit
Weight decay 0.01
Max sequence length 2048
Precision fp16

Hardware

  • GPU: NVIDIA Tesla T4 (16 GB)
  • Platform: Kaggle (free tier)
  • Framework: Unsloth + TRL SFTTrainer

Limitations & Risks

  • Not a medical device. Outputs are not validated by clinical experts and must not be used for actual diagnosis or treatment decisions.
  • Hallucination. Like all LLMs, the model can produce plausible-sounding but incorrect medical information.
  • English only. Trained exclusively on English-language data.
  • Narrow coverage. Trained on general GP-style Q&A; may perform poorly on specialist domains (oncology, rare diseases, paediatrics, etc.).
  • No patient history. The model has no memory across turns and no access to lab results or imaging.

Citation

If you use this model, please cite the base model and dataset:

@misc{qwen3-2025,
  title  = {Qwen3 Technical Report},
  author = {Qwen Team},
  year   = {2025},
  url    = {https://huggingface.co/Qwen/Qwen3-4B}
}

@dataset{malikeh-medical-qa,
  author = {Malikeh Ehghaghi},
  title  = {Medical Question Answering Datasets},
  year   = {2023},
  url    = {https://huggingface.co/datasets/Malikeh1375/medical-question-answering-datasets}
}

Framework Versions

  • PEFT 0.18.1
  • TRL (SFTTrainer)
  • Unsloth 2026.3.8
  • Transformers ≥ 4.51
  • PyTorch 2.10
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KingLLM/medical-finetuned

Finetuned
Qwen/Qwen3-4B
Finetuned
unsloth/Qwen3-4B
Adapter
(22)
this model

Dataset used to train KingLLM/medical-finetuned