Medical QA LLM β QLoRA Fine-Tuned Llama 3.1 8B
A medical question-answering model fine-tuned from Meta Llama 3.1 8B-Instruct using QLoRA (4-bit quantization + LoRA adapters) on 107K medical QA examples.
Training Data
| Dataset | Examples | Description |
|---|---|---|
| ChatDoctor | 96,485 | Patient-doctor conversations |
| MedQA-USMLE | 10,178 | USMLE-style medical exam questions |
| PubMedQA | 1,000 | Biomedical research questions |
| Total | 107,663 | Train: 101,421 / Val: 5,338 |
Training Results
- Initial Loss: 3.37
- Final Loss: 1.18
- Best Loss: 1.13 (step 7,190)
- Loss Reduction: 65%
Training Loss Curve
Loss Distribution by Epoch
Evaluation β PubMedQA Benchmark
| Model | Accuracy (300 samples) |
|---|---|
| Base Llama 3.1 8B-Instruct | 75.3% |
| Fine-Tuned (this model) | 71.7% |
Note: The fine-tuned model was primarily trained on conversational medical data (ChatDoctor ~90% of training set), optimizing for detailed, doctor-style responses rather than terse yes/no/maybe classification. The PubMedQA benchmark measures classification accuracy, which favors the base model's instruction-following format. The fine-tuned model excels at generating comprehensive medical explanations and patient-friendly responses.
Usage
With PEFT (Recommended β uses less memory)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch
# Load in 4-bit for memory efficiency
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=bnb_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "DinoCU/medical-qa-llama3.1-8b")
tokenizer = AutoTokenizer.from_pretrained("DinoCU/medical-qa-llama3.1-8b")
# Generate
messages = [
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What are the common side effects of metformin?"},
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- This model is for educational and research purposes only β not for clinical decision-making.
- Trained primarily on conversational medical data; may generate verbose responses.
- May produce inaccurate or hallucinated medical information.
- Always consult qualified healthcare professionals for medical advice.
- Downloads last month
- 10
Model tree for DinoCU/medical-qa-llama3.1-8b
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-InstructDatasets used to train DinoCU/medical-qa-llama3.1-8b
Evaluation results
- Accuracy on PubMedQAtest set self-reported71.700

