MedSLM-SFT — Instruction-Tuned Medical Language Model
⚠️ Research Only — Not for Clinical Use
This model is intended for research and educational purposes only.
It must not be used for medical diagnosis, treatment recommendations, or any clinical decision-making.
Model Summary
MedSLM-SFT is a 330M-parameter medical language model fine-tuned for instruction following and question answering.
It was created by applying Supervised Fine-Tuning (SFT) with QLoRA (4-bit quantized LoRA) to the pre-trained base model Saminx22/MedSLM using the dataset Saminx22/medical_data_for_slm_SFT.
This repository contains the merged model (LoRA adapters baked into the base model at full fp16 precision). It can be used like any standard Hugging Face causal LM without requiring PEFT at inference time.
For the standalone LoRA adapter weights, see Saminx22/MedSLM-SFT-LoRA.
| Property | Value |
|---|---|
| Base model | Saminx22/MedSLM |
| Architecture | LLaMA-style (RMSNorm, RoPE, SwiGLU, GQA) |
| Parameters | ~330M |
| Context length | 1,024 tokens |
| Vocabulary | 50,257 (GPT-2 tokenizer) |
| Fine-tuning method | QLoRA (4-bit base + LoRA adapters) |
| LoRA rank / alpha | 16 / 32 |
| Trainable parameters | ~7.1M (3.59% of total) |
| Training data | 46,166 medical QA pairs |
| Training framework | Unsloth + TRL SFTTrainer |
| Hardware | Tesla T4 (15.6 GB VRAM) |
Architecture
MedSLM-SFT uses a modern LLaMA-style architecture with:
- RMSNorm for stable training
- Rotary Positional Embeddings (RoPE)
- SwiGLU activation in the feed-forward network
- Grouped-Query Attention (GQA) — 16 query heads / 8 key-value heads
The base model was pre-trained from scratch on ~148M tokens of medical text (PubMed abstracts, PMC full texts, and clinical guidelines).
Training Details
Dataset
- Repository:
Saminx22/medical_data_for_slm_SFT - Splits: 46,166 train / 2,565 validation / 2,565 test
- Sources: WikiDoc, medical Q&A corpora
- Average length: ~180 tokens per example
Training prompt template:
### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.
### User:
{question}
### Assistant:
{answer}
SFT Hyperparameters
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-4 |
| LR scheduler | Cosine decay |
| Warmup ratio | 5% |
| Batch size (per device) | 4 |
| Gradient accumulation steps | 8 |
| Effective batch size | 32 |
| Epochs | 3 |
| Weight decay | 0.01 |
| Max gradient norm | 1.0 |
| Optimizer | AdamW (8-bit) |
| Sequence packing | Enabled |
| Max sequence length | 1,024 tokens |
| Precision | fp16 |
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 16 |
| Alpha (α) | 32 |
| Effective scaling | 2.0 |
| Dropout | 0.0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Bias | none |
How to Use
Installation
pip install transformers torch accelerate bitsandbytes
Quick Start Code
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
MODEL_ID = "Saminx22/MedSLM-SFT"
# 4-bit quantization for lower VRAM (optional)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
quantization_config=bnb_config, # Remove this line for full fp16
device_map="auto",
)
model.eval()
SYSTEM_PROMPT = "You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions."
def ask(question: str, max_new_tokens: int = 300) -> str:
prompt = f"### System:\n{SYSTEM_PROMPT}\n\n### User:\n{question}\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=True,
temperature=0.7,
top_p=0.9,
top_k=50,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
response = output_ids[0][inputs["input_ids"].shape[1]:]
return tokenizer.decode(response, skip_special_tokens=True).strip()
# Example
print(ask("What are the warning signs of a stroke?"))
Recommended Prompt Template
Always use this exact format for best performance:
### System:
You are a medical AI assistant. Provide accurate, evidence-based answers to medical questions.
### User:
<your question here>
### Assistant:
Limitations and Risks
- Research only — Not for clinical use or patient care.
- Small model size (~330M parameters) → more prone to hallucinations than larger models.
- No RLHF or safety alignment.
- Trained only for single-turn QA.
- Context length limited to 1,024 tokens.
Citation
@misc{medslm-sft-2025,
title = {MedSLM-SFT: Instruction-Tuned Medical Small Language Model},
author = {Saminx22},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Saminx22/MedSLM-SFT}
}
Related Repositories
- Base model:
Saminx22/MedSLM - SFT Dataset:
Saminx22/medical_data_for_slm_SFT - LoRA adapters only:
Saminx22/MedSLM-SFT-LoRA
- Downloads last month
- -