base_model: google/medgemma-4b-it library_name: peft pipeline_tag: text-generation license: mit language:
- en tags:
- lora
- transformers
- medical
- clinical-documentation
- soap-notes
- medgemma
- hai-def
- medgemma-impact-challenge
MedScribe SOAP LoRA β Concise Clinical Note Generation
LoRA adapter for google/medgemma-4b-it that generates concise, clinician-ready SOAP notes from medical encounter transcripts.
Built for the Google MedGemma Impact Challenge 2026.
What This Model Does
Converts medical encounter transcripts into structured SOAP (Subjective, Objective, Assessment, Plan) notes written in the concise shorthand that clinicians actually use β not the verbose textbook prose that base models default to.
Example:
| Input transcript | "54-year-old female presenting with shortness of breath. CT chest shows filling defects in segmental branches of right lower lobe..." |
|---|---|
| Base MedGemma | ~200 words, textbook prose, over-specified plan with 6-8 items |
| This adapter | ~104 words, clinical shorthand ("54 yo F c/o SOB"), focused 2-4 item plan |
Key Metrics
| Metric | Base MedGemma | With This Adapter |
|---|---|---|
| Avg word count | ~200+ | 104 |
| Section completeness (S/O/A/P) | 85-95% | 100% |
| Hallucinated findings | 5-10% | 0% |
| WNL shortcuts | Present | 0% |
| Clinical style | Textbook verbose | Shorthand |
| PLAN items | 4-8 | 2-4 (focused) |
| Quality score | β | 90/100 |
Usage
python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# Load base model (4-bit quantized)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
"google/medgemma-4b-it",
quantization_config=bnb_config,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-4b-it")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model,"Tushar-9802/medscribe-soap-lora")
model.eval()
# Generate SOAP note
prompt ="""You are a clinical documentation assistant. Convert the following medical
text into a structured SOAP note.
MEDICAL TEXT:
{your_transcript_here}
Generate a SOAP note with these sections:
- SUBJECTIVE: Patient-reported symptoms and history
- OBJECTIVE: Physical exam findings and vital signs
- ASSESSMENT: Clinical impressions and diagnoses
- PLAN: Diagnostic tests, treatments, and follow-up
Write a complete PLAN (treatments, monitoring, follow-up). End with a full sentence.
SOAP NOTE:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
outputs = model.generate(
**inputs,
max_new_tokens=400,
min_new_tokens=150,
do_sample=False,
use_cache=True,
)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
Training Details
Training Data
712 curated transcript-SOAP pairs generated via GPT-4o Mini API ($1.28 total). Dataset: Tushar-9802/medscribe-soap-712
Each sample enforces:
- "Not documented in source" for any finding absent from the input transcript
- Zero WNL (Within Normal Limits) shortcuts β every finding explicitly stated
- Concise clinical shorthand style
- PLAN with specific, actionable items
Training Configuration
| Parameter | Value |
|---|---|
| Base model | google/medgemma-4b-it |
| Method | LoRA |
| Rank | 16 |
| Alpha | 32 |
| Dropout | 0.1 |
| Target modules | All attention layers |
| Trainable parameters | ~4.2M (0.1% of 4B base) |
| Batch size | 2 (Γ 8 gradient accumulation = effective 16) |
| Learning rate | 2e-5 |
| Epochs | 5 (early stopping patience: 2) |
| Precision | BFloat16 |
| Quantization | 4-bit NF4 during training |
| Hardware | NVIDIA RTX 5070 Ti (16GB VRAM) |
Training Results
| Metric | Value |
|---|---|
| Training loss | 0.828 |
| Validation loss | 0.782 |
| Overfitting | None (val < train) |
Anti-Hallucination Behavior
The adapter was specifically trained to avoid clinical hallucination. When the input transcript does not contain information for a SOAP section, the model outputs "Not documented in source" rather than fabricating findings. This is critical for clinical safety β a missing field that is explicitly marked as missing is far safer than a plausible-sounding fabrication.
Intended Use
- Converting medical encounter transcripts to structured SOAP notes
- Clinical documentation assistance (with physician review)
- Research and demonstration of efficient medical LLM fine-tuning
Limitations
- English only
- Research prototype β not validated for clinical use in any jurisdiction
- Synthetic training data β 712 samples generated by GPT-4o Mini, not from real clinical encounters
- Requires physician review β all generated notes must be reviewed and approved by a licensed clinician before use in patient care
- Inference speed β ~25 seconds per note on RTX 5070 Ti with 4-bit quantization
Part Of
This adapter is one component of MedScribe, a clinical documentation workstation that combines MedASR (speech recognition), this fine-tuned MedGemma adapter (SOAP generation), and base MedGemma (clinical intelligence tools) into a single offline pipeline.
Framework Versions
- PEFT 0.18.1
- Transformers 4.52+
- PyTorch 2.8+ (nightly for Blackwell/SM 12.0)
- bitsandbytes 0.45+
Citation
bibtex
@misc{medscribe2026,
author = {Tushar},
title = {MedScribe: Concise Clinical Documentation via Fine-tuned MedGemma},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Tushar-9802/medscribe-soap-lora}
}
Contact
GitHub: @Tushar-9802
- Downloads last month
- 18