Instructions to use Tushar9802/medscribe-soap-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Tushar9802/medscribe-soap-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("./models/medgemma")
model = PeftModel.from_pretrained(base_model, "Tushar9802/medscribe-soap-lora")

Transformers

How to use Tushar9802/medscribe-soap-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Tushar9802/medscribe-soap-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Tushar9802/medscribe-soap-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Tushar9802/medscribe-soap-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Tushar9802/medscribe-soap-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tushar9802/medscribe-soap-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Tushar9802/medscribe-soap-lora

SGLang

How to use Tushar9802/medscribe-soap-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Tushar9802/medscribe-soap-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tushar9802/medscribe-soap-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Tushar9802/medscribe-soap-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Tushar9802/medscribe-soap-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Tushar9802/medscribe-soap-lora with Docker Model Runner:
```
docker model run hf.co/Tushar9802/medscribe-soap-lora
```

base_model: google/medgemma-4b-it library_name: peft pipeline_tag: text-generation license: mit language:

en tags:
lora
transformers
medical
clinical-documentation
soap-notes
medgemma
hai-def
medgemma-impact-challenge

MedScribe SOAP LoRA — Concise Clinical Note Generation

LoRA adapter for google/medgemma-4b-it that generates concise, clinician-ready SOAP notes from medical encounter transcripts.

Built for the Google MedGemma Impact Challenge 2026.

What This Model Does

Converts medical encounter transcripts into structured SOAP (Subjective, Objective, Assessment, Plan) notes written in the concise shorthand that clinicians actually use — not the verbose textbook prose that base models default to.

Example:

Input transcript	"54-year-old female presenting with shortness of breath. CT chest shows filling defects in segmental branches of right lower lobe..."
Base MedGemma	~200 words, textbook prose, over-specified plan with 6-8 items
This adapter	~104 words, clinical shorthand ("54 yo F c/o SOB"), focused 2-4 item plan

Key Metrics

Metric	Base MedGemma	With This Adapter
Avg word count	~200+	104
Section completeness (S/O/A/P)	85-95%	100%
Hallucinated findings	5-10%	0%
WNL shortcuts	Present	0%
Clinical style	Textbook verbose	Shorthand
PLAN items	4-8	2-4 (focused)
Quality score	—	90/100

Usage

python

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model (4-bit quantized)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
"google/medgemma-4b-it",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-4b-it")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model,"Tushar-9802/medscribe-soap-lora")
model.eval()

# Generate SOAP note
prompt ="""You are a clinical documentation assistant. Convert the following medical
text into a structured SOAP note.

MEDICAL TEXT:
{your_transcript_here}

Generate a SOAP note with these sections:
- SUBJECTIVE: Patient-reported symptoms and history
- OBJECTIVE: Physical exam findings and vital signs
- ASSESSMENT: Clinical impressions and diagnoses
- PLAN: Diagnostic tests, treatments, and follow-up

Write a complete PLAN (treatments, monitoring, follow-up). End with a full sentence.
SOAP NOTE:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    outputs = model.generate(
**inputs,
        max_new_tokens=400,
        min_new_tokens=150,
        do_sample=False,
        use_cache=True,
)
result = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)

Training Details

Training Data

712 curated transcript-SOAP pairs generated via GPT-4o Mini API ($1.28 total). Dataset: Tushar-9802/medscribe-soap-712

Each sample enforces:

"Not documented in source" for any finding absent from the input transcript
Zero WNL (Within Normal Limits) shortcuts — every finding explicitly stated
Concise clinical shorthand style
PLAN with specific, actionable items

Training Configuration

Parameter	Value
Base model	google/medgemma-4b-it
Method	LoRA
Rank	16
Alpha	32
Dropout	0.1
Target modules	All attention layers
Trainable parameters	~4.2M (0.1% of 4B base)
Batch size	2 (× 8 gradient accumulation = effective 16)
Learning rate	2e-5
Epochs	5 (early stopping patience: 2)
Precision	BFloat16
Quantization	4-bit NF4 during training
Hardware	NVIDIA RTX 5070 Ti (16GB VRAM)

Training Results

Metric	Value
Training loss	0.828
Validation loss	0.782
Overfitting	None (val < train)

Anti-Hallucination Behavior

The adapter was specifically trained to avoid clinical hallucination. When the input transcript does not contain information for a SOAP section, the model outputs "Not documented in source" rather than fabricating findings. This is critical for clinical safety — a missing field that is explicitly marked as missing is far safer than a plausible-sounding fabrication.

Intended Use

Converting medical encounter transcripts to structured SOAP notes
Clinical documentation assistance (with physician review)
Research and demonstration of efficient medical LLM fine-tuning

Limitations

English only
Research prototype — not validated for clinical use in any jurisdiction
Synthetic training data — 712 samples generated by GPT-4o Mini, not from real clinical encounters
Requires physician review — all generated notes must be reviewed and approved by a licensed clinician before use in patient care
Inference speed — ~25 seconds per note on RTX 5070 Ti with 4-bit quantization

Part Of

This adapter is one component of MedScribe, a clinical documentation workstation that combines MedASR (speech recognition), this fine-tuned MedGemma adapter (SOAP generation), and base MedGemma (clinical intelligence tools) into a single offline pipeline.

Framework Versions

PEFT 0.18.1
Transformers 4.52+
PyTorch 2.8+ (nightly for Blackwell/SM 12.0)
bitsandbytes 0.45+

Citation

bibtex

@misc{medscribe2026,
  author = {Tushar},
  title = {MedScribe: Concise Clinical Documentation via Fine-tuned MedGemma},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Tushar-9802/medscribe-soap-lora}
}

Contact

GitHub: @Tushar-9802

Downloads last month: 2

Model tree for Tushar9802/medscribe-soap-lora

Base model

google/gemma-3-4b-pt

Finetuned

google/medgemma-4b-pt

Finetuned

google/medgemma-4b-it

Adapter

(110)

this model