Edifon
/

SOAP_SFT_V1

@@ -5,17 +5,196 @@ tags:
 - transformers
 - unsloth
 - gemma3
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** Edifon
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/gemma-3-4b-it-unsloth-bnb-4bit
-This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - transformers
 - unsloth
 - gemma3
+- medical
+- clinical-nlp
+- soap-notes
 license: apache-2.0
 language:
 - en
 ---
+# SOAP_SFT_V1 — Medical SOAP Note Generator
+**SOAP_SFT_V1** is a fine-tuned version of [Gemma 3 4B Instruct](https://huggingface.co/unsloth/gemma-3-4b-it-unsloth-bnb-4bit), trained to generate structured clinical **SOAP notes** (Subjective, Objective, Assessment, Plan) from doctor–patient dialogues.
+Trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library on an H100 GPU.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+---
+## Model Details
+| Property | Value |
+|---|---|
+| **Developed by** | Edifon |
+| **Base model** | `unsloth/gemma-3-4b-it-unsloth-bnb-4bit` |
+| **Model type** | Causal Language Model (fine-tuned) |
+| **Language** | English |
+| **License** | Apache 2.0 |
+| **Fine-tuning method** | Supervised Fine-Tuning (SFT) with LoRA |
+| **Training hardware** | Google Colab H100 |
+---
+## Intended Use
+This model is designed to assist healthcare professionals and clinical NLP researchers by automatically converting clinical consultation transcripts into structured SOAP notes.
+**SOAP format:**
+- **S (Subjective):** Patient-reported symptoms, history, and complaints
+- **O (Objective):** Observable/measurable clinical findings and planned investigations
+- **A (Assessment):** Differential diagnosis and clinical reasoning
+- **P (Plan):** Treatment plan, referrals, and follow-up instructions
+> ⚠️ **Disclaimer:** This model is intended as a research and assistive tool only. It is **not** a substitute for professional medical judgment or a licensed clinician's evaluation.
+---
+## Training Details
+### Dataset
+- **Dataset:** [`syafiqassegaf/soap-dataset`](https://www.kaggle.com/datasets/syafiqassegaf/soap-dataset) (Kaggle)
+- **Total examples:** 9,250
+- **Train / Eval split:** 90% / 10% → 8,325 train | 925 eval
+- **Features:** `dialogue`, `soap`, `prompt`, `messages`
+### LoRA Configuration
+| Parameter | Value |
+|---|---|
+| Rank (`r`) | 8 |
+| Alpha (`lora_alpha`) | 8 |
+| Dropout | 0 |
+| Bias | none |
+| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
+| Trainable parameters | 16,394,240 / 4,316,473,712 (**0.38%**) |
+| Vision layers finetuned | No |
+| Language layers finetuned | Yes |
+### Training Hyperparameters
+| Parameter | Value |
+|---|---|
+| Epochs | 5 |
+| Per-device batch size | 2 |
+| Gradient accumulation steps | 4 (effective batch size = 8) |
+| Learning rate | 2e-5 |
+| LR scheduler | Linear |
+| Optimizer | AdamW 8-bit |
+| Weight decay | 0.001 |
+| Warmup steps | 5 |
+| Max sequence length | 2048 |
+| Seed | 3407 |
+| Total steps | 5,205 |
+Training used `train_on_responses_only` — only model responses were used in the loss computation, not the user instructions.
+---
+## How to Use
+### With `transformers` (Standard)
+```python
+from transformers import AutoProcessor, AutoModelForImageTextToText
+processor = AutoProcessor.from_pretrained("Edifon/SOAP_SFT_V1")
+model = AutoModelForImageTextToText.from_pretrained("Edifon/SOAP_SFT_V1", device_map="auto")
+messages = [
+    {
+        "role": "system",
+        "content": [{"type": "text", "text": (
+            "You are an expert medical professor assisting in the creation of medically accurate SOAP summaries. "
+            "Please ensure the response follows the structured format: S:, O:, A:, P: without using markdown or special formatting."
+        )}],
+    },
+    {
+        "role": "user",
+        "content": [{"type": "text", "text": """Create a medical SOAP summary of this dialogue.
+### Dialogue:
+Doctor: Hello, what brings you in today?
+Patient: I've been having severe headaches for the past few weeks...
+[rest of dialogue]
+"""}],
+    },
+]
+inputs = processor.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_dict=True,
+    return_tensors="pt",
+).to(model.device)
+from transformers import TextStreamer
+_ = model.generate(
+    **inputs,
+    max_new_tokens=2048,
+    streamer=TextStreamer(processor, skip_prompt=True),
+)
+```
+### With Unsloth (Faster Inference)
+```python
+from unsloth import FastModel
+model, tokenizer = FastModel.from_pretrained(
+    model_name="Edifon/SOAP_SFT_V1",
+    max_seq_length=2048,
+    load_in_4bit=True,
+)
+FastModel.for_inference(model)
+```
+---
+## Example Output
+**Input dialogue (excerpt):**
+> Patient reports photopsia in the left eye for ten days, including flashes of light and a dark spot on the nasal side. Had influenza-like symptoms two weeks prior. No history of eye disease.
+**Model output:**
+```
+S: Patient reports experiencing photopsia in the left eye for ten days, describing flashes of light
+   and a dark spot on the nasal side. History of influenza-like symptoms two weeks prior.
+   No prior eye disease, operations, or treatments.
+O: Patient presented with photopsia and a dark spot in the left eye. Comprehensive eye examination
+   planned (visual acuity, slit-lamp, fundus examination).
+A: Differential includes post-infectious transient optic neuropathy or acute ocular involvement
+   secondary to influenza. Absence of prior eye disease supports opportunistic onset.
+P: Order comprehensive eye examination. Schedule follow-up to review results and determine
+   treatment or referral plan. Encourage prompt completion of planned examination.
+```
+---
+## Limitations
+- Trained exclusively on English-language dialogues
+- Performance may degrade on highly specialized subspecialty consultations underrepresented in the training data
+- Should not be used for clinical decision-making without expert oversight
+- Outputs may occasionally include disclaimers or formatting inconsistencies
+---
+## Citation
+If you use this model in your research, please cite the base model and dataset:
+```bibtex
+@misc{soap_sft_v1,
+  author       = {Edifon},
+  title        = {SOAP\_SFT\_V1: Medical SOAP Note Generator},
+  year         = {2025},
+  publisher    = {Hugging Face},
+  url          = {https://huggingface.co/Edifon/SOAP_SFT_V1}
+}
+```