File size: 14,613 Bytes
63d6b38
 
c29d417
8e6307b
 
c29d417
8e6307b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63d6b38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c29d417
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
license: apache-2.0
language:
  - en
  - ar
tags:
  - medical
  - clinical-ai
  - medgemma
  - fine-tuned
  - diagnosis
  - differential-diagnosis
  - clinical-transcription
  - arabic-medical
  - qlora
  - healthcare
  - gemma3_text
base_model: google/medgemma-27b-text-it
datasets:
  - akemiH/NoteChat
  - starmpcc/Asclepius-Synthetic-Clinical-Notes
  - AGBonnet/augmented-clinical-notes
  - omi-health/medical-dialogue-to-soap-summary
  - openlifescienceai/medmcqa
  - GBaker/MedQA-USMLE-4-options
  - zhengyun21/PMC-Patients
  - lingshu-medical-mllm/ReasonMed
  - UCSC-VLAA/MedReason
  - FreedomIntelligence/medical-o1-reasoning-SFT
  - qiaojin/PubMedQA
  - appier-ai-research/StreamBench
  - MustafaIbrahim/medical-arabic-qa
  - MKamil/arabic_medical_50k
pipeline_tag: text-generation
library_name: transformers
model-index:
  - name: Sanad-1.0
    results:
      - task:
          type: text-generation
          name: Medical Question Answering
        dataset:
          type: GBaker/MedQA-USMLE-4-options
          name: MedQA USMLE
        metrics:
          - type: accuracy
            value: 87.7
            name: MedQA Accuracy
---

# ๐Ÿฅ Sanad-1.0 โ€” Clinical AI Assistant

<p align="center">
  <img src="https://img.shields.io/badge/Base_Model-MedGemma_27B-blue" alt="Base Model">
  <img src="https://img.shields.io/badge/Parameters-27B-green" alt="Parameters">
  <img src="https://img.shields.io/badge/Precision-BF16-yellow" alt="Precision">
  <img src="https://img.shields.io/badge/Training_Data-551K_examples-orange" alt="Training Data">
  <img src="https://img.shields.io/badge/Languages-English_|_Arabic-purple" alt="Languages">
  <img src="https://img.shields.io/badge/License-Apache_2.0-red" alt="License">
</p>

**Sanad-1.0** (ุณู†ุฏ โ€” meaning "support" or "pillar" in Arabic) is a fine-tuned clinical AI model built on Google's [MedGemma-27B-text-it](https://huggingface.co/google/medgemma-27b-text-it). It is purpose-built for **Mediscribe**, a comprehensive clinical AI platform providing medical diagnosis, differential diagnosis, clinical transcription, and bilingual Arabic-English medical support.

Sanad-1.0 has been trained on **551,491 curated medical examples** across 15 specialized healthcare datasets using a **4-stage progressive fine-tuning pipeline** with QLoRA.

> **Try it live:** [Sanad-1 Demo](https://huggingface.co/spaces/360kaUser/Sanad-1) | [Sanad Demo](https://huggingface.co/spaces/360kaUser/sanad)

---

## โœจ Key Capabilities

| Capability | Description |
|------------|-------------|
| ๐Ÿฅ **Clinical Transcription** | Converts doctor-patient conversations into structured SOAP notes and clinical documentation |
| ๐Ÿ”ฌ **Medical Diagnosis** | Analyzes patient presentations with systematic clinical reasoning to arrive at diagnoses |
| ๐Ÿ“‹ **Differential Diagnosis** | Generates ranked differential diagnoses with probability assessments and reasoning chains |
| ๐Ÿง  **Chain-of-Thought Reasoning** | Provides transparent, step-by-step medical reasoning with `<thinking>` traces |
| ๐ŸŒ **Arabic Medical Support** | Full bilingual capability (English + Arabic) for clinical consultations |
| ๐Ÿ“š **USMLE-Level Knowledge** | Trained on USMLE Step 1/2/3 questions across 21+ medical specialties |

---

## ๐Ÿš€ Quick Start

### Using Transformers

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "360kaUser/Sanad-1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Mediscribe, a clinical diagnostic AI assistant. Analyze the patient presentation and provide a diagnosis with clinical reasoning."},
    {"role": "user", "content": "A 55-year-old male presents with sudden onset crushing chest pain radiating to the left arm, diaphoresis, and shortness of breath. He has a history of hypertension, type 2 diabetes, and smokes 1 pack per day. ECG shows ST elevation in leads II, III, and aVF. Troponin I is elevated at 2.5 ng/mL."}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9, do_sample=True)
response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
print(response)
```

### Using Unsloth (Faster Inference)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "360kaUser/Sanad-1.0",
    max_seq_length=2048,
    load_in_4bit=True,  # Use 4-bit for lower VRAM
)
FastLanguageModel.for_inference(model)

# Same message format as above
```

---

## ๐Ÿ’ฌ Usage Examples

### 1. Clinical Transcription

```python
messages = [
    {"role": "system", "content": "You are Mediscribe, a clinical documentation AI assistant. Given a doctor-patient conversation, generate a comprehensive clinical note."},
    {"role": "user", "content": """Doctor: Good morning, what brings you in today?
Patient: I've been having this terrible headache for the past 3 days. It's mainly on the right side.
Doctor: On a scale of 1-10, how bad is the pain?
Patient: About 7. And I've been feeling nauseous too.
Doctor: Any visual changes? Sensitivity to light?
Patient: Yes, bright lights make it worse.
Doctor: Have you had migraines before?
Patient: My mother gets them, but I've never had one this bad."""}
]
```

**Output:** Generates a structured clinical note with Chief Complaint, HPI (onset, location, severity, associated symptoms), Review of Systems, Family History, Assessment, and Plan.

### 2. Differential Diagnosis

```python
messages = [
    {"role": "system", "content": "You are Mediscribe, a clinical diagnostic AI assistant. Provide a ranked differential diagnosis with reasoning."},
    {"role": "user", "content": "A 30-year-old female presents with fatigue, weight gain of 15 pounds over 3 months, cold intolerance, constipation, and dry skin. Hair thinning and difficulty concentrating. HR 58 bpm, BP 110/70, temp 97.2F."}
]
```

**Output:** Ranked differential including Hypothyroidism (most likely), Depression, Anemia, with clinical reasoning for each and recommended next steps (TSH, Free T4, CBC).

### 3. Arabic Medical Query

```python
messages = [
    {"role": "system", "content": "ุฃู†ุช MediscribeุŒ ู…ุณุงุนุฏ ุฐูƒุงุก ุงุตุทู†ุงุนูŠ ุทุจูŠ. ู‚ู… ุจุชุญู„ูŠู„ ุงู„ุณุคุงู„ ุงู„ุทุจูŠ ูˆุชู‚ุฏูŠู… ุฅุฌุงุจุฉ ุฏู‚ูŠู‚ุฉ."},
    {"role": "user", "content": "ู…ุง ู‡ูŠ ุฃุนุฑุงุถ ู…ุฑุถ ุงู„ุณูƒุฑูŠ ู…ู† ุงู„ู†ูˆุน ุงู„ุซุงู†ูŠุŸ"}
]
```

**Output:** Comprehensive Arabic response covering all Type 2 Diabetes symptoms (ุงู„ุนุทุด ุงู„ุดุฏูŠุฏุŒ ุงู„ุชุจูˆู„ ุงู„ู…ุชูƒุฑุฑุŒ ุงู„ุฌูˆุน ุงู„ู…ูุฑุท, etc.) with medical terminology.

---

## ๐Ÿ—๏ธ Model Architecture

| Component | Detail |
|-----------|--------|
| **Base Model** | [google/medgemma-27b-text-it](https://huggingface.co/google/medgemma-27b-text-it) |
| **Architecture** | Gemma 3 (27B parameters) |
| **Context Window** | 128K tokens |
| **Tensor Type** | BF16 (Brain Float 16) |
| **Format** | Safetensors |
| **Fine-Tuning Method** | QLoRA (4-bit NF4 quantization during training) |
| **LoRA Configuration** | r=32, alpha=64, dropout=0.05 |
| **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |

---

## ๐Ÿ“Š Training Data โ€” 551K Medical Examples

### Stage 1: Clinical Transcription (152,930 train + 8,048 val)

| Dataset | Records | Purpose |
|---------|---------|---------|
| [NoteChat](https://huggingface.co/datasets/akemiH/NoteChat) | 60,000 | Doctor-patient conversation โ†’ clinical note |
| [Asclepius Clinical Notes](https://huggingface.co/datasets/starmpcc/Asclepius-Synthetic-Clinical-Notes) | 30,000 | Clinical note comprehension & QA |
| [Augmented Clinical Notes](https://huggingface.co/datasets/AGBonnet/augmented-clinical-notes) | 60,000 | Conversation โ†’ note + note โ†’ JSON summary |
| [OMI Health SOAP](https://huggingface.co/datasets/omi-health/medical-dialogue-to-soap-summary) | 9,250 | Medical dialogue โ†’ SOAP note |
| [MTS-Dialog](https://github.com/abachaa/MTS-Dialog) | 1,601 | Dialogue โ†’ clinical note sections |
| [ACI-Bench](https://github.com/wyim/aci-bench) | 127 | Ambient clinical intelligence |

### Stage 2: Medical Diagnosis (61,920 train + 3,258 val)

| Dataset | Records | Purpose |
|---------|---------|---------|
| [MedMCQA](https://huggingface.co/datasets/openlifescienceai/medmcqa) | 40,000 | MCQ across 21 specialties with explanations |
| [PMC-Patients](https://huggingface.co/datasets/zhengyun21/PMC-Patients) | 15,000 | Patient case narratives with diagnosis |
| [MedQA USMLE](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options) | 10,178 | USMLE Step 1/2/3 clinical vignettes |

### Stage 3: Differential Diagnosis & Reasoning (129,696 train + 6,826 val)

| Dataset | Records | Purpose |
|---------|---------|---------|
| [ReasonMed](https://huggingface.co/datasets/lingshu-medical-mllm/ReasonMed) | 80,000 | Multi-step medical reasoning chains |
| [MedReason](https://huggingface.co/datasets/UCSC-VLAA/MedReason) | 32,682 | Knowledge-grounded clinical reasoning |
| [Medical O1 Reasoning](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) | 19,704 | Chain-of-thought `<thinking>` traces |
| [DDXPlus](https://huggingface.co/datasets/appier-ai-research/StreamBench) | 3,136 | Symptom โ†’ ranked differential diagnosis |
| [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) | 1,000 | Evidence-based biomedical QA |

### Stage 4: Arabic & General Medical (179,373 train + 9,440 val)

| Dataset | Records | Purpose |
|---------|---------|---------|
| [Arabic Medical QA](https://huggingface.co/datasets/MustafaIbrahim/medical-arabic-qa) | 52,657 | Arabic medical QA (30+ specialties) |
| [Arabic Medical 50K](https://huggingface.co/datasets/MKamil/arabic_medical_50k) | 50,000 | Arabic medical dialogues |
| Existing Clinical Training Data | 86,156 | ChatDoctor, Indian Medical QA, combined clinical data |

---

## ๐ŸŽฏ Training Configuration

### 4-Stage Progressive Fine-Tuning

Training used **decreasing learning rates** across stages to progressively build capabilities while preventing catastrophic forgetting:

```
Stage 1: Transcription    โ†’ LR: 2.0e-4  โ”‚ 152,930 examples โ”‚ Clinical documentation
Stage 2: Diagnosis        โ†’ LR: 1.5e-4  โ”‚  61,920 examples โ”‚ Diagnostic reasoning
Stage 3: DDx & Reasoning  โ†’ LR: 1.0e-4  โ”‚ 129,696 examples โ”‚ Advanced reasoning
Stage 4: Arabic & General โ†’ LR: 5.0e-5  โ”‚ 179,373 examples โ”‚ Bilingual + reinforcement
```

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Batch size | 2 per device |
| Gradient accumulation | 8 steps |
| Effective batch size | 16 |
| Max sequence length | 2,048 tokens |
| LR scheduler | Cosine |
| Warmup ratio | 0.03 |
| Weight decay | 0.01 |
| Max gradient norm | 0.3 |
| Optimizer | Paged AdamW 8-bit |
| Precision | BF16 |
| Packing | Enabled |
| Gradient checkpointing | Unsloth optimized |
| Hardware | NVIDIA A100 40GB |

---

## ๐Ÿ“ˆ Performance

### Evaluation Results

| Category | Quality | Description |
|----------|---------|-------------|
| Clinical Transcription | โœ… High | Structured SOAP notes with CC, HPI, ROS, Assessment, Plan |
| Medical Diagnosis | โœ… High | Systematic analysis with risk factors, ECG interpretation, clinical reasoning |
| Differential Diagnosis | โœ… High | Ranked DDx with probability reasoning and recommended next steps |
| Chain-of-Thought | โœ… High | Transparent `<thinking>` reasoning traces |
| Arabic Medical | โœ… High | Comprehensive Arabic responses with medical terminology |

### Base Model Benchmarks (MedGemma-27B)

| Benchmark | Score |
|-----------|-------|
| MedQA (USMLE) | 87.7% |
| EHRQA | 90.0% |
| Path-VQA | 72.2% |
| AfriMed-QA | 78.8% |

---

## โš•๏ธ Medical Specialties

Trained coverage across **21+ medical specialties**:

<table>
<tr><td>Cardiology</td><td>Neurology</td><td>Pulmonology</td><td>Gastroenterology</td></tr>
<tr><td>Endocrinology</td><td>Nephrology</td><td>Hematology</td><td>Oncology</td></tr>
<tr><td>Ophthalmology</td><td>Dermatology</td><td>Orthopedics</td><td>Pediatrics</td></tr>
<tr><td>OB/GYN</td><td>Psychiatry</td><td>Surgery</td><td>Emergency Medicine</td></tr>
<tr><td>Infectious Disease</td><td>Rheumatology</td><td>Radiology</td><td>Pathology</td></tr>
<tr><td>Pharmacology</td><td>Anatomy</td><td>Biochemistry</td><td>Forensic Medicine</td></tr>
</table>

---

## โš ๏ธ Limitations & Ethical Considerations

### Important Disclaimers

> โš ๏ธ **Sanad-1.0 is an AI assistant designed to support healthcare professionals. It is NOT a replacement for clinical judgment.**

- **Not for self-diagnosis.** Patients should always consult qualified healthcare providers.
- **Training data limitations.** May not represent all populations, conditions, or clinical settings equally.
- **Arabic coverage depth.** Arabic medical capabilities may not fully match English-language depth in all specialties.
- **No real-time data.** Does not access real-time medical literature, drug interaction databases, or patient records.
- **Potential for errors.** Like all AI models, Sanad-1.0 may produce incorrect or incomplete information.

### Intended Use

โœ… Clinical decision support for licensed healthcare professionals
โœ… Medical education and training
โœ… Clinical documentation assistance
โœ… Research and academic applications

### Out-of-Scope Uses

โŒ Direct patient-facing medical advice without physician oversight
โŒ Emergency medical decision-making as sole source
โŒ Legal or forensic medical opinions
โŒ Prescribing medications without physician review

---

## ๐Ÿ“œ Citation

```bibtex
@misc{sanad1-2025,
  title={Sanad-1.0: A Fine-Tuned Clinical AI Model for Medical Diagnosis, Transcription, and Arabic Medical Support},
  author={360kaUser},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/360kaUser/Sanad-1.0},
  note={Fine-tuned from google/medgemma-27b-text-it on 551K medical examples using 4-stage QLoRA}
}
```

---

## ๐Ÿ™ Acknowledgments

- [Google Health AI](https://health.google/) โ€” MedGemma base model
- [Unsloth](https://unsloth.ai/) โ€” Efficient fine-tuning framework
- All dataset creators and contributors listed in the Training Data section
- The open-source medical AI community

---

<p align="center">
  <b>Sanad-1.0</b> โ€” ุณู†ุฏ<br>
  <i>Your AI pillar of clinical support</i>
</p>