File size: 10,721 Bytes
2ad4762
4d550d7
 
2ad4762
 
76ba9ea
2ad4762
4d550d7
 
 
2ad4762
 
 
 
 
 
 
 
 
 
 
 
 
4d550d7
 
d36939d
 
2ad4762
 
d36939d
2ad4762
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7f6e762
2ad4762
 
 
ad5778a
2ad4762
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7f6e762
2ad4762
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ad5778a
2ad4762
 
 
 
 
 
 
ad5778a
2ad4762
ad5778a
2ad4762
ad5778a
2ad4762
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d36939d
2ad4762
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
---
base_model:
- TachyHealth/Gazal-R1-32B-sft-merged-preview
datasets:
- TachyHealth/medical_grpo
- TachyHealth/structured_medical
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE
pipeline_tag: text-generation
tags:
- gazal-r1
- grpo
- qwen3
- conversational
- medical
- clinical
- healthcare
- reasoning
---

# Gazal-R1-32B: Medical Reasoning Language Model

The model was presented in the paper [Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training](https://huggingface.co/papers/2506.21594).

<a href="https://gazal.ai/" target="_blank" style="margin: 0px;">
    <img alt="Gazal AI" src="./logo.png" style=" width: 70%;" />
</a>


## Model Highlights

Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains.

Key features include:

- **πŸ”¬ Medical Expertise**: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment
- **🧠 Transparent Reasoning**: Structured clinical thinking with step-by-step explanations in `<think></think>` tags, following established clinical reasoning frameworks
- **πŸ“Š State-of-the-Art Performance**: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12Γ— larger
- **⚑ Parameter Efficiency**: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA)
- **🎯 Alignment Optimization**: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems
- **🌍 Medical Knowledge**: Comprehensive understanding across multiple medical specialties and clinical scenarios

## Model Overview

**Gazal-R1-32B** has the following characteristics:
- **Type**: Causal Language Model (Medical Reasoning Specialist)
- **Base Model**: Qwen 3 32B
- **Training Stages**: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning)
- **Number of Parameters**: 32.8B
- **Number of Parameters (Non-Embedding)**: 31.2B
- **Context Length**: 32,768 tokens natively, extensible to 131,072 with YaRN
- **Training Data**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) (32,682 examples)
- **Fine-tuning Method**: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning)
- **Alignment**: Group Relative Policy Optimization (GRPO)

For detailed methodology, training insights, and comprehensive evaluation, please refer to our [technical report](https://arxiv.org/abs/2506.21594).

## Performance Results

Gazal-R1 achieves exceptional performance across standard medical benchmarks:

| Model | Size | MMLU Pro (Medical) | MedMCQA | MedQA | PubMedQA |
|-------|------|-------------------|---------|-------|----------|
| **Gazal-R1 (Final)** | **32B** | **81.6** | **71.9** | **87.1** | **79.6** |
| [Gazal-R1 (SFT-only)](https://huggingface.co/TachyHealth/Gazal-R1-32B-sft-merged-preview) | 32B | 79.3 | 72.3 | 86.9 | 77.6 |
| Llama 3.1 405B Instruct | 405B | 70.2 | 75.8 | 81.9 | 74.6 |
| Qwen 2.5 72B Instruct | 72B | 72.1 | 66.2 | 72.7 | 71.7 |
| Med42-Llama3.1-70B | 70B | 66.1 | 72.4 | 80.4 | 77.6 |
| Llama 3.1 70B Instruct | 70B | 74.5 | 72.5 | 78.4 | 78.5 |
| QwQ 32B | 32B | 70.1 | 65.6 | 72.3 | 73.7 |
| Qwen 3 32B | 32B | 78.4 | 71.6 | 84.4 | 76.7 |

**Key Achievements:**
- πŸ₯‡ Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA
- πŸ“ˆ Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA)
- πŸš€ Outperforms models up to 12Γ— larger (Llama 3.1 405B) on medical reasoning tasks

## Quickstart

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview"

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Medical reasoning prompt
prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. 
ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?"""

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate response with medical reasoning
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048,
    temperature=0.7,
    top_p=0.8,
    top_k=20
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)

print("Medical Assessment:", response)
```

### Structured Medical Reasoning Format

Gazal-R1 is trained to provide structured medical reasoning in the following format:

```
<think>
Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation
Step 2: Interpret ECG findings - ST elevation in inferior leads
Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina
Step 4: Identify culprit vessel - likely RCA given inferior lead changes
Step 5: Assess urgency - emergent intervention required
Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab
</think>

## Clinical Assessment

**Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall

**Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA).

**Immediate Management**:
1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI
2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor)
3. **Anticoagulation**: Heparin or bivalirudin
4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive)

**Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral
```

## Training Methodology

### Stage 1: Supervised Fine-Tuning (SFT)
- **Dataset**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason)
- **Techniques**: DoRA + rsLoRA with rank 256
- **Focus**: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios

### Stage 2: Group Relative Policy Optimization (GRPO)
- **Algorithm**: Value-function-free reinforcement learning
- **Dataset**: UltraMedical subset (32K medical MCQs)
- **Rewards**: Multi-component system (accuracy, format, length control, repetition penalty)
- **Improvements**: Enhanced reasoning quality and format adherence

## Model Capabilities

### Clinical Reasoning Types
1. **Diagnostic Reasoning**: Systematic symptom analysis β†’ differential diagnosis
2. **Treatment Planning**: Evidence-based therapy selection with patient-specific factors
3. **Decision-Making Under Uncertainty**: Risk assessment and clinical judgment
4. **Prognostic Assessment**: Outcome prediction based on clinical evidence

### Medical Specialties Covered
- Internal Medicine
- Emergency Medicine  
- Cardiology
- Pulmonology
- Infectious Disease
- Pharmacology
- Pathophysiology
- Clinical Laboratory Medicine

## Limitations and Important Disclaimers

### ⚠️ Critical Safety Information
- **NOT A MEDICAL DEVICE**: Gazal-R1 is a research model and is **NOT** intended for direct clinical use, diagnosis, or treatment planning
- **REQUIRES PROFESSIONAL VERIFICATION**: All outputs must be independently verified by qualified medical professionals
- **NO REAL-TIME UPDATES**: Knowledge is static and does not reflect the latest medical research or guidelines

### Technical Limitations
- **Knowledge Cutoff**: Training data reflects medical knowledge up to the training date
- **Hallucination Risk**: May generate plausible-sounding but factually incorrect information
- **Evaluation Scope**: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ
- **Regional Bias**: Training data may contain geographical or demographic biases

### Ethical Considerations
- **Professional Responsibility**: Final medical decisions must always rest with qualified healthcare providers
- **Accountability**: Users assume responsibility for verifying and appropriately applying model outputs
- **Patient Safety**: Never use for emergency medical situations or time-critical decisions

## Use Cases

### Research and Education
- Medical education and training
- Clinical reasoning research
- Medical knowledge assessment
- Academic medical writing assistance

### Professional Support (With Supervision)
- Literature review assistance
- Clinical case analysis support
- Medical documentation aid
- Differential diagnosis exploration

### NOT Suitable For
- Direct patient care
- Emergency medical decisions
- Replacing clinical judgment
- Unsupervised medical advice

## Citation

If you find Gazal-R1 helpful in your research, please cite our work:

```bibtex
@article{gazal-r1-2025,
    title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training},
    author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy},
    journal={arXiv preprint arXiv:2506.21594},
    year={2025},
    url={https://arxiv.org/abs/2506.21594}
}
```

## Model Access

- **Model Weights**: Available on Hugging Face Hub
- **Datasets**: Training datasets available at [TachyHealth/structured_medical](https://huggingface.co/datasets/TachyHealth/structured_medical) and [TachyHealth/medical_grpo](https://huggingface.co/datasets/TachyHealth/medical_grpo)
<!-- - **Technical Report**: [arXiv:2505.09388](https://arxiv.org/abs/2505.09388) -->

## License

This model is released under the Apache 2.0 License. Please review the license terms before use.

## Contact

For questions about Gazal-R1, please contact:
- **Research Team**: TachyHealth
- **Website**: [https://tachyhealth.com/](https://tachyhealth.com/)
- **Gazal Platform**: [Gazal.ai](https://gazal.ai)

---

*Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.*