| --- |
| base_model: |
| - TachyHealth/Gazal-R1-32B-sft-merged-preview |
| datasets: |
| - TachyHealth/medical_grpo |
| - TachyHealth/structured_medical |
| library_name: transformers |
| license: apache-2.0 |
| license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE |
| pipeline_tag: text-generation |
| tags: |
| - gazal-r1 |
| - grpo |
| - qwen3 |
| - conversational |
| - medical |
| - clinical |
| - healthcare |
| - reasoning |
| --- |
| |
| # Gazal-R1-32B: Medical Reasoning Language Model |
|
|
| The model was presented in the paper [Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training](https://huggingface.co/papers/2506.21594). |
|
|
| <a href="https://gazal.ai/" target="_blank" style="margin: 0px;"> |
| <img alt="Gazal AI" src="./logo.png" style=" width: 70%;" /> |
| </a> |
| |
|
|
| ## Model Highlights |
|
|
| Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains. |
|
|
| Key features include: |
|
|
| - **🔬 Medical Expertise**: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment |
| - **🧠 Transparent Reasoning**: Structured clinical thinking with step-by-step explanations in `<think></think>` tags, following established clinical reasoning frameworks |
| - **📊 State-of-the-Art Performance**: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12× larger |
| - **⚡ Parameter Efficiency**: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA) |
| - **🎯 Alignment Optimization**: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems |
| - **🌍 Medical Knowledge**: Comprehensive understanding across multiple medical specialties and clinical scenarios |
|
|
| ## Model Overview |
|
|
| **Gazal-R1-32B** has the following characteristics: |
| - **Type**: Causal Language Model (Medical Reasoning Specialist) |
| - **Base Model**: Qwen 3 32B |
| - **Training Stages**: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning) |
| - **Number of Parameters**: 32.8B |
| - **Number of Parameters (Non-Embedding)**: 31.2B |
| - **Context Length**: 32,768 tokens natively, extensible to 131,072 with YaRN |
| - **Training Data**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) (32,682 examples) |
| - **Fine-tuning Method**: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning) |
| - **Alignment**: Group Relative Policy Optimization (GRPO) |
|
|
| For detailed methodology, training insights, and comprehensive evaluation, please refer to our [technical report](https://arxiv.org/abs/2506.21594). |
|
|
| ## Performance Results |
|
|
| Gazal-R1 achieves exceptional performance across standard medical benchmarks: |
|
|
| | Model | Size | MMLU Pro (Medical) | MedMCQA | MedQA | PubMedQA | |
| |-------|------|-------------------|---------|-------|----------| |
| | **Gazal-R1 (Final)** | **32B** | **81.6** | **71.9** | **87.1** | **79.6** | |
| | [Gazal-R1 (SFT-only)](https://huggingface.co/TachyHealth/Gazal-R1-32B-sft-merged-preview) | 32B | 79.3 | 72.3 | 86.9 | 77.6 | |
| | Llama 3.1 405B Instruct | 405B | 70.2 | 75.8 | 81.9 | 74.6 | |
| | Qwen 2.5 72B Instruct | 72B | 72.1 | 66.2 | 72.7 | 71.7 | |
| | Med42-Llama3.1-70B | 70B | 66.1 | 72.4 | 80.4 | 77.6 | |
| | Llama 3.1 70B Instruct | 70B | 74.5 | 72.5 | 78.4 | 78.5 | |
| | QwQ 32B | 32B | 70.1 | 65.6 | 72.3 | 73.7 | |
| | Qwen 3 32B | 32B | 78.4 | 71.6 | 84.4 | 76.7 | |
|
|
| **Key Achievements:** |
| - 🥇 Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA |
| - 📈 Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA) |
| - 🚀 Outperforms models up to 12× larger (Llama 3.1 405B) on medical reasoning tasks |
|
|
| ## Quickstart |
|
|
| ### Basic Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview" |
| |
| # Load the tokenizer and model |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_name, |
| torch_dtype="auto", |
| device_map="auto" |
| ) |
| |
| # Medical reasoning prompt |
| prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels. |
| ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?""" |
| |
| messages = [ |
| {"role": "user", "content": prompt} |
| ] |
| |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True |
| ) |
| |
| model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| |
| # Generate response with medical reasoning |
| generated_ids = model.generate( |
| **model_inputs, |
| max_new_tokens=2048, |
| temperature=0.7, |
| top_p=0.8, |
| top_k=20 |
| ) |
| |
| output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() |
| response = tokenizer.decode(output_ids, skip_special_tokens=True) |
| |
| print("Medical Assessment:", response) |
| ``` |
|
|
| ### Structured Medical Reasoning Format |
|
|
| Gazal-R1 is trained to provide structured medical reasoning in the following format: |
|
|
| ``` |
| <think> |
| Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation |
| Step 2: Interpret ECG findings - ST elevation in inferior leads |
| Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina |
| Step 4: Identify culprit vessel - likely RCA given inferior lead changes |
| Step 5: Assess urgency - emergent intervention required |
| Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab |
| </think> |
| |
| ## Clinical Assessment |
| |
| **Primary Diagnosis**: ST-Elevation Myocardial Infarction (STEMI), inferior wall |
| |
| **Reasoning**: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA). |
| |
| **Immediate Management**: |
| 1. **Reperfusion therapy**: Emergency cardiac catheterization with primary PCI |
| 2. **Antiplatelet therapy**: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor) |
| 3. **Anticoagulation**: Heparin or bivalirudin |
| 4. **Supportive care**: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive) |
| |
| **Follow-up**: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral |
| ``` |
|
|
| ## Training Methodology |
|
|
| ### Stage 1: Supervised Fine-Tuning (SFT) |
| - **Dataset**: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) |
| - **Techniques**: DoRA + rsLoRA with rank 256 |
| - **Focus**: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios |
|
|
| ### Stage 2: Group Relative Policy Optimization (GRPO) |
| - **Algorithm**: Value-function-free reinforcement learning |
| - **Dataset**: UltraMedical subset (32K medical MCQs) |
| - **Rewards**: Multi-component system (accuracy, format, length control, repetition penalty) |
| - **Improvements**: Enhanced reasoning quality and format adherence |
|
|
| ## Model Capabilities |
|
|
| ### Clinical Reasoning Types |
| 1. **Diagnostic Reasoning**: Systematic symptom analysis → differential diagnosis |
| 2. **Treatment Planning**: Evidence-based therapy selection with patient-specific factors |
| 3. **Decision-Making Under Uncertainty**: Risk assessment and clinical judgment |
| 4. **Prognostic Assessment**: Outcome prediction based on clinical evidence |
|
|
| ### Medical Specialties Covered |
| - Internal Medicine |
| - Emergency Medicine |
| - Cardiology |
| - Pulmonology |
| - Infectious Disease |
| - Pharmacology |
| - Pathophysiology |
| - Clinical Laboratory Medicine |
|
|
| ## Limitations and Important Disclaimers |
|
|
| ### ⚠️ Critical Safety Information |
| - **NOT A MEDICAL DEVICE**: Gazal-R1 is a research model and is **NOT** intended for direct clinical use, diagnosis, or treatment planning |
| - **REQUIRES PROFESSIONAL VERIFICATION**: All outputs must be independently verified by qualified medical professionals |
| - **NO REAL-TIME UPDATES**: Knowledge is static and does not reflect the latest medical research or guidelines |
|
|
| ### Technical Limitations |
| - **Knowledge Cutoff**: Training data reflects medical knowledge up to the training date |
| - **Hallucination Risk**: May generate plausible-sounding but factually incorrect information |
| - **Evaluation Scope**: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ |
| - **Regional Bias**: Training data may contain geographical or demographic biases |
|
|
| ### Ethical Considerations |
| - **Professional Responsibility**: Final medical decisions must always rest with qualified healthcare providers |
| - **Accountability**: Users assume responsibility for verifying and appropriately applying model outputs |
| - **Patient Safety**: Never use for emergency medical situations or time-critical decisions |
|
|
| ## Use Cases |
|
|
| ### Research and Education |
| - Medical education and training |
| - Clinical reasoning research |
| - Medical knowledge assessment |
| - Academic medical writing assistance |
|
|
| ### Professional Support (With Supervision) |
| - Literature review assistance |
| - Clinical case analysis support |
| - Medical documentation aid |
| - Differential diagnosis exploration |
|
|
| ### NOT Suitable For |
| - Direct patient care |
| - Emergency medical decisions |
| - Replacing clinical judgment |
| - Unsupervised medical advice |
|
|
| ## Citation |
|
|
| If you find Gazal-R1 helpful in your research, please cite our work: |
|
|
| ```bibtex |
| @article{gazal-r1-2025, |
| title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training}, |
| author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy}, |
| journal={arXiv preprint arXiv:2506.21594}, |
| year={2025}, |
| url={https://arxiv.org/abs/2506.21594} |
| } |
| ``` |
|
|
| ## Model Access |
|
|
| - **Model Weights**: Available on Hugging Face Hub |
| - **Datasets**: Training datasets available at [TachyHealth/structured_medical](https://huggingface.co/datasets/TachyHealth/structured_medical) and [TachyHealth/medical_grpo](https://huggingface.co/datasets/TachyHealth/medical_grpo) |
| <!-- - **Technical Report**: [arXiv:2505.09388](https://arxiv.org/abs/2505.09388) --> |
|
|
| ## License |
|
|
| This model is released under the Apache 2.0 License. Please review the license terms before use. |
|
|
| ## Contact |
|
|
| For questions about Gazal-R1, please contact: |
| - **Research Team**: TachyHealth |
| - **Website**: [https://tachyhealth.com/](https://tachyhealth.com/) |
| - **Gazal Platform**: [Gazal.ai](https://gazal.ai) |
|
|
| --- |
|
|
| *Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.* |