Add Hugging Face paper link for improved discoverability (#1)

4d550d7 verified 6 months ago

10.7 kB

	---
	base_model:
	- TachyHealth/Gazal-R1-32B-sft-merged-preview
	datasets:
	- TachyHealth/medical_grpo
	- TachyHealth/structured_medical
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/TachyHealth/Gazal-R1-32B-GRPO-preview/blob/main/LICENSE
	pipeline_tag: text-generation
	tags:
	- gazal-r1
	- grpo
	- qwen3
	- conversational
	- medical
	- clinical
	- healthcare
	- reasoning
	---

	# Gazal-R1-32B: Medical Reasoning Language Model

	The model was presented in the paper [Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training](https://huggingface.co/papers/2506.21594).

	<a href="https://gazal.ai/" target="_blank" style="margin: 0px;">
	<img alt="Gazal AI" src="./logo.png" style=" width: 70%;" />
	</a>


	## Model Highlights

	Gazal-R1 is a state-of-the-art 32-billion-parameter language model specifically designed for medical reasoning and clinical decision-making. Built upon Qwen 3 32B, Gazal-R1 demonstrates that strategic training can enable mid-sized models to outperform significantly larger counterparts in specialized medical domains.

	Key features include:

	- 🔬 Medical Expertise: Specialized training on 107,033 synthetic medical reasoning examples covering diagnostic reasoning, treatment planning, decision-making under uncertainty, and prognostic assessment
	- 🧠 Transparent Reasoning: Structured clinical thinking with step-by-step explanations in `<think></think>` tags, following established clinical reasoning frameworks
	- 📊 State-of-the-Art Performance: Achieves 87.1% on MedQA, 81.6% on MMLU Pro (Medical), and 79.6% on PubMedQA, surpassing models up to 12× larger
	- ⚡ Parameter Efficiency: Advanced training techniques including Weight-Decomposed Low-Rank Adaptation (DoRA) and Rank-Stabilized LoRA (rsLoRA)
	- 🎯 Alignment Optimization: Refined through Group Relative Policy Optimization (GRPO) with sophisticated multi-component reward systems
	- 🌍 Medical Knowledge: Comprehensive understanding across multiple medical specialties and clinical scenarios

	## Model Overview

	Gazal-R1-32B has the following characteristics:
	- Type: Causal Language Model (Medical Reasoning Specialist)
	- Base Model: Qwen 3 32B
	- Training Stages: Two-stage pipeline (Supervised Fine-Tuning + Reinforcement Learning)
	- Number of Parameters: 32.8B
	- Number of Parameters (Non-Embedding): 31.2B
	- Context Length: 32,768 tokens natively, extensible to 131,072 with YaRN
	- Training Data: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason) (32,682 examples)
	- Fine-tuning Method: DoRA + rsLoRA (Parameter-Efficient Fine-Tuning)
	- Alignment: Group Relative Policy Optimization (GRPO)

	For detailed methodology, training insights, and comprehensive evaluation, please refer to our [technical report](https://arxiv.org/abs/2506.21594).

	## Performance Results

	Gazal-R1 achieves exceptional performance across standard medical benchmarks:

	\| Model \| Size \| MMLU Pro (Medical) \| MedMCQA \| MedQA \| PubMedQA \|
	\|-------\|------\|-------------------\|---------\|-------\|----------\|
	\| Gazal-R1 (Final) \| 32B \| 81.6 \| 71.9 \| 87.1 \| 79.6 \|
	\| [Gazal-R1 (SFT-only)](https://huggingface.co/TachyHealth/Gazal-R1-32B-sft-merged-preview) \| 32B \| 79.3 \| 72.3 \| 86.9 \| 77.6 \|
	\| Llama 3.1 405B Instruct \| 405B \| 70.2 \| 75.8 \| 81.9 \| 74.6 \|
	\| Qwen 2.5 72B Instruct \| 72B \| 72.1 \| 66.2 \| 72.7 \| 71.7 \|
	\| Med42-Llama3.1-70B \| 70B \| 66.1 \| 72.4 \| 80.4 \| 77.6 \|
	\| Llama 3.1 70B Instruct \| 70B \| 74.5 \| 72.5 \| 78.4 \| 78.5 \|
	\| QwQ 32B \| 32B \| 70.1 \| 65.6 \| 72.3 \| 73.7 \|
	\| Qwen 3 32B \| 32B \| 78.4 \| 71.6 \| 84.4 \| 76.7 \|

	Key Achievements:
	- 🥇 Highest scores on MMLU Pro (Medical), MedQA, and PubMedQA
	- 📈 Significant improvements from GRPO training (+2.3% on MMLU Pro, +2.0% on PubMedQA)
	- 🚀 Outperforms models up to 12× larger (Llama 3.1 405B) on medical reasoning tasks

	## Quickstart

	### Basic Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "TachyHealth/Gazal-R1-32B-GRPO-preview"

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# Medical reasoning prompt
	prompt = """A 65-year-old male presents with chest pain, shortness of breath, and elevated troponin levels.
	ECG shows ST-segment elevation in leads II, III, and aVF. What is the most likely diagnosis and immediate management?"""

	messages = [
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# Generate response with medical reasoning
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=2048,
	temperature=0.7,
	top_p=0.8,
	top_k=20
	)

	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
	response = tokenizer.decode(output_ids, skip_special_tokens=True)

	print("Medical Assessment:", response)
	```

	### Structured Medical Reasoning Format

	Gazal-R1 is trained to provide structured medical reasoning in the following format:

	```
	<think>
	Step 1: Analyze presenting symptoms - chest pain, dyspnea, troponin elevation
	Step 2: Interpret ECG findings - ST elevation in inferior leads
	Step 3: Consider differential diagnoses - STEMI vs NSTEMI vs unstable angina
	Step 4: Identify culprit vessel - likely RCA given inferior lead changes
	Step 5: Assess urgency - emergent intervention required
	Step 6: Plan immediate management - dual antiplatelet, anticoagulation, cath lab
	</think>

	## Clinical Assessment

	Primary Diagnosis: ST-Elevation Myocardial Infarction (STEMI), inferior wall

	Reasoning: The combination of chest pain, elevated troponin, and ST-elevation in leads II, III, and aVF is pathognomonic for inferior STEMI, likely involving the right coronary artery (RCA).

	Immediate Management:
	1. Reperfusion therapy: Emergency cardiac catheterization with primary PCI
	2. Antiplatelet therapy: Aspirin 325mg + P2Y12 inhibitor (clopidogrel/ticagrelor)
	3. Anticoagulation: Heparin or bivalirudin
	4. Supportive care: O2 if hypoxic, nitroglycerin for pain (avoid if hypotensive)

	Follow-up: Post-PCI monitoring, echocardiogram, cardiac rehabilitation referral
	```

	## Training Methodology

	### Stage 1: Supervised Fine-Tuning (SFT)
	- Dataset: 107,033 synthetic medical reasoning examples + [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason)
	- Techniques: DoRA + rsLoRA with rank 256
	- Focus: Structured clinical reasoning across diagnostic, therapeutic, and prognostic scenarios

	### Stage 2: Group Relative Policy Optimization (GRPO)
	- Algorithm: Value-function-free reinforcement learning
	- Dataset: UltraMedical subset (32K medical MCQs)
	- Rewards: Multi-component system (accuracy, format, length control, repetition penalty)
	- Improvements: Enhanced reasoning quality and format adherence

	## Model Capabilities

	### Clinical Reasoning Types
	1. Diagnostic Reasoning: Systematic symptom analysis → differential diagnosis
	2. Treatment Planning: Evidence-based therapy selection with patient-specific factors
	3. Decision-Making Under Uncertainty: Risk assessment and clinical judgment
	4. Prognostic Assessment: Outcome prediction based on clinical evidence

	### Medical Specialties Covered
	- Internal Medicine
	- Emergency Medicine
	- Cardiology
	- Pulmonology
	- Infectious Disease
	- Pharmacology
	- Pathophysiology
	- Clinical Laboratory Medicine

	## Limitations and Important Disclaimers

	### ⚠️ Critical Safety Information
	- NOT A MEDICAL DEVICE: Gazal-R1 is a research model and is NOT intended for direct clinical use, diagnosis, or treatment planning
	- REQUIRES PROFESSIONAL VERIFICATION: All outputs must be independently verified by qualified medical professionals
	- NO REAL-TIME UPDATES: Knowledge is static and does not reflect the latest medical research or guidelines

	### Technical Limitations
	- Knowledge Cutoff: Training data reflects medical knowledge up to the training date
	- Hallucination Risk: May generate plausible-sounding but factually incorrect information
	- Evaluation Scope: Primarily evaluated on multiple-choice questions; real-world clinical scenarios may differ
	- Regional Bias: Training data may contain geographical or demographic biases

	### Ethical Considerations
	- Professional Responsibility: Final medical decisions must always rest with qualified healthcare providers
	- Accountability: Users assume responsibility for verifying and appropriately applying model outputs
	- Patient Safety: Never use for emergency medical situations or time-critical decisions

	## Use Cases

	### Research and Education
	- Medical education and training
	- Clinical reasoning research
	- Medical knowledge assessment
	- Academic medical writing assistance

	### Professional Support (With Supervision)
	- Literature review assistance
	- Clinical case analysis support
	- Medical documentation aid
	- Differential diagnosis exploration

	### NOT Suitable For
	- Direct patient care
	- Emergency medical decisions
	- Replacing clinical judgment
	- Unsupervised medical advice

	## Citation

	If you find Gazal-R1 helpful in your research, please cite our work:

	```bibtex
	@article{gazal-r1-2025,
	title={Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training},
	author={Ahmed M. Adly and Mostafa Samy and Amr Fawzy},
	journal={arXiv preprint arXiv:2506.21594},
	year={2025},
	url={https://arxiv.org/abs/2506.21594}
	}
	```

	## Model Access

	- Model Weights: Available on Hugging Face Hub
	- Datasets: Training datasets available at [TachyHealth/structured_medical](https://huggingface.co/datasets/TachyHealth/structured_medical) and [TachyHealth/medical_grpo](https://huggingface.co/datasets/TachyHealth/medical_grpo)
	<!-- - Technical Report: [arXiv:2505.09388](https://arxiv.org/abs/2505.09388) -->

	## License

	This model is released under the Apache 2.0 License. Please review the license terms before use.

	## Contact

	For questions about Gazal-R1, please contact:
	- Research Team: TachyHealth
	- Website: [https://tachyhealth.com/](https://tachyhealth.com/)
	- Gazal Platform: [Gazal.ai](https://gazal.ai)

	---

	Developed by TachyHealth Research Team. This model represents a significant advancement in medical AI reasoning while emphasizing the critical importance of professional medical oversight.