File size: 4,362 Bytes
335708f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: other
license_name: health-ai-developer-foundations
license_link: https://developers.google.com/health-ai-developer-foundations/terms
base_model: google/medgemma-27b-text-it
tags:
- medical
- healthcare
- maternal-health
- sexual-health
- reproductive-health
- multilingual
- african-languages
- akan
- amharic
- luganda
- swahili
- lora
- peft
- medgemma
language:
- en
- am
- sw
- lg
- ak
library_name: peft
pipeline_tag: text-generation
---

# MedGemma 27B - Maternal, Sexual & Reproductive Health Oracle for African Languages

Fine-tuned Google MedGemma 27B Text for the Zindi ITU Multilingual Health QA Challenge.

Specialized in answering Maternal, Sexual, and Reproductive Health (MSRH) questions in:
- Akan (Twi/Fante from Ghana)
- Amharic (Ethiopia)
- Luganda (Uganda)
- Swahili (Kenya)
- English (Ethiopia, Ghana, Kenya, Uganda)

## Model Description

LoRA adapter for google/medgemma-27b-text-it, fine-tuned on 29,815 multilingual medical Q&A samples across 8 language-region pairs.

### Training Details

- Base model: google/medgemma-27b-text-it (27B params, medical text-only)
- Training method: QLoRA (4-bit quantization + LoRA)
- LoRA config: r=8, alpha=16, attention-only modules
- Trainable params: 16.7M (0.21% of total)
- Training data: 29,815 multilingual medical Q&A samples
- Optimizer: AdamW fused, lr=3e-5, linear warmup 5%
- Hardware: NVIDIA A40 (48GB VRAM)
- Final eval_loss: 1.39

### Loss Trajectory

| Step | eval_loss |
|------|-----------|
| 600  | 1.69 |
| 900  | 1.58 |
| 1200 | 1.50 |
| 1500 | 1.45 |
| 1800 | 1.42 |
| 1864 | 1.39 (best) |

## Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "google/medgemma-27b-text-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    attn_implementation="eager",
    quantization_config=quantization_config,
)

model = PeftModel.from_pretrained(base_model, "KYAGABA/medgemma-27b-msrh-african-oracle")
model.eval()

tokenizer = AutoTokenizer.from_pretrained("KYAGABA/medgemma-27b-msrh-african-oracle")

# Example
question = "How can young people access reproductive health services?"
language = "English"

prompt_text = f"Answer this question in {language} about maternal, sexual, and reproductive health: {question}"
messages = [{"role": "user", "content": prompt_text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=400,
        do_sample=False,
        num_beams=3,
        repetition_penalty=1.1,
    )

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
```

## Dataset

Trained on the Zindi ITU Multilingual Health QA Challenge dataset:

| Subset | Samples | Language | Region |
|--------|---------|----------|--------|
| Eng_Uga | 7,624 | English | Uganda |
| Aka_Gha | 4,455 | Akan | Ghana |
| Eng_Gha | 4,443 | English | Ghana |
| Eng_Eth | 3,915 | English | Ethiopia |
| Lug_Uga | 3,383 | Luganda | Uganda |
| Eng_Ken | 2,080 | English | Kenya |
| Swa_Ken | 2,070 | Swahili | Kenya |
| Amh_Eth | 1,845 | Amharic | Ethiopia |

## Intended Use

For research and educational purposes to support healthcare information access in African languages. NOT for direct clinical use. Always consult qualified healthcare professionals.

## Limitations

- May add English preamble at start of responses
- Lower quality for Akan compared to English (less training data)
- Trained for ~1.13 epochs only (compute constraints)
- Best for MSRH topics

## Citation

```
@misc{medgemma27b-msrh-africa,
  author = {KYAGABA, Arul},
  title = {MedGemma 27B - MSRH African Oracle},
  year = {2026},
  publisher = {HuggingFace},
  howpublished = {https://huggingface.co/KYAGABA/medgemma-27b-msrh-african-oracle}
}
```

## Acknowledgements

- Google for MedGemma 27B base model
- Zindi and ITU for the multilingual health QA challenge
- AfriMed-QA community for advancing African medical AI