File size: 1,246 Bytes
659b2f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
language: ar
tags:
  - text-classification
  - xlm-roberta
  - arabic
  - healthcare
  - hierarchical
  - multi-label
base_model: FacebookAI/xlm-roberta-base
datasets:
  - perfectPresentation/phc-dataset
---

# PHC Multi-Label Hierarchical Classifier (v4)

Fine-tuned **XLM-RoBERTa-base** on Arabic healthcare patient complaints.
Predicts a **4-level PHC taxonomy code** (Multi-Label) with confidence at each level.

> Output heads are sized to the **full PHC taxonomy (117 codes)**. Of these, 90 have training examples and 27 are zero-shot from the taxonomy structure only.

## Taxonomy Structure

```
PHC  ->  L2 (service_area, 7)  ->  L3 (category, 21+)  ->  L4 (001/002/003)
```

| L2 — Service Area | EMD, IPS, LAB, OPC, PHA, RAD, REC |
| L3 — Category | ALT, APN, CDR, COM, DAV, DIC, EMS, ENV, EPS, EQU, FAC, HSK, INS, MAC, MAS, MBR, PCC, PED, PPD, PRE, QOI, QUE, REG, SAF, SCH, SRT, SYS, TRA, TRI, TRT, TTI, VOI, WAI |
| Full Code | 117 taxonomy codes |

## Test Performance

| Level | Exact Match Acc | F1 (macro) |
|-------|-----------------|------------|
| L2 | 96.0% | 0.9779 |
| L3 | 87.2% | 0.5307 |
| L4 | 93.6% | 0.6026 |
| Full Code | 82.7% | 0.2438 |

> Best val full-code exact match accuracy during training: **85.2%**