| --- |
| language: ar |
| tags: |
| - text-classification |
| - xlm-roberta |
| - arabic |
| - healthcare |
| - hierarchical |
| - multi-label |
| base_model: FacebookAI/xlm-roberta-base |
| datasets: |
| - perfectPresentation/phc-dataset |
| --- |
| |
| # PHC Multi-Label Hierarchical Classifier (v4) |
|
|
| Fine-tuned **XLM-RoBERTa-base** on Arabic healthcare patient complaints. |
| Predicts a **4-level PHC taxonomy code** (Multi-Label) with confidence at each level. |
|
|
| > Output heads are sized to the **full PHC taxonomy (117 codes)**. Of these, 90 have training examples and 27 are zero-shot from the taxonomy structure only. |
|
|
| ## Taxonomy Structure |
|
|
| ``` |
| PHC -> L2 (service_area, 7) -> L3 (category, 21+) -> L4 (001/002/003) |
| ``` |
|
|
| | L2 — Service Area | EMD, IPS, LAB, OPC, PHA, RAD, REC | |
| | L3 — Category | ALT, APN, CDR, COM, DAV, DIC, EMS, ENV, EPS, EQU, FAC, HSK, INS, MAC, MAS, MBR, PCC, PED, PPD, PRE, QOI, QUE, REG, SAF, SCH, SRT, SYS, TRA, TRI, TRT, TTI, VOI, WAI | |
| | Full Code | 117 taxonomy codes | |
|
|
| ## Test Performance |
|
|
| | Level | Exact Match Acc | F1 (macro) | |
| |-------|-----------------|------------| |
| | L2 | 96.0% | 0.9779 | |
| | L3 | 87.2% | 0.5307 | |
| | L4 | 93.6% | 0.6026 | |
| | Full Code | 82.7% | 0.2438 | |
|
|
| > Best val full-code exact match accuracy during training: **85.2%** |
|
|