UMCU's picture
Update README.md
9b30a5e verified
---
license: gpl-3.0
datasets:
- DT4H/CardioCCC
language:
- ro
- it
- nl
- es
- en
- cs
- sv
base_model:
- EuroBERT/EuroBERT-610m
pipeline_tag: token-classification
tags:
- medical
---
Trained using [CardioNER](https://github.com/DataTools4Heart/CardioNER).
Settings
* Optimizer: AdamW
* Learning rate: 2e-5, with class-weights
* Decay rate: 1e-4
* batch-size: 16
* 10-fold stratified CV, with 10 epochs for the statistics below
* Architecture: EuroBERT + 3-layer dense head.
* Output: multilabel with probas for DISEASE, MEDICATION, PROCEDURE, SYMPTOM.
* Chunking: centered around the span of interest
Note: for inference we use ```poetry run python -m cardioner.main --inferency_only..``` [CardioNER](https://github.com/DataTools4Heart/CardioNER) with ```--pipe=dt4h```.
# Performance on internal hold-outs
Use with caution/not standalone: performance may be considerably less on external datasets.
## Strict
| Category | Precision | Recall | F1 |
| ---------- | ------------: | ------------: | ------------: |
| **Micro** | 0.706 ± 0.009 | 0.583 ± 0.005 | 0.640 ± 0.004 |
| **Macro** | 0.745 ± 0.008 | 0.616 ± 0.007 | 0.673 ± 0.008 |
| DISEASE | 0.671 ± 0.009 | 0.562 ± 0.007 | 0.611 ± 0.007 |
| MEDICATION | 0.909 ± 0.009 | 0.749 ± 0.016 | 0.821 ± 0.009 |
| PROCEDURE | 0.734 ± 0.010 | 0.587 ± 0.010 | 0.655 ± 0.007 |
| SYMPTOM | 0.666 ± 0.009 | 0.566 ± 0.009 | 0.611 ± 0.007 |
## Relaxed
| Category | Precision | Recall | F1 |
| ---------- | ------------: | ------------: | ------------: |
| **Micro** | 0.876 ± 0.009 | 0.728 ± 0.007 | 0.796 ± 0.005 |
| **Macro** | 0.894 ± 0.007 | 0.738 ± 0.009 | 0.809 ± 0.005 |
| DISEASE | 0.872 ± 0.010 | 0.733 ± 0.012 | 0.794 ± 0.007 |
| MEDICATION | 0.955 ± 0.008 | 0.787 ± 0.016 | 0.861 ± 0.011 |
| PROCEDURE | 0.898 ± 0.009 | 0.717 ± 0.010 | 0.797 ± 0.006 |
| SYMPTOM | 0.846 ± 0.009 | 0.720 ± 0.011 | 0.779 ± 0.007 |