--- license: gpl-3.0 datasets: - DT4H/CardioCCC language: - ro - it - nl - es - en - cs - sv base_model: - EuroBERT/EuroBERT-610m pipeline_tag: token-classification tags: - medical --- Trained using [CardioNER](https://github.com/DataTools4Heart/CardioNER). Settings * Optimizer: AdamW * Learning rate: 2e-5, with class-weights * Decay rate: 1e-4 * batch-size: 16 * 10-fold stratified CV, with 10 epochs for the statistics below * Architecture: EuroBERT + 3-layer dense head. * Output: multilabel with probas for DISEASE, MEDICATION, PROCEDURE, SYMPTOM. * Chunking: centered around the span of interest Note: for inference we use ```poetry run python -m cardioner.main --inferency_only..``` [CardioNER](https://github.com/DataTools4Heart/CardioNER) with ```--pipe=dt4h```. # Performance on internal hold-outs Use with caution/not standalone: performance may be considerably less on external datasets. ## Strict | Category | Precision | Recall | F1 | | ---------- | ------------: | ------------: | ------------: | | **Micro** | 0.706 ± 0.009 | 0.583 ± 0.005 | 0.640 ± 0.004 | | **Macro** | 0.745 ± 0.008 | 0.616 ± 0.007 | 0.673 ± 0.008 | | DISEASE | 0.671 ± 0.009 | 0.562 ± 0.007 | 0.611 ± 0.007 | | MEDICATION | 0.909 ± 0.009 | 0.749 ± 0.016 | 0.821 ± 0.009 | | PROCEDURE | 0.734 ± 0.010 | 0.587 ± 0.010 | 0.655 ± 0.007 | | SYMPTOM | 0.666 ± 0.009 | 0.566 ± 0.009 | 0.611 ± 0.007 | ## Relaxed | Category | Precision | Recall | F1 | | ---------- | ------------: | ------------: | ------------: | | **Micro** | 0.876 ± 0.009 | 0.728 ± 0.007 | 0.796 ± 0.005 | | **Macro** | 0.894 ± 0.007 | 0.738 ± 0.009 | 0.809 ± 0.005 | | DISEASE | 0.872 ± 0.010 | 0.733 ± 0.012 | 0.794 ± 0.007 | | MEDICATION | 0.955 ± 0.008 | 0.787 ± 0.016 | 0.861 ± 0.011 | | PROCEDURE | 0.898 ± 0.009 | 0.717 ± 0.010 | 0.797 ± 0.006 | | SYMPTOM | 0.846 ± 0.009 | 0.720 ± 0.011 | 0.779 ± 0.007 |