| --- |
| license: gpl-3.0 |
| language: |
| - nl |
| base_model: |
| - CLTL/MedRoBERTa.nl |
| pipeline_tag: token-classification |
| tags: |
| - ner |
| - medical |
| --- |
| |
| # Model Card for Cardioner Medroberta.Nl Multilabel |
|
|
|
|
| This a medroberta.nl base model finetuned for span classification. This specific model is |
| the average of the best checkpoints per fold over a ten-fold cross-validation. For this model |
| we used the IOB-tagged. Using the IOB-tagging schema facilitates the aggregation of predictions |
| over sequences. |
|
|
| For the chunking we used paragraph based chunking, and we assumed the maximum context length of the base model, i.e. 512 tokens. |
|
|
| ### Expected input and output |
| The input should be a string with **Dutch** cardio clinical text. |
|
|
| CardioNER_MedRoBERTa.nl_multilabel is a muticlass span classification model. |
| The classes that can be predicted are disease, medication, procedure and symptom. |
|
|
| #### Extracting span classification from CardioNER_MedRoBERTa.nl_multilabel |
|
|
| The following script converts a string of <512 tokens to a list of span predictions. |
| ```python |
| from transformers import pipeline |
| |
| le_pipe = pipeline('ner', |
| model=model, |
| tokenizer=model, aggregation_strategy="simple", |
| device=-1) |
| |
| named_ents = le_pipe(SOME_TEXT) |
| ``` |
|
|
| To process a string of arbitrary length you can split the string into sentences or paragraphs |
| using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe. |
|
|
|
|
|
|
| # Data description |
|
|
| CardioCCC; Manually annotated parallel-language corpus for the clinical cardiology domain |
|
|
| On a 10-fold cross-validations the multilabel metrics are: |
|
|
| | Metric | Mean | Median | Stdev | |
| |---|---|---|---| |
| |---|---|---|---| |
| | eval_f1_B-DISEASE | 0.782 | 0.777 | 0.024 | |
| | eval_f1_B-MEDICATION | 0.898 | 0.905 | 0.036 | |
| | eval_f1_B-PROCEDURE | 0.788 | 0.79 | 0.027 | |
| | eval_f1_B-SYMPTOM | 0.73 | 0.73 | 0.018 | |
| | eval_f1_I-DISEASE | 0.776 | 0.781 | 0.022 | |
| | eval_f1_I-MEDICATION | 0.8 | 0.803 | 0.086 | |
| | eval_f1_I-PROCEDURE | 0.759 | 0.757 | 0.018 | |
| | eval_f1_I-SYMPTOM | 0.725 | 0.723 | 0.017 | |
| | eval_f1_O | 0.935 | 0.936 | 0.005 | |
| | eval_f1_macro | 0.799 | 0.799 | 0.018 | |
| | eval_f1_micro | 0.884 | 0.886 | 0.008 | |
| | eval_loss | 0.095 | 0.092 | 0.01 | |
| | eval_precision_B-DISEASE | 0.784 | 0.774 | 0.029 | |
| | eval_precision_B-MEDICATION | 0.907 | 0.917 | 0.035 | |
| | eval_precision_B-PROCEDURE | 0.791 | 0.795 | 0.031 | |
| | eval_precision_B-SYMPTOM | 0.721 | 0.72 | 0.017 | |
| | eval_precision_I-DISEASE | 0.79 | 0.79 | 0.025 | |
| | eval_precision_I-MEDICATION | 0.835 | 0.863 | 0.075 | |
| | eval_precision_I-PROCEDURE | 0.784 | 0.779 | 0.023 | |
| | eval_precision_I-SYMPTOM | 0.727 | 0.72 | 0.021 | |
| | eval_precision_O | 0.935 | 0.938 | 0.009 | |
| | eval_precision_macro | 0.808 | 0.81 | 0.015 | |
| | eval_precision_micro | 0.888 | 0.889 | 0.008 | |
| | eval_rauc_macro | 0.883 | 0.885 | 0.012 | |
| | eval_rauc_micro | 0.933 | 0.934 | 0.005 | |
| | eval_recall_B-DISEASE | 0.781 | 0.785 | 0.025 | |
| | eval_recall_B-MEDICATION | 0.889 | 0.893 | 0.039 | |
| | eval_recall_B-PROCEDURE | 0.785 | 0.783 | 0.025 | |
| | eval_recall_B-SYMPTOM | 0.739 | 0.74 | 0.023 | |
| | eval_recall_I-DISEASE | 0.763 | 0.774 | 0.028 | |
| | eval_recall_I-MEDICATION | 0.77 | 0.767 | 0.103 | |
| | eval_recall_I-PROCEDURE | 0.735 | 0.744 | 0.025 | |
| | eval_recall_I-SYMPTOM | 0.724 | 0.724 | 0.032 | |
| | eval_recall_O | 0.934 | 0.934 | 0.004 | |
| | eval_recall_macro | 0.791 | 0.795 | 0.022 | |
| | eval_recall_micro | 0.88 | 0.883 | 0.009 | |
| | eval_roc_auc_B-DISEASE | 0.888 | 0.889 | 0.013 | |
| | eval_roc_auc_B-MEDICATION | 0.944 | 0.946 | 0.019 | |
| | eval_roc_auc_B-PROCEDURE | 0.89 | 0.889 | 0.013 | |
| | eval_roc_auc_B-SYMPTOM | 0.866 | 0.867 | 0.011 | |
| | eval_roc_auc_I-DISEASE | 0.873 | 0.879 | 0.014 | |
| | eval_roc_auc_I-MEDICATION | 0.884 | 0.883 | 0.052 | |
| | eval_roc_auc_I-PROCEDURE | 0.862 | 0.866 | 0.013 | |
| | eval_roc_auc_I-SYMPTOM | 0.85 | 0.851 | 0.016 | |
| | eval_roc_auc_O | 0.887 | 0.889 | 0.008 | |
|
|
| # Acknowledgement |
|
|
| This is part of the [DT4H project](https://www.datatools4heart.eu/). |
|
|
| # Doi and reference |
|
|
|
|
|
|
| For more details about training/eval and other scripts, see CardioNER [github repo](https://github.com/DataTools4Heart/CardioNER). |
| and for more information on the background, see Datatools4Heart [Huggingface](https://huggingface.co/DT4H)/[Website](https://www.datatools4heart.eu/) |