UMCU
/

CardioNER.nl_128

Token Classification

lexical semantic

span classification

Model card Files Files and versions

UMCU commited on Jun 26, 2025

Commit

9e78623

·

verified ·

1 Parent(s): bf1bed2

Update README.md

Files changed (1) hide show

README.md +6 -8

README.md CHANGED Viewed

@@ -22,20 +22,19 @@ pipeline_tag: token-classification
 This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
 we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
-over sequences. This specific model is trained on a batch of 240 span-labeled documents.
 ### Expected input and output
-The input should be a string with **Dutch** cardio clinical text.
-CardioNER.nl_128xtokenWindow is a muticlass span classification model.
-The classes that can be predicted are ['procedure,medication,diseasae,symptom'].
 #### Extracting span classification from CardioNER.nl_128xtokenWindow
-The following script converts a string of <512 tokens to a list of span predictions.
 ```python
 from transformers import pipeline
@@ -47,7 +46,7 @@ le_pipe = pipeline('ner',
 named_ents = le_pipe(SOME_TEXT)
 ```
-To process a string of arbitrary length you can split the string into sentences or paragraphs
 using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
 You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
 ```python
@@ -56,7 +55,6 @@ named_ents = le_pipe(SOME_TEXT, stride=256)
 # Data description
 CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom

 This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
 we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
+over sequences. This specific model is trained on a batch of about 500 span-labeled documents.
 ### Expected input and output
+The input should be a string with **Dutch** clinical text related to **cardiology**
+CardioNER.nl_128xtokenWindow is a multiclass span classification model.
+The classes that can be predicted are ['procedure,medication,disease,symptom'].
 #### Extracting span classification from CardioNER.nl_128xtokenWindow
+The following script converts a string of <128 tokens to a list of span predictions.
 ```python
 from transformers import pipeline
 named_ents = le_pipe(SOME_TEXT)
 ```
+To process a string of *arbitrary length* you can split the string into sentences or paragraphs
 using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
 You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
 ```python
 # Data description
 CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom