UMCU
/

CardioNER.nl_128

Token Classification

lexical semantic

span classification

Model card Files Files and versions

UMCU commited on Jun 26, 2025

Commit

bf1bed2

·

verified ·

1 Parent(s): 04e7273

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+id: CardioNER.nl_128xtokenWindow
+name: CardioNER.nl_128xtokenWindow
+description: CardioBERTa.nl_clinical finetuned for multilabel NER task with tokenwindow
+  of 128
+license: gpl-3.0
+language: nl
+tags:
+- lexical semantic
+- span classification
+- science
+- biology
+- clinical ner
+- biomedical
+- ner,medical
+- bionlp
+base_model: UMCU/CardioBERTa.nl_clinical
+pipeline_tag: token-classification
+---
+# Model Card for Cardioner.Nl 128Xtokenwindow
+This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
+we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
+over sequences. This specific model is trained on a batch of 240 span-labeled documents.
+### Expected input and output
+The input should be a string with **Dutch** cardio clinical text.
+CardioNER.nl_128xtokenWindow is a muticlass span classification model.
+The classes that can be predicted are ['procedure,medication,diseasae,symptom'].
+#### Extracting span classification from CardioNER.nl_128xtokenWindow
+The following script converts a string of <512 tokens to a list of span predictions.
+```python
+from transformers import pipeline
+le_pipe = pipeline('ner',
+                    model=model,
+                    tokenizer=model, aggregation_strategy="simple",
+                    device=-1)
+named_ents = le_pipe(SOME_TEXT)
+```
+To process a string of arbitrary length you can split the string into sentences or paragraphs
+using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
+You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
+```python
+named_ents = le_pipe(SOME_TEXT, stride=256)
+```
+# Data description
+CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
+# Acknowledgement
+This is part of the [DT4H project](https://www.datatools4heart.eu/).
+# Doi and reference
+For more details about training/eval and other scripts, see CardioNER [github repo](https://github.com/DataTools4Heart/CardioNER).
+and for more information on the background, see Datatools4Heart [Huggingface](https://huggingface.co/DT4H)/[Website](https://www.datatools4heart.eu/)