UMCU commited on
Commit
bf1bed2
·
verified ·
1 Parent(s): 04e7273

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ id: CardioNER.nl_128xtokenWindow
3
+ name: CardioNER.nl_128xtokenWindow
4
+ description: CardioBERTa.nl_clinical finetuned for multilabel NER task with tokenwindow
5
+ of 128
6
+ license: gpl-3.0
7
+ language: nl
8
+ tags:
9
+ - lexical semantic
10
+ - span classification
11
+ - science
12
+ - biology
13
+ - clinical ner
14
+ - biomedical
15
+ - ner,medical
16
+ - bionlp
17
+ base_model: UMCU/CardioBERTa.nl_clinical
18
+ pipeline_tag: token-classification
19
+ ---
20
+
21
+ # Model Card for Cardioner.Nl 128Xtokenwindow
22
+
23
+
24
+
25
+
26
+ This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
27
+ we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
28
+ over sequences. This specific model is trained on a batch of 240 span-labeled documents.
29
+
30
+ ### Expected input and output
31
+ The input should be a string with **Dutch** cardio clinical text.
32
+
33
+ CardioNER.nl_128xtokenWindow is a muticlass span classification model.
34
+ The classes that can be predicted are ['procedure,medication,diseasae,symptom'].
35
+
36
+ #### Extracting span classification from CardioNER.nl_128xtokenWindow
37
+
38
+ The following script converts a string of <512 tokens to a list of span predictions.
39
+ ```python
40
+ from transformers import pipeline
41
+
42
+ le_pipe = pipeline('ner',
43
+ model=model,
44
+ tokenizer=model, aggregation_strategy="simple",
45
+ device=-1)
46
+
47
+ named_ents = le_pipe(SOME_TEXT)
48
+ ```
49
+
50
+ To process a string of arbitrary length you can split the string into sentences or paragraphs
51
+ using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
52
+ You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
53
+ ```python
54
+ named_ents = le_pipe(SOME_TEXT, stride=256)
55
+ ```
56
+
57
+
58
+
59
+
60
+ # Data description
61
+
62
+ CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
63
+
64
+
65
+ # Acknowledgement
66
+
67
+ This is part of the [DT4H project](https://www.datatools4heart.eu/).
68
+
69
+ # Doi and reference
70
+
71
+
72
+
73
+ For more details about training/eval and other scripts, see CardioNER [github repo](https://github.com/DataTools4Heart/CardioNER).
74
+ and for more information on the background, see Datatools4Heart [Huggingface](https://huggingface.co/DT4H)/[Website](https://www.datatools4heart.eu/)
75
+
76
+