UMCU commited on
Commit
9e78623
·
verified ·
1 Parent(s): bf1bed2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -22,20 +22,19 @@ pipeline_tag: token-classification
22
 
23
 
24
 
25
-
26
  This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
27
  we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
28
- over sequences. This specific model is trained on a batch of 240 span-labeled documents.
29
 
30
  ### Expected input and output
31
- The input should be a string with **Dutch** cardio clinical text.
32
 
33
- CardioNER.nl_128xtokenWindow is a muticlass span classification model.
34
- The classes that can be predicted are ['procedure,medication,diseasae,symptom'].
35
 
36
  #### Extracting span classification from CardioNER.nl_128xtokenWindow
37
 
38
- The following script converts a string of <512 tokens to a list of span predictions.
39
  ```python
40
  from transformers import pipeline
41
 
@@ -47,7 +46,7 @@ le_pipe = pipeline('ner',
47
  named_ents = le_pipe(SOME_TEXT)
48
  ```
49
 
50
- To process a string of arbitrary length you can split the string into sentences or paragraphs
51
  using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
52
  You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
53
  ```python
@@ -56,7 +55,6 @@ named_ents = le_pipe(SOME_TEXT, stride=256)
56
 
57
 
58
 
59
-
60
  # Data description
61
 
62
  CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
 
22
 
23
 
24
 
 
25
  This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
26
  we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
27
+ over sequences. This specific model is trained on a batch of about 500 span-labeled documents.
28
 
29
  ### Expected input and output
30
+ The input should be a string with **Dutch** clinical text related to **cardiology**
31
 
32
+ CardioNER.nl_128xtokenWindow is a multiclass span classification model.
33
+ The classes that can be predicted are ['procedure,medication,disease,symptom'].
34
 
35
  #### Extracting span classification from CardioNER.nl_128xtokenWindow
36
 
37
+ The following script converts a string of <128 tokens to a list of span predictions.
38
  ```python
39
  from transformers import pipeline
40
 
 
46
  named_ents = le_pipe(SOME_TEXT)
47
  ```
48
 
49
+ To process a string of *arbitrary length* you can split the string into sentences or paragraphs
50
  using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
51
  You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
52
  ```python
 
55
 
56
 
57
 
 
58
  # Data description
59
 
60
  CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom