Update README.md
Browse files
README.md
CHANGED
|
@@ -22,20 +22,19 @@ pipeline_tag: token-classification
|
|
| 22 |
|
| 23 |
|
| 24 |
|
| 25 |
-
|
| 26 |
This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
|
| 27 |
we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
|
| 28 |
-
over sequences. This specific model is trained on a batch of
|
| 29 |
|
| 30 |
### Expected input and output
|
| 31 |
-
The input should be a string with **Dutch**
|
| 32 |
|
| 33 |
-
CardioNER.nl_128xtokenWindow is a
|
| 34 |
-
The classes that can be predicted are ['procedure,medication,
|
| 35 |
|
| 36 |
#### Extracting span classification from CardioNER.nl_128xtokenWindow
|
| 37 |
|
| 38 |
-
The following script converts a string of <
|
| 39 |
```python
|
| 40 |
from transformers import pipeline
|
| 41 |
|
|
@@ -47,7 +46,7 @@ le_pipe = pipeline('ner',
|
|
| 47 |
named_ents = le_pipe(SOME_TEXT)
|
| 48 |
```
|
| 49 |
|
| 50 |
-
To process a string of arbitrary length you can split the string into sentences or paragraphs
|
| 51 |
using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
|
| 52 |
You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
|
| 53 |
```python
|
|
@@ -56,7 +55,6 @@ named_ents = le_pipe(SOME_TEXT, stride=256)
|
|
| 56 |
|
| 57 |
|
| 58 |
|
| 59 |
-
|
| 60 |
# Data description
|
| 61 |
|
| 62 |
CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
|
|
|
|
| 25 |
This a UMCU/CardioBERTa.nl_clinical base model finetuned for span classification. For this model
|
| 26 |
we used IOB-tagging. Using the IOB-tagging schema facilitates the aggregation of predictions
|
| 27 |
+
over sequences. This specific model is trained on a batch of about 500 span-labeled documents.
|
| 28 |
|
| 29 |
### Expected input and output
|
| 30 |
+
The input should be a string with **Dutch** clinical text related to **cardiology**
|
| 31 |
|
| 32 |
+
CardioNER.nl_128xtokenWindow is a multiclass span classification model.
|
| 33 |
+
The classes that can be predicted are ['procedure,medication,disease,symptom'].
|
| 34 |
|
| 35 |
#### Extracting span classification from CardioNER.nl_128xtokenWindow
|
| 36 |
|
| 37 |
+
The following script converts a string of <128 tokens to a list of span predictions.
|
| 38 |
```python
|
| 39 |
from transformers import pipeline
|
| 40 |
|
|
|
|
| 46 |
named_ents = le_pipe(SOME_TEXT)
|
| 47 |
```
|
| 48 |
|
| 49 |
+
To process a string of *arbitrary length* you can split the string into sentences or paragraphs
|
| 50 |
using e.g. pysbd or spacy(sentencizer) and iteratively parse the list of with the span-classification pipe.
|
| 51 |
You can also use the strider built in the transformer pipeline, although this is limited to non-overlapping strides plus it requires a FastTokenizer and it does not work for aggregation_strategy=None;
|
| 52 |
```python
|
|
|
|
| 55 |
|
| 56 |
|
| 57 |
|
|
|
|
| 58 |
# Data description
|
| 59 |
|
| 60 |
CardioCCC; manually labeled cardiology discharge letters; procedure, medication, disease, symptom
|