update README.md
Browse files
README.md
CHANGED
|
@@ -2,6 +2,7 @@
|
|
| 2 |
language: nl
|
| 3 |
license: mit
|
| 4 |
pipeline_tag: text-classification
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# A-PROOF ICF-domains Classification
|
|
@@ -15,27 +16,85 @@ The model can detect 9 domains, which were chosen due to their relevance to reco
|
|
| 15 |
|
| 16 |
ICF code | Domain | name in repo
|
| 17 |
---|---|---
|
| 18 |
-
b1300 | Energy level | ENR
|
| 19 |
-
b140 | Attention functions | ATT
|
| 20 |
-
b152 | Emotional functions | STM
|
| 21 |
b440 | Respiration functions | ADM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
b455 | Exercise tolerance functions | INS
|
| 23 |
b530 | Weight maintenance functions | MBW
|
| 24 |
-
|
| 25 |
-
d550 | Eating | ETN
|
| 26 |
-
d840-d859 | Work and employment | BER
|
| 27 |
|
| 28 |
## Intended uses and limitations
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Training data
|
| 32 |
-
|
|
|
|
| 33 |
|
| 34 |
## Training procedure
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
## Evaluation results
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
## Authors and references
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
TBD
|
|
|
|
| 2 |
language: nl
|
| 3 |
license: mit
|
| 4 |
pipeline_tag: text-classification
|
| 5 |
+
inference: false
|
| 6 |
---
|
| 7 |
|
| 8 |
# A-PROOF ICF-domains Classification
|
|
|
|
| 16 |
|
| 17 |
ICF code | Domain | name in repo
|
| 18 |
---|---|---
|
|
|
|
|
|
|
|
|
|
| 19 |
b440 | Respiration functions | ADM
|
| 20 |
+
b140 | Attention functions | ATT
|
| 21 |
+
d840-d859 | Work and employment | BER
|
| 22 |
+
b1300 | Energy level | ENR
|
| 23 |
+
d550 | Eating | ETN
|
| 24 |
+
d450 | Walking | FAC
|
| 25 |
b455 | Exercise tolerance functions | INS
|
| 26 |
b530 | Weight maintenance functions | MBW
|
| 27 |
+
b152 | Emotional functions | STM
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Intended uses and limitations
|
| 30 |
+
- The model was fine-tuned (trained, validated and tested) on medical records from the Amsterdam UMC (the two academic medical centers of Amsterdam). It might perform differently on text from a different hospital or text from non-hospital sources (e.g. GP records).
|
| 31 |
+
- The model was fine-tuned with the [Simple Transformers](https://simpletransformers.ai/) library. This library is based on Transformers but the model cannot be used directly with Transformers `pipeline` and classes; doing so would generate incorrect outputs. For this reason, the API on this page is disabled.
|
| 32 |
+
|
| 33 |
+
## How to use
|
| 34 |
+
To generate predictions with the model, use the [Simple Transformers](https://simpletransformers.ai/) library:
|
| 35 |
+
```
|
| 36 |
+
from simpletransformers.classification import MultiLabelClassificationModel
|
| 37 |
+
|
| 38 |
+
model = MultiLabelClassificationModel(
|
| 39 |
+
'roberta',
|
| 40 |
+
'CLTL/icf-domains',
|
| 41 |
+
use_cuda=False,
|
| 42 |
+
)
|
| 43 |
+
|
| 44 |
+
example = 'Nu sinds 5-6 dagen progressieve benauwdheidsklachten (bij korte stukken lopen al kortademig), terwijl dit eerder niet zo was.'
|
| 45 |
+
predictions, raw_outputs = model.predict([example])
|
| 46 |
+
```
|
| 47 |
+
The predictions look like this:
|
| 48 |
+
```
|
| 49 |
+
[[1, 0, 0, 0, 0, 1, 1, 0, 0]]
|
| 50 |
+
```
|
| 51 |
+
The indices of the multi-label stand for:
|
| 52 |
+
```
|
| 53 |
+
[ADM, ATT, BER, ENR, ETN, FAC, INS, MBW, STM]
|
| 54 |
+
```
|
| 55 |
+
In other words, the above prediction corresponds to assigning the labels ADM, FAC and INS to the example sentence.
|
| 56 |
+
|
| 57 |
+
The raw outputs look like this:
|
| 58 |
+
```
|
| 59 |
+
[[0.51907885 0.00268032 0.0030862 0.03066113 0.00616694 0.64720929
|
| 60 |
+
0.67348498 0.0118863 0.0046311 ]]
|
| 61 |
+
```
|
| 62 |
+
For this model, the threshold at which the prediction for a label flips from 0 to 1 is **0.5**.
|
| 63 |
|
| 64 |
## Training data
|
| 65 |
+
- The training data consists of clinical notes from medical records (in Dutch) of the Amsterdam UMC. Due to privacy constraints, the data cannot be released.
|
| 66 |
+
- The annotation guidelines used for the project can be found [here](https://github.com/cltl/a-proof-zonmw/tree/main/resources/annotation_guidelines).
|
| 67 |
|
| 68 |
## Training procedure
|
| 69 |
+
The default training parameters of Simple Transformers were used, including:
|
| 70 |
+
- Optimizer: AdamW
|
| 71 |
+
- Learning rate: 4e-5
|
| 72 |
+
- Num train epochs: 1
|
| 73 |
+
- Train batch size: 8
|
| 74 |
+
- Threshold: 0.5
|
| 75 |
|
| 76 |
## Evaluation results
|
| 77 |
+
The evaluation is done on a sentence-level (the classification unit) and on a note-level (the aggregated unit which is meaningful for the healthcare professionals).
|
| 78 |
+
|
| 79 |
+
### Sentence-level
|
| 80 |
+
| | ADM | ATT | BER | ENR | ETN | FAC | INS | MBW | STM
|
| 81 |
+
|---|---|---|---|---|---|---|---|---|---
|
| 82 |
+
precision | 0.98 | 0.98 | 0.56 | 0.96 | 0.92 | 0.84 | 0.89 | 0.79 | 0.70
|
| 83 |
+
recall | 0.49 | 0.41 | 0.29 | 0.57 | 0.49 | 0.71 | 0.26 | 0.62 | 0.75
|
| 84 |
+
F1-score | 0.66 | 0.58 | 0.35 | 0.72 | 0.63 | 0.76 | 0.41 | 0.70 | 0.72
|
| 85 |
+
support | 775 | 39 | 54 | 160 | 382 | 253 | 287 | 125 | 181
|
| 86 |
+
|
| 87 |
+
### Note-level
|
| 88 |
+
| | ADM | ATT | BER | ENR | ETN | FAC | INS | MBW | STM
|
| 89 |
+
|---|---|---|---|---|---|---|---|---|---
|
| 90 |
+
precision | 1.0 | 1.0 | 0.66 | 0.96 | 0.95 | 0.84 | 0.95 | 0.87 | 0.80
|
| 91 |
+
recall | 0.89 | 0.56 | 0.44 | 0.70 | 0.72 | 0.89 | 0.46 | 0.87 | 0.87
|
| 92 |
+
F1-score | 0.94 | 0.71 | 0.50 | 0.81 | 0.82 | 0.86 | 0.61 | 0.87 | 0.84
|
| 93 |
+
support | 231 | 27 | 34 | 92 | 165 | 95 | 116 | 64 | 94
|
| 94 |
|
| 95 |
## Authors and references
|
| 96 |
+
### Authors
|
| 97 |
+
Jenia Kim, Piek Vossen
|
| 98 |
+
|
| 99 |
+
### References
|
| 100 |
TBD
|