CLTL commited on
Commit
38fbd00
·
verified ·
1 Parent(s): f97ce29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -5
README.md CHANGED
@@ -1,8 +1,102 @@
1
  ---
 
2
  license: mit
3
- language:
4
- - nl
5
- base_model:
6
- - FacebookAI/roberta-base
7
  pipeline_tag: text-classification
8
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: nl
3
  license: mit
 
 
 
 
4
  pipeline_tag: text-classification
5
+ inference: false
6
+ ---
7
+
8
+ # A-PROOF ICF-domains Classification
9
+
10
+ ## Description
11
+
12
+ A fine-tuned multi-label classification model that detects 17 [WHO-ICF](https://www.who.int/standards/classifications/international-classification-of-functioning-disability-and-health) domains in clinical text in Dutch.
13
+ The model is based on a pre-trained Dutch medical language model ([link to be added]()), a RoBERTa model, trained from scratch on clinical notes of the Amsterdam UMC.
14
+
15
+ ## ICF domains
16
+ The model can detect 17 categories, which were chosen due to their relevance to recovery from COVID-19:
17
+
18
+
19
+ ICF code | Domain | name in repo
20
+ ---|---|---
21
+ b1300 | Energy level | ENR
22
+ b140 | Attention functions | ATT
23
+ b152 | Emotional functions | STM
24
+ b440 | Respiration functions | ADM
25
+ b455 | Exercise tolerance functions | INS
26
+ b530 | Weight maintenance functions | MBW
27
+ d450 | Walking | FAC
28
+ d550 | Eating | ETN
29
+ d840-d859 | Work and employment | BER
30
+ B280 | Sensations of pain | SOP
31
+ B134 | Sleep functions | SLP
32
+ D760 | Family relationships | FML
33
+ B164 | Higher-level cognitive functions | HLC
34
+ D465 | Moving around using equipment | MAE
35
+ D410 | Changing basic body position | CBP
36
+ B230 | Hearing functions | HRN
37
+ D240 | Handling stress and other psychological demands | HSP
38
+
39
+
40
+ ## Intended uses and limitations
41
+ - The model was fine-tuned (trained, validated and tested) on medical records from the Amsterdam UMC (the two academic medical centers of Amsterdam). It might perform differently on text from a different hospital or text from non-hospital sources (e.g. GP records).
42
+ - The model was fine-tuned with the [Simple Transformers](https://simpletransformers.ai/) library. This library is based on Transformers but the model cannot be used directly with Transformers `pipeline` and classes; doing so would generate incorrect outputs. For this reason, the API on this page is disabled.
43
+
44
+ ## How to use
45
+ To generate predictions with the model, use the [Simple Transformers](https://simpletransformers.ai/) library:
46
+ ```
47
+ from simpletransformers.classification import MultiLabelClassificationModel
48
+
49
+ model = MultiLabelClassificationModel(
50
+ 'roberta',
51
+ 'CLTL/icf-domains',
52
+ use_cuda=False,
53
+ )
54
+
55
+ example = 'Nu sinds 5-6 dagen progressieve benauwdheidsklachten (bij korte stukken lopen al kortademig), terwijl dit eerder niet zo was.'
56
+ predictions, raw_outputs = model.predict([example])
57
+ ```
58
+ The predictions look like this:
59
+ ```
60
+ [[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
61
+ ```
62
+ The indices of the multi-label stand for:
63
+ ```
64
+ [ENR-B1300, ATT-B140, STM-B152, ADM-B440, INS-B455, MBW-B530, FAC-D540, ETN-D550, BER-D840-D859, SOP-B280, SLP-B134, FML-D760, HLC-B164, MAE-D465, CBP-D410, HRN-B230, HSP-D240]
65
+ ```
66
+ In other words, the above prediction corresponds to assigning the labels ADM, FAC and INS to the example sentence.
67
+
68
+ The raw outputs look like this:
69
+ ```
70
+ [[0.51907885 0.00268032 0.0030862 0.03066113 0.00616694 0.64720929
71
+ 0.67348498 0.0118863 0.0046311 ]]
72
+ ```
73
+ For this model, the threshold at which the prediction for a label flips from 0 to 1 is **0.5**.
74
+
75
+ ## Training data
76
+ - The training data consists of clinical notes from medical records (in Dutch) of the Amsterdam UMC. Due to privacy constraints, the data cannot be released.
77
+ - The annotation guidelines used for the project can be found [here](https://github.com/cltl/a-proof-zonmw/tree/main/resources/annotation_guidelines).
78
+
79
+ ## Training procedure
80
+ The default training parameters of Simple Transformers were used, including:
81
+ - Optimizer: AdamW
82
+ - Learning rate: 4e-5
83
+ - Num train epochs: 1
84
+ - Train batch size: 8
85
+ - Threshold: 0.5
86
+
87
+
88
+ ## Authors and references
89
+ ### Authors
90
+ Jenia Kim, Piek Vossen
91
+
92
+ ### References
93
+
94
+ Kim, Jenia, Stella Verkijk, Edwin Geleijn, Marieke van der Leeden, Carel Meskers, Caroline Meskers, Sabina van der Veen, Piek Vossen, and Guy Widdershoven. "Modeling Dutch medical texts for detecting functional categories and levels of COVID-19 patients." In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4577-4585. 2022
95
+
96
+ @inproceedings{kim2022modeling,
97
+ title={Modeling Dutch medical texts for detecting functional categories and levels of COVID-19 patients},
98
+ author={Kim, Jenia and Verkijk, Stella and Geleijn, Edwin and van der Leeden, Marieke and Meskers, Carel and Meskers, Caroline and van der Veen, Sabina and Vossen, Piek and Widdershoven, Guy},
99
+ booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference},
100
+ pages={4577--4585},
101
+ year={2022}
102
+ }