Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ tags:
|
|
| 12 |
|
| 13 |
# RadReportX
|
| 14 |
### Model description
|
| 15 |
-
Llama3.1-8B-instruct model fine tuned on synthetic data. There are two tasks that this model can achieve. The first task is an open-ended question, which is to detect phrases in a radiology report that represents an ICD-10 code. There is no restriction about the underlying disease. The second task is to detect disease out of 13 candidates from a radiology report. The candidate diseases are [Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged Cardiomediastinum, Fracture, Lung Lesion, Lung Opacity, Pleural Effusion, Pleural Other, Pneumonia, Pneumothorax, Support Devices]. When there are no diseases out of the candidates, the model will output 'Normal'.
|
| 16 |
|
| 17 |
### Training set and training process
|
| 18 |
There are two sources of training data. The first set is generated by GPT4o. The second source comes from MIMIC-CXR dataset (https://arxiv.org/pdf/1901.07042), with labels being extracted by Negbio algorithm. The training is conducted using torchtune framework (https://github.com/pytorch/torchtune). For details, please refer to our paper listed below.
|
|
|
|
| 12 |
|
| 13 |
# RadReportX
|
| 14 |
### Model description
|
| 15 |
+
Llama3.1-8B-instruct model fine tuned on synthetic data. There are two tasks that this model can achieve. The first task is an open-ended question, which is to detect phrases in a radiology report that represents an ICD-10 code. There is no restriction about the underlying disease. The second task is to detect disease out of 13 candidates from a radiology report. The candidate diseases are [*Atelectasis, Cardiomegaly, Consolidation, Edema, Enlarged Cardiomediastinum, Fracture, Lung Lesion, Lung Opacity, Pleural Effusion, Pleural Other, Pneumonia, Pneumothorax, Support Devices*]. When there are no diseases out of the candidates, the model will output 'Normal'.
|
| 16 |
|
| 17 |
### Training set and training process
|
| 18 |
There are two sources of training data. The first set is generated by GPT4o. The second source comes from MIMIC-CXR dataset (https://arxiv.org/pdf/1901.07042), with labels being extracted by Negbio algorithm. The training is conducted using torchtune framework (https://github.com/pytorch/torchtune). For details, please refer to our paper listed below.
|