Commit
·
0b28f00
1
Parent(s):
9853dfd
update with new training params and perf
Browse files
README.md
CHANGED
|
@@ -51,15 +51,14 @@ The training was run on a NVIDIA DGX Station with 4XTesla V100 GPUs.
|
|
| 51 |
|
| 52 |
Training code is available at https://github.com/source-data/soda-roberta
|
| 53 |
|
| 54 |
-
- Command: `python -m tokcl.train NER --num_train_epochs=3.5`
|
| 55 |
- Tokenizer vocab size: 50265
|
| 56 |
- Training data: EMBO/sd-nlp NER
|
| 57 |
-
- Training with
|
| 58 |
-
- Evaluating on
|
| 59 |
- Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
|
| 60 |
-
- Epochs:
|
| 61 |
-
- `per_device_train_batch_size`:
|
| 62 |
-
- `per_device_eval_batch_size`:
|
| 63 |
- `learning_rate`: 0.0001
|
| 64 |
- `weight_decay`: 0.0
|
| 65 |
- `adam_beta1`: 0.9
|
|
@@ -69,20 +68,22 @@ Training code is available at https://github.com/source-data/soda-roberta
|
|
| 69 |
|
| 70 |
## Eval results
|
| 71 |
|
| 72 |
-
Testing on
|
| 73 |
|
| 74 |
```
|
| 75 |
precision recall f1-score support
|
| 76 |
|
| 77 |
-
CELL 0.
|
| 78 |
-
EXP_ASSAY 0.
|
| 79 |
-
GENEPROD 0.
|
| 80 |
-
ORGANISM 0.
|
| 81 |
-
SMALL_MOLECULE 0.
|
| 82 |
-
SUBCELLULAR 0.
|
| 83 |
-
TISSUE 0.
|
| 84 |
-
|
| 85 |
-
micro avg 0.
|
| 86 |
-
macro avg 0.
|
| 87 |
-
weighted avg 0.
|
|
|
|
|
|
|
| 88 |
```
|
|
|
|
| 51 |
|
| 52 |
Training code is available at https://github.com/source-data/soda-roberta
|
| 53 |
|
|
|
|
| 54 |
- Tokenizer vocab size: 50265
|
| 55 |
- Training data: EMBO/sd-nlp NER
|
| 56 |
+
- Training with 48771 examples.
|
| 57 |
+
- Evaluating on 13801 examples.
|
| 58 |
- Training on 15 features: O, I-SMALL_MOLECULE, B-SMALL_MOLECULE, I-GENEPROD, B-GENEPROD, I-SUBCELLULAR, B-SUBCELLULAR, I-CELL, B-CELL, I-TISSUE, B-TISSUE, I-ORGANISM, B-ORGANISM, I-EXP_ASSAY, B-EXP_ASSAY
|
| 59 |
+
- Epochs: 0.6
|
| 60 |
+
- `per_device_train_batch_size`: 16
|
| 61 |
+
- `per_device_eval_batch_size`: 16
|
| 62 |
- `learning_rate`: 0.0001
|
| 63 |
- `weight_decay`: 0.0
|
| 64 |
- `adam_beta1`: 0.9
|
|
|
|
| 68 |
|
| 69 |
## Eval results
|
| 70 |
|
| 71 |
+
Testing on 7178 examples of test set with `sklearn.metrics`:
|
| 72 |
|
| 73 |
```
|
| 74 |
precision recall f1-score support
|
| 75 |
|
| 76 |
+
CELL 0.69 0.81 0.74 5245
|
| 77 |
+
EXP_ASSAY 0.56 0.57 0.56 10067
|
| 78 |
+
GENEPROD 0.77 0.89 0.82 23587
|
| 79 |
+
ORGANISM 0.72 0.82 0.77 3623
|
| 80 |
+
SMALL_MOLECULE 0.70 0.80 0.75 6187
|
| 81 |
+
SUBCELLULAR 0.65 0.72 0.69 3700
|
| 82 |
+
TISSUE 0.62 0.73 0.67 3207
|
| 83 |
+
|
| 84 |
+
micro avg 0.70 0.79 0.74 55616
|
| 85 |
+
macro avg 0.67 0.77 0.72 55616
|
| 86 |
+
weighted avg 0.70 0.79 0.74 55616
|
| 87 |
+
|
| 88 |
+
{'test_loss': 0.1830928772687912, 'test_accuracy_score': 0.9334821000160841, 'test_precision': 0.6987463009514112, 'test_recall': 0.789682825086306, 'test_f1': 0.7414366506288511, 'test_runtime': 61.0547, 'test_samples_per_second': 117.567, 'test_steps_per_second': 1.851}
|
| 89 |
```
|