add hyperparameters
Browse files
README.md
CHANGED
|
@@ -106,7 +106,11 @@ The model was trained locally on a single-node with one 16GB Nvidia T4 using
|
|
| 106 |
|
| 107 |
#### Training Hyperparameters
|
| 108 |
|
| 109 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
|
| 112 |
## Evaluation / Metrics
|
|
|
|
| 106 |
|
| 107 |
#### Training Hyperparameters
|
| 108 |
|
| 109 |
+
- **Precision:** We use FP32 precision, as follows immediately from the precision inhereted for the original "DistilBERT/distilbert-base-uncased" model.
|
| 110 |
+
- **Optimizer:** AdamW
|
| 111 |
+
- **Learning Rate:** We use a linear learing rate scheduler with an initial learning rate of 5e-5
|
| 112 |
+
- **Batch Size:** 32
|
| 113 |
+
- **Number of Training Steps**: 2877 steps over the course of 3 epochs
|
| 114 |
|
| 115 |
|
| 116 |
## Evaluation / Metrics
|