Update README.md
Browse files
README.md
CHANGED
|
@@ -30,14 +30,25 @@ tokenizer.decode(output[0], skip_special_tokens=True, clean_up_tokenization_spac
|
|
| 30 |
|----------------------|-------------------------------------------------------------------------------------------------|
|
| 31 |
| Dataset | WMT14-de-en |
|
| 32 |
| Translation Pairs | 4.5M (83M tokens total) |
|
| 33 |
-
| Epochs |
|
| 34 |
| Batch Size | 16 |
|
| 35 |
| Accumulation Batch | 8 |
|
| 36 |
| Effective Batch Size | 128 (16 * 8) |
|
| 37 |
| Training Script | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py) |
|
| 38 |
| Optimiser | Adam (learning rate = 0.0001) |
|
| 39 |
| Loss Type | Cross Entropy |
|
| 40 |
-
| Final Test Loss | 1.
|
| 41 |
| GPU. | RTX 4070 (12GB) |
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
|
|
|
| 30 |
|----------------------|-------------------------------------------------------------------------------------------------|
|
| 31 |
| Dataset | WMT14-de-en |
|
| 32 |
| Translation Pairs | 4.5M (83M tokens total) |
|
| 33 |
+
| Epochs | 24 |
|
| 34 |
| Batch Size | 16 |
|
| 35 |
| Accumulation Batch | 8 |
|
| 36 |
| Effective Batch Size | 128 (16 * 8) |
|
| 37 |
| Training Script | [train.py](https://github.com/ubaada/scratch-transformer/blob/main/train.py) |
|
| 38 |
| Optimiser | Adam (learning rate = 0.0001) |
|
| 39 |
| Loss Type | Cross Entropy |
|
| 40 |
+
| Final Test Loss | 1.87 |
|
| 41 |
| GPU. | RTX 4070 (12GB) |
|
| 42 |
|
| 43 |
+
<p align="center" style="width:500px;">
|
| 44 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/62a7d1e152aa8695f9209345/0p4eEHiYFaeaibjk_Rf1y.png" />
|
| 45 |
+
</p>
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
## Results
|
| 49 |
+
|
| 50 |
+
<p align="center" style="width:500px;">
|
| 51 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/62a7d1e152aa8695f9209345/Gip1Ox-M1_z3qdafGGh3-.png" />
|
| 52 |
+
</p>
|
| 53 |
+
|
| 54 |
|