|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
Contains files for a Transformer model that answers 6-digit subtraction questions (e.g. 123450-345670=-0123230) with very low loss (1e-8). |
|
|
|
|
|
This subtraction model has 3 layers, 4 attention heads, d-model = 510, d-head = 170. |
|
|
The subtraction model was initialised with a very-low-loss Addition model (2 layers, 3 attention heads, 9e-9 loss), before being trained for 45K epochs. |
|
|
|
|
|
The CoLab used to train the model is here: https://github.com/apartresearch/Verified_addition/blob/main/assets/Accurate_Math_Train.ipynb |
|
|
|
|
|
The CoLab used to analyse the model is here: https://github.com/apartresearch/Verified_addition/blob/main/assets/Accurate_Math_Analyse.ipynb |
|
|
|