| --- |
| license: apache-2.0 |
| --- |
| |
|
|
| # ๐งฎ NanoCalc-1M |
|
|
| **NanoCalc-1M** is a ultra-compact, character-level Seq2Seq Transformer based on the T5 architecture, specifically trained to perform arithmetic operations with high precision. |
|
|
| * **Architecture:** T5-based Encoder-Decoder |
| * **Parameters:** 0.99M |
| * **Precision:** Mixed Precision (BF16/FP16) |
| * **Vocab:** Character-level (0-9, +, -, *, /, =) |
| * **Training Data:** 2,000,000 synthetic samples (3-digit arithmetic) |
| * **Max Input Length:** 20 tokens |
| * **Performance:** ~97% Accuracy on 4-operation math (Validation Set) |
|
|
| ## Performance Chart |
| | Epoch | Training Loss | Val Accuracy | Status | |
| | :--- | :--- | :--- | :--- | |
| | 1 | 1.1420 | 54.89% | ๐ด Learnt Format | |
| | 2 | 0.3931 | 78.79% | ๐ก Learnt Digits | |
| | 5 | 0.1638 | 91.91% | ๐ข Learning subtleties | |
| | 9 | 0.1051 | 97.15% | ๐ต High Precision | |
| | **10** | **0.1004** | **97.73%** | ๐ **Near Perfect** | |
|
|
| ## How to use |
| To use this model, download `model.pt` and `use.py` and run it on any type of device with Python3. |
|
|
| ## Examples |
| Model loaded (Accuracy: 97.73% from epoch 10) |
|
|
| --- Mini Math Model interactive --- |
| Enter an arithmetic task (e.g. 15*15) or type 'exit' to quit this. |
| |
| Task > 0*567 |
| Model: 0 | Correct: 0 โ
|
|
|
| Task > 999+999 |
| Model: 1998 | Correct: 1998 โ
|
|
|
| Task > 1/1 |
| Model: 1 | Correct: 1 โ
|
|
|
| Task > 1684*8787 |
| Model: 6398 | Correct: 14797308 โ |
| |
| Task > 124*598 |
| Model: 2452 | Correct: 74152 โ |
|
|
| Task > 12/68 |
| Model: 4 | Correct: 0 โ |
|
|
| Task > 123*123 |
| Model: 499 | Correct: 15129 โ |
| |
| Task > 47*5 |
| Model: 235 | Correct: 235 โ
|
|
|
| Task > 456+125 |
| Model: 581 | Correct: 581 โ
|
|
|
| Task > 957-234 |
| Model: 723 | Correct: 723 โ
|
|
|
| Task > 120-7650 |
| Model: -550 | Correct: -7530 โ |
|
|
| Task > 450-750 |
| Model: -300 | Correct: -300 โ
|
|
|
| Task > 453-97 |
| Model: 356 | Correct: 356 โ
|
|
|
| Task > 129-462 |
| Model: -333 | Correct: -333 โ
|
|
|
| Task > 8*8 |
| Model: 64 | Correct: 64 โ
|
| |
| Task > 54*54 |
| Model: 2916 | Correct: 2916 โ
|
|
|
| Task > 102*78 |
| Model: 748 | Correct: 7956 โ |
| |
| Task > 74*9 |
| Model: 666 | Correct: 666 โ
|
|
|
| Task > 103-34 |
| Model: 69 | Correct: 69 โ
|
|
|
| ## Overall accuracy |
| The overall accuracy after 10 epochs of training is ~97% for tasks with max. 3 digits each like `74*9` or `103-34`. |
|
|
| ## Limitations |
| The can't do: |
| - Tasks with more than 3 digits like `3984-125` |
| - Multiplication tasks with numbers above 99 like `293*21` |
| - Complex tasks |
|
|
| ## Training |
| We trained for 10 epochs (~20 minutes of training on Kaggle 2x T4) with 2 million randomly generated training samples. |
|
|
| ## Final thoughts |
| We may be releasing an improved version of this that can solve really complex tasks and much more...stay tuned! |