Commit
·
cd861b4
1
Parent(s):
a76994c
Add training steps info (10k)
Browse files
README.md
CHANGED
|
@@ -60,6 +60,7 @@ attn = (q @ k.T) / sqrt(d)
|
|
| 60 |
- **Experts**: 4 (1 active per token)
|
| 61 |
- **Vocabulary**: 100K tokens
|
| 62 |
- **Context**: 2048 tokens
|
|
|
|
| 63 |
|
| 64 |
## Installation
|
| 65 |
|
|
|
|
| 60 |
- **Experts**: 4 (1 active per token)
|
| 61 |
- **Vocabulary**: 100K tokens
|
| 62 |
- **Context**: 2048 tokens
|
| 63 |
+
- **Training steps**: 10,000
|
| 64 |
|
| 65 |
## Installation
|
| 66 |
|