Update README.md
Browse files
README.md
CHANGED
|
@@ -100,7 +100,6 @@ We follow their training recipe and release our version of Mamba-7B.
|
|
| 100 |
| Optimizer | AdamW |
|
| 101 |
| Learning rate | 3e-4 |
|
| 102 |
| LR cooldown end | 1e-5 |
|
| 103 |
-
| QK-norm | False |
|
| 104 |
| Warmup steps | 2000 |
|
| 105 |
| Z-loss | 1e-4 |
|
| 106 |
| Batch size | 2M |
|
|
|
|
| 100 |
| Optimizer | AdamW |
|
| 101 |
| Learning rate | 3e-4 |
|
| 102 |
| LR cooldown end | 1e-5 |
|
|
|
|
| 103 |
| Warmup steps | 2000 |
|
| 104 |
| Z-loss | 1e-4 |
|
| 105 |
| Batch size | 2M |
|