TRI-ML
/

mamba-7b-rw

Text Generation

Eval Results (legacy)

Model card Files Files and versions

Update README.md

#3

by ivas-tri - opened Apr 17, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -93,7 +93,7 @@ We follow their training recipe and release our version of Mamba-7B.
 ## Training Details
 - Mamba-7B was trained using AWS SageMaker on 128 H100 80GB GPUs.
-- Training began in March 2024 and lasted around 3 weeks (some down time due to crashes and loss spikes)
 | **Hyperparameter** | **Value**  |
 |--------------------|------------|
 | Precision          | `bfloat16` |

 ## Training Details
 - Mamba-7B was trained using AWS SageMaker on 128 H100 80GB GPUs.
+- Training began in March 2024 and lasted three weeks.
 | **Hyperparameter** | **Value**  |
 |--------------------|------------|
 | Precision          | `bfloat16` |