Add files using upload-large-folder tool
Browse files- README.md +4 -4
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -211,13 +211,13 @@ The current public checkpoint was produced through staged pretraining:
|
|
| 211 |
4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
|
| 212 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 213 |
|
| 214 |
-
Current long-running stage-
|
| 215 |
|
| 216 |
| Field | Value |
|
| 217 |
|---|---|
|
| 218 |
| Hardware | 8 x NVIDIA H200 |
|
| 219 |
-
| Data | `
|
| 220 |
-
| Tokens in current stage dataset |
|
| 221 |
| Global batch | 180,224 tokens |
|
| 222 |
| Local token slots/GPU | 22,528 |
|
| 223 |
| Context | 4,096 |
|
|
@@ -228,7 +228,7 @@ Current long-running stage-2 settings:
|
|
| 228 |
|
| 229 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 230 |
|
| 231 |
-
The full HRM
|
| 232 |
|
| 233 |
## Intended Use
|
| 234 |
|
|
|
|
| 211 |
4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
|
| 212 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 213 |
|
| 214 |
+
Current long-running stage-3 settings:
|
| 215 |
|
| 216 |
| Field | Value |
|
| 217 |
|---|---|
|
| 218 |
| Hardware | 8 x NVIDIA H200 |
|
| 219 |
+
| Data | `local_terminal_conversations_ctx9k_resp6k_v1` |
|
| 220 |
+
| Tokens in current stage dataset | 9.39B |
|
| 221 |
| Global batch | 180,224 tokens |
|
| 222 |
| Local token slots/GPU | 22,528 |
|
| 223 |
| Context | 4,096 |
|
|
|
|
| 228 |
|
| 229 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 230 |
|
| 231 |
+
The stage-2 full/no-cap HRM continuation has completed and produced a final epoch checkpoint. The public artifact is now being updated through the stage-3 local-terminal continuation while the remaining `stage4 -> stage1b -> stage2b -> stage3b -> stage4b` chain continues in the background.
|
| 232 |
|
| 233 |
## Intended Use
|
| 234 |
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 2768259784
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ab2a074971c08d2fa860ca9f7fca08ecf7e308a396621d7f95f905f53ada295f
|
| 3 |
size 2768259784
|