gyung commited on
Commit
6e4d5ef
·
verified ·
1 Parent(s): 5dae8f6

Add files using upload-large-folder tool

Browse files
Files changed (2) hide show
  1. README.md +4 -4
  2. model.safetensors +1 -1
README.md CHANGED
@@ -211,13 +211,13 @@ The current public checkpoint was produced through staged pretraining:
211
  4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
212
  5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
213
 
214
- Current long-running stage-2 settings:
215
 
216
  | Field | Value |
217
  |---|---|
218
  | Hardware | 8 x NVIDIA H200 |
219
- | Data | `koterm_hrm_cleaned_full_nocap_v1` |
220
- | Tokens in current stage dataset | 14.55B |
221
  | Global batch | 180,224 tokens |
222
  | Local token slots/GPU | 22,528 |
223
  | Context | 4,096 |
@@ -228,7 +228,7 @@ Current long-running stage-2 settings:
228
 
229
  The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
230
 
231
- The full HRM 328G cleaned corpus is being retokenized with the new 131K tokenizer. That full no-cap retokenization is intended to support a larger 40B+ token training continuation, instead of stopping at the 14.55B fast-cap stage.
232
 
233
  ## Intended Use
234
 
 
211
  4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
212
  5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
213
 
214
+ Current long-running stage-3 settings:
215
 
216
  | Field | Value |
217
  |---|---|
218
  | Hardware | 8 x NVIDIA H200 |
219
+ | Data | `local_terminal_conversations_ctx9k_resp6k_v1` |
220
+ | Tokens in current stage dataset | 9.39B |
221
  | Global batch | 180,224 tokens |
222
  | Local token slots/GPU | 22,528 |
223
  | Context | 4,096 |
 
228
 
229
  The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
230
 
231
+ The stage-2 full/no-cap HRM continuation has completed and produced a final epoch checkpoint. The public artifact is now being updated through the stage-3 local-terminal continuation while the remaining `stage4 -> stage1b -> stage2b -> stage3b -> stage4b` chain continues in the background.
232
 
233
  ## Intended Use
234
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b58990a81bc865eba09890f6f53bd2080d1e5b901e647e80b0864ee0bd6e7b2d
3
  size 2768259784
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab2a074971c08d2fa860ca9f7fca08ecf7e308a396621d7f95f905f53ada295f
3
  size 2768259784