LLM-OS-Models
/

KoHRM-Text-1.4B

Text Generation

Model card Files Files and versions

gyung commited on about 18 hours ago

Commit

6e4d5ef

·

verified ·

1 Parent(s): 5dae8f6

Add files using upload-large-folder tool

Files changed (2) hide show

README.md +4 -4
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -211,13 +211,13 @@ The current public checkpoint was produced through staged pretraining:
 4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
 5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
-Current long-running stage-2 settings:
 | Field | Value |
 |---|---|
 | Hardware | 8 x NVIDIA H200 |
-| Data | `koterm_hrm_cleaned_full_nocap_v1` |
-| Tokens in current stage dataset | 14.55B |
 | Global batch | 180,224 tokens |
 | Local token slots/GPU | 22,528 |
 | Context | 4,096 |
@@ -228,7 +228,7 @@ Current long-running stage-2 settings:
 The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
-The full HRM 328G cleaned corpus is being retokenized with the new 131K tokenizer. That full no-cap retokenization is intended to support a larger 40B+ token training continuation, instead of stopping at the 14.55B fast-cap stage.
 ## Intended Use

 4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
 5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
+Current long-running stage-3 settings:
 | Field | Value |
 |---|---|
 | Hardware | 8 x NVIDIA H200 |
+| Data | `local_terminal_conversations_ctx9k_resp6k_v1` |
+| Tokens in current stage dataset | 9.39B |
 | Global batch | 180,224 tokens |
 | Local token slots/GPU | 22,528 |
 | Context | 4,096 |
 The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
+The stage-2 full/no-cap HRM continuation has completed and produced a final epoch checkpoint. The public artifact is now being updated through the stage-3 local-terminal continuation while the remaining `stage4 -> stage1b -> stage2b -> stage3b -> stage4b` chain continues in the background.
 ## Intended Use

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b58990a81bc865eba09890f6f53bd2080d1e5b901e647e80b0864ee0bd6e7b2d
 size 2768259784

 version https://git-lfs.github.com/spec/v1
+oid sha256:ab2a074971c08d2fa860ca9f7fca08ecf7e308a396621d7f95f905f53ada295f
 size 2768259784