LLM-OS-Models
/

KoHRM-Text-1.4B

@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
 It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
-Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_120000`, converted with EMA weights to `safetensors`. Training is still in progress from this run.
 ## Model Details
@@ -208,7 +208,7 @@ The current public checkpoint was produced through staged pretraining:
 1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
 2. Continue once more on the same available mix as `stage0b`.
 3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
-4. Convert `stage2 step_120000` EMA weights to `safetensors` and upload to the main model repo.
 5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
 Current long-running stage-2 settings:
@@ -223,8 +223,8 @@ Current long-running stage-2 settings:
 | Context | 4,096 |
 | LR | 2.2e-4 |
 | LR warmup | 2,000 steps |
-| Checkpoint interval | 5,000 steps |
-| Current public export | `step_25000`, EMA, safetensors |
 The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.

 It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
+Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_160000`, converted with EMA weights to `safetensors`. Training is still in progress; the stage-2 run is continuing toward its end-of-stage checkpoint from the same full/no-cap HRM corpus.
 ## Model Details
 1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
 2. Continue once more on the same available mix as `stage0b`.
 3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
+4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
 5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
 Current long-running stage-2 settings:
 | Context | 4,096 |
 | LR | 2.2e-4 |
 | LR warmup | 2,000 steps |
+| Checkpoint interval | 10,000 steps |
+| Current public export | `stage2 step_160000`, EMA, safetensors |
 The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:27fbd40942654580e230a7bc20cbd7c1019cc571f2c42d54a3e6985a21cd545d
 size 2768259784

 version https://git-lfs.github.com/spec/v1
+oid sha256:f132e1db6fe5bad4cb12bbf077e90abe77745fc5e9905df1fe47fa71e0c2cf52
 size 2768259784