gyung commited on
Commit
e4fdd9a
·
verified ·
1 Parent(s): db12311

Add files using upload-large-folder tool

Browse files
Files changed (2) hide show
  1. README.md +4 -4
  2. model.safetensors +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
43
 
44
  It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
45
 
46
- Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_120000`, converted with EMA weights to `safetensors`. Training is still in progress from this run.
47
 
48
  ## Model Details
49
 
@@ -208,7 +208,7 @@ The current public checkpoint was produced through staged pretraining:
208
  1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
209
  2. Continue once more on the same available mix as `stage0b`.
210
  3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
211
- 4. Convert `stage2 step_120000` EMA weights to `safetensors` and upload to the main model repo.
212
  5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
213
 
214
  Current long-running stage-2 settings:
@@ -223,8 +223,8 @@ Current long-running stage-2 settings:
223
  | Context | 4,096 |
224
  | LR | 2.2e-4 |
225
  | LR warmup | 2,000 steps |
226
- | Checkpoint interval | 5,000 steps |
227
- | Current public export | `step_25000`, EMA, safetensors |
228
 
229
  The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
230
 
 
43
 
44
  It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
45
 
46
+ Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_160000`, converted with EMA weights to `safetensors`. Training is still in progress; the stage-2 run is continuing toward its end-of-stage checkpoint from the same full/no-cap HRM corpus.
47
 
48
  ## Model Details
49
 
 
208
  1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
209
  2. Continue once more on the same available mix as `stage0b`.
210
  3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
211
+ 4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
212
  5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
213
 
214
  Current long-running stage-2 settings:
 
223
  | Context | 4,096 |
224
  | LR | 2.2e-4 |
225
  | LR warmup | 2,000 steps |
226
+ | Checkpoint interval | 10,000 steps |
227
+ | Current public export | `stage2 step_160000`, EMA, safetensors |
228
 
229
  The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
230
 
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:27fbd40942654580e230a7bc20cbd7c1019cc571f2c42d54a3e6985a21cd545d
3
  size 2768259784
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f132e1db6fe5bad4cb12bbf077e90abe77745fc5e9905df1fe47fa71e0c2cf52
3
  size 2768259784