gyung commited on
Commit
db12311
·
verified ·
1 Parent(s): 862683a

Add files using upload-large-folder tool

Browse files
Files changed (2) hide show
  1. README.md +8 -6
  2. model.safetensors +1 -1
README.md CHANGED
@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
43
 
44
  It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
45
 
46
- Current public artifact: `stage1` HRM fast-cap checkpoint at `step_25000`, converted with EMA weights to `safetensors`. Training is still in progress.
47
 
48
  ## Model Details
49
 
@@ -117,7 +117,8 @@ from simple_inference_engine import inference_load_checkpoint, inference_generat
117
 
118
  ckpt = inference_load_checkpoint(
119
  ckpt_path="/path/to/KoHRM-Text-1.4B-stage1-hrm-fastcap-gbs180",
120
- ckpt_epoch=25000,
 
121
  ckpt_use_ema=True,
122
  device="cuda",
123
  )
@@ -156,7 +157,7 @@ Completed and prepared datasets:
156
  | SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
157
  | Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
158
  | ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
159
- | HRM cleaned fast-cap stage-1 | 14.55B | 148G | current stage-1 |
160
  | Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
161
  | Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
162
  | Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
@@ -207,14 +208,15 @@ The current public checkpoint was produced through staged pretraining:
207
  1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
208
  2. Continue once more on the same available mix as `stage0b`.
209
  3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
210
- 4. Convert `stage1 step_25000` EMA weights to `safetensors` and upload to the main model repo.
 
211
 
212
- Current long-running stage-1 settings:
213
 
214
  | Field | Value |
215
  |---|---|
216
  | Hardware | 8 x NVIDIA H200 |
217
- | Data | `koterm_hrm_cleaned_fastcap_stage1_v1` |
218
  | Tokens in current stage dataset | 14.55B |
219
  | Global batch | 180,224 tokens |
220
  | Local token slots/GPU | 22,528 |
 
43
 
44
  It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
45
 
46
+ Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_120000`, converted with EMA weights to `safetensors`. Training is still in progress from this run.
47
 
48
  ## Model Details
49
 
 
117
 
118
  ckpt = inference_load_checkpoint(
119
  ckpt_path="/path/to/KoHRM-Text-1.4B-stage1-hrm-fastcap-gbs180",
120
+ ckpt_epoch=None,
121
+ ckpt_step=85000,
122
  ckpt_use_ema=True,
123
  device="cuda",
124
  )
 
157
  | SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
158
  | Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
159
  | ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
160
+ | HRM cleaned fast-cap stage-1 | 14.55B | 148G | completed through latest saved `step_85000` |
161
  | Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
162
  | Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
163
  | Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
 
208
  1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
209
  2. Continue once more on the same available mix as `stage0b`.
210
  3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
211
+ 4. Convert `stage2 step_120000` EMA weights to `safetensors` and upload to the main model repo.
212
+ 5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
213
 
214
+ Current long-running stage-2 settings:
215
 
216
  | Field | Value |
217
  |---|---|
218
  | Hardware | 8 x NVIDIA H200 |
219
+ | Data | `koterm_hrm_cleaned_full_nocap_v1` |
220
  | Tokens in current stage dataset | 14.55B |
221
  | Global batch | 180,224 tokens |
222
  | Local token slots/GPU | 22,528 |
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:46d4e7e588bf66dc44aa036c7ed4542428d7832de0ae330288486e0c19370043
3
  size 2768259784
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27fbd40942654580e230a7bc20cbd7c1019cc571f2c42d54a3e6985a21cd545d
3
  size 2768259784