Add files using upload-large-folder tool
Browse files- README.md +11 -7
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
-
Current public artifact:
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
@@ -157,7 +157,9 @@ Completed and prepared datasets:
|
|
| 157 |
| SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
|
| 158 |
| Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
|
| 159 |
| ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
|
| 160 |
-
| HRM cleaned fast-cap stage-1 | 14.55B | 148G | completed
|
|
|
|
|
|
|
| 161 |
| Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
|
| 162 |
| Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
|
| 163 |
| Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
|
|
@@ -210,25 +212,27 @@ The current public checkpoint was produced through staged pretraining:
|
|
| 210 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 211 |
4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
|
| 212 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
|
|
|
|
|
|
| 213 |
|
| 214 |
-
Current long-running
|
| 215 |
|
| 216 |
| Field | Value |
|
| 217 |
|---|---|
|
| 218 |
| Hardware | 8 x NVIDIA H200 |
|
| 219 |
-
| Data | `
|
| 220 |
-
| Tokens in current stage dataset |
|
| 221 |
| Global batch | 180,224 tokens |
|
| 222 |
| Local token slots/GPU | 22,528 |
|
| 223 |
| Context | 4,096 |
|
| 224 |
| LR | 2.2e-4 |
|
| 225 |
| LR warmup | 2,000 steps |
|
| 226 |
| Checkpoint interval | 10,000 steps |
|
| 227 |
-
| Current public export | `
|
| 228 |
|
| 229 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 230 |
|
| 231 |
-
The stage-2 full/no-cap HRM continuation
|
| 232 |
|
| 233 |
## Intended Use
|
| 234 |
|
|
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
+
Current public artifact target: latest converted EMA checkpoint from the ongoing staged run. As of 2026-05-26 17:17 KST, `stage1b-hrm-fastcap-repeat` has produced and uploaded `step_240000`. Training is still in progress; this is an intermediate checkpoint, not the final aligned model.
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
|
|
| 157 |
| SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
|
| 158 |
| Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
|
| 159 |
| ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
|
| 160 |
+
| HRM cleaned fast-cap stage-1/stage1b | 14.55B | 148G | stage1 completed; active stage1b repeat |
|
| 161 |
+
| HRM cleaned full/no-cap stage2 | 14.55B | 633G | completed stage2 |
|
| 162 |
+
| HRM cleaned full/no-cap extra stage2b | 14.55B | 637G | scheduled after stage1b |
|
| 163 |
| Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
|
| 164 |
| Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
|
| 165 |
| Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
|
|
|
|
| 212 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 213 |
4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
|
| 214 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 215 |
+
6. Continue through `stage3-local-terminal` and `stage4-korean-tool-finance`.
|
| 216 |
+
7. Continue through `stage1b -> stage2b -> stage3b -> stage4b` using actual checkpoint `global_step` metadata for handoff.
|
| 217 |
|
| 218 |
+
Current long-running stage1b settings:
|
| 219 |
|
| 220 |
| Field | Value |
|
| 221 |
|---|---|
|
| 222 |
| Hardware | 8 x NVIDIA H200 |
|
| 223 |
+
| Data | `koterm_hrm_cleaned_fastcap_stage1_v1` |
|
| 224 |
+
| Tokens in current stage dataset | 14.55B |
|
| 225 |
| Global batch | 180,224 tokens |
|
| 226 |
| Local token slots/GPU | 22,528 |
|
| 227 |
| Context | 4,096 |
|
| 228 |
| LR | 2.2e-4 |
|
| 229 |
| LR warmup | 2,000 steps |
|
| 230 |
| Checkpoint interval | 10,000 steps |
|
| 231 |
+
| Current public export | `stage1b step_240000`, EMA, safetensors |
|
| 232 |
|
| 233 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 234 |
|
| 235 |
+
The stage-2 full/no-cap HRM continuation, stage-3 local-terminal continuation, and stage-4 Korean/tool/finance continuation have completed. The active run is `stage1b-hrm-fastcap-repeat`; the remaining `stage2b -> stage3b -> stage4b` chain is handled by a handoff watcher that reads the actual `epoch_1_info.json` `global_step` from each completed checkpoint before starting the next stage.
|
| 236 |
|
| 237 |
## Intended Use
|
| 238 |
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 2768259784
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d50c5619d5f71c2a92797715a661d90739870eec98418ef61d8cf6012435a2ca
|
| 3 |
size 2768259784
|