Add files using upload-large-folder tool
Browse files- README.md +8 -6
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
-
Current public artifact: `
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
@@ -117,7 +117,8 @@ from simple_inference_engine import inference_load_checkpoint, inference_generat
|
|
| 117 |
|
| 118 |
ckpt = inference_load_checkpoint(
|
| 119 |
ckpt_path="/path/to/KoHRM-Text-1.4B-stage1-hrm-fastcap-gbs180",
|
| 120 |
-
ckpt_epoch=
|
|
|
|
| 121 |
ckpt_use_ema=True,
|
| 122 |
device="cuda",
|
| 123 |
)
|
|
@@ -156,7 +157,7 @@ Completed and prepared datasets:
|
|
| 156 |
| SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
|
| 157 |
| Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
|
| 158 |
| ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
|
| 159 |
-
| HRM cleaned fast-cap stage-1 | 14.55B | 148G |
|
| 160 |
| Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
|
| 161 |
| Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
|
| 162 |
| Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
|
|
@@ -207,14 +208,15 @@ The current public checkpoint was produced through staged pretraining:
|
|
| 207 |
1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
|
| 208 |
2. Continue once more on the same available mix as `stage0b`.
|
| 209 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 210 |
-
4. Convert `
|
|
|
|
| 211 |
|
| 212 |
-
Current long-running stage-
|
| 213 |
|
| 214 |
| Field | Value |
|
| 215 |
|---|---|
|
| 216 |
| Hardware | 8 x NVIDIA H200 |
|
| 217 |
-
| Data | `
|
| 218 |
| Tokens in current stage dataset | 14.55B |
|
| 219 |
| Global batch | 180,224 tokens |
|
| 220 |
| Local token slots/GPU | 22,528 |
|
|
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
+
Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_120000`, converted with EMA weights to `safetensors`. Training is still in progress from this run.
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
|
|
| 117 |
|
| 118 |
ckpt = inference_load_checkpoint(
|
| 119 |
ckpt_path="/path/to/KoHRM-Text-1.4B-stage1-hrm-fastcap-gbs180",
|
| 120 |
+
ckpt_epoch=None,
|
| 121 |
+
ckpt_step=85000,
|
| 122 |
ckpt_use_ema=True,
|
| 123 |
device="cuda",
|
| 124 |
)
|
|
|
|
| 157 |
| SWE-ZERO + GLM pilot mix | 251.2M | 990M | included in stage-0 mix |
|
| 158 |
| Korean legal SFT/task data | 83.1M | 336M | included in stage-0 mix |
|
| 159 |
| ToolBench train tool-call data | 127.0M | 500M | included in stage-0 mix |
|
| 160 |
+
| HRM cleaned fast-cap stage-1 | 14.55B | 148G | completed through latest saved `step_85000` |
|
| 161 |
| Korean statutes/local ordinances raw full | 308.9M | 1.2G | prepared for later stages |
|
| 162 |
| Korean administrative rules + precedents raw full | 271.7M | 1.1G | prepared for later stages |
|
| 163 |
| Korean legal/admin full task data | 629.0M | 2.5G | uploaded to prepared dataset repo |
|
|
|
|
| 208 |
1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
|
| 209 |
2. Continue once more on the same available mix as `stage0b`.
|
| 210 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 211 |
+
4. Convert `stage2 step_120000` EMA weights to `safetensors` and upload to the main model repo.
|
| 212 |
+
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 213 |
|
| 214 |
+
Current long-running stage-2 settings:
|
| 215 |
|
| 216 |
| Field | Value |
|
| 217 |
|---|---|
|
| 218 |
| Hardware | 8 x NVIDIA H200 |
|
| 219 |
+
| Data | `koterm_hrm_cleaned_full_nocap_v1` |
|
| 220 |
| Tokens in current stage dataset | 14.55B |
|
| 221 |
| Global batch | 180,224 tokens |
|
| 222 |
| Local token slots/GPU | 22,528 |
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 2768259784
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:27fbd40942654580e230a7bc20cbd7c1019cc571f2c42d54a3e6985a21cd545d
|
| 3 |
size 2768259784
|