Add files using upload-large-folder tool
Browse files- README.md +4 -4
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -43,7 +43,7 @@ The main model repository is intended to expose the latest model-only artifact:
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
-
Current public artifact: `stage2` HRM full/no-cap checkpoint at `
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
@@ -208,7 +208,7 @@ The current public checkpoint was produced through staged pretraining:
|
|
| 208 |
1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
|
| 209 |
2. Continue once more on the same available mix as `stage0b`.
|
| 210 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 211 |
-
4. Convert
|
| 212 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 213 |
|
| 214 |
Current long-running stage-2 settings:
|
|
@@ -223,8 +223,8 @@ Current long-running stage-2 settings:
|
|
| 223 |
| Context | 4,096 |
|
| 224 |
| LR | 2.2e-4 |
|
| 225 |
| LR warmup | 2,000 steps |
|
| 226 |
-
| Checkpoint interval |
|
| 227 |
-
| Current public export | `
|
| 228 |
|
| 229 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 230 |
|
|
|
|
| 43 |
|
| 44 |
It is not intended to keep every training checkpoint as visible model files. Intermediate FSDP2 `.distcp` checkpoints are large resume artifacts and are kept separately in `LLM-OS-Models/KoHRM-Text-1.4B-raw-checkpoints` when needed. The main repo may still have normal Hugging Face git history, but the current file tree should be treated as the latest public model export.
|
| 45 |
|
| 46 |
+
Current public artifact: `stage2` HRM full/no-cap checkpoint at `step_160000`, converted with EMA weights to `safetensors`. Training is still in progress; the stage-2 run is continuing toward its end-of-stage checkpoint from the same full/no-cap HRM corpus.
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
|
|
|
|
| 208 |
1. Train `stage-0` on `koterm_pretrain_mix_v1` with 711.3M tokens.
|
| 209 |
2. Continue once more on the same available mix as `stage0b`.
|
| 210 |
3. Continue to `stage-1` on HRM cleaned fast-cap data with 14.55B tokens.
|
| 211 |
+
4. Convert intermediate EMA weights to `safetensors` and upload to the main model repo for public inspection.
|
| 212 |
5. Continue from `stage1 step_85000` into `stage2` on full/no-cap HRM cleaned data.
|
| 213 |
|
| 214 |
Current long-running stage-2 settings:
|
|
|
|
| 223 |
| Context | 4,096 |
|
| 224 |
| LR | 2.2e-4 |
|
| 225 |
| LR warmup | 2,000 steps |
|
| 226 |
+
| Checkpoint interval | 10,000 steps |
|
| 227 |
+
| Current public export | `stage2 step_160000`, EMA, safetensors |
|
| 228 |
|
| 229 |
The run uses staged continuation. The checkpoint carries model, optimizer, EMA, and recurrent carry state forward. `resume_step_offset` and `total_steps_override` are used so the learning-rate schedule follows the intended longer pretraining run rather than resetting at every data stage.
|
| 230 |
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 2768259784
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f132e1db6fe5bad4cb12bbf077e90abe77745fc5e9905df1fe47fa71e0c2cf52
|
| 3 |
size 2768259784
|