KoHRM-Text-1.4B FullSFT LFM25 Terminal ToolBench Epoch2
This is an experimental second-epoch full-SFT checkpoint for the KoHRM-Text 1.4B PrefixLM runtime.
It is a fine-tuned version of LLM-OS-Models/KoHRM-Text-1.4B. It continues from the Epoch1 LFM25/ToolBench full-SFT checkpoint and is intended to test whether an additional pass improves TB2-lite terminal next-action behavior.
Base Model
- Base model:
LLM-OS-Models/KoHRM-Text-1.4B - Relation: full fine-tune (
base_model_relation: finetune) - Parent checkpoint:
LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch1 - Export format: single-file
model.safetensorsplus tokenizer/config files - Runtime: local KoHRM/HRM-Text PrefixLM runtime
Training
- Dataset:
kohrm_sft_lfm25_terminal_toolbench_full_v1 - Source style: LFM2.5 terminal/tool successful data plus ToolBench terminal turns, reprocessed into KoHRM PrefixLM targets
- Context length:
8192 - Approximate training tokens per full pass:
1.51B - Training type: full SFT, not LoRA
- Epochs:
2total on this SFT dataset (epoch2continues fromepoch1) - Epoch2 GPUs:
8 x H200 - Epoch2 global batch size:
180224tokens - Learning rate:
2e-5 - Epoch2 checkpoint:
/home/work/.data/hrm_text_checkpoints/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-from-epoch1-gbs180k-8gpu/fsdp2_epoch_1
Evaluation
TB2-lite full replay evaluation completed on 2026-06-06 KST.
Result JSON:
tb2_lite/results/20260606T_kohrm_lfm25_epoch2_eval_sdpa8_b16/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-sdpa8-b16-merged.json
| Checkpoint | Steps | Score | Cmd F1 | Precision | Recall | First Cmd | Valid JSON | Avg Pred Cmds | Status |
|---|---|---|---|---|---|---|---|---|---|
epoch1 full replay |
303/303 | 38.56 | 0.3856 | 0.4262 | 0.4341 | 37.0% | 55.1% | 27.33 | completed |
epoch2 full replay |
303/303 | 45.90 | 0.4590 | 0.5031 | 0.5098 | 44.9% | 68.3% | 25.16 | completed |
Score = 100 * avg_command_f1.
Result Analysis
Epoch2 is a large gain over Epoch1:
- Score:
38.56 -> 45.90(+7.34) - Precision:
0.4262 -> 0.5031 - Recall:
0.4341 -> 0.5098 - First command exact:
37.0% -> 44.9% - Valid JSON:
55.1% -> 68.3%
The improvement is not just JSON repair. Command precision, command recall, first-action selection, and JSON validity all improved together. That indicates the second pass made the terminal next-action distribution itself closer to the TB2-lite references.
Compared with other local runs:
KoHRM-Text-1.4Bdirect base:11.48- Best KoHRM LoRA:
29.11 - KoHRM Top2 full SFT Epoch1:
31.59 - KoHRM LFM25 full SFT Epoch1:
38.56 - KoHRM LFM25 full SFT Epoch2:
45.90 - Qwen3.5-2B fast-continue fullconv:
44.79 - LFM2.5-8B-A1B ToolBench SFT Epoch2:
50.48 - LFM2.5-8B-A1B ToolBench SFT Epoch1:
52.30
Strong areas in the Epoch2 run:
data_querying:0.6881average command F1data_science:0.4901debugging:0.4857math:0.4845software_engineering:0.4770file_operations:0.4710
Remaining weak areas:
swe:0.3590data_processing:0.4017dependency_management:0.4025security:0.4220model_training:0.4283
The main remaining gap to LFM2.5 is first-action accuracy and late-step command coverage. Epoch2 bucket F1 is 0.5458 early, 0.4533 mid, and 0.3910 late. The model is much better than Epoch1, but late repair/verification steps are still weaker than early exploration steps.
Usage
Use the local HRM-Text PrefixLM evaluator/runtime. Example evaluation command:
python tb2_lite/scripts/replay_eval_hrm_text.py \
--model /path/to/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2 \
--model-short KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2 \
--eval-path tb2_lite/data/replay_full.jsonl \
--output-dir tb2_lite/results/kohrm_lfm25_epoch2_eval \
--local-hrm-export \
--base-ckpt-path /path/to/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-from-epoch1-gbs180k-8gpu \
--max-model-len 8192 \
--max-tokens 1024 \
--condition synth,cot \
--batch-size 16
This is not a standard Hugging Face AutoModelForCausalLM chat-model export yet. It currently requires the local KoHRM/HRM-Text PrefixLM runtime.
For final leaderboard context and completed scores, use the root project README.md as the source of truth.
- Downloads last month
- 46
Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2
Base model
LLM-OS-Models/KoHRM-Text-1.4B