KoHRM-Text-1.4B FullSFT Top2 Terminal Tool Merge Epoch1
This is an experimental full-SFT checkpoint for the KoHRM-Text 1.4B PrefixLM runtime.
Base Model
- Base model:
LLM-OS-Models/KoHRM-Text-1.4B - Relation: full fine-tune (
base_model_relation: finetune) - This repository contains full fine-tuned KoHRM-Text weights exported as
model.safetensors.
It is a fine-tuned version of LLM-OS-Models/KoHRM-Text-1.4B. The training resumed from the stage4d KoHRM checkpoint with a merged terminal/tool dataset built from the current top LFM2.5 terminal SFT runs. The goal is to move KoHRM from generic PrefixLM generation toward TB2-lite terminal next-action JSON outputs.
Training
- Base model:
LLM-OS-Models/KoHRM-Text-1.4B - Dataset:
kohrm_sft_top2_terminal_tool_raw8192_v1 - Context length:
8192 - Approximate training tokens:
245M - Training type: full SFT, not LoRA
- Epochs:
1 - GPUs:
4 x H200 - Global batch size:
90112tokens - Learning rate:
2e-5 - Export format: single-file
model.safetensorsplus tokenizer/config files
Evaluation
TB2-lite full replay evaluation completed on 2026-06-05 KST.
| Checkpoint | Steps | Score | Cmd F1 | Precision | Recall | First Cmd | Valid JSON |
|---|---|---|---|---|---|---|---|
303/303 full replay |
303 | 31.59 | 0.3159 | 0.3859 | 0.3415 | 24.8% | 73.3% |
This final score is above the best completed KoHRM LoRA result (29.11, +2.48) and just below LLM-OS-Models/Ouro-1.4B-Thinking-Terminal-SFT (31.74, -0.15). The main gain over the LoRA runs is better command recall and JSON stability after moving the base weights directly with full SFT. The remaining gap to the stronger Qwen/LFM terminal SFT models is mostly command coverage and first-action accuracy.
The previous KoHRM stage4d results before this full SFT were:
| Model | Score | Cmd F1 | Precision | Recall | First Cmd | Valid JSON |
|---|---|---|---|---|---|---|
KoHRM-Text-1.4B-stage4d direct |
11.48 | 0.1148 | 0.1995 | 0.0961 | 5.9% | 38.9% |
stage4d + terminal-tool-core-r64 LoRA |
29.11 | 0.2911 | 0.3988 | 0.2768 | 22.1% | 63.4% |
LLM-OS-Models/Ouro-1.4B-Thinking-Terminal-SFT |
31.74 | 0.3174 | 0.4062 | 0.3410 | 24.8% | 63.7% |
This full-SFT checkpoint is the best completed KoHRM full-weight result in this repository at upload time, but it should still be treated as experimental because the local HRM-Text PrefixLM runtime is slower than the vLLM chat-model path used by most leaderboard entries.
Usage Notes
This is not a standard Hugging Face AutoModelForCausalLM export yet. It is a KoHRM/HRM-Text PrefixLM checkpoint and currently requires the local HRM-Text runtime.
The local evaluation path used for this export is:
python tb2_lite/scripts/replay_eval_hrm_text.py \
--model /path/to/KoHRM-Text-1.4B-fullsft-top2-terminal-tool-merge-epoch1 \
--model-short KoHRM-Text-1.4B-fullsft-top2-terminal-tool-merge-epoch1 \
--eval-path tb2_lite/data/replay_full.jsonl \
--output-dir tb2_lite/results/kohrm_fullsft_top2_export \
--local-hrm-export \
--base-ckpt-path /path/to/KoHRM-Text-1.4B-fullsft-top2-terminal-tool-merge-gbs180k-4gpu \
--max-model-len 4096 \
--max-tokens 1024 \
--batch-size 16
For now, use the repository README benchmark table as the source of truth for completed scores.
- Downloads last month
- -
Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-Top2-Terminal-Tool-Merge-Epoch1
Base model
LLM-OS-Models/KoHRM-Text-1.4B