KoHRM-Text-1.4B FullSFT LFM25 Terminal ToolBench Epoch2

This is an experimental second-epoch full-SFT checkpoint for the KoHRM-Text 1.4B PrefixLM runtime.

It is a fine-tuned version of LLM-OS-Models/KoHRM-Text-1.4B. It continues from the Epoch1 LFM25/ToolBench full-SFT checkpoint and is intended to test whether an additional pass improves TB2-lite terminal next-action behavior.

Base Model

Base model: LLM-OS-Models/KoHRM-Text-1.4B
Relation: full fine-tune (base_model_relation: finetune)
Parent checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch1
Export format: single-file model.safetensors plus tokenizer/config files
Runtime: local KoHRM/HRM-Text PrefixLM runtime

Training

Dataset: kohrm_sft_lfm25_terminal_toolbench_full_v1
Source style: LFM2.5 terminal/tool successful data plus ToolBench terminal turns, reprocessed into KoHRM PrefixLM targets
Context length: 8192
Approximate training tokens per full pass: 1.51B
Training type: full SFT, not LoRA
Epochs: 2 total on this SFT dataset (epoch2 continues from epoch1)
Epoch2 GPUs: 8 x H200
Epoch2 global batch size: 180224 tokens
Learning rate: 2e-5
Epoch2 checkpoint: /home/work/.data/hrm_text_checkpoints/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-from-epoch1-gbs180k-8gpu/fsdp2_epoch_1

Evaluation

TB2-lite full replay evaluation completed on 2026-06-06 KST.

Result JSON:

tb2_lite/results/20260606T_kohrm_lfm25_epoch2_eval_sdpa8_b16/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-sdpa8-b16-merged.json

Checkpoint	Steps	Score	Cmd F1	Precision	Recall	First Cmd	Valid JSON	Avg Pred Cmds	Status
`epoch1 full replay`	303/303	38.56	0.3856	0.4262	0.4341	37.0%	55.1%	27.33	completed
`epoch2 full replay`	303/303	45.90	0.4590	0.5031	0.5098	44.9%	68.3%	25.16	completed

Score = 100 * avg_command_f1.

Result Analysis

Epoch2 is a large gain over Epoch1:

Score: 38.56 -> 45.90 (+7.34)
Precision: 0.4262 -> 0.5031
Recall: 0.4341 -> 0.5098
First command exact: 37.0% -> 44.9%
Valid JSON: 55.1% -> 68.3%

The improvement is not just JSON repair. Command precision, command recall, first-action selection, and JSON validity all improved together. That indicates the second pass made the terminal next-action distribution itself closer to the TB2-lite references.

Compared with other local runs:

KoHRM-Text-1.4B direct base: 11.48
Best KoHRM LoRA: 29.11
KoHRM Top2 full SFT Epoch1: 31.59
KoHRM LFM25 full SFT Epoch1: 38.56
KoHRM LFM25 full SFT Epoch2: 45.90
Qwen3.5-2B fast-continue fullconv: 44.79
LFM2.5-8B-A1B ToolBench SFT Epoch2: 50.48
LFM2.5-8B-A1B ToolBench SFT Epoch1: 52.30

Strong areas in the Epoch2 run:

data_querying: 0.6881 average command F1
data_science: 0.4901
debugging: 0.4857
math: 0.4845
software_engineering: 0.4770
file_operations: 0.4710

Remaining weak areas:

swe: 0.3590
data_processing: 0.4017
dependency_management: 0.4025
security: 0.4220
model_training: 0.4283

The main remaining gap to LFM2.5 is first-action accuracy and late-step command coverage. Epoch2 bucket F1 is 0.5458 early, 0.4533 mid, and 0.3910 late. The model is much better than Epoch1, but late repair/verification steps are still weaker than early exploration steps.

Usage

Use the local HRM-Text PrefixLM evaluator/runtime. Example evaluation command:

python tb2_lite/scripts/replay_eval_hrm_text.py \
  --model /path/to/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2 \
  --model-short KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2 \
  --eval-path tb2_lite/data/replay_full.jsonl \
  --output-dir tb2_lite/results/kohrm_lfm25_epoch2_eval \
  --local-hrm-export \
  --base-ckpt-path /path/to/KoHRM-Text-1.4B-fullsft-lfm25-terminal-toolbench-epoch2-from-epoch1-gbs180k-8gpu \
  --max-model-len 8192 \
  --max-tokens 1024 \
  --condition synth,cot \
  --batch-size 16

This is not a standard Hugging Face AutoModelForCausalLM chat-model export yet. It currently requires the local KoHRM/HRM-Text PrefixLM runtime.

For final leaderboard context and completed scores, use the root project README.md as the source of truth.

Downloads last month: 8

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-LFM25-Terminal-ToolBench-Epoch2

Base model

LLM-OS-Models/KoHRM-Text-1.4B

Finetuned

(4)

this model