HRM-Text-1B-code — a code expert (SFT)

Full-parameter SFT of sapientinc/HRM-Text-1B for Python code generation, trained in the model's synth,cot (reasoning) condition lane. It takes a base that essentially couldn't code (HumanEval 1.2%) and teaches it to code from just ~25k instruction→code SFT examples.

Built as the second expert in a skill-composition experiment (can an HRM tool expert + code expert merge into one model?). Full writeup + code: https://github.com/jasoncarreira/hrm-text-agent. Companions: hrm-text-agent (tools), hrm-text-agent-v2 (tools, scaled).

Scores (pass@1)

Bench Base This model
HumanEval 1.2% (2/164) 11.0% (18/164)
MBPP 2.3% (6/257) 16.7% (43/257)

Honest positioning: as a standalone code model this is entry-level — roughly StarCoderBase-1B tier (15% HE), and well below purpose-built small code models (DeepSeek-Coder-1.3B ~35%, Qwen2.5-Coder-1.5B ~40%+, Phi-1 ~50%). But those were pretrained on hundreds of billions of code tokens; this learned code from **25k SFT examples on a non-code reasoning base**, so the result is about sample efficiency, not absolute code SOTA — and plausibly the recurrent reasoning base helps with code's structured nature. (pass@1 measured with the repo's eval_code.py instruct harness, which can slightly under-measure vs a model's native eval.)

Training

  • full-parameter SFT (sapientinc cfg_sft recipe: lr 3e-5, cosine to 10%, AdamW(0.9, 0.95) wd 0.1, 3 epochs, max_len 2048, bf16)
  • synth,cot condition (<|quad_end|><|object_ref_end|>) — deliberately a different lane than the tool expert's direct, for the composition experiment
  • data: ~25k instruction→code examples from CodeFeedback-Filtered-Instruction

Usage

HRM-Text is a PrefixLM with a conditioning scheme — generate in the synth,cot lane with token_type_ids=1 over the prompt. Use the repo harness rather than a bare .generate():

python eval_code.py --bench humaneval --model jasoncarreira/hrm-text-code

Note on composition

The merge experiment found this code expert and the tool expert do not compose in merged weights — a hard tool-XOR-code trade at every coefficient (tools work only at full tool-weight, where code dies; weaken tools at all and they collapse while code recovers). So for a multi-skill HRM agent the path is model-routing between separate experts, not weight-merging. Details in the repo README.

License & lineage

Base is Apache-2.0; the training data (CodeAlpaca / CodeFeedback lineage) is best treated as non-commercial / research. Verify source licenses for your use case.

🤖 Built with Claude Code.

Downloads last month
34
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jasoncarreira/hrm-text-code

Finetuned
(9)
this model