--- license: cc-by-nc-4.0 base_model: sapientinc/HRM-Text-1B library_name: transformers pipeline_tag: text-generation language: - en tags: - code - code-generation - hrm - hierarchical-reasoning - prefix-lm --- # HRM-Text-1B-code — a code expert (SFT) Full-parameter SFT of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B) for **Python code generation**, trained in the model's **`synth,cot` (reasoning) condition** lane. It takes a base that essentially couldn't code (HumanEval 1.2%) and teaches it to code from just **~25k** instruction→code SFT examples. Built as the second expert in a **skill-composition experiment** (can an HRM tool expert + code expert *merge* into one model?). Full writeup + code: **https://github.com/jasoncarreira/hrm-text-agent**. Companions: [`hrm-text-agent`](https://huggingface.co/jasoncarreira/hrm-text-agent) (tools), [`hrm-text-agent-v2`](https://huggingface.co/jasoncarreira/hrm-text-agent-v2) (tools, scaled). ## Scores (pass@1) | Bench | Base | **This model** | |---|---|---| | HumanEval | 1.2% (2/164) | **11.0% (18/164)** | | MBPP | 2.3% (6/257) | **16.7% (43/257)** | **Honest positioning:** as a standalone code model this is **entry-level** — roughly StarCoderBase-1B tier (~15% HE), and well below purpose-built small code models (DeepSeek-Coder-1.3B ~35%, Qwen2.5-Coder-1.5B ~40%+, Phi-1 ~50%). But those were **pretrained on hundreds of billions of code tokens**; this learned code from **~25k SFT examples on a non-code reasoning base**, so the result is about **sample efficiency**, not absolute code SOTA — and plausibly the recurrent reasoning base helps with code's structured nature. (pass@1 measured with the repo's `eval_code.py` instruct harness, which can slightly *under*-measure vs a model's native eval.) ## Training - full-parameter SFT (sapientinc `cfg_sft` recipe: lr 3e-5, cosine to 10%, AdamW(0.9, 0.95) wd 0.1, 3 epochs, `max_len` 2048, bf16) - **`synth,cot` condition** (`<|quad_end|><|object_ref_end|>`) — deliberately a *different lane* than the tool expert's `direct`, for the composition experiment - **data:** ~25k instruction→code examples from [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) + [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k), length-filtered to fit 2048 ## Usage HRM-Text is a PrefixLM with a conditioning scheme — generate in the `synth,cot` lane with `token_type_ids=1` over the prompt. Use the repo harness rather than a bare `.generate()`: ```bash python eval_code.py --bench humaneval --model jasoncarreira/hrm-text-code ``` ## Note on composition The merge experiment found this code expert and the tool expert **do not compose** in merged weights — a hard tool-XOR-code trade at every coefficient (tools work only at full tool-weight, where code dies; weaken tools at all and they collapse while code recovers). So for a multi-skill HRM agent the path is **model-routing** between separate experts, not weight-merging. Details in the repo README. ## License & lineage Base is Apache-2.0; the training data (CodeAlpaca / CodeFeedback lineage) is best treated as **non-commercial / research**. Verify source licenses for your use case. 🤖 Built with Claude Code.