Instructions to use jasoncarreira/hrm-text-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jasoncarreira/hrm-text-code with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-code")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-code") model = AutoModelForMultimodalLM.from_pretrained("jasoncarreira/hrm-text-code") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jasoncarreira/hrm-text-code with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jasoncarreira/hrm-text-code" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/jasoncarreira/hrm-text-code
- SGLang
How to use jasoncarreira/hrm-text-code with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jasoncarreira/hrm-text-code" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jasoncarreira/hrm-text-code", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use jasoncarreira/hrm-text-code with Docker Model Runner:
docker model run hf.co/jasoncarreira/hrm-text-code
license: cc-by-nc-4.0
base_model: sapientinc/HRM-Text-1B
library_name: transformers
pipeline_tag: text-generation
language:
- en
tags:
- code
- code-generation
- hrm
- hierarchical-reasoning
- prefix-lm
HRM-Text-1B-code — a code expert (SFT)
Full-parameter SFT of sapientinc/HRM-Text-1B for
Python code generation, trained in the model's synth,cot (reasoning) condition lane. It takes
a base that essentially couldn't code (HumanEval 1.2%) and teaches it to code from just ~25k
instruction→code SFT examples.
Built as the second expert in a skill-composition experiment (can an HRM tool expert + code expert
merge into one model?). Full writeup + code: https://github.com/jasoncarreira/hrm-text-agent.
Companions: hrm-text-agent (tools),
hrm-text-agent-v2 (tools, scaled).
Scores (pass@1)
| Bench | Base | This model |
|---|---|---|
| HumanEval | 1.2% (2/164) | 11.0% (18/164) |
| MBPP | 2.3% (6/257) | 16.7% (43/257) |
Honest positioning: as a standalone code model this is entry-level — roughly StarCoderBase-1B
tier (15% HE), and well below purpose-built small code models (DeepSeek-Coder-1.3B ~35%,
Qwen2.5-Coder-1.5B ~40%+, Phi-1 ~50%). But those were pretrained on hundreds of billions of code
tokens; this learned code from **25k SFT examples on a non-code reasoning base**, so the result is
about sample efficiency, not absolute code SOTA — and plausibly the recurrent reasoning base helps
with code's structured nature. (pass@1 measured with the repo's eval_code.py instruct harness, which
can slightly under-measure vs a model's native eval.)
Training
- full-parameter SFT (sapientinc
cfg_sftrecipe: lr 3e-5, cosine to 10%, AdamW(0.9, 0.95) wd 0.1, 3 epochs,max_len2048, bf16) synth,cotcondition (<|quad_end|><|object_ref_end|>) — deliberately a different lane than the tool expert'sdirect, for the composition experiment- data: ~25k instruction→code examples from
CodeFeedback-Filtered-Instruction
- CodeAlpaca-20k, length-filtered to fit 2048
Usage
HRM-Text is a PrefixLM with a conditioning scheme — generate in the synth,cot lane with
token_type_ids=1 over the prompt. Use the repo harness rather than a bare .generate():
python eval_code.py --bench humaneval --model jasoncarreira/hrm-text-code
Note on composition
The merge experiment found this code expert and the tool expert do not compose in merged weights — a hard tool-XOR-code trade at every coefficient (tools work only at full tool-weight, where code dies; weaken tools at all and they collapse while code recovers). So for a multi-skill HRM agent the path is model-routing between separate experts, not weight-merging. Details in the repo README.
License & lineage
Base is Apache-2.0; the training data (CodeAlpaca / CodeFeedback lineage) is best treated as non-commercial / research. Verify source licenses for your use case.
🤖 Built with Claude Code.