Instructions to use jasoncarreira/hrm-text-code with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jasoncarreira/hrm-text-code with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jasoncarreira/hrm-text-code")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("jasoncarreira/hrm-text-code")
model = AutoModelForMultimodalLM.from_pretrained("jasoncarreira/hrm-text-code")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jasoncarreira/hrm-text-code with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jasoncarreira/hrm-text-code"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jasoncarreira/hrm-text-code

SGLang

How to use jasoncarreira/hrm-text-code with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jasoncarreira/hrm-text-code" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jasoncarreira/hrm-text-code" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jasoncarreira/hrm-text-code",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jasoncarreira/hrm-text-code with Docker Model Runner:
```
docker model run hf.co/jasoncarreira/hrm-text-code
```

hrm-text-code / README.md

jasoncarreira

Add model card

969a371 verified 2 days ago

preview code

raw

history blame contribute delete

3.31 kB

	---
	license: cc-by-nc-4.0
	base_model: sapientinc/HRM-Text-1B
	library_name: transformers
	pipeline_tag: text-generation
	language:
	- en
	tags:
	- code
	- code-generation
	- hrm
	- hierarchical-reasoning
	- prefix-lm
	---

	# HRM-Text-1B-code — a code expert (SFT)

	Full-parameter SFT of [`sapientinc/HRM-Text-1B`](https://huggingface.co/sapientinc/HRM-Text-1B) for
	Python code generation, trained in the model's `synth,cot` (reasoning) condition lane. It takes
	a base that essentially couldn't code (HumanEval 1.2%) and teaches it to code from just ~25k
	instruction→code SFT examples.

	Built as the second expert in a skill-composition experiment (can an HRM tool expert + code expert
	merge into one model?). Full writeup + code: https://github.com/jasoncarreira/hrm-text-agent.
	Companions: [`hrm-text-agent`](https://huggingface.co/jasoncarreira/hrm-text-agent) (tools),
	[`hrm-text-agent-v2`](https://huggingface.co/jasoncarreira/hrm-text-agent-v2) (tools, scaled).

	## Scores (pass@1)

	\| Bench \| Base \| This model \|
	\|---\|---\|---\|
	\| HumanEval \| 1.2% (2/164) \| 11.0% (18/164) \|
	\| MBPP \| 2.3% (6/257) \| 16.7% (43/257) \|

	Honest positioning: as a standalone code model this is entry-level — roughly StarCoderBase-1B
	tier (~15% HE), and well below purpose-built small code models (DeepSeek-Coder-1.3B ~35%,
	Qwen2.5-Coder-1.5B ~40%+, Phi-1 ~50%). But those were **pretrained on hundreds of billions of code
	tokens; this learned code from ~25k SFT examples on a non-code reasoning base**, so the result is
	about sample efficiency, not absolute code SOTA — and plausibly the recurrent reasoning base helps
	with code's structured nature. (pass@1 measured with the repo's `eval_code.py` instruct harness, which
	can slightly under-measure vs a model's native eval.)

	## Training
	- full-parameter SFT (sapientinc `cfg_sft` recipe: lr 3e-5, cosine to 10%, AdamW(0.9, 0.95) wd 0.1,
	3 epochs, `max_len` 2048, bf16)
	- `synth,cot` condition (`<\|quad_end\|><\|object_ref_end\|>`) — deliberately a different lane than
	the tool expert's `direct`, for the composition experiment
	- data: ~25k instruction→code examples from
	[CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction)
	+ [CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k), length-filtered to fit 2048

	## Usage
	HRM-Text is a PrefixLM with a conditioning scheme — generate in the `synth,cot` lane with
	`token_type_ids=1` over the prompt. Use the repo harness rather than a bare `.generate()`:
	```bash
	python eval_code.py --bench humaneval --model jasoncarreira/hrm-text-code
	```

	## Note on composition
	The merge experiment found this code expert and the tool expert do not compose in merged weights —
	a hard tool-XOR-code trade at every coefficient (tools work only at full tool-weight, where code dies;
	weaken tools at all and they collapse while code recovers). So for a multi-skill HRM agent the path is
	model-routing between separate experts, not weight-merging. Details in the repo README.

	## License & lineage
	Base is Apache-2.0; the training data (CodeAlpaca / CodeFeedback lineage) is best treated as
	non-commercial / research. Verify source licenses for your use case.

	🤖 Built with Claude Code.