HRM Checkpoint — Sudoku Full

Checkpoint from training the Hierarchical Reasoning Model (HRM) on the full Sudoku Extreme dataset, following similar setup to sapientinc/HRM-checkpoint-sudoku-extreme.

python3 ./pretrain.py \
  data_path=data/sudoku-extreme-full \
  epochs=100 \
  eval_interval=100 \
  lr_min_ratio=0.1 \
  global_batch_size=1152 \
  lr=3e-4 \
  puzzle_emb_lr=3e-4 \
  weight_decay=0.1 \
  puzzle_emb_weight_decay=0.1 \
  arch.loss.loss_type=softmax_cross_entropy \
  arch.L_cycles=8 \
  arch.halt_max_steps=8 \
  arch.pos_encodings=learned

I tried to mimic the file structure in sapientinc/HRM-checkpoint-sudoku-extreme, but I figured I'd add some extra stats:

This has the output evaluate.py typically produces across several max_steps settings, in an easier to read JSON format: evaluate-Sudoku-extreme-full.json.

I also ran it in a loop where I whittled down the set to only ones that were unsolved. You can see my method in run_subset.py.. It produces stats.json. That's what I'm graphing below.

And here's a graph of that data, somewhat like Figure 5c in the Hierarchical Reasoning Model paper:

(I should say that even though the graph shows at M_max=1024 exact accuracy being at 100%, it's not really 100%. It's 99.9605%: Which corresponds to 422,619 correct of 422,786 total sudokus. Or 167 unsolved sudokus.)

Perhaps it would be useful to see the results as a table.

Steps	Total	Solved	Solved %	Unsolved	Unsolved %
0	422,786	0	0.000%	422,786	100.000%
1	422,786	262,006	61.971%	160,780	38.029%
2	422,786	373,996	88.460%	48,790	11.540%
4	422,786	399,675	94.534%	23,111	5.466%
8	422,786	411,387	97.304%	11,399	2.696%
16	422,786	417,326	98.709%	5,460	1.291%
32	422,786	420,155	99.378%	2,631	0.622%
64	422,786	421,523	99.701%	1,263	0.299%
128	422,786	422,111	99.840%	675	0.160%
256	422,786	422,412	99.912%	374	0.088%
512	422,786	422,555	99.945%	231	0.055%
1024	422,786	422,619	99.961%	167	0.039%
2048	422,786	422,654	99.969%	132	0.031%
4096	422,786	422,679	99.975%	107	0.025%
8192	422,786	422,690	99.977%	96	0.023%
16384	422,786	422,702	99.980%	84	0.020%
32768	422,786	422,715	99.983%	71	0.017%
65536	422,786	422,718	99.984%	68	0.016%
131072	422,786	422,724	99.985%	62	0.015%
262144	422,786	422,728	99.986%	58	0.014%
524288	422,786	422,732	99.987%	54	0.013%
1048576	422,786	422,734	99.988%	52	0.012%
2097152	422,786	422,739	99.989%	47	0.011%
4194304	422,786	422,741	99.989%	45	0.011%

👉 Browse the Sudoku puzzles solved by HRM (grouped by required inference steps) here: Sudoku Puzzle Collection

Usage

You should be able to run it as

HRM_LOCATION="/tmp/hrm" # Or wherever
CHECKPOINT_LOCATION="/tmp/HRM-checkpoint-sudoku-full" # Or wherever, of course.

git clone https://github.com/sapientinc/HRM "${HRM_LOCATION}"
# Running this, requires a bunch of configuration. Obviously Sapient has their
# own README.md, etc. But I've made a docker image that you might be able to
# use as a guide as well. I'll link it below.

git clone https://huggingface.co/bnsh/HRM-checkpoint-sudoku-full/ "${CHECKPOINT_LOCATION}"

cd "${HRM_LOCATION}"
python3 ./evaluate.py checkpoint="${CHECKPOINT_LOCATION}/checkpoint" data_path=data/sudoku-extreme-full/

And, here's that Docker image I mentioned: bnsh/hrm-docker (setup and usage guide).

Training Details

Hardware: NVIDIA A10g
Runtime: ≈ 9 days, 3 hours, 34 minutes, 25 seconds (13174m 24.845s)
Parameters: ~27.3M

Final Metrics

Metric	Value
Train Accuracy	0.98701
Train Exact Accuracy	0.96367
Train LM Loss	0.27213
Train Q Continue Loss	0.13321
Train Q Halt Accuracy	1.0
Train Q Halt Loss	0.00632
Train Steps	1.90995

Run History (ASCII plots)

num_params ▁
train/accuracy ▂▁▂▁▁▃▄▄▄▄▅▅▅▆▅▆▇▅▇▆▇▆▇▆▇▇▇▇▇▇██████████
train/count ▁███████████████████████████████████████
train/exact_accuracy ▁▁▂▂▃▄▄▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇▇▇▇█▇█▇███████████
train/lm_loss ██▇▇▇▇▆▆▆▆▅▅▅▅▅▅▅▅▅▄▄▄▅▄▄▄▃▄▄▄▃▃▃▂▁▂▂▁▁▁
train/lr ██████████▇▇▇▆▆▆▆▆▆▆▅▄▄▄▄▄▃▃▃▃▂▂▁▁▁▁▁▁▁▁
train/q_continue_loss ▁▄▃█▃▅▅▆▅▅▅▆▆▆▅▆▅▅▆▆▅▅▅▄▅▅▄▅▅▄▄▄▄▃▃▄▃▃▃▂
train/q_halt_accuracy █▂██▁▄█▅███████████▆████████████████████
train/q_halt_loss ▁▃▇▁▄▆▄▆▂▄▅▄▇▄▇▄▄▃▆▅▇▃▂▆█▇▆█▅▄▄▆▆▅▄▆▇▅▇▆
train/steps █▇▇█▆▅▅▅▄▇▄▃▃▃▃▃▃▂▃▂▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁█▁▁▁

Reference

Reference: Hierarchical Reasoning Model (HRM), Arxiv:2506.21734

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for bnsh/HRM-checkpoint-sudoku-full

Hierarchical Reasoning Model

Paper • 2506.21734 • Published Jun 26, 2025 • 46