HRM Checkpoint β€” Sudoku Full

Checkpoint from training the Hierarchical Reasoning Model (HRM) on the full Sudoku Extreme dataset, following similar setup to sapientinc/HRM-checkpoint-sudoku-extreme.

python3 ./pretrain.py \
  data_path=data/sudoku-extreme-full \
  epochs=100 \
  eval_interval=100 \
  lr_min_ratio=0.1 \
  global_batch_size=1152 \
  lr=3e-4 \
  puzzle_emb_lr=3e-4 \
  weight_decay=0.1 \
  puzzle_emb_weight_decay=0.1 \
  arch.loss.loss_type=softmax_cross_entropy \
  arch.L_cycles=8 \
  arch.halt_max_steps=8 \
  arch.pos_encodings=learned

I tried to mimic the file structure in sapientinc/HRM-checkpoint-sudoku-extreme, but I figured I'd add some extra stats:

This has the output evaluate.py typically produces across several max_steps settings, in an easier to read JSON format: evaluate-Sudoku-extreme-full.json.

I also ran it in a loop where I whittled down the set to only ones that were unsolved. You can see my method in run_subset.py.. It produces stats.json. That's what I'm graphing below.

And here's a graph of that data, somewhat like Figure 5c in the Hierarchical Reasoning Model paper: _My_ Figure 5c: Inference Time Scaling

(I should say that even though the graph shows at Mmax=1024 exact accuracy being at 100%, it's not really 100%. It's 99.9605%: Which corresponds to 422,619 correct of 422,786 total sudokus. Or 167 unsolved sudokus.)

Perhaps it would be useful to see the results as a table.

Steps Total Solved Solved % Unsolved Unsolved %
0 422,786 0 0.000% 422,786 100.000%
1 422,786 262,006 61.971% 160,780 38.029%
2 422,786 373,996 88.460% 48,790 11.540%
4 422,786 399,675 94.534% 23,111 5.466%
8 422,786 411,387 97.304% 11,399 2.696%
16 422,786 417,326 98.709% 5,460 1.291%
32 422,786 420,155 99.378% 2,631 0.622%
64 422,786 421,523 99.701% 1,263 0.299%
128 422,786 422,111 99.840% 675 0.160%
256 422,786 422,412 99.912% 374 0.088%
512 422,786 422,555 99.945% 231 0.055%
1024 422,786 422,619 99.961% 167 0.039%
2048 422,786 422,654 99.969% 132 0.031%
4096 422,786 422,679 99.975% 107 0.025%
8192 422,786 422,690 99.977% 96 0.023%
16384 422,786 422,702 99.980% 84 0.020%
32768 422,786 422,715 99.983% 71 0.017%
65536 422,786 422,718 99.984% 68 0.016%
131072 422,786 422,724 99.985% 62 0.015%
262144 422,786 422,728 99.986% 58 0.014%
524288 422,786 422,732 99.987% 54 0.013%
1048576 422,786 422,734 99.988% 52 0.012%
2097152 422,786 422,739 99.989% 47 0.011%
4194304 422,786 422,741 99.989% 45 0.011%

πŸ‘‰ Browse the Sudoku puzzles solved by HRM (grouped by required inference steps) here: Sudoku Puzzle Collection

Usage

You should be able to run it as

HRM_LOCATION="/tmp/hrm" # Or wherever
CHECKPOINT_LOCATION="/tmp/HRM-checkpoint-sudoku-full" # Or wherever, of course.

git clone https://github.com/sapientinc/HRM "${HRM_LOCATION}"
# Running this, requires a bunch of configuration. Obviously Sapient has their
# own README.md, etc. But I've made a docker image that you might be able to
# use as a guide as well. I'll link it below.

git clone https://huggingface.co/bnsh/HRM-checkpoint-sudoku-full/ "${CHECKPOINT_LOCATION}"

cd "${HRM_LOCATION}"
python3 ./evaluate.py checkpoint="${CHECKPOINT_LOCATION}/checkpoint" data_path=data/sudoku-extreme-full/

And, here's that Docker image I mentioned: bnsh/hrm-docker (setup and usage guide).

Training Details

  • Hardware: NVIDIA A10g
  • Runtime: β‰ˆ 9 days, 3 hours, 34 minutes, 25 seconds (13174m 24.845s)
  • Parameters: ~27.3M

Final Metrics

Metric Value
Train Accuracy 0.98701
Train Exact Accuracy 0.96367
Train LM Loss 0.27213
Train Q Continue Loss 0.13321
Train Q Halt Accuracy 1.0
Train Q Halt Loss 0.00632
Train Steps 1.90995

Run History (ASCII plots)

num_params ▁
train/accuracy β–‚β–β–‚β–β–β–ƒβ–„β–„β–„β–„β–…β–…β–…β–†β–…β–†β–‡β–…β–‡β–†β–‡β–†β–‡β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
train/count β–β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
train/exact_accuracy β–β–β–‚β–‚β–ƒβ–„β–„β–…β–…β–…β–…β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–‡β–ˆβ–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
train/lm_loss β–ˆβ–ˆβ–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–…β–…β–…β–…β–…β–„β–„β–„β–…β–„β–„β–„β–ƒβ–„β–„β–„β–ƒβ–ƒβ–ƒβ–‚β–β–‚β–‚β–β–β–
train/lr β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–†β–†β–†β–†β–†β–†β–†β–…β–„β–„β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–β–β–β–β–β–β–β–
train/q_continue_loss β–β–„β–ƒβ–ˆβ–ƒβ–…β–…β–†β–…β–…β–…β–†β–†β–†β–…β–†β–…β–…β–†β–†β–…β–…β–…β–„β–…β–…β–„β–…β–…β–„β–„β–„β–„β–ƒβ–ƒβ–„β–ƒβ–ƒβ–ƒβ–‚
train/q_halt_accuracy β–ˆβ–‚β–ˆβ–ˆβ–β–„β–ˆβ–…β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
train/q_halt_loss β–β–ƒβ–‡β–β–„β–†β–„β–†β–‚β–„β–…β–„β–‡β–„β–‡β–„β–„β–ƒβ–†β–…β–‡β–ƒβ–‚β–†β–ˆβ–‡β–†β–ˆβ–…β–„β–„β–†β–†β–…β–„β–†β–‡β–…β–‡β–†
train/steps β–ˆβ–‡β–‡β–ˆβ–†β–…β–…β–…β–„β–‡β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–ˆβ–β–β–

Reference

Reference: Hierarchical Reasoning Model (HRM), Arxiv:2506.21734

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for bnsh/HRM-checkpoint-sudoku-full