ThomasHeim
/

HRM-Reproduction-Checkpoints

Model card Files Files and versions

ThomasHeim commited on 3 days ago

Commit

427088e

·

verified ·

1 Parent(s): aa6c32b

Create readme.md

Files changed (1) hide show

readme.md +64 -0

readme.md ADDED Viewed

	@@ -0,0 +1,64 @@

+ ---
+  license: apache-2.0
+  tags:
+    - hierarchical-reasoning-model
+    - sudoku
+    - puzzle-solving
+    - recursive-reasoning
+    - adaptive-computation
+  pretty_name: HRM Sudoku-Extreme Checkpoints
+  ---
+  # HRM Sudoku-Extreme Checkpoints
+  Trained checkpoints for reproducing **Hierarchical Reasoning Model (HRM)** results on the Sudoku-Extreme benchmark. Trained on a single NVIDIA GH200 GPU.
+  ## Models
+  ### Original HRM (`sudoku-extreme/original-hrm/`)
+  - **Run name:** liberal-bee
+  - **Architecture:** HierarchicalReasoningModel_ACTV1 (~27M parameters)
+  - **Dataset:** sudoku-extreme-1k-aug-1000 (vanilla, 1000 puzzles, 1000x augmented)
+  - **Training:** 20,000 epochs, lr=7e-5, batch=384, 1 GPU
+  - **Test exact accuracy:** 53% (paper: 55% ±2%)
+  - **Checkpoints:** 20 checkpoints from step 2604 to step 52080
+  ### Augmented HRM (`sudoku-extreme/augmented-hrm/`)
+  - **Run name:** hopeful-quetzal
+  - **Architecture:** HierarchicalReasoningModel_ACTV1 (~27M parameters)
+  - **Dataset:** sudoku-extreme-1k-aug-1000-hint (with easier puzzles mixed in)
+  - **Training:** 40,000 epochs, lr=1e-4, batch=768, 1 GPU
+  - **Peak single-checkpoint test accuracy:** 54.2% (paper: 59.9%)
+  - **Ensemble accuracy (10 ckpts + 9 permutations, 1000 samples):** 90.5% (paper: 96.9%)
+  - **Checkpoints:** 40 checkpoints from step 1302 to step 52080
+  ## Source Papers
+  - **Original HRM:** [Hierarchical Reasoning Model](https://arxiv.org/abs/2506.21734) (Wang et al., 2025)
+  - **Augmented HRM:** [Are Your Reasoning Models Reasoning or Guessing?](https://arxiv.org/abs/2601.10679) (Ren & Liu, 2026)
+  ## How to Evaluate
+  ### Original HRM — Single Checkpoint
+  Train and check `eval/exact_accuracy` in W&B, as described in the [HRM repo](https://github.com/sapientinc/HRM).
+  ### Augmented HRM — Ensemble (10 checkpoints + 9 permutations)
+  Using [batch_inference.py](https://github.com/renrua52/hrm-mechanistic-analysis):
+  ```python
+  # Snapshot evaluation (1000 test samples)
+  python batch_inference.py \
+    --checkpoints "step_40362,step_41664,...,step_52080" \
+    --permutes 9 --num_batch 10 --batch_size 100
+  # Full evaluation (422,786 test samples)
+  python batch_inference.py \
+    --checkpoints "step_40362,step_41664,...,step_52080" \
+    --permutes 9
+  Reproduction Notes
+  - All models trained on a single NVIDIA GH200 GPU (102GB VRAM). The papers used 8 GPUs.
+  - The Original HRM result (53%) falls within the paper's stated ±2% variance for small-sample learning.
+  - The Augmented HRM gap (90.5% vs 96.9%) is attributed to single-GPU vs multi-GPU training dynamics.
+  - Optimizer: adam-atan2-pytorch v0.2.8