YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
PoT Sudoku 78.9% - Pre-trained Checkpoint
This repository contains a pre-trained HybridPoHHRMSolver model achieving 78.9% grid accuracy on Sudoku-Extreme.
Model Details
| Metric | Value |
|---|---|
| Grid Accuracy | 78.9% |
| Cell Accuracy | 97.8% |
| Parameters | 20,799,516 |
| Architecture | HybridPoHHRMSolver |
| Controller | CausalDepthTransformerRouter |
Configuration
config = {
"d_model": 512,
"n_heads": 8,
"H_layers": 2,
"L_layers": 2,
"d_ff": 2048,
"H_cycles": 2,
"L_cycles": 6,
"T": 4,
"dropout": 0.039,
"hrm_grad_style": True,
"halt_max_steps": 2,
"controller_type": "transformer",
"controller_kwargs": {
"n_ctrl_layers": 2,
"n_ctrl_heads": 4,
"d_ctrl": 256,
"max_depth": 32,
"token_conditioned": True,
},
"injection_mode": "none",
"vocab_size": 10,
"num_puzzles": 1,
"puzzle_emb_dim": 512,
}
Usage
import torch
from huggingface_hub import hf_hub_download
from src.pot.models.sudoku_solver import HybridPoHHRMSolver
# Download checkpoint
checkpoint_path = hf_hub_download("Eran92/pot-sudoku-78", "best_model.pt")
# Create model with exact config
model = HybridPoHHRMSolver(
d_model=512,
n_heads=8,
H_layers=2,
L_layers=2,
d_ff=2048,
H_cycles=2,
L_cycles=6,
hrm_grad_style=True,
halt_max_steps=2,
controller_type="transformer",
controller_kwargs={
"n_ctrl_layers": 2,
"n_ctrl_heads": 4,
"d_ctrl": 256,
"max_depth": 32,
"token_conditioned": True,
},
injection_mode="none",
)
# Load weights
checkpoint = torch.load(checkpoint_path, map_location="cpu")
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()
# Solve a puzzle
puzzle = torch.tensor([[5, 3, 0, 0, 7, 0, 0, 0, 0,
6, 0, 0, 1, 9, 5, 0, 0, 0,
0, 9, 8, 0, 0, 0, 0, 6, 0,
8, 0, 0, 0, 6, 0, 0, 0, 3,
4, 0, 0, 8, 0, 3, 0, 0, 1,
7, 0, 0, 0, 2, 0, 0, 0, 6,
0, 6, 0, 0, 0, 0, 2, 8, 0,
0, 0, 0, 4, 1, 9, 0, 0, 5,
0, 0, 0, 0, 8, 0, 0, 7, 9]])
solution = model.solve(puzzle)
print(solution.reshape(9, 9))
Training
Trained on Sudoku-Extreme (10k puzzles) with:
- Batch size: 768
- Learning rate: 3.7e-4
- Epochs: 2001
- On-the-fly augmentation (digit permutation, transpose, shuffling)
Links
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support