|
|
---
|
|
|
license: cc-by-nc-4.0
|
|
|
tags:
|
|
|
- sudoku
|
|
|
- reasoning
|
|
|
- pytorch
|
|
|
- rhan
|
|
|
---
|
|
|
|
|
|
# PotatoAGI (RHAN-Sudoku)
|
|
|
|
|
|
This is the official weight repository for the **Recurrent Hybrid Attention Network (RHAN)** trained on Sudoku.
|
|
|
|
|
|
It uses a **Universal Linear Attention** mechanism combined with **Recursive Memory** and was trained using **Adversarial Erasure**.
|
|
|
|
|
|
## Stats
|
|
|
- **Parameters:** ~150k
|
|
|
- **Architecture:** 12-Loop Recurrent CNN + Linear Attention
|
|
|
- **Accuracy:** 99% Cell Accuracy / 90%+ Perfect Solve Rate
|
|
|
- **License:** CC BY-NC 4.0 (Non-Commercial Research Use Only)
|
|
|
|
|
|
## Files in this Repository
|
|
|
|
|
|
model.py # Model architecture (UniversalPotato)
|
|
|
model.safetensors # Trained weights
|
|
|
local_test_sudoku.py # Dataset-based local evaluation
|
|
|
README.md
|
|
|
|
|
|
## Usage
|
|
|
### 1️⃣ Install dependencies
|
|
|
|
|
|
```bash
|
|
|
pip install torch safetensors
|
|
|
```
|
|
|
|
|
|
Python ≥ 3.10 recommended.
|
|
|
|
|
|
2️⃣ Load the model and weights
|
|
|
|
|
|
import torch
|
|
|
from safetensors.torch import load_file
|
|
|
from model import UniversalPotato, HIDDEN_DIM
|
|
|
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
|
|
model = UniversalPotato().to(device)
|
|
|
model.load_state_dict(load_file("model.safetensors"), strict=True)
|
|
|
model.eval()
|
|
|
|
|
|
3️⃣ Run inference on a single Sudoku puzzle
|
|
|
|
|
|
Sudoku grids are represented as a flat tensor of length 81,
|
|
|
with 0 indicating empty cells.
|
|
|
|
|
|
# Example puzzle (0 = empty)
|
|
|
puzzle = [
|
|
|
5,3,0,0,7,0,0,0,0,
|
|
|
6,0,0,1,9,5,0,0,0,
|
|
|
0,9,8,0,0,0,0,6,0,
|
|
|
8,0,0,0,6,0,0,0,3,
|
|
|
4,0,0,8,0,3,0,0,1,
|
|
|
7,0,0,0,2,0,0,0,6,
|
|
|
0,6,0,0,0,0,2,8,0,
|
|
|
0,0,0,4,1,9,0,0,5,
|
|
|
0,0,0,0,8,0,0,7,9,
|
|
|
]
|
|
|
|
|
|
clues = torch.tensor(puzzle, dtype=torch.long).unsqueeze(0).to(device)
|
|
|
board = clues.clone()
|
|
|
memory = torch.zeros(1, HIDDEN_DIM, 9, 9, device=device)
|
|
|
|
|
|
with torch.no_grad():
|
|
|
for _ in range(24): # reasoning steps
|
|
|
logits, memory = model(
|
|
|
clues=clues,
|
|
|
current_board=board,
|
|
|
memory=memory,
|
|
|
blindfold=False,
|
|
|
)
|
|
|
board = logits.argmax(dim=-1)
|
|
|
|
|
|
solution = board.view(9, 9).cpu()
|
|
|
print(solution)
|
|
|
|
|
|
4️⃣ Dataset-based evaluation
|
|
|
|
|
|
To evaluate the model on a real Sudoku dataset:
|
|
|
|
|
|
Download sudoku.csv from Kaggle
|
|
|
👉 https://www.kaggle.com/datasets/rohanrao/sudoku
|
|
|
|
|
|
Place it in the repository root
|
|
|
|
|
|
Run:
|
|
|
|
|
|
python local_test_sudoku.py
|
|
|
|
|
|
This script:
|
|
|
|
|
|
runs multi-step inference
|
|
|
|
|
|
compares predictions against ground truth
|
|
|
|
|
|
reports solve success rate
|
|
|
|
|
|
Notes
|
|
|
|
|
|
This model does not use Hugging Face Transformers
|
|
|
|
|
|
model.py is the authoritative architecture definition
|
|
|
|
|
|
Inference requires multiple recurrent steps for best results
|
|
|
|
|
|
Designed for reasoning research, not commercial deployment
|
|
|
|
|
|
License
|
|
|
|
|
|
This project is released under CC BY-NC 4.0.
|
|
|
|
|
|
You may:
|
|
|
|
|
|
use
|
|
|
|
|
|
modify
|
|
|
|
|
|
redistribute
|
|
|
for non-commercial research purposes only, with attribution.
|
|
|
|
|
|
Commercial use is not permitted.
|
|
|
|
|
|
|