Spaces:

LLM-course
/

Chess1MChallenge

Sleeping

App Files Files Community

nathanael-fijalkow commited on Jan 20

Commit

cb44915

1 Parent(s): eda3ee4

First version ready with webhook and deterministic eval

Browse files

Files changed (20) hide show

.gitignore +2 -0
README.md +1 -7
TEMPLATE_README.md +428 -86
app.py +446 -497
example_solution/README.md +109 -0
example_solution/config.json +24 -0
{src → example_solution}/data.py +1 -1
{src → example_solution}/model.py +0 -0
example_solution/special_tokens_map.json +6 -0
{src → example_solution}/tokenizer.py +0 -0
example_solution/tokenizer_config.json +50 -0
{src → example_solution}/train.py +54 -10
example_solution/vocab.json +1684 -0
pyproject.toml +1 -5
src/__init__.py +10 -12
src/__main__.py +44 -0
src/evaluate.py +912 -767
src/utils.py +0 -305
submit.py +230 -100
uv.lock +0 -0

.gitignore CHANGED Viewed

@@ -9,6 +9,8 @@ dist/
 build/
 *.egg
 # Virtual environments
 .venv/
 venv/

 build/
 *.egg
+.github/
 # Virtual environments
 .venv/
 venv/

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: Chess Challenge Arena
-emoji: ♟️
 colorFrom: gray
 colorTo: yellow
 sdk: gradio
@@ -16,9 +16,3 @@ short_description: Play Chess like a Honey Bee
 This Space hosts the evaluation arena for the LLM Chess Challenge.
 **Chess Challenge Template**: https://github.com/nathanael-fijalkow/ChessChallengeTemplate
-## Features
-- **Interactive Demo**: Test any submitted model against Stockfish
-- **Leaderboard**: See rankings of all submitted models
-- **Statistics**: View detailed performance metrics

 ---
 title: Chess Challenge Arena
+emoji: chess_pawn
 colorFrom: gray
 colorTo: yellow
 sdk: gradio
 This Space hosts the evaluation arena for the LLM Chess Challenge.
 **Chess Challenge Template**: https://github.com/nathanael-fijalkow/ChessChallengeTemplate

TEMPLATE_README.md CHANGED Viewed

@@ -1,15 +1,16 @@
 # Chess Challenge
-Train a 1M parameter LLM to play chess!
 ## Objective
 Design and train a transformer-based language model to predict chess moves. Your model must:
 1. **Stay under 1M parameters** - This is the hard constraint!
-2. **Use a custom tokenizer** - Design an efficient move-level tokenizer
-3. **Play legal chess** - The model should learn to generate valid moves
-4. **Beat Stockfish** - Your ELO will be measured against Stockfish Level 1
 ## Dataset
@@ -17,7 +18,7 @@ We use the Lichess dataset: [`dlouapre/lichess_2025-01_1M`](https://huggingface.
 The dataset uses an extended UCI notation:
 - `W`/`B` prefix for White/Black
-- Piece letter: `P`=Pawn, `N`=Knight, `B`=Bishop, `R`=Rook, `Q`=Queen, `K`=King
 - Source and destination squares (e.g., `e2e4`)
 - Special suffixes: `(x)`=capture, `(+)`=check, `(+*)`=checkmate, `(o)`/`(O)`=castling
@@ -26,127 +27,468 @@ Example game:
 WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...
 ```
-## Quick Start
-### Train a Model
-```bash
-# Basic training
-python -m src.train \
-    --output_dir ./my_model \
-    --num_train_epochs 3 \
-    --per_device_train_batch_size 32
 ```
-### Evaluate Your Model
-Evaluation happens in two phases:
-```bash
-# Phase 1: Legal Move Evaluation (quick sanity check)
-python -m src.evaluate \
-    --model_path ./my_model \
-    --mode legal \
-    --n_positions 500
-# Phase 2: Win Rate Evaluation (full games against Stockfish)
-python -m src.evaluate \
-    --model_path ./my_model \
-    --mode winrate \
-    --n_games 100 \
-    --stockfish_level 1
-# Or run both phases:
-python -m src.evaluate \
-    --model_path ./my_model \
-    --mode both
 ```
-## Parameter Budget
-Use the utility function to check your budget:
 ```python
-from src import ChessConfig, print_parameter_budget
-config = ChessConfig(
-    vocab_size=1200,
     n_embd=128,
     n_layer=4,
     n_head=4,
 )
-print_parameter_budget(config)
 ```
-### Pro Tips
-1. **Weight Tying**: The default config ties the embedding and output layer weights, saving ~154k parameters
-2. **Vocabulary Size**: Keep it small! ~1200 tokens covers all moves
-3. **Depth vs Width**: With limited parameters, experiment with shallow-but-wide vs deep-but-narrow
-## Customization
-### Custom Tokenizer
-The template provides a move-level tokenizer that builds vocabulary from the actual dataset.
-Feel free to try different approaches!
-### Custom Architecture
-Modify the model in `src/model.py`:
 ```python
-from src import ChessConfig, ChessForCausalLM
-# Customize configuration
-config = ChessConfig(
-    vocab_size=1200,
-    n_embd=128,      # Try 96, 128, or 192
-    n_layer=4,       # Try 3, 4, or 6
-    n_head=4,        # Try 4 or 8
-    n_inner=384,     # Feed-forward dimension (default: 3*n_embd)
-    dropout=0.1,
-    tie_weights=True,
-)
-model = ChessForCausalLM(config)
 ```
-## Evaluation Metrics
-### Phase 1: Legal Move Evaluation
-Tests if your model generates valid chess moves:
-| Metric | Description |
-|--------|-------------|
-| **Legal Rate (1st try)** | % of legal moves on first attempt |
-| **Legal Rate (with retry)** | % of legal moves within 3 attempts |
-> **Target**: >90% legal rate before proceeding to Phase 2
-### Phase 2: Win Rate Evaluation
-Full games against Stockfish to measure playing strength:
 | Metric | Description |
 |--------|-------------|
-| **Win Rate** | % of games won against Stockfish |
-| **ELO Rating** | Estimated rating based on game results |
-| **Avg Game Length** | Average number of moves per game |
-| **Illegal Move Rate** | % of illegal moves during games |
-## Submission
-1. Train your model
-2. Log in to Hugging Face: `hf auth login`
-3. Submit your model using the submission script:
-```bash
-python submit.py --model_path ./my_model/final_model --model_name your-model-name
-```
-The script will:
-- Upload your model to the LLM-course organization
-- Include your HF username in the model card for tracking

 # Chess Challenge
+Train a transformer with less than 1M parameters to play legal chess moves!
 ## Objective
 Design and train a transformer-based language model to predict chess moves. Your model must:
 1. **Stay under 1M parameters** - This is the hard constraint!
+2. **Create a custom tokenizer** - Design your own move-level tokenizer
+3. **Create a custom model architecture** - Build your own transformer
+4. **Play legal chess** - The model should learn to generate valid moves
+5. **Do NOT use python-chess to filter moves** - The model must generate legal moves on its own
 ## Dataset
 The dataset uses an extended UCI notation:
 - `W`/`B` prefix for White/Black
+- Piece letter: `P`=Pawn, `N`=Knight, `B`=Bishop, `R`=Rook, `Q`=Queen, `K`=King
 - Source and destination squares (e.g., `e2e4`)
 - Special suffixes: `(x)`=capture, `(+)`=check, `(+*)`=checkmate, `(o)`/`(O)`=castling
 WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...
 ```
+---
+## Building Your Solution
+You need to create **from scratch**:
+1. A custom tokenizer class
+2. A custom model architecture
+3. A training script
+4. Save everything in the correct format
+A complete working example is available in `example_solution/` - use it as reference, but build your own!
+---
+## Step 1: Create a Custom Tokenizer
+Your tokenizer must inherit from `PreTrainedTokenizer` and implement the required methods.
+### Required Files
+Create a file called `tokenizer.py` with your tokenizer class:
+```python
+import json
+from typing import Dict, List, Optional
+from transformers import PreTrainedTokenizer
+class MyChessTokenizer(PreTrainedTokenizer):
+    """Custom tokenizer for chess moves."""
+    # Tell HuggingFace which files to save/load
+    vocab_files_names = {"vocab_file": "vocab.json"}
+    def __init__(
+        self,
+        vocab_file: Optional[str] = None,
+        **kwargs,
+    ):
+        # Define special tokens
+        self.pad_token = "[PAD]"
+        self.bos_token = "[BOS]"
+        self.eos_token = "[EOS]"
+        self.unk_token = "[UNK]"
+        # Load or create vocabulary
+        if vocab_file is not None:
+            with open(vocab_file, "r") as f:
+                self._vocab = json.load(f)
+        else:
+            # Create default vocab with special tokens
+            self._vocab = {
+                "[PAD]": 0,
+                "[BOS]": 1,
+                "[EOS]": 2,
+                "[UNK]": 3,
+            }
+        self._ids_to_tokens = {v: k for k, v in self._vocab.items()}
+        # Call parent init AFTER setting up vocab
+        super().__init__(
+            pad_token=self.pad_token,
+            bos_token=self.bos_token,
+            eos_token=self.eos_token,
+            unk_token=self.unk_token,
+            **kwargs,
+        )
+    @property
+    def vocab_size(self) -> int:
+        return len(self._vocab)
+    def get_vocab(self) -> Dict[str, int]:
+        return self._vocab.copy()
+    def _tokenize(self, text: str) -> List[str]:
+        """Split text into tokens (moves are space-separated)."""
+        return text.strip().split()
+    def _convert_token_to_id(self, token: str) -> int:
+        return self._vocab.get(token, self._vocab.get(self.unk_token, 0))
+    def _convert_id_to_token(self, index: int) -> str:
+        return self._ids_to_tokens.get(index, self.unk_token)
+    def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None):
+        """Save vocabulary to a JSON file."""
+        import os
+        vocab_file = os.path.join(
+            save_directory,
+            (filename_prefix + "-" if filename_prefix else "") + "vocab.json"
+        )
+        with open(vocab_file, "w") as f:
+            json.dump(self._vocab, f, indent=2)
+        return (vocab_file,)
 ```
+### Building the Vocabulary
+You need to build a vocabulary.
+It could be written from scratch, or inferred from the dataset:
+```python
+from datasets import load_dataset
+# Load dataset
+dataset = load_dataset("dlouapre/lichess_2025-01_1M", split="train")
+# Collect all unique moves
+vocab = {"[PAD]": 0, "[BOS]": 1, "[EOS]": 2, "[UNK]": 3}
+for game in dataset:
+    moves = game["text"].split()
+    for move in moves:
+        if move not in vocab:
+            vocab[move] = len(vocab)
+print(f"Vocabulary size: {len(vocab)}")
+# Save vocabulary
+import json
+with open("vocab.json", "w") as f:
+    json.dump(vocab, f, indent=2)
 ```
+---
+## Step 2: Create a Custom Model
+Your model must inherit from `PreTrainedModel` and use a config that inherits from `PretrainedConfig`.
+### Required Files
+Create a file called `model.py` with your model class:
 ```python
+import torch
+import torch.nn as nn
+from transformers import PretrainedConfig, PreTrainedModel
+from transformers.modeling_outputs import CausalLMOutputWithPast
+class MyChessConfig(PretrainedConfig):
+    """Configuration for the chess model."""
+    model_type = "my_chess_model"
+    def __init__(
+        self,
+        vocab_size: int = 1500,
+        n_embd: int = 128,
+        n_layer: int = 4,
+        n_head: int = 4,
+        n_ctx: int = 256,
+        dropout: float = 0.1,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.vocab_size = vocab_size
+        self.n_embd = n_embd
+        self.n_layer = n_layer
+        self.n_head = n_head
+        self.n_ctx = n_ctx
+        self.dropout = dropout
+class MyChessModel(PreTrainedModel):
+    """A simple transformer for chess move prediction."""
+    config_class = MyChessConfig
+    def __init__(self, config: MyChessConfig):
+        super().__init__(config)
+        # Token and position embeddings
+        self.token_emb = nn.Embedding(config.vocab_size, config.n_embd)
+        self.pos_emb = nn.Embedding(config.n_ctx, config.n_embd)
+        self.dropout = nn.Dropout(config.dropout)
+        # Transformer layers
+        encoder_layer = nn.TransformerEncoderLayer(
+            d_model=config.n_embd,
+            nhead=config.n_head,
+            dim_feedforward=config.n_embd * 4,
+            dropout=config.dropout,
+            batch_first=True,
+        )
+        self.transformer = nn.TransformerEncoder(encoder_layer, config.n_layer)
+        # Output head
+        self.ln_f = nn.LayerNorm(config.n_embd)
+        self.lm_head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
+        # Weight tying (saves parameters!)
+        self.lm_head.weight = self.token_emb.weight
+        self.post_init()
+    def forward(
+        self,
+        input_ids,
+        attention_mask=None,
+        labels=None,
+        **kwargs,
+    ):
+        batch_size, seq_len = input_ids.shape
+        device = input_ids.device
+        # Embeddings
+        positions = torch.arange(seq_len, device=device).unsqueeze(0)
+        x = self.token_emb(input_ids) + self.pos_emb(positions)
+        x = self.dropout(x)
+        # Causal mask for autoregressive generation
+        causal_mask = torch.triu(
+            torch.ones(seq_len, seq_len, device=device) * float('-inf'),
+            diagonal=1
+        )
+        # Transformer
+        x = self.transformer(x, mask=causal_mask)
+        x = self.ln_f(x)
+        logits = self.lm_head(x)
+        # Compute loss if labels provided
+        loss = None
+        if labels is not None:
+            shift_logits = logits[..., :-1, :].contiguous()
+            shift_labels = labels[..., 1:].contiguous()
+            loss = nn.functional.cross_entropy(
+                shift_logits.view(-1, self.config.vocab_size),
+                shift_labels.view(-1),
+                ignore_index=-100,
+            )
+        return CausalLMOutputWithPast(loss=loss, logits=logits)
+    def prepare_inputs_for_generation(self, input_ids, **kwargs):
+        return {"input_ids": input_ids}
+```
+### Parameter Budget Tips
+With 1M parameters, you need to be careful:
+| Component | Formula | Example (128 dim, 1500 vocab) |
+|-----------|---------|------------------------------|
+| Token embeddings | vocab_size x n_embd | 1500 x 128 = 192,000 |
+| Position embeddings | n_ctx x n_embd | 256 x 128 = 32,768 |
+| Transformer layer | ~4 x n_embd^2 | ~65,536 per layer |
+| LM head | 0 (with weight tying) | 0 |
+**Key savings:**
+- **Weight tying**: Share token embeddings with output layer (saves vocab_size x n_embd)
+- **Smaller vocabulary**: Only include moves that appear in training data
+- **Fewer layers**: 4-6 layers is often enough
+---
+## Step 3: Train Your Model
+Create a training script:
+```python
+import torch
+from datasets import load_dataset
+from transformers import Trainer, TrainingArguments
+from model import MyChessConfig, MyChessModel
+from tokenizer import MyChessTokenizer
+# Load tokenizer with your vocabulary
+tokenizer = MyChessTokenizer(vocab_file="vocab.json")
+# Create model
+config = MyChessConfig(
+    vocab_size=tokenizer.vocab_size,
     n_embd=128,
     n_layer=4,
     n_head=4,
 )
+model = MyChessModel(config)
+# Check parameter count
+n_params = sum(p.numel() for p in model.parameters())
+print(f"Parameters: {n_params:,}")
+assert n_params < 1_000_000, f"Model too large: {n_params:,} > 1M"
+# Load and tokenize dataset
+dataset = load_dataset("dlouapre/lichess_2025-01_1M", split="train")
+def tokenize_function(examples):
+    return tokenizer(
+        examples["text"],
+        truncation=True,
+        max_length=256,
+        padding="max_length",
+    )
+tokenized_dataset = dataset.map(tokenize_function, batched=True)
+# Training
+training_args = TrainingArguments(
+    output_dir="./my_model",
+    num_train_epochs=3,
+    per_device_train_batch_size=32,
+    learning_rate=5e-4,
+    save_steps=1000,
+    logging_steps=100,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized_dataset,
+)
+trainer.train()
+# Save final model
+model.save_pretrained("./my_model/final")
+tokenizer.save_pretrained("./my_model/final")
 ```
+---
+## Step 4: Prepare for Submission
+Your model directory must contain these files:
+```
+my_model/
+  config.json           # Model configuration
+  model.safetensors     # Model weights
+  tokenizer_config.json # Tokenizer configuration
+  vocab.json            # Vocabulary
+  model.py              # Your model class
+  tokenizer.py          # Your tokenizer class
+```
+### Adding auto_map for Remote Loading
+The `auto_map` field tells HuggingFace how to load your custom classes with `trust_remote_code=True`.
+**In config.json**, add:
+```json
+{
+  "auto_map": {
+    "AutoConfig": "model.MyChessConfig",
+    "AutoModelForCausalLM": "model.MyChessModel"
+  },
+  ...
+}
+```
+**In tokenizer_config.json**, add:
+```json
+{
+  "auto_map": {
+    "AutoTokenizer": "tokenizer.MyChessTokenizer"
+  },
+  ...
+}
+```
+You can do this programmatically:
 ```python
+# Register for auto loading
+model.config.auto_map = {
+    "AutoConfig": "model.MyChessConfig",
+    "AutoModelForCausalLM": "model.MyChessModel",
+}
+tokenizer.register_for_auto_class("AutoTokenizer")
+# Save
+model.save_pretrained("./my_model/final")
+tokenizer.save_pretrained("./my_model/final")
+# Copy your Python files
+import shutil
+shutil.copy("model.py", "./my_model/final/model.py")
+shutil.copy("tokenizer.py", "./my_model/final/tokenizer.py")
+```
+---
+## Local Evaluation (Optional but Recommended)
+Before submitting, you can evaluate your model locally to check its performance. Since the evaluation is **fully deterministic** (fixed seed, deterministic opponent engine), you will get the exact same results locally as on the HuggingFace Space after submission.
+```bash
+python -m src --model ./my_model/final
 ```
+This runs the same evaluation procedure as the online leaderboard:
+- 500 moves against the deterministic opponent
+- Same random seed (42)
+- Same move generation parameters
+Use this to iterate quickly on your model before pushing to HuggingFace!
+---
+## Step 5: Submit
+```bash
+python submit.py --model_path ./my_model/final --model_name my-chess-model
+```
+The script will:
+1. Validate all required files are present
+2. Check that auto_map is configured
+3. Count parameters and warn if over 1M
+4. Log you into HuggingFace (if needed)
+5. Upload to the LLM-course organization
+---
+## Evaluation
+After submission, go to the [Chess Challenge Arena](https://huggingface.co/spaces/LLM-course/Chess1MChallenge) to run evaluation.
+### Evaluation Procedure
+1. **Parameter Check**: Model must have < 1M parameters
+2. **Security Check**: Code is scanned for illegal python-chess usage
+3. **Game Play**: 500 moves against a deterministic opponent engine
+4. **Move Generation**: 3 retries allowed per move (greedy on 1st try, then sampling)
+5. **Scoring**: Legal move rate (first try and with retries)
+### Scoring
 | Metric | Description |
 |--------|-------------|
+| **Legal Rate (1st try)** | % of moves legal on first attempt |
+| **Legal Rate (with retries)** | % of moves legal within 3 attempts |
+**Target**: >90% legal rate = excellent performance
+---
+## Example Solution
+A complete working example is in `example_solution/`:
+- `model.py` - Full transformer implementation
+- `tokenizer.py` - Complete tokenizer class
+- `train.py` - Training script with data loading
+- `data.py` - Dataset utilities
+Use it as reference to understand the expected format and structure.
+---
+## Rules
+1. **< 1M parameters** - Hard limit, checked automatically
+2. **No python-chess for move filtering** - Model must generate legal moves on its own
+3. **Custom architecture required** - Must include model.py and tokenizer.py
+4. **Use the submission script** - Required for leaderboard tracking
+Good luck!

app.py CHANGED Viewed

@@ -1,21 +1,26 @@
 """
-Play Chess like a Honey Bee
 This Gradio app provides:
-1. Interactive demo to test models
-2. Leaderboard of submitted models
-3. Live game visualization
-Instructions:
 The goal is to train a language model to play chess, under a strict constraint:
 less than 1M parameters! This is approximately the number of neurons of a honey bee.
 Leaderboard data is stored in a private HuggingFace dataset for persistence.
 """
 import io
 import os
 import sys
 from datetime import datetime
 from pathlib import Path
 from typing import Optional
@@ -28,38 +33,138 @@ ORGANIZATION = os.environ.get("HF_ORGANIZATION", "LLM-course")
 LEADERBOARD_DATASET = os.environ.get("LEADERBOARD_DATASET", f"{ORGANIZATION}/chess-challenge-leaderboard")
 LEADERBOARD_FILENAME = "leaderboard.csv"
 HF_TOKEN = os.environ.get("HF_TOKEN")  # Required for private dataset access
-# Evaluation settings
-EVAL_SEED = 42
-EVAL_N_POSITIONS = 500
-STOCKFISH_LEVELS = {
-    "Beginner (Level 0)": 0,
-    "Easy (Level 1)": 1,
-    "Medium (Level 3)": 3,
-    "Hard (Level 5)": 5,
-}
 # CSV columns for the leaderboard
 LEADERBOARD_COLUMNS = [
     "model_id",
     "user_id",
-    "legal_rate",
     "legal_rate_first_try",
-    # "elo",
-    # "win_rate",
-    # "draw_rate",
-    # "games_played",
     "last_updated",
 ]
 def load_leaderboard() -> list:
     """Load leaderboard from private HuggingFace dataset."""
     try:
         from huggingface_hub import hf_hub_download
-        # Download the CSV file from the dataset
         csv_path = hf_hub_download(
             repo_id=LEADERBOARD_DATASET,
             filename=LEADERBOARD_FILENAME,
@@ -72,7 +177,6 @@ def load_leaderboard() -> list:
     except Exception as e:
         print(f"Could not load leaderboard from dataset: {e}")
-        # Return empty list if dataset doesn't exist yet
         return []
@@ -81,7 +185,6 @@ def save_leaderboard(data: list):
     try:
         from huggingface_hub import HfApi
-        # Convert to DataFrame
         df = pd.DataFrame(data, columns=LEADERBOARD_COLUMNS)
         # Fill missing columns with defaults
@@ -89,7 +192,6 @@ def save_leaderboard(data: list):
             if col not in df.columns:
                 df[col] = None
-        # Reorder columns
         df = df[LEADERBOARD_COLUMNS]
         # Convert to CSV bytes
@@ -118,25 +220,21 @@ def get_available_models() -> list:
     try:
         from huggingface_hub import list_models
-        # Get all chess models sorted by newest first
         models = list(list_models(author=ORGANIZATION, sort="lastModified", direction=-1))
         chess_models = [m for m in models if "chess" in m.id.lower()]
-        # Keep only the latest model per user (based on model name pattern: chess-<username>-*)
         seen_users = set()
         filtered_models = []
         for m in chess_models:
-            # Extract username from model id (format: LLM-course/chess-<username>-<modelname>)
-            model_name = m.id.split("/")[-1]  # e.g., "chess-johndoe-mymodel"
             parts = model_name.split("-")
             if len(parts) >= 2:
-                # Username is after "chess-"
                 username = parts[1] if parts[0] == "chess" else None
                 if username and username not in seen_users:
                     seen_users.add(username)
                     filtered_models.append(m.id)
             else:
-                # If pattern doesn't match, include the model anyway
                 filtered_models.append(m.id)
         return filtered_models if filtered_models else ["No models available"]
@@ -145,21 +243,55 @@ def get_available_models() -> list:
         return ["No models available"]
 def format_leaderboard_html(data: list) -> str:
     """Format leaderboard data as HTML table."""
     if not data:
         return "<p>No models evaluated yet. Be the first to submit!</p>"
-    # Keep only the best entry per user
     best_per_user = {}
     for entry in data:
         user_id = entry.get("user_id", "unknown")
-        legal_rate = entry.get("legal_rate", 0)
-        if user_id not in best_per_user or legal_rate > best_per_user[user_id].get("legal_rate", 0):
             best_per_user[user_id] = entry
-    # Sort by legal_rate
-    sorted_data = sorted(best_per_user.values(), key=lambda x: x.get("legal_rate", 0), reverse=True)
     html = """
     <style>
@@ -193,11 +325,10 @@ def format_leaderboard_html(data: list) -> str:
                 <th>Rank</th>
                 <th>User</th>
                 <th>Model</th>
-                <th>Legal Rate</th>
                 <th>Legal Rate (1st try)</th>
-                <!-- <th>ELO</th> -->
-                <!-- <th>Win Rate</th> -->
-                <!-- <th>Games</th> -->
                 <th>Last Updated</th>
             </tr>
         </thead>
@@ -211,7 +342,7 @@ def format_leaderboard_html(data: list) -> str:
         model_url = f"https://huggingface.co/{entry['model_id']}"
         # Color code legal rate
-        legal_rate = entry.get('legal_rate', 0)
         if legal_rate >= 0.9:
             legal_class = "legal-good"
         elif legal_rate >= 0.7:
@@ -219,19 +350,21 @@ def format_leaderboard_html(data: list) -> str:
         else:
             legal_class = "legal-bad"
-        legal_rate_first_try = entry.get('legal_rate_first_try', 0)
         user_id = entry.get('user_id', 'unknown')
         user_url = f"https://huggingface.co/{user_id}"
         html += f"""
             <tr>
                 <td class="{rank_class}">{rank_display}</td>
                 <td><a href="{user_url}" target="_blank" class="model-link">{user_id}</a></td>
                 <td><a href="{model_url}" target="_blank" class="model-link">{entry['model_id'].split('/')[-1]}</a></td>
                 <td class="{legal_class}">{legal_rate*100:.1f}%</td>
-                <td>{legal_rate_first_try*100:.1f}%</td>
-                <!-- <td><strong>{entry.get('elo', 'N/A')}</strong></td> -->
-                <!-- <td>{entry.get('win_rate', 0)*100:.1f}%</td> -->
-                <!-- <td>{entry.get('games_played', 0)}</td> -->
                 <td>{entry.get('last_updated', 'N/A')}</td>
             </tr>
         """
@@ -240,377 +373,160 @@ def format_leaderboard_html(data: list) -> str:
     return html
-def render_board_svg(fen: str = "startpos") -> str:
-    """Render a chess board as SVG."""
-    try:
-        import chess
-        import chess.svg
-        if fen == "startpos":
-            board = chess.Board()
-        else:
-            board = chess.Board(fen)
-        return chess.svg.board(board, size=400)
-    except ImportError:
-        return "<p>Install python-chess to see the board</p>"
-def play_move(
     model_id: str,
-    current_fen: str,
-    move_history: str,
-    temperature: float,
-) -> tuple:
-    """Play a move with the selected model."""
     try:
-        import chess
-        import torch
-        import sys
         sys.path.insert(0, str(Path(__file__).parent))
-        from src.evaluate import load_model_from_hub
-        # Load model using the same method as evaluation
-        model, tokenizer = load_model_from_hub(model_id)
-        model.eval()
-        # Setup board
-        board = chess.Board(current_fen) if current_fen != "startpos" else chess.Board()
-        # Tokenize history
-        if move_history:
-            inputs = tokenizer(move_history, return_tensors="pt")
-        else:
-            inputs = tokenizer(tokenizer.bos_token, return_tensors="pt")
-        # Generate move
-        with torch.no_grad():
-            outputs = model(**inputs)
-            logits = outputs.logits[:, -1, :] / temperature
-            probs = torch.softmax(logits, dim=-1)
-            next_token = torch.multinomial(probs, num_samples=1)
-        move_token = tokenizer.decode(next_token[0])
-        # Parse move
-        if len(move_token) >= 6:
-            uci_move = move_token[2:4] + move_token[4:6]
-            try:
-                move = chess.Move.from_uci(uci_move)
-                if move in board.legal_moves:
-                    board.push(move)
-                    new_history = f"{move_history} {move_token}".strip()
-                    return (
-                        render_board_svg(board.fen()),
-                        board.fen(),
-                        new_history,
-                        f"Model played: {move_token} ({uci_move})",
-                    )
-            except:
-                pass
-        return (
-            render_board_svg(current_fen if current_fen != "startpos" else None),
-            current_fen,
-            move_history,
-            f"Model generated illegal move: {move_token}",
-        )
-    except Exception as e:
-        return (
-            render_board_svg(),
-            "startpos",
-            "",
-            f"Error: {str(e)}",
         )
-def get_model_submitter(model_id: str) -> Optional[str]:
-    """Extract the submitter's username from the model's README on HuggingFace.
-    Returns None if the submitter cannot be determined.
-    """
-    try:
-        from huggingface_hub import hf_hub_download
-        import re
-        # Download the README.md from the model repo
-        readme_path = hf_hub_download(
-            repo_id=model_id,
-            filename="README.md",
-            token=HF_TOKEN,
-        )
-        with open(readme_path, "r") as f:
-            readme_content = f.read()
-        # Look for the pattern: **Submitted by**: [username](https://huggingface.co/username)
-        match = re.search(r'\*\*Submitted by\*\*:\s*\[([^\]]+)\]', readme_content)
-        if match:
-            return match.group(1)
-        # Fallback: try to get from model info
-        from huggingface_hub import model_info
-        info = model_info(model_id, token=HF_TOKEN)
-        if info.author:
-            return info.author
-    except Exception as e:
-        print(f"Could not extract submitter from model: {e}")
-    return None
-def evaluate_legal_moves(
-    model_id: str,
-    progress: gr.Progress = gr.Progress(),
-) -> str:
-    """Evaluate a model's legal move generation."""
-    try:
-        import sys
-        import io
-        from contextlib import redirect_stdout
-        sys.path.insert(0, str(Path(__file__).parent))
-        from src.evaluate import ChessEvaluator, load_model_from_hub
-        progress(0, desc="Loading model...")
-        # Capture tokenizer debug info
-        debug_output = io.StringIO()
-        with redirect_stdout(debug_output):
-            model, tokenizer = load_model_from_hub(model_id, verbose=True)
-        tokenizer_info = debug_output.getvalue()
-        progress(0.1, desc="Setting up evaluator...")
-        evaluator = ChessEvaluator(
-            model=model,
-            tokenizer=tokenizer,
-            stockfish_level=1,  # Not used for legal move eval
-        )
-        progress(0.2, desc=f"Testing {EVAL_N_POSITIONS} positions...")
-        results = evaluator.evaluate_legal_moves(
-            n_positions=EVAL_N_POSITIONS,
-            verbose=False,
-            seed=EVAL_SEED,
-        )
-        # Extract user_id from model's README (submitted by field)
         user_id = get_model_submitter(model_id)
         if user_id is None:
-            return f"""## Evaluation Failed
 Could not determine the submitter for model `{model_id}`.
 Please ensure your model was submitted using the official submission script (`submit.py`),
 which adds the required metadata to the README.md file.
 """
-        # Update leaderboard - only one entry per user, keep the best
         leaderboard = load_leaderboard()
-        # Find existing entry for this user (not model - one entry per user)
         user_entry = next((e for e in leaderboard if e.get("user_id") == user_id), None)
-        new_legal_rate = results.get("legal_rate_with_retry", 0)
-        new_legal_rate_first_try = results.get("legal_rate_first_try", 0)
         if user_entry is None:
-            # New user - add to leaderboard
-            entry = {
-                "model_id": model_id,
-                "user_id": user_id,
-                "legal_rate": new_legal_rate,
-                "legal_rate_first_try": new_legal_rate_first_try,
-                "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
-            }
-            leaderboard.append(entry)
             save_leaderboard(leaderboard)
             update_message = "New entry added to leaderboard!"
         else:
-            # Existing user - only update if this submission is better
-            old_legal_rate = user_entry.get("legal_rate", 0)
-            old_model = user_entry.get("model_id", "unknown")
-            if new_legal_rate > old_legal_rate:
-                user_entry.update({
-                    "model_id": model_id,  # Update to new model if better
-                    "legal_rate": new_legal_rate,
-                    "legal_rate_first_try": new_legal_rate_first_try,
-                    "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
-                })
                 save_leaderboard(leaderboard)
-                if old_model != model_id:
-                    update_message = f"🎉 Improved! New best model for {user_id}: {old_legal_rate*100:.1f}% → {new_legal_rate*100:.1f}%"
-                else:
-                    update_message = f"🎉 Improved! Previous: {old_legal_rate*100:.1f}% → New: {new_legal_rate*100:.1f}%"
             else:
-                update_message = f"ℹ️ No improvement. Your best: {old_legal_rate*100:.1f}% (model: {old_model.split('/')[-1]}), This run: {new_legal_rate*100:.1f}%"
-        progress(1.0, desc="Done!")
-        # Format tokenizer info for display
-        tokenizer_debug = tokenizer_info.strip().replace("   ", "- ")
-        return f"""
-## Legal Move Evaluation for {model_id.split('/')[-1]}
-| Metric | Value |
-|--------|-------|
-| **Positions Tested** | {results['total_positions']} |
-| **Legal (1st try)** | {results['legal_first_try']} ({results['legal_rate_first_try']*100:.1f}%) |
-| **Legal (with retries)** | {results['legal_first_try'] + results['legal_with_retry']} ({results['legal_rate_with_retry']*100:.1f}%) |
-| **Always Illegal** | {results['illegal_all_retries']} ({results['illegal_rate']*100:.1f}%) |
-### Tokenizer Info
-```
-{tokenizer_debug}
-```
 ### Leaderboard Update
 {update_message}
-### Interpretation
-- **>90% legal rate**: Great! Model has learned chess rules well.
-- **70-90% legal rate**: Decent, but room for improvement.
-- **<70% legal rate**: Model struggles with legal move generation.
 """
     except Exception as e:
-        return f"Evaluation failed: {str(e)}"
-# def evaluate_winrate(
-#     model_id: str,
-#     stockfish_level: str,
-#     n_games: int,
-#     progress: gr.Progress = gr.Progress(),
-# ) -> str:
-#     """Evaluate a model's win rate against Stockfish."""
-#     try:
-#         import sys
-#         sys.path.insert(0, str(Path(__file__).parent))
-#
-#         from src.evaluate import ChessEvaluator, load_model_from_hub
-#
-#         progress(0, desc="Loading model...")
-#         model, tokenizer = load_model_from_hub(model_id)
-#
-#         progress(0.1, desc="Setting up Stockfish...")
-#         level = STOCKFISH_LEVELS.get(stockfish_level, 1)
-#         evaluator = ChessEvaluator(
-#             model=model,
-#             tokenizer=tokenizer,
-#             stockfish_level=level,
-#         )
-#
-#         progress(0.2, desc=f"Playing {n_games} games...")
-#         results = evaluator.evaluate(n_games=n_games, verbose=False)
-#
-#         # Update leaderboard
-#         leaderboard = load_leaderboard()
-#         entry = next((e for e in leaderboard if e["model_id"] == model_id), None)
-#         if entry is None:
-#             entry = {"model_id": model_id}
-#             leaderboard.append(entry)
-#
-#         entry.update({
-#             "elo": results.get("estimated_elo", 1000),
-#             "win_rate": results.get("win_rate", 0),
-#             "games_played": entry.get("games_played", 0) + n_games,
-#             "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
-#         })
-#
-#         save_leaderboard(leaderboard)
-#         progress(1.0, desc="Done!")
-#
-#         return f"""
-# ## Win Rate Evaluation for {model_id.split('/')[-1]}
-#
-# | Metric | Value |
-# |--------|-------|
-# | **Estimated ELO** | {results.get('estimated_elo', 'N/A'):.0f} |
-# | **Win Rate** | {results.get('win_rate', 0)*100:.1f}% |
-# | **Draw Rate** | {results.get('draw_rate', 0)*100:.1f}% |
-# | **Loss Rate** | {results.get('loss_rate', 0)*100:.1f}% |
-# | **Avg Game Length** | {results.get('avg_game_length', 0):.1f} moves |
-# | **Illegal Move Rate** | {results.get('illegal_move_rate', 0)*100:.2f}% |
-#
-# Games played: {n_games} against Stockfish {stockfish_level}
-# """
-#
-#     except Exception as e:
-#         return f"Evaluation failed: {str(e)}"
-# def evaluate_model(
-#     model_id: str,
-#     stockfish_level: str,
-#     n_games: int,
-#     progress: gr.Progress = gr.Progress(),
-# ) -> str:
-#     """Evaluate a model against Stockfish."""
-#     try:
-#         # Import evaluation code
-#         import sys
-#         sys.path.insert(0, str(Path(__file__).parent))
-#
-#         from src.evaluate import ChessEvaluator, load_model_from_hub
-#
-#         progress(0, desc="Loading model...")
-#         model, tokenizer = load_model_from_hub(model_id)
-#
-#         progress(0.1, desc="Setting up Stockfish...")
-#         level = STOCKFISH_LEVELS.get(stockfish_level, 1)
-#         evaluator = ChessEvaluator(
-#             model=model,
-#             tokenizer=tokenizer,
-#             stockfish_level=level,
-#         )
-#
-#         progress(0.2, desc=f"Playing {n_games} games...")
-#         results = evaluator.evaluate(n_games=n_games, verbose=False)
-#
-#         # Update leaderboard
-#         leaderboard = load_leaderboard()
-#
-#         # Find or create entry
-#         entry = next((e for e in leaderboard if e["model_id"] == model_id), None)
-#         if entry is None:
-#             entry = {"model_id": model_id}
-#             leaderboard.append(entry)
-#
-#         entry.update({
-#             "elo": results.get("estimated_elo", 1000),
-#             "win_rate": results.get("win_rate", 0),
-#             "games_played": entry.get("games_played", 0) + n_games,
-#             "illegal_rate": results.get("illegal_move_rate", 0),
-#             "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
-#         })
-#
-#         save_leaderboard(leaderboard)
-#
-#         progress(1.0, desc="Done!")
-#
-#         return f"""
-# ## Evaluation Results for {model_id.split('/')[-1]}
-#
-# | Metric | Value |
-# |--------|-------|
-# | **Estimated ELO** | {results.get('estimated_elo', 'N/A'):.0f} |
-# | **Win Rate** | {results.get('win_rate', 0)*100:.1f}% |
-# | **Draw Rate** | {results.get('draw_rate', 0)*100:.1f}% |
-# | **Loss Rate** | {results.get('loss_rate', 0)*100:.1f}% |
-# | **Avg Game Length** | {results.get('avg_game_length', 0):.1f} moves |
-# | **Illegal Move Rate** | {results.get('illegal_move_rate', 0)*100:.2f}% |
-#
-# Games played: {n_games} against Stockfish {stockfish_level}
-# """
-#
-#     except Exception as e:
-#         return f"Evaluation failed: {str(e)}"
 def refresh_leaderboard() -> str:
@@ -618,7 +534,10 @@ def refresh_leaderboard() -> str:
     return format_leaderboard_html(load_leaderboard())
-# Build Gradio Interface
 with gr.Blocks(
     title="Play Chess like a Honey Bee",
     theme=gr.themes.Soft(),
@@ -633,178 +552,208 @@ with gr.Blocks(
     """)
     with gr.Tabs():
-        # Submission Guide Tab
-        with gr.TabItem("How to Submit"):
             gr.Markdown(f"""
             ### Submitting Your Model
-            The goal is to create a chess-playing language model with **under 1 million parameters**, which is roughly the number of neurons in a honey bee's brain.
-            At this scale, efficiency and clever architecture choices are key! We are not targetting superhuman performance, but rather exploring how well small models can learn the rules of chess, the goal being (only) to play **legal moves**.
-            0. **Clone this repository**:
                 ```bash
                 git clone https://huggingface.co/spaces/LLM-course/Chess1MChallenge
                 ```
-                and check the `TEMPLATE_README.md` for detailed instructions.
-            1. **Train your model**
-            2. **Push to Hugging Face Hub** using the `submit.py` script provided in the template to make sure that your model is registered correctly.
-            3. **Verify your submission** by checking the model page on Hugging Face
-            4. **Run evaluations**:
             ### Requirements
             - Model must be under **1M parameters**
-            - Model must use the `ChessConfig` and `ChessForCausalLM` classes
             - Include the tokenizer with your submission
             ### Tips for Better Performance
             - Experiment with different architectures (layers, heads, dimensions)
             - Try weight tying to save parameters
             """)
-        # Interactive Demo Tab (commented out for now)
-        # with gr.TabItem("🎮 Interactive Demo"):
-        #     gr.Markdown("### Test a Model")
-        #
-        #     with gr.Row():
-        #         with gr.Column(scale=1):
-        #             with gr.Row():
-        #                 model_dropdown = gr.Dropdown(
-        #                     choices=get_available_models(),
-        #                     label="Select Model",
-        #                     value=None,
-        #                     scale=4,
-        #                 )
-        #                 refresh_models_btn = gr.Button("🔄", scale=1)
-        #             temperature_slider = gr.Slider(
-        #                 minimum=0.1,
-        #                 maximum=2.0,
-        #                 value=0.7,
-        #                 step=0.1,
-        #                 label="Temperature",
-        #             )
-        #
-        #             with gr.Row():
-        #                 play_btn = gr.Button("Model Move", variant="primary")
-        #                 reset_btn = gr.Button("Reset")
-        #
-        #             status_text = gr.Textbox(label="Status", interactive=False)
-        #
-        #         with gr.Column(scale=1):
-        #             board_display = gr.HTML(value=render_board_svg())
-        #
-        #     # Hidden state
-        #     current_fen = gr.State("startpos")
-        #     move_history = gr.State("")
-        #
-        #     def refresh_models():
-        #         return gr.update(choices=get_available_models())
-        #
-        #     refresh_models_btn.click(
-        #         refresh_models,
-        #         outputs=[model_dropdown],
-        #     )
-        #
-        #     play_btn.click(
-        #         play_move,
-        #         inputs=[model_dropdown, current_fen, move_history, temperature_slider],
-        #         outputs=[board_display, current_fen, move_history, status_text],
-        #     )
-        #
-        #     def reset_game():
-        #         return render_board_svg(), "startpos", "", "Game reset!"
-        #
-        #     reset_btn.click(
-        #         reset_game,
-        #         outputs=[board_display, current_fen, move_history, status_text],
-        #     )
-        # Legal Move Evaluation Tab
-        with gr.TabItem("Legal Move Eval"):
             gr.Markdown("""
-            ### Phase 1: Legal Move Evaluation
-            Test if your model can generate **legal chess moves** in random positions.
-            - Tests the model on random board positions
-            - Measures how often it generates legal moves
             """)
             with gr.Row():
-                legal_model = gr.Dropdown(
                     choices=get_available_models(),
                     label="Model to Evaluate",
                 )
-                refresh_legal_models_btn = gr.Button("🔄", scale=0, min_width=40)
-            def refresh_legal_models():
                 return gr.update(choices=get_available_models())
-            refresh_legal_models_btn.click(
-                refresh_legal_models,
-                outputs=[legal_model],
             )
-            legal_btn = gr.Button("Run Legal Move Evaluation", variant="primary")
-            legal_results = gr.Markdown()
-            legal_btn.click(
-                evaluate_legal_moves,
-                inputs=[legal_model],
-                outputs=legal_results,
             )
-        # Win Rate Evaluation Tab (commented out for now)
-        # with gr.TabItem("🏆 Win Rate Eval"):
-        #     gr.Markdown("""
-        #     ### Phase 2: Win Rate Evaluation
-        #
-        #     Play full games against Stockfish and measure win rate.
-        #     This evaluation computes your model's **ELO rating**.
-        #
-        #     - Plays complete games against Stockfish
-        #     - Measures win/draw/loss rates
-        #     - Estimates ELO rating
-        #     """)
-        #
-        #     with gr.Row():
-        #         eval_model = gr.Dropdown(
-        #             choices=get_available_models(),
-        #             label="Model to Evaluate",
-        #         )
-        #         eval_level = gr.Dropdown(
-        #             choices=list(STOCKFISH_LEVELS.keys()),
-        #             value="Easy (Level 1)",
-        #             label="Stockfish Level",
-        #         )
-        #         eval_games = gr.Slider(
-        #             minimum=10,
-        #             maximum=100,
-        #             value=50,
-        #             step=10,
-        #             label="Number of Games",
-        #         )
-        #
-        #     eval_btn = gr.Button("Run Win Rate Evaluation", variant="primary")
-        #     eval_results = gr.Markdown()
-        #
-        #     eval_btn.click(
-        #         evaluate_winrate,
-        #         inputs=[eval_model, eval_level, eval_games],
-        #         outputs=eval_results,
-        #     )
-        # Leaderboard Tab (moved to the end)
-        with gr.TabItem("🏆 Leaderboard"):
             gr.Markdown("### Current Rankings")
             leaderboard_html = gr.HTML(value=format_leaderboard_html(load_leaderboard()))
             refresh_btn = gr.Button("Refresh Leaderboard")
             refresh_btn.click(refresh_leaderboard, outputs=leaderboard_html)
 if __name__ == "__main__":
     demo.launch(server_name="0.0.0.0", server_port=7860)

 """
+Play Chess like a Honey Bee - Chess Challenge Arena
 This Gradio app provides:
+1. Leaderboard of submitted models
+2. Model evaluation interface
+3. Submission guide
+4. Webhook endpoint for automatic evaluation
 The goal is to train a language model to play chess, under a strict constraint:
 less than 1M parameters! This is approximately the number of neurons of a honey bee.
 Leaderboard data is stored in a private HuggingFace dataset for persistence.
 """
+import hashlib
+import hmac
 import io
+import json
 import os
+import queue
 import sys
+import threading
 from datetime import datetime
 from pathlib import Path
 from typing import Optional
 LEADERBOARD_DATASET = os.environ.get("LEADERBOARD_DATASET", f"{ORGANIZATION}/chess-challenge-leaderboard")
 LEADERBOARD_FILENAME = "leaderboard.csv"
 HF_TOKEN = os.environ.get("HF_TOKEN")  # Required for private dataset access
+WEBHOOK_SECRET = os.environ.get("WEBHOOK_SECRET", "459f4c2c6b0b4b6468e21f981103753d14219d4955f07ab457e100fee93cae66")
 # CSV columns for the leaderboard
 LEADERBOARD_COLUMNS = [
     "model_id",
     "user_id",
+    "n_parameters",
     "legal_rate_first_try",
+    "legal_rate_with_retry",
+    "games_played",
     "last_updated",
 ]
+# =============================================================================
+# Webhook Queue and Worker
+# =============================================================================
+eval_queue = queue.Queue()
+eval_status = {}  # Track status of queued evaluations
+eval_lock = threading.Lock()
+def evaluation_worker():
+    """Background worker that processes evaluation queue."""
+    while True:
+        try:
+            model_id = eval_queue.get()
+            with eval_lock:
+                eval_status[model_id] = "running"
+            print(f"[Webhook Worker] Starting evaluation for: {model_id}")
+            try:
+                sys.path.insert(0, str(Path(__file__).parent))
+                from src.evaluate import (
+                    ChessEvaluator,
+                    load_model_and_tokenizer,
+                    post_discussion_summary,
+                )
+                # Load and evaluate
+                model, tokenizer, _ = load_model_and_tokenizer(model_id, verbose=True)
+                evaluator = ChessEvaluator(model=model, tokenizer=tokenizer, model_path=model_id)
+                result = evaluator.evaluate(verbose=True)
+                # Update leaderboard if evaluation succeeded
+                if result.passed_param_check and result.passed_pychess_check and not result.error_message:
+                    user_id = get_model_submitter(model_id)
+                    if user_id:
+                        leaderboard = load_leaderboard()
+                        user_entry = next((e for e in leaderboard if e.get("user_id") == user_id), None)
+                        new_entry = {
+                            "model_id": model_id,
+                            "user_id": user_id,
+                            "n_parameters": result.n_parameters,
+                            "legal_rate_first_try": result.legal_rate_first_try,
+                            "legal_rate_with_retry": result.legal_rate_with_retry,
+                            "games_played": result.games_played,
+                            "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
+                        }
+                        if user_entry is None:
+                            leaderboard.append(new_entry)
+                            save_leaderboard(leaderboard)
+                            print(f"[Webhook Worker] Added {model_id} to leaderboard")
+                        elif result.legal_rate_with_retry > user_entry.get("legal_rate_with_retry", 0):
+                            user_entry.update(new_entry)
+                            save_leaderboard(leaderboard)
+                            print(f"[Webhook Worker] Updated {model_id} on leaderboard (improvement)")
+                        else:
+                            print(f"[Webhook Worker] {model_id} - no improvement, not updating leaderboard")
+                        # Post results to model discussion
+                        if HF_TOKEN:
+                            try:
+                                post_discussion_summary(model_id, result, HF_TOKEN)
+                                print(f"[Webhook Worker] Posted results to {model_id} discussion")
+                            except Exception as e:
+                                print(f"[Webhook Worker] Failed to post discussion: {e}")
+                    else:
+                        print(f"[Webhook Worker] Could not determine submitter for {model_id}")
+                else:
+                    print(f"[Webhook Worker] Evaluation failed for {model_id}: {result.error_message}")
+                with eval_lock:
+                    eval_status[model_id] = "completed"
+            except Exception as e:
+                print(f"[Webhook Worker] Error evaluating {model_id}: {e}")
+                with eval_lock:
+                    eval_status[model_id] = f"error: {str(e)}"
+        except Exception as e:
+            print(f"[Webhook Worker] Queue error: {e}")
+        finally:
+            eval_queue.task_done()
+# Start the background worker thread
+worker_thread = threading.Thread(target=evaluation_worker, daemon=True)
+worker_thread.start()
+print("[Webhook] Evaluation worker started")
+def is_chess_model(model_id: str) -> bool:
+    """Check if a model ID looks like a chess challenge submission."""
+    if not model_id.startswith(f"{ORGANIZATION}/"):
+        return False
+    model_name = model_id.split("/")[-1].lower()
+    return "chess" in model_name
+def verify_webhook_signature(body: bytes, signature: str) -> bool:
+    """Verify the webhook signature using HMAC-SHA256."""
+    if not WEBHOOK_SECRET:
+        return True  # Skip verification if no secret configured
+    expected = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256).hexdigest()
+    return hmac.compare_digest(signature or "", expected)
+# =============================================================================
+# Leaderboard Management
+# =============================================================================
 def load_leaderboard() -> list:
     """Load leaderboard from private HuggingFace dataset."""
     try:
         from huggingface_hub import hf_hub_download
         csv_path = hf_hub_download(
             repo_id=LEADERBOARD_DATASET,
             filename=LEADERBOARD_FILENAME,
     except Exception as e:
         print(f"Could not load leaderboard from dataset: {e}")
         return []
     try:
         from huggingface_hub import HfApi
         df = pd.DataFrame(data, columns=LEADERBOARD_COLUMNS)
         # Fill missing columns with defaults
             if col not in df.columns:
                 df[col] = None
         df = df[LEADERBOARD_COLUMNS]
         # Convert to CSV bytes
     try:
         from huggingface_hub import list_models
         models = list(list_models(author=ORGANIZATION, sort="lastModified", direction=-1))
         chess_models = [m for m in models if "chess" in m.id.lower()]
+        # Keep only the latest model per user
         seen_users = set()
         filtered_models = []
         for m in chess_models:
+            model_name = m.id.split("/")[-1]
             parts = model_name.split("-")
             if len(parts) >= 2:
                 username = parts[1] if parts[0] == "chess" else None
                 if username and username not in seen_users:
                     seen_users.add(username)
                     filtered_models.append(m.id)
             else:
                 filtered_models.append(m.id)
         return filtered_models if filtered_models else ["No models available"]
         return ["No models available"]
+def get_model_submitter(model_id: str) -> Optional[str]:
+    """Extract the submitter's username from the model's README on HuggingFace."""
+    try:
+        from huggingface_hub import hf_hub_download
+        import re
+        readme_path = hf_hub_download(
+            repo_id=model_id,
+            filename="README.md",
+            token=HF_TOKEN,
+        )
+        with open(readme_path, "r") as f:
+            readme_content = f.read()
+        match = re.search(r'\*\*Submitted by\*\*:\s*\[([^\]]+)\]', readme_content)
+        if match:
+            return match.group(1)
+        from huggingface_hub import model_info
+        info = model_info(model_id, token=HF_TOKEN)
+        if info.author:
+            return info.author
+    except Exception as e:
+        print(f"Could not extract submitter from model: {e}")
+    return None
+# =============================================================================
+# Leaderboard Formatting
+# =============================================================================
 def format_leaderboard_html(data: list) -> str:
     """Format leaderboard data as HTML table."""
     if not data:
         return "<p>No models evaluated yet. Be the first to submit!</p>"
+    # Keep only the best entry per user (by legal_rate_with_retry)
     best_per_user = {}
     for entry in data:
         user_id = entry.get("user_id", "unknown")
+        legal_rate = entry.get("legal_rate_with_retry", 0)
+        if user_id not in best_per_user or legal_rate > best_per_user[user_id].get("legal_rate_with_retry", 0):
             best_per_user[user_id] = entry
+    # Sort by legal_rate_with_retry
+    sorted_data = sorted(best_per_user.values(), key=lambda x: x.get("legal_rate_with_retry", 0), reverse=True)
     html = """
     <style>
                 <th>Rank</th>
                 <th>User</th>
                 <th>Model</th>
+                <th>Parameters</th>
                 <th>Legal Rate (1st try)</th>
+                <th>Legal Rate (with retries)</th>
+                <th>Games</th>
                 <th>Last Updated</th>
             </tr>
         </thead>
         model_url = f"https://huggingface.co/{entry['model_id']}"
         # Color code legal rate
+        legal_rate = entry.get('legal_rate_with_retry', 0)
         if legal_rate >= 0.9:
             legal_class = "legal-good"
         elif legal_rate >= 0.7:
         else:
             legal_class = "legal-bad"
         user_id = entry.get('user_id', 'unknown')
         user_url = f"https://huggingface.co/{user_id}"
+        n_params = entry.get('n_parameters', 0)
+        legal_rate_first = entry.get('legal_rate_first_try', 0)
+        games = entry.get('games_played', 0)
         html += f"""
             <tr>
                 <td class="{rank_class}">{rank_display}</td>
                 <td><a href="{user_url}" target="_blank" class="model-link">{user_id}</a></td>
                 <td><a href="{model_url}" target="_blank" class="model-link">{entry['model_id'].split('/')[-1]}</a></td>
+                <td>{n_params:,}</td>
+                <td>{legal_rate_first*100:.1f}%</td>
                 <td class="{legal_class}">{legal_rate*100:.1f}%</td>
+                <td>{games}</td>
                 <td>{entry.get('last_updated', 'N/A')}</td>
             </tr>
         """
     return html
+# =============================================================================
+# Evaluation Functions
+# =============================================================================
+def run_evaluation(
     model_id: str,
+    progress: gr.Progress = gr.Progress(),
+) -> str:
+    """
+    Run evaluation on a model and update the leaderboard.
+    Evaluation procedure:
+    1. Check if model has < 1M parameters
+    2. Check if model uses python-chess illegally
+    3. Play 500 moves against opponent engine (restart after 25 moves)
+    4. Track legal move rates
+    5. Update leaderboard and post discussion
+    """
     try:
         sys.path.insert(0, str(Path(__file__).parent))
+        from src.evaluate import (
+            ChessEvaluator,
+            load_model_and_tokenizer,
+            post_discussion_summary,
+        )
+        progress(0, desc="Loading model...")
+        # Load model
+        model, tokenizer, _ = load_model_and_tokenizer(model_id, verbose=True)
+        progress(0.1, desc="Setting up evaluator...")
+        # Create evaluator
+        evaluator = ChessEvaluator(
+            model=model,
+            tokenizer=tokenizer,
+            model_path=model_id,
         )
+        progress(0.2, desc="Running evaluation (500 moves)...")
+        # Run evaluation
+        result = evaluator.evaluate(verbose=True)
+        progress(0.9, desc="Updating leaderboard...")
+        # Check if evaluation was successful
+        if not result.passed_param_check:
+            return f"""## Evaluation Failed
+**Model**: `{model_id}`
+**Parameters**: {result.n_parameters:,}
+Model exceeds the **1M parameter limit**. Please reduce model size and resubmit.
+"""
+        if not result.passed_pychess_check:
+            return f"""## Evaluation Failed
+**Model**: `{model_id}`
+Model illegally uses python-chess for move filtering: {result.error_message}
+This is not allowed. The model must generate moves without access to legal move lists.
+"""
+        if result.error_message:
+            return f"""## Evaluation Error
+**Model**: `{model_id}`
+An error occurred during evaluation: {result.error_message}
+"""
+        # Get submitter info
         user_id = get_model_submitter(model_id)
         if user_id is None:
+            return f"""## Evaluation Issue
 Could not determine the submitter for model `{model_id}`.
 Please ensure your model was submitted using the official submission script (`submit.py`),
 which adds the required metadata to the README.md file.
+**Evaluation Results** (not saved to leaderboard):
+{result.summary()}
 """
+        # Update leaderboard
         leaderboard = load_leaderboard()
+        # Find existing entry for this user
         user_entry = next((e for e in leaderboard if e.get("user_id") == user_id), None)
+        new_entry = {
+            "model_id": model_id,
+            "user_id": user_id,
+            "n_parameters": result.n_parameters,
+            "legal_rate_first_try": result.legal_rate_first_try,
+            "legal_rate_with_retry": result.legal_rate_with_retry,
+            "games_played": result.games_played,
+            "last_updated": datetime.now().strftime("%Y-%m-%d %H:%M"),
+        }
         if user_entry is None:
+            leaderboard.append(new_entry)
             save_leaderboard(leaderboard)
             update_message = "New entry added to leaderboard!"
         else:
+            old_rate = user_entry.get("legal_rate_with_retry", 0)
+            if result.legal_rate_with_retry > old_rate:
+                user_entry.update(new_entry)
                 save_leaderboard(leaderboard)
+                update_message = f"Improved! {old_rate*100:.1f}% -> {result.legal_rate_with_retry*100:.1f}%"
             else:
+                update_message = f"No improvement. Best: {old_rate*100:.1f}%, This run: {result.legal_rate_with_retry*100:.1f}%"
+        # Post discussion to model page
+        if HF_TOKEN:
+            try:
+                post_discussion_summary(model_id, result, HF_TOKEN)
+                discussion_message = "Results posted to model page"
+            except Exception as e:
+                discussion_message = f"Could not post to model page: {e}"
+        else:
+            discussion_message = "No HF_TOKEN - results not posted to model page"
+        progress(1.0, desc="Done!")
+        return f"""## Evaluation Complete
+{result.summary()}
+---
 ### Leaderboard Update
 {update_message}
+### Model Page Discussion
+{discussion_message}
 """
     except Exception as e:
+        import traceback
+        return f"""## Evaluation Failed
+An unexpected error occurred:
+```
+{traceback.format_exc()}
+```
+"""
 def refresh_leaderboard() -> str:
     return format_leaderboard_html(load_leaderboard())
+# =============================================================================
+# Gradio Interface
+# =============================================================================
 with gr.Blocks(
     title="Play Chess like a Honey Bee",
     theme=gr.themes.Soft(),
     """)
     with gr.Tabs():
+        # How to Submit Tab
+        with gr.TabItem("📖 How to Submit"):
             gr.Markdown(f"""
             ### Submitting Your Model
+            The goal is to create a chess-playing language model with **under 1 million parameters**,
+            which is roughly the number of neurons in a honey bee's brain.
+            At this scale, efficiency and clever architecture choices are key! We are not targeting
+            superhuman performance, but rather exploring how well small models can learn the rules
+            of chess. The goal is to play **legal moves**.
+            ---
+            ### Getting Started
+            1. **Clone this repository**:
                 ```bash
                 git clone https://huggingface.co/spaces/LLM-course/Chess1MChallenge
                 ```
+            2. **Check the example solution** in the `example_solution/` folder for reference
+            3. **Train your model** using the provided training script or your own approach
+            4. **Submit using the official script**:
+                ```bash
+                python submit.py --model_path ./my_model --model_name my-chess-model
+                ```
+            5. **Run evaluation** on this page to see your results on the leaderboard
+            ---
+            ### Evaluation Procedure
+            Your model will be evaluated as follows:
+            1. **Parameter check**: Must have < 1M parameters
+            2. **Security check**: Model cannot use python-chess to filter legal moves
+            3. **Game play**: 500 moves against opponent engine (games restart every 25 moves)
+            4. **Move generation**: 3 retries allowed per move, greedy decoding
+            5. **Scoring**: Legal move rate (first try and with retries)
+            The evaluation is **fully deterministic** (seeded randomness, deterministic opponent).
+            ---
             ### Requirements
             - Model must be under **1M parameters**
+            - Model must use the `ChessConfig` and `ChessForCausalLM` classes (or compatible)
             - Include the tokenizer with your submission
+            - **Do not** use python-chess to filter moves during generation
             ### Tips for Better Performance
             - Experiment with different architectures (layers, heads, dimensions)
             - Try weight tying to save parameters
+            - Focus on learning the rules of chess, not just memorizing openings
+            - Check the `example_solution/` folder for ideas
             """)
+        # Evaluation Tab
+        with gr.TabItem("Evaluate Model"):
             gr.Markdown("""
+            ### Run Evaluation
+            Select a model to evaluate. The evaluation will:
+            - Check parameter count (< 1M required)
+            - Verify no illegal python-chess usage
+            - Play 500 moves against opponent engine
+            - Track legal move rates
+            - Update the leaderboard (if improvement)
+            - Post results to the model's discussion page
             """)
             with gr.Row():
+                model_dropdown = gr.Dropdown(
                     choices=get_available_models(),
                     label="Model to Evaluate",
+                    scale=4,
                 )
+                refresh_models_btn = gr.Button("Refresh", scale=1, min_width=50)
+            def refresh_models():
                 return gr.update(choices=get_available_models())
+            refresh_models_btn.click(
+                refresh_models,
+                outputs=[model_dropdown],
             )
+            eval_btn = gr.Button("Run Evaluation", variant="primary")
+            eval_results = gr.Markdown()
+            eval_btn.click(
+                run_evaluation,
+                inputs=[model_dropdown],
+                outputs=eval_results,
             )
+        # Leaderboard Tab
+        with gr.TabItem("Leaderboard"):
             gr.Markdown("### Current Rankings")
+            gr.Markdown("""
+            Rankings are based on **legal move rate (with retries)**.
+            - **Legal Rate (1st try)**: Percentage of moves that were legal on first attempt
+            - **Legal Rate (with retries)**: Percentage of moves that were legal within 3 attempts
+            """)
             leaderboard_html = gr.HTML(value=format_leaderboard_html(load_leaderboard()))
             refresh_btn = gr.Button("Refresh Leaderboard")
             refresh_btn.click(refresh_leaderboard, outputs=leaderboard_html)
+# =============================================================================
+# Webhook Endpoint (mounted on Gradio's FastAPI app)
+# =============================================================================
+from fastapi import Request
+from fastapi.responses import JSONResponse
+@demo.app.post("/webhook")
+async def handle_webhook(request: Request):
+    """
+    Handle HuggingFace webhook events for automatic model evaluation.
+    Triggered on model creation and update events in the organization.
+    """
+    # Verify webhook signature
+    body = await request.body()
+    signature = request.headers.get("X-Webhook-Signature")
+    if not verify_webhook_signature(body, signature):
+        print("[Webhook] Invalid signature")
+        return JSONResponse({"error": "Invalid signature"}, status_code=401)
+    try:
+        payload = json.loads(body)
+    except json.JSONDecodeError:
+        return JSONResponse({"error": "Invalid JSON"}, status_code=400)
+    event = payload.get("event", {})
+    repo = payload.get("repo", {})
+    action = event.get("action")
+    scope = event.get("scope")
+    repo_type = repo.get("type")
+    repo_name = repo.get("name", "")
+    print(f"[Webhook] Received: action={action}, scope={scope}, type={repo_type}, repo={repo_name}")
+    # Only process model repos in our organization
+    if repo_type != "model":
+        return JSONResponse({"status": "ignored", "reason": "not a model"})
+    if not repo_name.startswith(f"{ORGANIZATION}/"):
+        return JSONResponse({"status": "ignored", "reason": "not in organization"})
+    # Only process create and update actions
+    if action not in ("create", "update"):
+        return JSONResponse({"status": "ignored", "reason": f"action {action} not handled"})
+    # Check if it looks like a chess model
+    if not is_chess_model(repo_name):
+        return JSONResponse({"status": "ignored", "reason": "not a chess model"})
+    # Check if already queued or running
+    with eval_lock:
+        current_status = eval_status.get(repo_name)
+        if current_status == "running":
+            return JSONResponse({"status": "ignored", "reason": "evaluation already running"})
+        if current_status == "queued":
+            return JSONResponse({"status": "ignored", "reason": "already in queue"})
+        eval_status[repo_name] = "queued"
+    # Queue the model for evaluation
+    eval_queue.put(repo_name)
+    queue_size = eval_queue.qsize()
+    print(f"[Webhook] Queued {repo_name} for evaluation (queue size: {queue_size})")
+    return JSONResponse({
+        "status": "queued",
+        "model_id": repo_name,
+        "queue_position": queue_size,
+    })
+@demo.app.get("/webhook/status")
+async def webhook_status():
+    """Get the current status of the evaluation queue."""
+    with eval_lock:
+        status_copy = dict(eval_status)
+    return JSONResponse({
+        "queue_size": eval_queue.qsize(),
+        "evaluations": status_copy,
+    })
 if __name__ == "__main__":
     demo.launch(server_name="0.0.0.0", server_port=7860)

example_solution/README.md ADDED Viewed

	@@ -0,0 +1,109 @@

+# Example Solution
+This folder contains a complete reference implementation for the Chess Challenge.
+**Use this to understand the expected format** - see how model.py, tokenizer.py, and configuration files should be structured.
+## Files Included
+| File | Description |
+|------|-------------|
+| `model.py` | Custom transformer architecture |
+| `tokenizer.py` | Custom move-level tokenizer |
+| `train.py` | Training script |
+| `data.py` | Dataset utilities |
+| `config.json` | Model configuration with auto_map |
+| `model.safetensors` | Trained model weights |
+| `vocab.json` | Tokenizer vocabulary |
+| `tokenizer_config.json` | Tokenizer configuration with auto_map |
+| `special_tokens_map.json` | Special token mappings |
+## Model Architecture
+This example uses a small GPT-style transformer:
+| Parameter | Value |
+|-----------|-------|
+| Embedding dim | 128 |
+| Layers | 4 |
+| Attention heads | 4 |
+| Context length | 256 |
+| Total parameters | ~910K |
+## Training Details
+The model was trained on the Lichess dataset with:
+- 3 epochs
+- Batch size 32
+- Learning rate 5e-4
+- Weight tying (embedding = output layer)
+## How to Use This Example
+### Load the model:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("./example_solution", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("./example_solution", trust_remote_code=True)
+```
+### Generate a move:
+```python
+import torch
+# Game history in the format: WPe2e4 BPe7e5 WNg1f3 ...
+history = "[BOS] WPe2e4 BPe7e5"
+inputs = tokenizer(history, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    next_token = outputs.logits[0, -1].argmax()
+predicted_move = tokenizer.decode([next_token])
+print(f"Predicted move: {predicted_move}")
+```
+## Evaluation
+To evaluate this example:
+```bash
+python -m src.evaluate --model_path ./example_solution
+```
+## Key Implementation Details
+### auto_map Configuration
+The `config.json` contains:
+```json
+{
+  "auto_map": {
+    "AutoConfig": "model.ChessConfig",
+    "AutoModelForCausalLM": "model.ChessForCausalLM"
+  }
+}
+```
+The `tokenizer_config.json` contains:
+```json
+{
+  "auto_map": {
+    "AutoTokenizer": ["tokenizer.ChessTokenizer", null]
+  }
+}
+```
+Note: `AutoTokenizer` requires a list `[slow_class, fast_class]`, not a string!
+## Your Turn!
+Use this as inspiration, but create your own solution! Ideas to explore:
+1. **Architecture changes**: Different number of layers, heads, or embedding dimensions
+2. **Training strategies**: Different learning rates, warmup schedules, or optimizers
+3. **Data augmentation**: Flip board colors, use different game phases
+4. **Tokenization**: Different move representation formats

example_solution/config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "architectures": [
+    "ChessForCausalLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "model.ChessConfig",
+    "AutoModelForCausalLM": "model.ChessForCausalLM"
+  },
+  "bos_token_id": 1,
+  "dropout": 0.1,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "layer_norm_epsilon": 1e-05,
+  "model_type": "chess_transformer",
+  "n_ctx": 256,
+  "n_embd": 128,
+  "n_head": 4,
+  "n_inner": 384,
+  "n_layer": 4,
+  "pad_token_id": 0,
+  "tie_weights": true,
+  "transformers_version": "4.57.3",
+  "vocab_size": 1682
+}

{src → example_solution}/data.py RENAMED Viewed

@@ -24,7 +24,7 @@ class ChessDataset(Dataset):
     The labels are shifted by one position for next-token prediction.
     Example:
-        >>> from src.tokenizer import ChessTokenizer
         >>> tokenizer = ChessTokenizer.build_vocab_from_dataset()
         >>> dataset = ChessDataset(tokenizer, max_length=256)
         >>> sample = dataset[0]

     The labels are shifted by one position for next-token prediction.
     Example:
+        >>> from tokenizer import ChessTokenizer
         >>> tokenizer = ChessTokenizer.build_vocab_from_dataset()
         >>> dataset = ChessDataset(tokenizer, max_length=256)
         >>> sample = dataset[0]

{src → example_solution}/model.py RENAMED Viewed

File without changes

example_solution/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "[BOS]",
+  "eos_token": "[EOS]",
+  "pad_token": "[PAD]",
+  "unk_token": "[UNK]"
+}

{src → example_solution}/tokenizer.py RENAMED Viewed

File without changes

example_solution/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,50 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "[BOS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "[EOS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenizer.ChessTokenizer",
+      "tokenizer.ChessTokenizer"
+    ]
+  },
+  "bos_token": "[BOS]",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "[EOS]",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "[PAD]",
+  "tokenizer_class": "ChessTokenizer",
+  "unk_token": "[UNK]"
+}

{src → example_solution}/train.py RENAMED Viewed

@@ -22,10 +22,16 @@ from transformers import (
     set_seed,
 )
-from src.data import ChessDataCollator, create_train_val_datasets
-from src.model import ChessConfig, ChessForCausalLM
-from src.tokenizer import ChessTokenizer
-from src.utils import count_parameters, print_parameter_budget
 def parse_args():
@@ -168,8 +174,13 @@ def main():
         eos_token_id=tokenizer.eos_token_id,
     )
-    # Print parameter budget
-    print_parameter_budget(config)
     # Create model
     print("\nCreating model...")
@@ -180,7 +191,7 @@ def main():
     if n_params > 1_000_000:
         print("WARNING: Model exceeds 1M parameter limit!")
     else:
-        print("✓  Model is within 1M parameter limit")
     # Load datasets
     print("\nLoading datasets...")
@@ -235,11 +246,44 @@ def main():
     # Save final model
     print("\nSaving final model...")
-    trainer.save_model(os.path.join(args.output_dir, "final_model"))
-    tokenizer.save_pretrained(os.path.join(args.output_dir, "final_model"))
     print("\nTraining complete!")
-    print(f"   Model saved to: {args.output_dir}/final_model")
 if __name__ == "__main__":

     set_seed,
 )
+from data import ChessDataCollator, create_train_val_datasets
+from model import ChessConfig, ChessForCausalLM
+from tokenizer import ChessTokenizer
+def count_parameters(model, trainable_only=True):
+    """Count the number of parameters in a model."""
+    if trainable_only:
+        return sum(p.numel() for p in model.parameters() if p.requires_grad)
+    return sum(p.numel() for p in model.parameters())
 def parse_args():
         eos_token_id=tokenizer.eos_token_id,
     )
+    # Print configuration
+    print(f"\nModel configuration:")
+    print(f"  vocab_size: {config.vocab_size}")
+    print(f"  n_embd: {config.n_embd}")
+    print(f"  n_layer: {config.n_layer}")
+    print(f"  n_head: {config.n_head}")
+    print(f"  tie_weights: {config.tie_weights}")
     # Create model
     print("\nCreating model...")
     if n_params > 1_000_000:
         print("WARNING: Model exceeds 1M parameter limit!")
     else:
+        print("OK: Model is within 1M parameter limit")
     # Load datasets
     print("\nLoading datasets...")
     # Save final model
     print("\nSaving final model...")
+    final_model_dir = os.path.join(args.output_dir, "final_model")
+    trainer.save_model(final_model_dir)
+    tokenizer.save_pretrained(final_model_dir)
+    # Copy model.py and tokenizer.py for trust_remote_code loading
+    import shutil
+    import json
+    script_dir = Path(__file__).parent
+    shutil.copy(script_dir / "model.py", final_model_dir)
+    shutil.copy(script_dir / "tokenizer.py", final_model_dir)
+    print("   Copied model.py and tokenizer.py")
+    # Add auto_map to config.json for AutoModelForCausalLM
+    config_path = os.path.join(final_model_dir, "config.json")
+    with open(config_path) as f:
+        config_dict = json.load(f)
+    config_dict["auto_map"] = {
+        "AutoConfig": "model.ChessConfig",
+        "AutoModelForCausalLM": "model.ChessForCausalLM",
+    }
+    with open(config_path, "w") as f:
+        json.dump(config_dict, f, indent=2)
+    print("   Added auto_map to config.json")
+    # Add auto_map to tokenizer_config.json for AutoTokenizer
+    tokenizer_config_path = os.path.join(final_model_dir, "tokenizer_config.json")
+    with open(tokenizer_config_path) as f:
+        tokenizer_dict = json.load(f)
+    tokenizer_dict["auto_map"] = {
+        "AutoTokenizer": ["tokenizer.ChessTokenizer", None],
+    }
+    with open(tokenizer_config_path, "w") as f:
+        json.dump(tokenizer_dict, f, indent=2)
+    print("   Added auto_map to tokenizer_config.json")
     print("\nTraining complete!")
+    print(f"   Model saved to: {final_model_dir}")
+    print("   Ready for submission with: python submit.py --model_path " + final_model_dir)
 if __name__ == "__main__":

example_solution/vocab.json ADDED Viewed

	@@ -0,0 +1,1684 @@

+{
+  "[PAD]": 0,
+  "[BOS]": 1,
+  "[EOS]": 2,
+  "[UNK]": 3,
+  "BBa5b6": 4,
+  "BBa6b7": 5,
+  "BBb4a5": 6,
+  "BBb4c3(x)": 7,
+  "BBb4c3(x+)": 8,
+  "BBb4c5": 9,
+  "BBb4d2(x)": 10,
+  "BBb4d2(x+)": 11,
+  "BBb4d6": 12,
+  "BBb4e7": 13,
+  "BBb6c7": 14,
+  "BBb6d4(x)": 15,
+  "BBb7a6": 16,
+  "BBb7c6": 17,
+  "BBb7c6(x)": 18,
+  "BBb7c8": 19,
+  "BBb7d5": 20,
+  "BBb7d5(x)": 21,
+  "BBb7e4(x)": 22,
+  "BBb7f3(x)": 23,
+  "BBb7g2(x)": 24,
+  "BBc5a7": 25,
+  "BBc5b4": 26,
+  "BBc5b4(+)": 27,
+  "BBc5b6": 28,
+  "BBc5d4": 29,
+  "BBc5d4(x)": 30,
+  "BBc5d6": 31,
+  "BBc5e3(x)": 32,
+  "BBc5e7": 33,
+  "BBc5f2(x+)": 34,
+  "BBc6b5": 35,
+  "BBc6d5": 36,
+  "BBc6d5(x)": 37,
+  "BBc6d7": 38,
+  "BBc6e4(x)": 39,
+  "BBc6f3(x)": 40,
+  "BBc8a6": 41,
+  "BBc8b7": 42,
+  "BBc8d7": 43,
+  "BBc8d7(x)": 44,
+  "BBc8e6": 45,
+  "BBc8e6(x)": 46,
+  "BBc8f5": 47,
+  "BBc8f5(x)": 48,
+  "BBc8g4": 49,
+  "BBc8g4(x)": 50,
+  "BBc8h3": 51,
+  "BBc8h3(x)": 52,
+  "BBd5e6": 53,
+  "BBd5f3(x)": 54,
+  "BBd6b4": 55,
+  "BBd6c5": 56,
+  "BBd6c5(x)": 57,
+  "BBd6c7": 58,
+  "BBd6e5": 59,
+  "BBd6e5(x)": 60,
+  "BBd6e7": 61,
+  "BBd6f4": 62,
+  "BBd6f4(x)": 63,
+  "BBd6g3(x)": 64,
+  "BBd6h2(x+)": 65,
+  "BBd7b5": 66,
+  "BBd7b5(x)": 67,
+  "BBd7c6": 68,
+  "BBd7c6(x)": 69,
+  "BBd7c8": 70,
+  "BBd7e6": 71,
+  "BBd7e8": 72,
+  "BBd7f5": 73,
+  "BBd7f5(x)": 74,
+  "BBd7g4": 75,
+  "BBe4f3(x)": 76,
+  "BBe4g6": 77,
+  "BBe5d6": 78,
+  "BBe5f6": 79,
+  "BBe5g7": 80,
+  "BBe6a2(x)": 81,
+  "BBe6b3(x)": 82,
+  "BBe6c4": 83,
+  "BBe6c4(x)": 84,
+  "BBe6d5": 85,
+  "BBe6d5(x)": 86,
+  "BBe6d7": 87,
+  "BBe6f5": 88,
+  "BBe6f5(x)": 89,
+  "BBe6f7": 90,
+  "BBe6g4": 91,
+  "BBe6h3(x)": 92,
+  "BBe7b4": 93,
+  "BBe7c5": 94,
+  "BBe7c5(x)": 95,
+  "BBe7d6": 96,
+  "BBe7d6(x)": 97,
+  "BBe7d8": 98,
+  "BBe7f6": 99,
+  "BBe7f6(x)": 100,
+  "BBe7f8": 101,
+  "BBe7g5": 102,
+  "BBe7g5(x)": 103,
+  "BBe7h4": 104,
+  "BBe7h4(x)": 105,
+  "BBf5c2(x)": 106,
+  "BBf5d3": 107,
+  "BBf5d3(x)": 108,
+  "BBf5d7": 109,
+  "BBf5e4": 110,
+  "BBf5e4(x)": 111,
+  "BBf5e6": 112,
+  "BBf5g4": 113,
+  "BBf5g6": 114,
+  "BBf6b2(x)": 115,
+  "BBf6c3(x)": 116,
+  "BBf6d4(x)": 117,
+  "BBf6e5": 118,
+  "BBf6e5(x)": 119,
+  "BBf6e7": 120,
+  "BBf6g5": 121,
+  "BBf6g7": 122,
+  "BBf8b4": 123,
+  "BBf8b4(+)": 124,
+  "BBf8c5": 125,
+  "BBf8c5(+)": 126,
+  "BBf8c5(x)": 127,
+  "BBf8d6": 128,
+  "BBf8d6(x)": 129,
+  "BBf8e7": 130,
+  "BBf8e7(x)": 131,
+  "BBf8g7": 132,
+  "BBf8h6": 133,
+  "BBg4d1(x)": 134,
+  "BBg4d7": 135,
+  "BBg4e2(x)": 136,
+  "BBg4e6": 137,
+  "BBg4f3(x)": 138,
+  "BBg4f5": 139,
+  "BBg4h5": 140,
+  "BBg5f6": 141,
+  "BBg6d3(x)": 142,
+  "BBg6e4(x)": 143,
+  "BBg6h7": 144,
+  "BBg7b2(x)": 145,
+  "BBg7c3(x)": 146,
+  "BBg7d4(x)": 147,
+  "BBg7e5": 148,
+  "BBg7e5(x)": 149,
+  "BBg7f6": 150,
+  "BBg7f6(x)": 151,
+  "BBg7f8": 152,
+  "BBg7h6": 153,
+  "BBg7h6(x)": 154,
+  "BBh3g2(x)": 155,
+  "BBh5f3(x)": 156,
+  "BBh5g6": 157,
+  "BBh6g7": 158,
+  "BKb6a5": 159,
+  "BKb6b5": 160,
+  "BKb6c5": 161,
+  "BKb6c6": 162,
+  "BKb6c7": 163,
+  "BKb7a6": 164,
+  "BKb7b6": 165,
+  "BKb7c6": 166,
+  "BKb7c7": 167,
+  "BKb8a7": 168,
+  "BKb8a8": 169,
+  "BKb8b7": 170,
+  "BKb8c7": 171,
+  "BKb8c8": 172,
+  "BKc5b4": 173,
+  "BKc5c4": 174,
+  "BKc5d4": 175,
+  "BKc5d6": 176,
+  "BKc6b5": 177,
+  "BKc6b6": 178,
+  "BKc6b7": 179,
+  "BKc6c5": 180,
+  "BKc6c7": 181,
+  "BKc6d5": 182,
+  "BKc6d6": 183,
+  "BKc6d7": 184,
+  "BKc7b6": 185,
+  "BKc7b7": 186,
+  "BKc7b8": 187,
+  "BKc7c6": 188,
+  "BKc7c8": 189,
+  "BKc7d6": 190,
+  "BKc7d7": 191,
+  "BKc7d8": 192,
+  "BKc8b7": 193,
+  "BKc8b8": 194,
+  "BKc8c7": 195,
+  "BKc8d7": 196,
+  "BKc8d8": 197,
+  "BKd4c3": 198,
+  "BKd5c4": 199,
+  "BKd5c5": 200,
+  "BKd5c6": 201,
+  "BKd5d4": 202,
+  "BKd5e4": 203,
+  "BKd5e6": 204,
+  "BKd6c5": 205,
+  "BKd6c6": 206,
+  "BKd6c7": 207,
+  "BKd6d5": 208,
+  "BKd6d7": 209,
+  "BKd6e5": 210,
+  "BKd6e6": 211,
+  "BKd6e7": 212,
+  "BKd7c6": 213,
+  "BKd7c7": 214,
+  "BKd7c8": 215,
+  "BKd7d6": 216,
+  "BKd7d8": 217,
+  "BKd7e6": 218,
+  "BKd7e7": 219,
+  "BKd7e8": 220,
+  "BKd8c7": 221,
+  "BKd8c8": 222,
+  "BKd8d7": 223,
+  "BKd8e7": 224,
+  "BKd8e8": 225,
+  "BKe4d3": 226,
+  "BKe5d4": 227,
+  "BKe5d5": 228,
+  "BKe5d6": 229,
+  "BKe5e4": 230,
+  "BKe5f4": 231,
+  "BKe5f5": 232,
+  "BKe5f6": 233,
+  "BKe6d5": 234,
+  "BKe6d6": 235,
+  "BKe6d7": 236,
+  "BKe6e5": 237,
+  "BKe6e7": 238,
+  "BKe6f5": 239,
+  "BKe6f6": 240,
+  "BKe6f7": 241,
+  "BKe7d6": 242,
+  "BKe7d7": 243,
+  "BKe7d8": 244,
+  "BKe7e6": 245,
+  "BKe7e8": 246,
+  "BKe7f6": 247,
+  "BKe7f7": 248,
+  "BKe7f8": 249,
+  "BKe8c8(O)": 250,
+  "BKe8d7": 251,
+  "BKe8d7(x)": 252,
+  "BKe8d8": 253,
+  "BKe8d8(x)": 254,
+  "BKe8e7": 255,
+  "BKe8e7(x)": 256,
+  "BKe8f7": 257,
+  "BKe8f7(x)": 258,
+  "BKe8f8": 259,
+  "BKe8g8(o)": 260,
+  "BKf4f3": 261,
+  "BKf5e4": 262,
+  "BKf5e5": 263,
+  "BKf5e6": 264,
+  "BKf5f4": 265,
+  "BKf5f6": 266,
+  "BKf5g4": 267,
+  "BKf5g5": 268,
+  "BKf5g6": 269,
+  "BKf6e5": 270,
+  "BKf6e6": 271,
+  "BKf6e7": 272,
+  "BKf6f5": 273,
+  "BKf6f7": 274,
+  "BKf6g5": 275,
+  "BKf6g6": 276,
+  "BKf6g7": 277,
+  "BKf7e6": 278,
+  "BKf7e7": 279,
+  "BKf7e8": 280,
+  "BKf7f6": 281,
+  "BKf7f8": 282,
+  "BKf7g6": 283,
+  "BKf7g7": 284,
+  "BKf7g8": 285,
+  "BKf8e7": 286,
+  "BKf8e8": 287,
+  "BKf8f7": 288,
+  "BKf8g7": 289,
+  "BKf8g8": 290,
+  "BKg5f4": 291,
+  "BKg5f5": 292,
+  "BKg5f6": 293,
+  "BKg5g4": 294,
+  "BKg5h4": 295,
+  "BKg6f5": 296,
+  "BKg6f6": 297,
+  "BKg6f7": 298,
+  "BKg6g5": 299,
+  "BKg6g7": 300,
+  "BKg6h5": 301,
+  "BKg6h6": 302,
+  "BKg6h7": 303,
+  "BKg7f6": 304,
+  "BKg7f7": 305,
+  "BKg7f8": 306,
+  "BKg7g6": 307,
+  "BKg7g8": 308,
+  "BKg7h6": 309,
+  "BKg7h7": 310,
+  "BKg7h8": 311,
+  "BKg8f7": 312,
+  "BKg8f7(x)": 313,
+  "BKg8f8": 314,
+  "BKg8f8(x)": 315,
+  "BKg8g7": 316,
+  "BKg8g7(x)": 317,
+  "BKg8h7": 318,
+  "BKg8h7(x)": 319,
+  "BKg8h8": 320,
+  "BKh5g4": 321,
+  "BKh5g6": 322,
+  "BKh5h4": 323,
+  "BKh6g5": 324,
+  "BKh6g6": 325,
+  "BKh6g7": 326,
+  "BKh6h5": 327,
+  "BKh6h7": 328,
+  "BKh7g6": 329,
+  "BKh7g7": 330,
+  "BKh7g8": 331,
+  "BKh7h6": 332,
+  "BKh7h8": 333,
+  "BKh8g7": 334,
+  "BKh8g8": 335,
+  "BKh8h7": 336,
+  "BNa5b3(x)": 337,
+  "BNa5c4": 338,
+  "BNa5c4(x)": 339,
+  "BNa5c6": 340,
+  "BNa6b4": 341,
+  "BNa6c5": 342,
+  "BNa6c7": 343,
+  "BNb4a6": 344,
+  "BNb4c2": 345,
+  "BNb4c2(x)": 346,
+  "BNb4c6": 347,
+  "BNb4d3": 348,
+  "BNb4d3(x)": 349,
+  "BNb4d5": 350,
+  "BNb6c4": 351,
+  "BNb6c4(x)": 352,
+  "BNb6d5": 353,
+  "BNb6d5(x)": 354,
+  "BNb6d7": 355,
+  "BNb8a6": 356,
+  "BNb8c6": 357,
+  "BNb8c6(x)": 358,
+  "BNb8d7": 359,
+  "BNb8d7(x)": 360,
+  "BNc2a1(x)": 361,
+  "BNc4b2(x)": 362,
+  "BNc4d6": 363,
+  "BNc4e5": 364,
+  "BNc5d3": 365,
+  "BNc5d3(x)": 366,
+  "BNc5d7": 367,
+  "BNc5e4": 368,
+  "BNc5e4(x)": 369,
+  "BNc5e6": 370,
+  "BNc6a5": 371,
+  "BNc6a7": 372,
+  "BNc6b4": 373,
+  "BNc6b4(x)": 374,
+  "BNc6b8": 375,
+  "BNc6d4": 376,
+  "BNc6d4(x)": 377,
+  "BNc6d8": 378,
+  "BNc6d8(x)": 379,
+  "BNc6e5": 380,
+  "BNc6e5(x)": 381,
+  "BNc6e7": 382,
+  "BNc6e7(x)": 383,
+  "BNd3b2(x)": 384,
+  "BNd4b3(x)": 385,
+  "BNd4c2(x)": 386,
+  "BNd4c6": 387,
+  "BNd4e2(+)": 388,
+  "BNd4e2(x+)": 389,
+  "BNd4e6": 390,
+  "BNd4f3(+)": 391,
+  "BNd4f3(x+)": 392,
+  "BNd4f5": 393,
+  "BNd5b4": 394,
+  "BNd5b6": 395,
+  "BNd5c3": 396,
+  "BNd5c3(x)": 397,
+  "BNd5e3": 398,
+  "BNd5e3(x)": 399,
+  "BNd5e7": 400,
+  "BNd5f4": 401,
+  "BNd5f4(x)": 402,
+  "BNd5f6": 403,
+  "BNd6e4": 404,
+  "BNd6f5": 405,
+  "BNd7b6": 406,
+  "BNd7b8": 407,
+  "BNd7c5": 408,
+  "BNd7c5(x)": 409,
+  "BNd7e5": 410,
+  "BNd7e5(x)": 411,
+  "BNd7f6": 412,
+  "BNd7f6(x)": 413,
+  "BNd7f8": 414,
+  "BNe3f1(x)": 415,
+  "BNe4c3": 416,
+  "BNe4c3(x)": 417,
+  "BNe4c5": 418,
+  "BNe4d2": 419,
+  "BNe4d2(x)": 420,
+  "BNe4d6": 421,
+  "BNe4f2(x)": 422,
+  "BNe4f6": 423,
+  "BNe4g3(x)": 424,
+  "BNe4g5": 425,
+  "BNe4g5(x)": 426,
+  "BNe5c4": 427,
+  "BNe5c4(x)": 428,
+  "BNe5c6": 429,
+  "BNe5d3": 430,
+  "BNe5d3(x)": 431,
+  "BNe5d7": 432,
+  "BNe5f3(+)": 433,
+  "BNe5f3(x+)": 434,
+  "BNe5g4": 435,
+  "BNe5g6": 436,
+  "BNe6d4": 437,
+  "BNe6f4": 438,
+  "BNe7c6": 439,
+  "BNe7c6(x)": 440,
+  "BNe7c8": 441,
+  "BNe7d5": 442,
+  "BNe7d5(x)": 443,
+  "BNe7f5": 444,
+  "BNe7f5(x)": 445,
+  "BNe7g6": 446,
+  "BNe8d6": 447,
+  "BNe8f6": 448,
+  "BNf4e2(+)": 449,
+  "BNf5d4": 450,
+  "BNf5d4(x)": 451,
+  "BNf5d6": 452,
+  "BNf5e3": 453,
+  "BNf5e3(x)": 454,
+  "BNf5e7": 455,
+  "BNf5h4": 456,
+  "BNf6d5": 457,
+  "BNf6d5(x)": 458,
+  "BNf6d7": 459,
+  "BNf6d7(x)": 460,
+  "BNf6e4": 461,
+  "BNf6e4(x)": 462,
+  "BNf6e8": 463,
+  "BNf6g4": 464,
+  "BNf6g4(x)": 465,
+  "BNf6g8": 466,
+  "BNf6h5": 467,
+  "BNf6h5(x)": 468,
+  "BNf6h7": 469,
+  "BNf8e6": 470,
+  "BNf8g6": 471,
+  "BNg4e3": 472,
+  "BNg4e3(x)": 473,
+  "BNg4e5": 474,
+  "BNg4e5(x)": 475,
+  "BNg4f2(x)": 476,
+  "BNg4f6": 477,
+  "BNg4h6": 478,
+  "BNg6e5": 479,
+  "BNg6e5(x)": 480,
+  "BNg6e7": 481,
+  "BNg6f4": 482,
+  "BNg6f4(x)": 483,
+  "BNg6h4": 484,
+  "BNg8e7": 485,
+  "BNg8f6": 486,
+  "BNg8f6(x)": 487,
+  "BNg8h6": 488,
+  "BNh5f4": 489,
+  "BNh5f4(x)": 490,
+  "BNh5f6": 491,
+  "BNh5g3(x)": 492,
+  "BNh6f5": 493,
+  "BNh6f7": 494,
+  "BNh6g4": 495,
+  "BNh7f6": 496,
+  "BNh7g5": 497,
+  "BPa2a1(Q)": 498,
+  "BPa3a2": 499,
+  "BPa4a3": 500,
+  "BPa4b3(x)": 501,
+  "BPa5a4": 502,
+  "BPa5b4(x)": 503,
+  "BPa6a5": 504,
+  "BPa6b5(x)": 505,
+  "BPa7a5": 506,
+  "BPa7a6": 507,
+  "BPa7b6(x)": 508,
+  "BPb2b1(Q)": 509,
+  "BPb3b2": 510,
+  "BPb4a3(x)": 511,
+  "BPb4b3": 512,
+  "BPb4c3(x)": 513,
+  "BPb5a4(x)": 514,
+  "BPb5b4": 515,
+  "BPb5c4(x)": 516,
+  "BPb6a5(x)": 517,
+  "BPb6b5": 518,
+  "BPb6c5(x)": 519,
+  "BPb7a6(x)": 520,
+  "BPb7b5": 521,
+  "BPb7b6": 522,
+  "BPb7c6(x)": 523,
+  "BPc2c1(Q)": 524,
+  "BPc3c2": 525,
+  "BPc4b3(x)": 526,
+  "BPc4c3": 527,
+  "BPc4d3(x)": 528,
+  "BPc5b4(x)": 529,
+  "BPc5c4": 530,
+  "BPc5d4(x)": 531,
+  "BPc6b5(x)": 532,
+  "BPc6c5": 533,
+  "BPc6d5(x)": 534,
+  "BPc7b6(x)": 535,
+  "BPc7c5": 536,
+  "BPc7c6": 537,
+  "BPc7d6(x)": 538,
+  "BPd3d2": 539,
+  "BPd4c3(x)": 540,
+  "BPd4d3": 541,
+  "BPd4e3(x)": 542,
+  "BPd5c4(x)": 543,
+  "BPd5d4": 544,
+  "BPd5e4(x)": 545,
+  "BPd6c5(x)": 546,
+  "BPd6d5": 547,
+  "BPd6e5(x)": 548,
+  "BPd7c6(x)": 549,
+  "BPd7d5": 550,
+  "BPd7d6": 551,
+  "BPe3e2": 552,
+  "BPe4d3(x)": 553,
+  "BPe4e3": 554,
+  "BPe4f3(x)": 555,
+  "BPe5d4(x)": 556,
+  "BPe5e4": 557,
+  "BPe5f4(x)": 558,
+  "BPe6d5(x)": 559,
+  "BPe6e5": 560,
+  "BPe6f5(x)": 561,
+  "BPe7d6(x)": 562,
+  "BPe7e5": 563,
+  "BPe7e6": 564,
+  "BPe7f6(x)": 565,
+  "BPf3f2": 566,
+  "BPf4e3(x)": 567,
+  "BPf4f3": 568,
+  "BPf4g3(x)": 569,
+  "BPf5e4(x)": 570,
+  "BPf5f4": 571,
+  "BPf5g4(x)": 572,
+  "BPf6e5(x)": 573,
+  "BPf6f5": 574,
+  "BPf6g5(x)": 575,
+  "BPf7e6(x)": 576,
+  "BPf7f5": 577,
+  "BPf7f6": 578,
+  "BPf7g6(x)": 579,
+  "BPg2g1(Q)": 580,
+  "BPg3g2": 581,
+  "BPg4f3(x)": 582,
+  "BPg4g3": 583,
+  "BPg4h3(x)": 584,
+  "BPg5f4(x)": 585,
+  "BPg5g4": 586,
+  "BPg5h4(x)": 587,
+  "BPg6f5(x)": 588,
+  "BPg6g5": 589,
+  "BPg6h5(x)": 590,
+  "BPg7f6(x)": 591,
+  "BPg7g5": 592,
+  "BPg7g6": 593,
+  "BPg7h6(x)": 594,
+  "BPh2h1(Q)": 595,
+  "BPh3h2": 596,
+  "BPh4g3(x)": 597,
+  "BPh4h3": 598,
+  "BPh5g4(x)": 599,
+  "BPh5h4": 600,
+  "BPh6g5(x)": 601,
+  "BPh6h5": 602,
+  "BPh7g6(x)": 603,
+  "BPh7h5": 604,
+  "BPh7h6": 605,
+  "BQa5b6": 606,
+  "BQa5c7": 607,
+  "BQa5d8": 608,
+  "BQb2a2(x)": 609,
+  "BQb4b2(x)": 610,
+  "BQb6a5": 611,
+  "BQb6b2(x)": 612,
+  "BQb6c6": 613,
+  "BQb6c7": 614,
+  "BQb6d4(x)": 615,
+  "BQb6d8": 616,
+  "BQc7a5": 617,
+  "BQc7b6": 618,
+  "BQc7b7": 619,
+  "BQc7c6": 620,
+  "BQc7d6": 621,
+  "BQc7d6(x)": 622,
+  "BQc7d7": 623,
+  "BQc7d8": 624,
+  "BQc7e5(x)": 625,
+  "BQc7e7": 626,
+  "BQd5a5": 627,
+  "BQd5d6": 628,
+  "BQd5d8": 629,
+  "BQd6c6": 630,
+  "BQd6c7": 631,
+  "BQd6d7": 632,
+  "BQd6e6": 633,
+  "BQd6e7": 634,
+  "BQd7c6": 635,
+  "BQd7c7": 636,
+  "BQd7d6": 637,
+  "BQd7e6": 638,
+  "BQd7e7": 639,
+  "BQd7f5": 640,
+  "BQd7g4": 641,
+  "BQd8a5": 642,
+  "BQd8a5(+)": 643,
+  "BQd8a8(x)": 644,
+  "BQd8b6": 645,
+  "BQd8b6(+)": 646,
+  "BQd8b8": 647,
+  "BQd8c7": 648,
+  "BQd8c8": 649,
+  "BQd8d1(x)": 650,
+  "BQd8d1(x+)": 651,
+  "BQd8d4": 652,
+  "BQd8d4(x)": 653,
+  "BQd8d5": 654,
+  "BQd8d5(x)": 655,
+  "BQd8d6": 656,
+  "BQd8d6(x)": 657,
+  "BQd8d7": 658,
+  "BQd8d7(x)": 659,
+  "BQd8e7": 660,
+  "BQd8e7(+)": 661,
+  "BQd8e7(x)": 662,
+  "BQd8e8": 663,
+  "BQd8f6": 664,
+  "BQd8f6(x)": 665,
+  "BQd8f8": 666,
+  "BQd8g5": 667,
+  "BQd8g5(x)": 668,
+  "BQd8h4": 669,
+  "BQd8h4(+)": 670,
+  "BQd8h4(x)": 671,
+  "BQe7c5": 672,
+  "BQe7c7": 673,
+  "BQe7d6": 674,
+  "BQe7d7": 675,
+  "BQe7d8": 676,
+  "BQe7e5": 677,
+  "BQe7e5(x)": 678,
+  "BQe7e6": 679,
+  "BQe7e6(x)": 680,
+  "BQe7f6": 681,
+  "BQe7f6(x)": 682,
+  "BQe7f7": 683,
+  "BQe7g5": 684,
+  "BQe7h4": 685,
+  "BQf6d8": 686,
+  "BQf6e5(x)": 687,
+  "BQf6e6": 688,
+  "BQf6e7": 689,
+  "BQf6f3(x)": 690,
+  "BQf6f5": 691,
+  "BQf6g5": 692,
+  "BQf6g6": 693,
+  "BQg5f6": 694,
+  "BQg5g6": 695,
+  "BQg6f6": 696,
+  "BRa2a1(+)": 697,
+  "BRa2b2": 698,
+  "BRa8a1(x)": 699,
+  "BRa8a2(x)": 700,
+  "BRa8a6": 701,
+  "BRa8a6(x)": 702,
+  "BRa8a7": 703,
+  "BRa8b8": 704,
+  "BRa8c8": 705,
+  "BRa8c8(x)": 706,
+  "BRa8d8": 707,
+  "BRa8d8(x)": 708,
+  "BRa8e8": 709,
+  "BRa8e8(x)": 710,
+  "BRa8f8": 711,
+  "BRa8f8(x)": 712,
+  "BRa8g8": 713,
+  "BRa8h8": 714,
+  "BRb2a2(x)": 715,
+  "BRb8a8": 716,
+  "BRb8b2": 717,
+  "BRb8b2(x)": 718,
+  "BRb8b6": 719,
+  "BRb8b7": 720,
+  "BRb8b7(x)": 721,
+  "BRb8c8": 722,
+  "BRb8d8": 723,
+  "BRb8e8": 724,
+  "BRb8f8": 725,
+  "BRc2a2(x)": 726,
+  "BRc2b2(x)": 727,
+  "BRc8a8": 728,
+  "BRc8b8": 729,
+  "BRc8c1(x)": 730,
+  "BRc8c1(x+)": 731,
+  "BRc8c2": 732,
+  "BRc8c2(x)": 733,
+  "BRc8c3": 734,
+  "BRc8c3(x)": 735,
+  "BRc8c4": 736,
+  "BRc8c4(x)": 737,
+  "BRc8c5": 738,
+  "BRc8c5(x)": 739,
+  "BRc8c6": 740,
+  "BRc8c6(x)": 741,
+  "BRc8c7": 742,
+  "BRc8c7(x)": 743,
+  "BRc8d8": 744,
+  "BRc8e8": 745,
+  "BRc8f8": 746,
+  "BRd7c7": 747,
+  "BRd7e7": 748,
+  "BRd8a8": 749,
+  "BRd8b8": 750,
+  "BRd8c8": 751,
+  "BRd8d1(+)": 752,
+  "BRd8d1(x)": 753,
+  "BRd8d1(x+)": 754,
+  "BRd8d2": 755,
+  "BRd8d2(x)": 756,
+  "BRd8d3": 757,
+  "BRd8d3(x)": 758,
+  "BRd8d4": 759,
+  "BRd8d4(x)": 760,
+  "BRd8d5": 761,
+  "BRd8d5(x)": 762,
+  "BRd8d6": 763,
+  "BRd8d6(x)": 764,
+  "BRd8d7": 765,
+  "BRd8d7(x)": 766,
+  "BRd8e8": 767,
+  "BRd8f8": 768,
+  "BRd8g8": 769,
+  "BRd8h8": 770,
+  "BRe7d7": 771,
+  "BRe8b8": 772,
+  "BRe8c8": 773,
+  "BRe8d8": 774,
+  "BRe8d8(x)": 775,
+  "BRe8e1(+)": 776,
+  "BRe8e1(x)": 777,
+  "BRe8e1(x+)": 778,
+  "BRe8e2": 779,
+  "BRe8e2(x)": 780,
+  "BRe8e3": 781,
+  "BRe8e3(x)": 782,
+  "BRe8e4": 783,
+  "BRe8e4(x)": 784,
+  "BRe8e5": 785,
+  "BRe8e5(x)": 786,
+  "BRe8e6": 787,
+  "BRe8e6(x)": 788,
+  "BRe8e7": 789,
+  "BRe8e7(x)": 790,
+  "BRe8f8": 791,
+  "BRe8g8": 792,
+  "BRf6g6": 793,
+  "BRf7e7": 794,
+  "BRf7f8": 795,
+  "BRf8a8": 796,
+  "BRf8a8(x)": 797,
+  "BRf8b8": 798,
+  "BRf8c8": 799,
+  "BRf8c8(x)": 800,
+  "BRf8d8": 801,
+  "BRf8d8(x)": 802,
+  "BRf8e8": 803,
+  "BRf8e8(+)": 804,
+  "BRf8e8(x)": 805,
+  "BRf8f1(x+)": 806,
+  "BRf8f2(x)": 807,
+  "BRf8f3(x)": 808,
+  "BRf8f4": 809,
+  "BRf8f4(x)": 810,
+  "BRf8f5": 811,
+  "BRf8f5(x)": 812,
+  "BRf8f6": 813,
+  "BRf8f6(x)": 814,
+  "BRf8f7": 815,
+  "BRf8f7(x)": 816,
+  "BRf8g8": 817,
+  "BRf8h8": 818,
+  "BRg8f8": 819,
+  "BRg8g6": 820,
+  "BRg8g7": 821,
+  "BRg8h8": 822,
+  "BRh8c8": 823,
+  "BRh8d8": 824,
+  "BRh8e8": 825,
+  "BRh8f8": 826,
+  "BRh8g8": 827,
+  "BRh8h6": 828,
+  "BRh8h7": 829,
+  "WBa3b2": 830,
+  "WBa4b3": 831,
+  "WBa4c2": 832,
+  "WBb2a3": 833,
+  "WBb2c1": 834,
+  "WBb2c3": 835,
+  "WBb2c3(x)": 836,
+  "WBb2d4": 837,
+  "WBb2d4(x)": 838,
+  "WBb2e5(x)": 839,
+  "WBb2f6(x)": 840,
+  "WBb2g7(x)": 841,
+  "WBb3a2": 842,
+  "WBb3c2": 843,
+  "WBb3d5": 844,
+  "WBb3d5(x)": 845,
+  "WBb3e6(x)": 846,
+  "WBb5a4": 847,
+  "WBb5c4": 848,
+  "WBb5c6(x)": 849,
+  "WBb5c6(x+)": 850,
+  "WBb5d3": 851,
+  "WBb5d7(x)": 852,
+  "WBb5d7(x+)": 853,
+  "WBb5e2": 854,
+  "WBc1a3": 855,
+  "WBc1b2": 856,
+  "WBc1d2": 857,
+  "WBc1d2(x)": 858,
+  "WBc1e3": 859,
+  "WBc1e3(x)": 860,
+  "WBc1f4": 861,
+  "WBc1f4(x)": 862,
+  "WBc1g5": 863,
+  "WBc1g5(x)": 864,
+  "WBc1h6": 865,
+  "WBc1h6(x)": 866,
+  "WBc2b3": 867,
+  "WBc2e4(x)": 868,
+  "WBc3d2": 869,
+  "WBc4a2": 870,
+  "WBc4b3": 871,
+  "WBc4b5": 872,
+  "WBc4b5(+)": 873,
+  "WBc4d3": 874,
+  "WBc4d5": 875,
+  "WBc4d5(x)": 876,
+  "WBc4e2": 877,
+  "WBc4e6(x)": 878,
+  "WBc4f7(x)": 879,
+  "WBc4f7(x+)": 880,
+  "WBd2b4": 881,
+  "WBd2b4(x)": 882,
+  "WBd2c1": 883,
+  "WBd2c3": 884,
+  "WBd2c3(x)": 885,
+  "WBd2e1": 886,
+  "WBd2e3": 887,
+  "WBd2f4": 888,
+  "WBd2f4(x)": 889,
+  "WBd2g5": 890,
+  "WBd3a6(x)": 891,
+  "WBd3b1": 892,
+  "WBd3b5": 893,
+  "WBd3b5(x)": 894,
+  "WBd3c2": 895,
+  "WBd3c4": 896,
+  "WBd3c4(x)": 897,
+  "WBd3e2": 898,
+  "WBd3e4": 899,
+  "WBd3e4(x)": 900,
+  "WBd3f1": 901,
+  "WBd3f5": 902,
+  "WBd3f5(x)": 903,
+  "WBd3g6(x)": 904,
+  "WBd3h7(x+)": 905,
+  "WBd4e3": 906,
+  "WBd4f6(x)": 907,
+  "WBd5b3": 908,
+  "WBe2b5": 909,
+  "WBe2c4": 910,
+  "WBe2c4(x)": 911,
+  "WBe2d1": 912,
+  "WBe2d3": 913,
+  "WBe2f1": 914,
+  "WBe2f3": 915,
+  "WBe2f3(x)": 916,
+  "WBe2g4": 917,
+  "WBe2g4(x)": 918,
+  "WBe2h5": 919,
+  "WBe2h5(x)": 920,
+  "WBe3a7(x)": 921,
+  "WBe3b6(x)": 922,
+  "WBe3c1": 923,
+  "WBe3c5": 924,
+  "WBe3c5(x)": 925,
+  "WBe3d2": 926,
+  "WBe3d4": 927,
+  "WBe3d4(x)": 928,
+  "WBe3f2": 929,
+  "WBe3f4": 930,
+  "WBe3f4(x)": 931,
+  "WBe3g5": 932,
+  "WBe3g5(x)": 933,
+  "WBe3h6": 934,
+  "WBe3h6(x)": 935,
+  "WBe4d3": 936,
+  "WBe4f3": 937,
+  "WBe5f6(x)": 938,
+  "WBe5g3": 939,
+  "WBf1b5": 940,
+  "WBf1b5(+)": 941,
+  "WBf1c4": 942,
+  "WBf1c4(x)": 943,
+  "WBf1d3": 944,
+  "WBf1d3(x)": 945,
+  "WBf1e2": 946,
+  "WBf1g2": 947,
+  "WBf1h3": 948,
+  "WBf3b7(x)": 949,
+  "WBf3c6(x)": 950,
+  "WBf3d5(x)": 951,
+  "WBf3e2": 952,
+  "WBf3e4": 953,
+  "WBf3e4(x)": 954,
+  "WBf3g2": 955,
+  "WBf3g4": 956,
+  "WBf4c7(x)": 957,
+  "WBf4d2": 958,
+  "WBf4d6": 959,
+  "WBf4d6(x)": 960,
+  "WBf4e3": 961,
+  "WBf4e5": 962,
+  "WBf4e5(x)": 963,
+  "WBf4g3": 964,
+  "WBf4g5": 965,
+  "WBf4h2": 966,
+  "WBf4h6": 967,
+  "WBg2b7(x)": 968,
+  "WBg2c6(x)": 969,
+  "WBg2d5(x)": 970,
+  "WBg2e4": 971,
+  "WBg2e4(x)": 972,
+  "WBg2f1": 973,
+  "WBg2f3": 974,
+  "WBg2f3(x)": 975,
+  "WBg2h3": 976,
+  "WBg3d6(x)": 977,
+  "WBg3e5": 978,
+  "WBg3e5(x)": 979,
+  "WBg3f2": 980,
+  "WBg3h2": 981,
+  "WBg3h4": 982,
+  "WBg4f3": 983,
+  "WBg5d2": 984,
+  "WBg5d8(x)": 985,
+  "WBg5e3": 986,
+  "WBg5e7(x)": 987,
+  "WBg5f4": 988,
+  "WBg5f6": 989,
+  "WBg5f6(x)": 990,
+  "WBg5h4": 991,
+  "WBg5h6": 992,
+  "WBh4e7(x)": 993,
+  "WBh4f6(x)": 994,
+  "WBh4g3": 995,
+  "WBh6f8(x)": 996,
+  "WBh6g5": 997,
+  "WBh6g7(x)": 998,
+  "WKb1a1": 999,
+  "WKb1a2": 1000,
+  "WKb1b2": 1001,
+  "WKb1c1": 1002,
+  "WKb1c2": 1003,
+  "WKb2a3": 1004,
+  "WKb2b3": 1005,
+  "WKb2c2": 1006,
+  "WKb2c3": 1007,
+  "WKb3a4": 1008,
+  "WKb3c2": 1009,
+  "WKb3c4": 1010,
+  "WKc1b1": 1011,
+  "WKc1b2": 1012,
+  "WKc1c2": 1013,
+  "WKc1d1": 1014,
+  "WKc1d2": 1015,
+  "WKc2b1": 1016,
+  "WKc2b2": 1017,
+  "WKc2b3": 1018,
+  "WKc2c3": 1019,
+  "WKc2d2": 1020,
+  "WKc2d3": 1021,
+  "WKc3b2": 1022,
+  "WKc3b3": 1023,
+  "WKc3b4": 1024,
+  "WKc3c4": 1025,
+  "WKc3d2": 1026,
+  "WKc3d3": 1027,
+  "WKc3d4": 1028,
+  "WKc4b5": 1029,
+  "WKc4c5": 1030,
+  "WKc4d5": 1031,
+  "WKd1c1": 1032,
+  "WKd1c2": 1033,
+  "WKd1d2": 1034,
+  "WKd1e1": 1035,
+  "WKd1e2": 1036,
+  "WKd2c1": 1037,
+  "WKd2c2": 1038,
+  "WKd2c3": 1039,
+  "WKd2d1": 1040,
+  "WKd2d3": 1041,
+  "WKd2e1": 1042,
+  "WKd2e2": 1043,
+  "WKd2e3": 1044,
+  "WKd3c2": 1045,
+  "WKd3c3": 1046,
+  "WKd3c4": 1047,
+  "WKd3d2": 1048,
+  "WKd3d4": 1049,
+  "WKd3e2": 1050,
+  "WKd3e3": 1051,
+  "WKd3e4": 1052,
+  "WKd4c3": 1053,
+  "WKd4c4": 1054,
+  "WKd4c5": 1055,
+  "WKd4d5": 1056,
+  "WKd4e3": 1057,
+  "WKd4e4": 1058,
+  "WKd4e5": 1059,
+  "WKd5c6": 1060,
+  "WKe1c1(O)": 1061,
+  "WKe1d1": 1062,
+  "WKe1d1(x)": 1063,
+  "WKe1d2": 1064,
+  "WKe1d2(x)": 1065,
+  "WKe1e2": 1066,
+  "WKe1e2(x)": 1067,
+  "WKe1f1": 1068,
+  "WKe1f2": 1069,
+  "WKe1f2(x)": 1070,
+  "WKe1g1(o)": 1071,
+  "WKe2d1": 1072,
+  "WKe2d2": 1073,
+  "WKe2d3": 1074,
+  "WKe2e1": 1075,
+  "WKe2e3": 1076,
+  "WKe2f1": 1077,
+  "WKe2f2": 1078,
+  "WKe2f3": 1079,
+  "WKe3d2": 1080,
+  "WKe3d3": 1081,
+  "WKe3d4": 1082,
+  "WKe3e2": 1083,
+  "WKe3e4": 1084,
+  "WKe3f2": 1085,
+  "WKe3f3": 1086,
+  "WKe3f4": 1087,
+  "WKe4d3": 1088,
+  "WKe4d4": 1089,
+  "WKe4d5": 1090,
+  "WKe4e3": 1091,
+  "WKe4e5": 1092,
+  "WKe4f3": 1093,
+  "WKe4f4": 1094,
+  "WKe4f5": 1095,
+  "WKe5d6": 1096,
+  "WKe5f6": 1097,
+  "WKf1e1": 1098,
+  "WKf1e2": 1099,
+  "WKf1f2": 1100,
+  "WKf1g1": 1101,
+  "WKf1g2": 1102,
+  "WKf2e1": 1103,
+  "WKf2e2": 1104,
+  "WKf2e3": 1105,
+  "WKf2f1": 1106,
+  "WKf2f3": 1107,
+  "WKf2g1": 1108,
+  "WKf2g2": 1109,
+  "WKf2g3": 1110,
+  "WKf3e2": 1111,
+  "WKf3e3": 1112,
+  "WKf3e4": 1113,
+  "WKf3f2": 1114,
+  "WKf3f4": 1115,
+  "WKf3g2": 1116,
+  "WKf3g3": 1117,
+  "WKf3g4": 1118,
+  "WKf4e3": 1119,
+  "WKf4e4": 1120,
+  "WKf4e5": 1121,
+  "WKf4f3": 1122,
+  "WKf4f5": 1123,
+  "WKf4g3": 1124,
+  "WKf4g4": 1125,
+  "WKf4g5": 1126,
+  "WKg1f1": 1127,
+  "WKg1f1(x)": 1128,
+  "WKg1f2": 1129,
+  "WKg1f2(x)": 1130,
+  "WKg1g2": 1131,
+  "WKg1g2(x)": 1132,
+  "WKg1h1": 1133,
+  "WKg1h2": 1134,
+  "WKg1h2(x)": 1135,
+  "WKg2f1": 1136,
+  "WKg2f2": 1137,
+  "WKg2f3": 1138,
+  "WKg2g1": 1139,
+  "WKg2g3": 1140,
+  "WKg2h1": 1141,
+  "WKg2h2": 1142,
+  "WKg2h3": 1143,
+  "WKg3f2": 1144,
+  "WKg3f3": 1145,
+  "WKg3f4": 1146,
+  "WKg3g2": 1147,
+  "WKg3g4": 1148,
+  "WKg3h2": 1149,
+  "WKg3h3": 1150,
+  "WKg3h4": 1151,
+  "WKg4f3": 1152,
+  "WKg4f4": 1153,
+  "WKg4f5": 1154,
+  "WKg4g3": 1155,
+  "WKg4g5": 1156,
+  "WKg4h3": 1157,
+  "WKg4h5": 1158,
+  "WKg5f6": 1159,
+  "WKh1g1": 1160,
+  "WKh1g2": 1161,
+  "WKh1h2": 1162,
+  "WKh2g1": 1163,
+  "WKh2g2": 1164,
+  "WKh2g3": 1165,
+  "WKh2h1": 1166,
+  "WKh2h3": 1167,
+  "WKh3g2": 1168,
+  "WKh3g3": 1169,
+  "WKh3g4": 1170,
+  "WKh3h2": 1171,
+  "WKh3h4": 1172,
+  "WKh4g3": 1173,
+  "WKh4g5": 1174,
+  "WKh4h5": 1175,
+  "WNa3b5": 1176,
+  "WNa3c2": 1177,
+  "WNa3c4": 1178,
+  "WNa4c3": 1179,
+  "WNa4c5": 1180,
+  "WNa4c5(x)": 1181,
+  "WNb1a3": 1182,
+  "WNb1c3": 1183,
+  "WNb1c3(x)": 1184,
+  "WNb1d2": 1185,
+  "WNb1d2(x)": 1186,
+  "WNb3c5": 1187,
+  "WNb3d2": 1188,
+  "WNb3d4": 1189,
+  "WNb5a3": 1190,
+  "WNb5c3": 1191,
+  "WNb5c7": 1192,
+  "WNb5d4": 1193,
+  "WNb5d6": 1194,
+  "WNb5d6(+)": 1195,
+  "WNb5d6(x)": 1196,
+  "WNc2e3": 1197,
+  "WNc3a2": 1198,
+  "WNc3a4": 1199,
+  "WNc3b1": 1200,
+  "WNc3b5": 1201,
+  "WNc3b5(x)": 1202,
+  "WNc3d1": 1203,
+  "WNc3d1(x)": 1204,
+  "WNc3d5": 1205,
+  "WNc3d5(x)": 1206,
+  "WNc3e2": 1207,
+  "WNc3e2(x)": 1208,
+  "WNc3e4": 1209,
+  "WNc3e4(x)": 1210,
+  "WNc4d2": 1211,
+  "WNc4d6": 1212,
+  "WNc4e3": 1213,
+  "WNc4e5": 1214,
+  "WNc4e5(x)": 1215,
+  "WNc5d3": 1216,
+  "WNc7a8(x)": 1217,
+  "WNd1e3": 1218,
+  "WNd2b1": 1219,
+  "WNd2b3": 1220,
+  "WNd2c4": 1221,
+  "WNd2c4(x)": 1222,
+  "WNd2e4": 1223,
+  "WNd2e4(x)": 1224,
+  "WNd2f1": 1225,
+  "WNd2f3": 1226,
+  "WNd2f3(x)": 1227,
+  "WNd3e5": 1228,
+  "WNd3f4": 1229,
+  "WNd4b3": 1230,
+  "WNd4b5": 1231,
+  "WNd4c6": 1232,
+  "WNd4c6(x)": 1233,
+  "WNd4e2": 1234,
+  "WNd4e6": 1235,
+  "WNd4e6(x)": 1236,
+  "WNd4f3": 1237,
+  "WNd4f5": 1238,
+  "WNd4f5(x)": 1239,
+  "WNd5c3": 1240,
+  "WNd5c7(x)": 1241,
+  "WNd5e3": 1242,
+  "WNd5e7(+)": 1243,
+  "WNd5e7(x)": 1244,
+  "WNd5e7(x+)": 1245,
+  "WNd5f4": 1246,
+  "WNd5f6(+)": 1247,
+  "WNd5f6(x+)": 1248,
+  "WNd6b7(x)": 1249,
+  "WNe1f3": 1250,
+  "WNe2c3": 1251,
+  "WNe2c3(x)": 1252,
+  "WNe2d4": 1253,
+  "WNe2d4(x)": 1254,
+  "WNe2f4": 1255,
+  "WNe2f4(x)": 1256,
+  "WNe2g3": 1257,
+  "WNe3c4": 1258,
+  "WNe3d5": 1259,
+  "WNe3f5": 1260,
+  "WNe3g4": 1261,
+  "WNe4c3": 1262,
+  "WNe4c5": 1263,
+  "WNe4c5(x)": 1264,
+  "WNe4d2": 1265,
+  "WNe4d6": 1266,
+  "WNe4d6(+)": 1267,
+  "WNe4d6(x)": 1268,
+  "WNe4f6(+)": 1269,
+  "WNe4f6(x+)": 1270,
+  "WNe4g3": 1271,
+  "WNe4g5": 1272,
+  "WNe5c4": 1273,
+  "WNe5c4(x)": 1274,
+  "WNe5c6": 1275,
+  "WNe5c6(x)": 1276,
+  "WNe5d3": 1277,
+  "WNe5d7": 1278,
+  "WNe5d7(x)": 1279,
+  "WNe5f3": 1280,
+  "WNe5f7(x)": 1281,
+  "WNe5g4": 1282,
+  "WNe5g4(x)": 1283,
+  "WNe5g6": 1284,
+  "WNe5g6(x)": 1285,
+  "WNe6f8(x)": 1286,
+  "WNf1e3": 1287,
+  "WNf1g3": 1288,
+  "WNf3d2": 1289,
+  "WNf3d2(x)": 1290,
+  "WNf3d4": 1291,
+  "WNf3d4(x)": 1292,
+  "WNf3e1": 1293,
+  "WNf3e5": 1294,
+  "WNf3e5(+)": 1295,
+  "WNf3e5(x)": 1296,
+  "WNf3g1": 1297,
+  "WNf3g5": 1298,
+  "WNf3g5(+)": 1299,
+  "WNf3g5(x)": 1300,
+  "WNf3h2": 1301,
+  "WNf3h4": 1302,
+  "WNf3h4(x)": 1303,
+  "WNf4d3": 1304,
+  "WNf4d5": 1305,
+  "WNf4d5(x)": 1306,
+  "WNf4e6(x)": 1307,
+  "WNf4h5": 1308,
+  "WNf5e3": 1309,
+  "WNf5e7(+)": 1310,
+  "WNf7h8(x)": 1311,
+  "WNg1e2": 1312,
+  "WNg1f3": 1313,
+  "WNg1f3(x)": 1314,
+  "WNg1h3": 1315,
+  "WNg3e2": 1316,
+  "WNg3e4": 1317,
+  "WNg3e4(x)": 1318,
+  "WNg3f5": 1319,
+  "WNg3f5(x)": 1320,
+  "WNg3h5": 1321,
+  "WNg4e3": 1322,
+  "WNg4e5": 1323,
+  "WNg5e4": 1324,
+  "WNg5e4(x)": 1325,
+  "WNg5e6": 1326,
+  "WNg5e6(x)": 1327,
+  "WNg5f3": 1328,
+  "WNg5f7(x)": 1329,
+  "WNg5h3": 1330,
+  "WNh2f3": 1331,
+  "WNh2g4": 1332,
+  "WNh3f2": 1333,
+  "WNh3f4": 1334,
+  "WNh3g5": 1335,
+  "WNh4f3": 1336,
+  "WNh4f5": 1337,
+  "WNh4f5(x)": 1338,
+  "WNh4g6(x)": 1339,
+  "WPa2a3": 1340,
+  "WPa2a4": 1341,
+  "WPa2b3(x)": 1342,
+  "WPa3a4": 1343,
+  "WPa3b4(x)": 1344,
+  "WPa4a5": 1345,
+  "WPa4b5(x)": 1346,
+  "WPa5a6": 1347,
+  "WPa5b6(x)": 1348,
+  "WPa6a7": 1349,
+  "WPa7a8(Q)": 1350,
+  "WPb2a3(x)": 1351,
+  "WPb2b3": 1352,
+  "WPb2b4": 1353,
+  "WPb2c3(x)": 1354,
+  "WPb3a4(x)": 1355,
+  "WPb3b4": 1356,
+  "WPb3c4(x)": 1357,
+  "WPb4a5(x)": 1358,
+  "WPb4b5": 1359,
+  "WPb4c5(x)": 1360,
+  "WPb5a6(x)": 1361,
+  "WPb5b6": 1362,
+  "WPb5c6(x)": 1363,
+  "WPb6b7": 1364,
+  "WPb7b8(Q)": 1365,
+  "WPc2b3(x)": 1366,
+  "WPc2c3": 1367,
+  "WPc2c4": 1368,
+  "WPc2d3(x)": 1369,
+  "WPc3b4(x)": 1370,
+  "WPc3c4": 1371,
+  "WPc3d4(x)": 1372,
+  "WPc4b5(x)": 1373,
+  "WPc4c5": 1374,
+  "WPc4d5(x)": 1375,
+  "WPc5b6(x)": 1376,
+  "WPc5c6": 1377,
+  "WPc5d6(x)": 1378,
+  "WPc6c7": 1379,
+  "WPc7c8(Q)": 1380,
+  "WPd2c3(x)": 1381,
+  "WPd2d3": 1382,
+  "WPd2d4": 1383,
+  "WPd3c4(x)": 1384,
+  "WPd3d4": 1385,
+  "WPd3e4(x)": 1386,
+  "WPd4c5(x)": 1387,
+  "WPd4d5": 1388,
+  "WPd4e5(x)": 1389,
+  "WPd5c6(x)": 1390,
+  "WPd5d6": 1391,
+  "WPd5e6(x)": 1392,
+  "WPd6d7": 1393,
+  "WPd7d8(Q)": 1394,
+  "WPe2e3": 1395,
+  "WPe2e4": 1396,
+  "WPe3d4(x)": 1397,
+  "WPe3e4": 1398,
+  "WPe3f4(x)": 1399,
+  "WPe4d5(x)": 1400,
+  "WPe4e5": 1401,
+  "WPe4f5(x)": 1402,
+  "WPe5d6(x)": 1403,
+  "WPe5e6": 1404,
+  "WPe5f6(x)": 1405,
+  "WPe5f6(xE)": 1406,
+  "WPe6e7": 1407,
+  "WPe6f7(x+)": 1408,
+  "WPf2e3(x)": 1409,
+  "WPf2f3": 1410,
+  "WPf2f4": 1411,
+  "WPf2g3(x)": 1412,
+  "WPf3e4(x)": 1413,
+  "WPf3f4": 1414,
+  "WPf3g4(x)": 1415,
+  "WPf4e5(x)": 1416,
+  "WPf4f5": 1417,
+  "WPf4g5(x)": 1418,
+  "WPf5e6(x)": 1419,
+  "WPf5f6": 1420,
+  "WPf5g6(x)": 1421,
+  "WPf6f7": 1422,
+  "WPg2f3(x)": 1423,
+  "WPg2g3": 1424,
+  "WPg2g4": 1425,
+  "WPg2h3(x)": 1426,
+  "WPg3f4(x)": 1427,
+  "WPg3g4": 1428,
+  "WPg3h4(x)": 1429,
+  "WPg4f5(x)": 1430,
+  "WPg4g5": 1431,
+  "WPg4h5(x)": 1432,
+  "WPg5f6(x)": 1433,
+  "WPg5g6": 1434,
+  "WPg5h6(x)": 1435,
+  "WPg6g7": 1436,
+  "WPg7g8(Q)": 1437,
+  "WPh2g3(x)": 1438,
+  "WPh2h3": 1439,
+  "WPh2h4": 1440,
+  "WPh3g4(x)": 1441,
+  "WPh3h4": 1442,
+  "WPh4g5(x)": 1443,
+  "WPh4h5": 1444,
+  "WPh5g6(x)": 1445,
+  "WPh5h6": 1446,
+  "WPh6h7": 1447,
+  "WPh7h8(Q)": 1448,
+  "WQa4b3": 1449,
+  "WQa4c2": 1450,
+  "WQb3b7(x)": 1451,
+  "WQb3c2": 1452,
+  "WQb3d1": 1453,
+  "WQc2b3": 1454,
+  "WQc2c3": 1455,
+  "WQc2d2": 1456,
+  "WQc2d3": 1457,
+  "WQc2e2": 1458,
+  "WQc2e4(x)": 1459,
+  "WQd1a1(x)": 1460,
+  "WQd1a4": 1461,
+  "WQd1a4(+)": 1462,
+  "WQd1b3": 1463,
+  "WQd1c1": 1464,
+  "WQd1c2": 1465,
+  "WQd1d2": 1466,
+  "WQd1d2(x)": 1467,
+  "WQd1d3": 1468,
+  "WQd1d3(x)": 1469,
+  "WQd1d4": 1470,
+  "WQd1d4(x)": 1471,
+  "WQd1d5": 1472,
+  "WQd1d5(x)": 1473,
+  "WQd1d6(x)": 1474,
+  "WQd1d8(x)": 1475,
+  "WQd1d8(x+)": 1476,
+  "WQd1e1": 1477,
+  "WQd1e2": 1478,
+  "WQd1e2(+)": 1479,
+  "WQd1e2(x)": 1480,
+  "WQd1f3": 1481,
+  "WQd1f3(x)": 1482,
+  "WQd1g4": 1483,
+  "WQd1g4(x)": 1484,
+  "WQd1h5": 1485,
+  "WQd1h5(+)": 1486,
+  "WQd1h5(x)": 1487,
+  "WQd2c2": 1488,
+  "WQd2c3": 1489,
+  "WQd2d3": 1490,
+  "WQd2e2": 1491,
+  "WQd2e3": 1492,
+  "WQd2e3(x)": 1493,
+  "WQd2f2": 1494,
+  "WQd2f4": 1495,
+  "WQd2f4(x)": 1496,
+  "WQd2g5": 1497,
+  "WQd2h6(x)": 1498,
+  "WQd3c2": 1499,
+  "WQd3d2": 1500,
+  "WQd3e2": 1501,
+  "WQd3e3": 1502,
+  "WQd3e4(x)": 1503,
+  "WQd3f3": 1504,
+  "WQd3g3": 1505,
+  "WQd4d1": 1506,
+  "WQd4d3": 1507,
+  "WQd4e3": 1508,
+  "WQe2c2": 1509,
+  "WQe2c4": 1510,
+  "WQe2c4(x)": 1511,
+  "WQe2d1": 1512,
+  "WQe2d2": 1513,
+  "WQe2d3": 1514,
+  "WQe2e3": 1515,
+  "WQe2e3(x)": 1516,
+  "WQe2e4": 1517,
+  "WQe2e4(x)": 1518,
+  "WQe2f2": 1519,
+  "WQe2f3": 1520,
+  "WQe2f3(x)": 1521,
+  "WQe2g4": 1522,
+  "WQe2h5": 1523,
+  "WQe3e2": 1524,
+  "WQe3f3": 1525,
+  "WQe3g3": 1526,
+  "WQf3b7(x)": 1527,
+  "WQf3d1": 1528,
+  "WQf3d3": 1529,
+  "WQf3d5(x)": 1530,
+  "WQf3e2": 1531,
+  "WQf3e3": 1532,
+  "WQf3e4(x)": 1533,
+  "WQf3f4": 1534,
+  "WQf3f6(x)": 1535,
+  "WQf3g3": 1536,
+  "WQf3g4": 1537,
+  "WQf3h3": 1538,
+  "WQf3h5": 1539,
+  "WQg3f3": 1540,
+  "WQg3g4": 1541,
+  "WQg3h4": 1542,
+  "WQg4f3": 1543,
+  "WQg4g3": 1544,
+  "WQh5f3": 1545,
+  "WRa1a2": 1546,
+  "WRa1a3": 1547,
+  "WRa1a6(x)": 1548,
+  "WRa1a7(x)": 1549,
+  "WRa1a8(x)": 1550,
+  "WRa1b1": 1551,
+  "WRa1c1": 1552,
+  "WRa1c1(x)": 1553,
+  "WRa1d1": 1554,
+  "WRa1d1(x)": 1555,
+  "WRa1e1": 1556,
+  "WRa1e1(x)": 1557,
+  "WRa1f1": 1558,
+  "WRa1f1(x)": 1559,
+  "WRa1g1": 1560,
+  "WRa1h1": 1561,
+  "WRb1a1": 1562,
+  "WRb1b2": 1563,
+  "WRb1b2(x)": 1564,
+  "WRb1b3": 1565,
+  "WRb1b7": 1566,
+  "WRb1b7(x)": 1567,
+  "WRb1c1": 1568,
+  "WRb1d1": 1569,
+  "WRb1e1": 1570,
+  "WRb1f1": 1571,
+  "WRb7a7(x)": 1572,
+  "WRc1a1": 1573,
+  "WRc1b1": 1574,
+  "WRc1c2": 1575,
+  "WRc1c2(x)": 1576,
+  "WRc1c3": 1577,
+  "WRc1c3(x)": 1578,
+  "WRc1c4(x)": 1579,
+  "WRc1c5": 1580,
+  "WRc1c5(x)": 1581,
+  "WRc1c6(x)": 1582,
+  "WRc1c7": 1583,
+  "WRc1c7(x)": 1584,
+  "WRc1c8(x)": 1585,
+  "WRc1d1": 1586,
+  "WRc1e1": 1587,
+  "WRc1f1": 1588,
+  "WRc7b7(x)": 1589,
+  "WRd1a1": 1590,
+  "WRd1b1": 1591,
+  "WRd1c1": 1592,
+  "WRd1d2": 1593,
+  "WRd1d2(x)": 1594,
+  "WRd1d3": 1595,
+  "WRd1d3(x)": 1596,
+  "WRd1d4": 1597,
+  "WRd1d4(x)": 1598,
+  "WRd1d5": 1599,
+  "WRd1d5(x)": 1600,
+  "WRd1d6": 1601,
+  "WRd1d6(x)": 1602,
+  "WRd1d7": 1603,
+  "WRd1d7(+)": 1604,
+  "WRd1d7(x)": 1605,
+  "WRd1d8(+)": 1606,
+  "WRd1d8(x)": 1607,
+  "WRd1d8(x+)": 1608,
+  "WRd1e1": 1609,
+  "WRd1e1(x)": 1610,
+  "WRd1f1": 1611,
+  "WRd1g1": 1612,
+  "WRd1h1": 1613,
+  "WRd2e2": 1614,
+  "WRd7b7(x)": 1615,
+  "WRe1a1": 1616,
+  "WRe1b1": 1617,
+  "WRe1c1": 1618,
+  "WRe1d1": 1619,
+  "WRe1d1(x)": 1620,
+  "WRe1e2": 1621,
+  "WRe1e2(x)": 1622,
+  "WRe1e3": 1623,
+  "WRe1e3(x)": 1624,
+  "WRe1e4": 1625,
+  "WRe1e4(x)": 1626,
+  "WRe1e5": 1627,
+  "WRe1e5(x)": 1628,
+  "WRe1e6": 1629,
+  "WRe1e6(x)": 1630,
+  "WRe1e7": 1631,
+  "WRe1e7(x)": 1632,
+  "WRe1e8(+)": 1633,
+  "WRe1e8(x)": 1634,
+  "WRe1e8(x+)": 1635,
+  "WRe1f1": 1636,
+  "WRe1g1": 1637,
+  "WRe1h1": 1638,
+  "WRe2d2": 1639,
+  "WRe2e3": 1640,
+  "WRe3f3": 1641,
+  "WRe3g3": 1642,
+  "WRf1a1": 1643,
+  "WRf1a1(x)": 1644,
+  "WRf1b1": 1645,
+  "WRf1c1": 1646,
+  "WRf1c1(x)": 1647,
+  "WRf1d1": 1648,
+  "WRf1d1(x)": 1649,
+  "WRf1e1": 1650,
+  "WRf1e1(+)": 1651,
+  "WRf1e1(x)": 1652,
+  "WRf1f2": 1653,
+  "WRf1f2(x)": 1654,
+  "WRf1f3": 1655,
+  "WRf1f3(x)": 1656,
+  "WRf1f4": 1657,
+  "WRf1f4(x)": 1658,
+  "WRf1f5(x)": 1659,
+  "WRf1f6(x)": 1660,
+  "WRf1f7": 1661,
+  "WRf1f7(x)": 1662,
+  "WRf1f8(x+)": 1663,
+  "WRf1g1": 1664,
+  "WRf1h1": 1665,
+  "WRf2e2": 1666,
+  "WRf3g3": 1667,
+  "WRf3h3": 1668,
+  "WRg1e1": 1669,
+  "WRg1f1": 1670,
+  "WRg1g2": 1671,
+  "WRg1g3": 1672,
+  "WRg1h1": 1673,
+  "WRh1c1": 1674,
+  "WRh1d1": 1675,
+  "WRh1e1": 1676,
+  "WRh1f1": 1677,
+  "WRh1g1": 1678,
+  "WRh1h2": 1679,
+  "WRh1h3": 1680,
+  "WRh1h5(x)": 1681
+}

pyproject.toml CHANGED Viewed

@@ -23,7 +23,7 @@ classifiers = [
 ]
 dependencies = [
     "torch>=2.0.0",
-    "transformers>=4.40.0",
     "accelerate>=0.26.0",
     "datasets>=2.14.0",
     "python-chess>=1.999",
@@ -39,12 +39,8 @@ dev = [
     "black>=23.0.0",
     "ruff>=0.1.0",
 ]
-eval = [
-    "stockfish>=3.28.0",
-]
 [project.scripts]
-chess-train = "src.train:main"
 chess-eval = "src.evaluate:main"
 [tool.setuptools.packages.find]

 ]
 dependencies = [
     "torch>=2.0.0",
+    "transformers>=4.40.0,<5.0.0",
     "accelerate>=0.26.0",
     "datasets>=2.14.0",
     "python-chess>=1.999",
     "black>=23.0.0",
     "ruff>=0.1.0",
 ]
 [project.scripts]
 chess-eval = "src.evaluate:main"
 [tool.setuptools.packages.find]

src/__init__.py CHANGED Viewed

@@ -1,22 +1,20 @@
-"""Chess Challenge source module."""
-from .model import ChessConfig, ChessForCausalLM
-from .tokenizer import ChessTokenizer
-# Lazy import for evaluate to avoid RuntimeWarning when running as module
 def __getattr__(name):
     if name == "ChessEvaluator":
         from .evaluate import ChessEvaluator
         return ChessEvaluator
-    if name == "load_model_from_hub":
-        from .evaluate import load_model_from_hub
-        return load_model_from_hub
     raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
 __all__ = [
-    "ChessConfig",
-    "ChessForCausalLM",
-    "ChessTokenizer",
     "ChessEvaluator",
-    "load_model_from_hub",
 ]

+"""Chess Challenge evaluation module."""
+# Lazy imports to avoid circular dependencies
 def __getattr__(name):
     if name == "ChessEvaluator":
         from .evaluate import ChessEvaluator
         return ChessEvaluator
+    if name == "load_model_and_tokenizer":
+        from .evaluate import load_model_and_tokenizer
+        return load_model_and_tokenizer
+    if name == "count_parameters":
+        from .evaluate import count_parameters
+        return count_parameters
     raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
 __all__ = [
     "ChessEvaluator",
+    "load_model_and_tokenizer",
+    "count_parameters",
 ]

src/__main__.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""
+CLI entry point for running evaluation as a module.
+Usage:
+    python -m src --model ./my_model/final
+    python -m src --model username/model-name
+"""
+import argparse
+import sys
+from .evaluate import evaluate_model
+def main():
+    parser = argparse.ArgumentParser(
+        description="Evaluate a chess model",
+        prog="python -m src",
+    )
+    parser.add_argument(
+        "--model",
+        "-m",
+        type=str,
+        required=True,
+        help="Path to model directory or HuggingFace model ID",
+    )
+    parser.add_argument(
+        "--quiet",
+        "-q",
+        action="store_true",
+        help="Suppress progress output",
+    )
+    args = parser.parse_args()
+    result = evaluate_model(args.model, verbose=not args.quiet)
+    print()
+    print(result.summary())
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

src/evaluate.py CHANGED Viewed

@@ -1,838 +1,1002 @@
 """
 Evaluation script for the Chess Challenge.
-This script evaluates a trained chess model by playing games against
-Stockfish and computing ELO ratings.
 """
 from __future__ import annotations
 import argparse
 import random
 import re
-from dataclasses import dataclass
 from typing import List, Optional, Tuple
 import torch
 @dataclass
-class GameResult:
-    """Result of a single game."""
-    moves: List[str]
-    result: str  # "1-0", "0-1", or "1/2-1/2"
-    model_color: str  # "white" or "black"
-    termination: str  # "checkmate", "stalemate", "illegal_move", "max_moves", etc.
-    illegal_move_count: int
-class ChessEvaluator:
-    """
-    Evaluator for chess models.
-    This class handles playing games between a trained model and Stockfish,
-    tracking results, and computing ELO ratings.
-    Supports any tokenization format as long as the model generates valid
-    chess squares (e.g., e2, e4). The evaluator extracts UCI moves by finding
-    square patterns in the generated output.
     """
-    # Regex pattern to match chess squares
-    SQUARE_PATTERN = r'[a-h][1-8]'
-    def __init__(
-        self,
-        model,
-        tokenizer,
-        stockfish_path: Optional[str] = None,
-        stockfish_level: int = 1,
-        max_retries: int = 3,
-        device: str = "cuda" if torch.cuda.is_available() else "cpu",
-    ):
-        """
-        Initialize the evaluator.
-        Args:
-            model: The trained chess model.
-            tokenizer: The chess tokenizer.
-            stockfish_path: Path to Stockfish executable.
-            stockfish_level: Stockfish skill level (0-20).
-            max_retries: Maximum retries for illegal moves.
-            device: Device to run the model on.
-        """
-        self.model = model.to(device)
-        self.tokenizer = tokenizer
-        self.max_retries = max_retries
-        self.device = device
-        # Initialize Stockfish
-        try:
-            import chess
-            import chess.engine
-            self.chess = chess
-            if stockfish_path is None:
-                # Try common paths
-                import shutil
-                stockfish_path = shutil.which("stockfish")
-            if stockfish_path:
-                self.engine = chess.engine.SimpleEngine.popen_uci(stockfish_path)
-                self.engine.configure({"Skill Level": stockfish_level})
-            else:
-                print("WARNING: Stockfish not found. Install it for full evaluation.")
-                self.engine = None
-        except ImportError:
-            raise ImportError(
-                "python-chess is required for evaluation. "
-                "Install it with: pip install python-chess"
-            )
-    def __del__(self):
-        """Clean up Stockfish engine."""
-        if hasattr(self, 'engine') and self.engine:
-            self.engine.quit()
-    def _detect_tokenizer_format(self) -> str:
-        """
-        Detect the tokenizer's expected move format by testing tokenization.
-        Tests various formats with a sample move and picks the one that
-        produces the fewest unknown tokens. This makes evaluation work
-        with any tokenizer format.
-        Supported formats:
-        - 'decomposed': "WP e2_f e4_t" (piece, from_suffix, to_suffix)
-        - 'standard': "WPe2e4" (combined with optional annotations)
-        - 'uci': "e2e4" (pure UCI notation)
-        - 'uci_spaced': "e2 e4" (UCI with space separator)
-        Returns:
-            The format string that best matches the tokenizer's vocabulary.
-        """
-        if hasattr(self, '_cached_format'):
-            return self._cached_format
-        # Sample move representations to test
-        test_formats = {
-            'decomposed': "WP e2_f e4_t",
-            'standard': "WPe2e4",
-            'uci': "e2e4",
-            'uci_spaced': "e2 e4",
-        }
-        unk_token_id = getattr(self.tokenizer, 'unk_token_id', None)
-        best_format = 'standard'
-        min_unk_count = float('inf')
-        for fmt, sample in test_formats.items():
-            try:
-                tokens = self.tokenizer.encode(sample, add_special_tokens=False)
-                # Count unknown tokens
-                unk_count = tokens.count(unk_token_id) if unk_token_id is not None else 0
-                # Also penalize if the entire thing became one UNK
-                if len(tokens) == 1 and unk_count == 1:
-                    unk_count = 100  # Heavy penalty
-                if unk_count < min_unk_count:
-                    min_unk_count = unk_count
-                    best_format = fmt
-            except Exception:
-                continue
-        self._cached_format = best_format
-        return best_format
-    def _format_move(self, color: str, piece: str, from_sq: str, to_sq: str,
-                     promotion: str = None) -> str:
-        """
-        Format a single move according to the detected tokenizer format.
-        Args:
-            color: 'W' or 'B'
-            piece: Piece letter (P, N, B, R, Q, K)
-            from_sq: Source square (e.g., 'e2')
-            to_sq: Destination square (e.g., 'e4')
-            promotion: Promotion piece letter or None
-        Returns:
-            Formatted move string.
-        """
-        fmt = self._detect_tokenizer_format()
-        if fmt == 'decomposed':
-            move_str = f"{color}{piece} {from_sq}_f {to_sq}_t"
-        elif fmt == 'uci':
-            move_str = f"{from_sq}{to_sq}"
-            if promotion:
-                move_str += promotion.lower()
-        elif fmt == 'uci_spaced':
-            move_str = f"{from_sq} {to_sq}"
-            if promotion:
-                move_str += f" {promotion.lower()}"
-        else:  # standard
-            move_str = f"{color}{piece}{from_sq}{to_sq}"
-            if promotion:
-                move_str += f"={promotion}"
-        return move_str
-    def _convert_board_to_moves(self, board) -> str:
-        """
-        Convert board move history to model input format.
-        Automatically detects the tokenizer's expected format and outputs
-        moves accordingly. Supports any tokenization strategy.
-        """
-        moves = []
-        temp_board = self.chess.Board()
-        fmt = self._detect_tokenizer_format()
-        for move in board.move_stack:
-            # Get piece and color
-            color = "W" if temp_board.turn == self.chess.WHITE else "B"
-            piece = temp_board.piece_at(move.from_square)
-            piece_letter = piece.symbol().upper() if piece else "P"
-            # Get squares
-            from_sq = self.chess.square_name(move.from_square)
-            to_sq = self.chess.square_name(move.to_square)
-            # Get promotion piece if any
-            promo = None
-            if move.promotion:
-                promo = self.chess.piece_symbol(move.promotion).upper()
-            # Format based on detected tokenizer format
-            move_str = self._format_move(color, piece_letter, from_sq, to_sq, promo)
-            # For standard format, add annotations (capture, check, castling)
-            if fmt == 'standard':
-                # Add capture suffix
-                if temp_board.is_capture(move):
-                    move_str += "(x)"
-                # Push move to check for check/checkmate
-                temp_board.push(move)
-                if temp_board.is_checkmate():
-                    if "(x)" in move_str:
-                        move_str = move_str.replace("(x)", "(x+*)")
-                    else:
-                        move_str += "(+*)"
-                elif temp_board.is_check():
-                    if "(x)" in move_str:
-                        move_str = move_str.replace("(x)", "(x+)")
-                    else:
-                        move_str += "(+)"
-                # Handle castling notation
-                if piece_letter == "K":
-                    if abs(ord(from_sq[0]) - ord(to_sq[0])) > 1:
-                        if to_sq[0] == 'g':  # Kingside
-                            move_str = move_str.split("(")[0] + "(o)"
-                        else:  # Queenside
-                            move_str = move_str.split("(")[0] + "(O)"
             else:
-                # For non-standard formats, just push the move
-                temp_board.push(move)
-            moves.append(move_str)
-        return " ".join(moves)
-    def _is_separator_token(self, token_str: str) -> bool:
-        """
-        Check if a token represents a separator (whitespace, EOS, etc.).
-        This allows the evaluator to work with different tokenization strategies:
-        - Move-level tokenizers: each move is one token, no separators generated
-        - Character-level tokenizers: space character marks end of move
-        - BPE/subword tokenizers: may generate partial moves
-        Args:
-            token_str: The decoded token string.
-        Returns:
-            True if this token indicates end of a move.
-        """
-        # Check for EOS token
         if hasattr(self.tokenizer, 'eos_token') and token_str == self.tokenizer.eos_token:
             return True
-        # Check for whitespace (space, newline, etc.)
-        if token_str.strip() == "" and len(token_str) > 0:
-            return True
-        # Check if the token ends with whitespace (some tokenizers include trailing space)
-        if token_str != token_str.rstrip():
-            return True
-        return False
     def _extract_uci_move(self, text: str) -> Optional[str]:
         """
-        Extract a UCI move from generated text using pattern matching.
-        This generic method works with any tokenization format by finding
-        chess square patterns ([a-h][1-8]) in the output.
-        Supported formats include:
-        - Standard: "WPe2e4" -> "e2e4"
-        - Decomposed: "WP e2_f e4_t" -> "e2e4"
-        - Pure UCI: "e2e4" -> "e2e4"
-        - With separators: "e2-e4", "e2 e4" -> "e2e4"
-        - With promotion: "e7e8=Q", "e7e8q" -> "e7e8q"
-        Args:
-            text: The generated text containing a move.
-        Returns:
-            UCI move string (e.g., "e2e4", "e7e8q") or None if not found.
         """
-        if not text:
-            return None
-        # Find all squares in the text
-        squares = re.findall(self.SQUARE_PATTERN, text)
         if len(squares) < 2:
             return None
-        # Take the first two squares as from and to
         from_sq, to_sq = squares[0], squares[1]
         uci_move = from_sq + to_sq
-        # Check for promotion (letter after to_square)
-        # Look for patterns like "=Q", "=q", or just "q" after the to_square
-        to_sq_idx = text.find(to_sq)
-        if to_sq_idx != -1:
-            remaining = text[to_sq_idx + 2:to_sq_idx + 5]  # Check next few chars
             promo_match = re.search(r'[=]?([qrbnQRBN])', remaining)
             if promo_match:
                 uci_move += promo_match.group(1).lower()
         return uci_move
-    def _has_complete_move(self, text: str) -> bool:
-        """
-        Check if the generated text contains a complete move.
-        A complete move has at least two valid chess squares.
-        Args:
-            text: The generated text so far.
-        Returns:
-            True if text contains at least two squares.
-        """
-        squares = re.findall(self.SQUARE_PATTERN, text)
-        return len(squares) >= 2
-    def _generate_move_tokens(
-        self,
         input_ids: torch.Tensor,
-        temperature: float = 0.7,
-        top_k: int = 10,
-        max_tokens: int = 20,
     ) -> str:
         """
-        Generate tokens until a complete move is detected or separator is hit.
-        This method is tokenizer-agnostic and stops when:
-        - A separator token (whitespace/EOS) is encountered
-        - Two chess squares have been generated (complete move)
-        - max_tokens limit is reached
         Args:
-            input_ids: The input token IDs.
-            temperature: Sampling temperature.
-            top_k: Top-k filtering parameter.
-            max_tokens: Maximum tokens to generate for a single move.
-        Returns:
-            The generated move string.
         """
         generated_tokens = []
         current_ids = input_ids.clone()
-        accumulated_text = ""
-        for _ in range(max_tokens):
-            with torch.no_grad():
                 outputs = self.model(input_ids=current_ids)
-                logits = outputs.logits[:, -1, :] / temperature
-                # Apply top-k filtering
-                if top_k > 0:
-                    top_k_vals = torch.topk(logits, min(top_k, logits.size(-1)))
-                    indices_to_remove = logits < top_k_vals[0][..., -1, None]
-                    logits[indices_to_remove] = float("-inf")
-                # Sample
-                probs = torch.softmax(logits, dim=-1)
-                next_token = torch.multinomial(probs, num_samples=1)
-            # Decode the token
-            token_str = self.tokenizer.decode(next_token[0])
-            # Check if this is a separator token
-            if self._is_separator_token(token_str):
-                # If we already have a complete move, stop
-                if self._has_complete_move(accumulated_text):
-                    break
-                # Otherwise, if it's EOS, we should also stop
-                if hasattr(self.tokenizer, 'eos_token'):
-                    if token_str == self.tokenizer.eos_token:
-                        break
-                # For whitespace separators, only stop if we have content
-                if accumulated_text:
                     break
-            generated_tokens.append(next_token[0])
-            current_ids = torch.cat([current_ids, next_token], dim=-1)
-            accumulated_text += token_str
-            # Stop if we have a complete move (two squares found)
-            if self._has_complete_move(accumulated_text):
-                # Check if this might be a promotion - peek for one more token
-                # if the move is to rank 1 or 8
-                squares = re.findall(self.SQUARE_PATTERN, accumulated_text)
-                if len(squares) >= 2:
-                    to_sq = squares[1]
-                    if to_sq[1] in '18':  # Potential promotion
-                        # Allow one more iteration to capture promotion piece
-                        if len(generated_tokens) > 3:  # Already have enough
-                            break
-                    else:
-                        break
-        # Decode all generated tokens together
         if generated_tokens:
-            all_tokens = torch.cat(generated_tokens, dim=0)
-            move_str = self.tokenizer.decode(all_tokens, skip_special_tokens=True)
-            return move_str.strip()
         return ""
-    def _get_model_move(
         self,
-        board,
-        temperature: float = 0.7,
-        top_k: int = 10,
-    ) -> Tuple[Optional[str], int]:
         """
-        Get the model's next move prediction.
-        This method is tokenizer-agnostic. It generates tokens and extracts
-        UCI moves using pattern matching on chess squares.
-        Works with any tokenization format:
-        - Move-level: "WPe2e4" -> e2e4
-        - Decomposed: "WP e2_f e4_t" -> e2e4
-        - Pure UCI: "e2e4" -> e2e4
-        - Character-level: "e" "2" "e" "4" -> e2e4
-        - BPE/subword: "e2" "e4" -> e2e4
         Returns:
-            Tuple of (UCI move string, number of retries used).
         """
-        self.model.eval()
-        # Convert board to input format
-        moves_str = self._convert_board_to_moves(board)
-        # Add BOS token if no moves yet
-        if not moves_str:
-            input_text = self.tokenizer.bos_token
         else:
-            input_text = self.tokenizer.bos_token + " " + moves_str
-        # Tokenize
         inputs = self.tokenizer(
             input_text,
             return_tensors="pt",
             truncation=True,
-            max_length=self.model.config.n_ctx - 10,
         ).to(self.device)
         # Try to generate a legal move
-        for retry in range(self.max_retries):
-            # Generate tokens until we have a move
-            move_text = self._generate_move_tokens(
-                inputs["input_ids"],
-                temperature=temperature,
-                top_k=top_k,
-            )
-            # Extract UCI move using generic pattern matching
             uci_move = self._extract_uci_move(move_text)
-            if uci_move:
-                try:
-                    move = self.chess.Move.from_uci(uci_move)
-                    if move in board.legal_moves:
-                        return uci_move, retry
-                except (ValueError, self.chess.InvalidMoveError):
-                    pass
-        return None, self.max_retries
-    def _get_stockfish_move(self, board, time_limit: float = 0.1) -> str:
-        """Get Stockfish's move."""
-        if self.engine is None:
-            raise RuntimeError("Stockfish engine not initialized")
-        result = self.engine.play(board, self.chess.engine.Limit(time=time_limit))
-        return result.move.uci()
-    def play_game(
-        self,
-        model_color: str = "white",
-        max_moves: int = 200,
-        temperature: float = 0.7,
-    ) -> GameResult:
         """
-        Play a single game between the model and Stockfish.
-        Args:
-            model_color: "white" or "black".
-            max_moves: Maximum number of moves before draw.
-            temperature: Sampling temperature for model.
-        Returns:
-            GameResult with the game details.
         """
-        board = self.chess.Board()
-        moves = []
-        illegal_move_count = 0
-        model_is_white = model_color == "white"
-        while not board.is_game_over() and len(moves) < max_moves:
-            is_model_turn = (board.turn == self.chess.WHITE) == model_is_white
-            if is_model_turn:
-                # Model's turn
-                uci_move, retries = self._get_model_move(board, temperature)
-                illegal_move_count += retries
-                if uci_move is None:
-                    # Model couldn't find a legal move
-                    return GameResult(
-                        moves=moves,
-                        result="0-1" if model_is_white else "1-0",
-                        model_color=model_color,
-                        termination="illegal_move",
-                        illegal_move_count=illegal_move_count + 1,
-                    )
-                move = self.chess.Move.from_uci(uci_move)
-            else:
-                # Stockfish's turn
-                if self.engine:
-                    uci_move = self._get_stockfish_move(board)
-                    move = self.chess.Move.from_uci(uci_move)
-                else:
-                    # Random move if no engine
-                    move = random.choice(list(board.legal_moves))
-            board.push(move)
-            moves.append(move.uci())
-        # Determine result
-        if board.is_checkmate():
-            if board.turn == self.chess.WHITE:
-                result = "0-1"  # Black wins
-            else:
-                result = "1-0"  # White wins
-            termination = "checkmate"
-        elif board.is_stalemate():
-            result = "1/2-1/2"
-            termination = "stalemate"
-        elif board.is_insufficient_material():
-            result = "1/2-1/2"
-            termination = "insufficient_material"
-        elif board.can_claim_draw():
-            result = "1/2-1/2"
-            termination = "draw_claim"
-        elif len(moves) >= max_moves:
-            result = "1/2-1/2"
-            termination = "max_moves"
         else:
-            result = "1/2-1/2"
-            termination = "unknown"
-        return GameResult(
-            moves=moves,
-            result=result,
-            model_color=model_color,
-            termination=termination,
-            illegal_move_count=illegal_move_count,
         )
-    def evaluate_legal_moves(
-        self,
-        n_positions: int = 1000,
-        temperature: float = 0.7,
-        verbose: bool = True,
-        seed: int = 42,
-    ) -> dict:
-        """
-        Evaluate the model's ability to generate legal moves.
-        This evaluation only checks if the model generates legal moves,
-        without playing full games. Useful as a first-pass evaluation.
-        Args:
-            n_positions: Number of positions to test.
-            temperature: Sampling temperature.
-            verbose: Whether to print progress.
-            seed: Random seed for reproducibility.
-        Returns:
-            Dictionary with legal move statistics.
         """
-        # Set random seed for reproducibility
-        random.seed(seed)
-        torch.manual_seed(seed)
-        results = {
-            "total_positions": 0,
-            "legal_first_try": 0,
-            "legal_with_retry": 0,
-            "illegal_all_retries": 0,
-            "positions": [],
-        }
-        # Generate random positions by playing random moves
-        for i in range(n_positions):
-            board = self.chess.Board()
-            # Play random number of moves (5-40) to get varied positions
-            n_random_moves = random.randint(5, 40)
-            for _ in range(n_random_moves):
-                if board.is_game_over():
-                    break
-                move = random.choice(list(board.legal_moves))
-                board.push(move)
-            if board.is_game_over():
-                continue  # Skip terminal positions
-            results["total_positions"] += 1
-            # Test model's move generation
-            uci_move, retries = self._get_model_move(board, temperature)
-            position_result = {
-                "fen": board.fen(),
-                "move_number": len(board.move_stack),
-                "legal": uci_move is not None,
-                "retries": retries,
-            }
-            results["positions"].append(position_result)
-            if uci_move is not None:
-                if retries == 0:
-                    results["legal_first_try"] += 1
                 else:
-                    results["legal_with_retry"] += 1
-            else:
-                results["illegal_all_retries"] += 1
-            if verbose and (i + 1) % 100 == 0:
-                legal_rate = (results["legal_first_try"] + results["legal_with_retry"]) / results["total_positions"]
-                print(f"  Positions: {i + 1}/{n_positions} | Legal rate: {legal_rate:.1%}")
-        # Calculate statistics
-        total = results["total_positions"]
-        if total > 0:
-            results["legal_rate_first_try"] = results["legal_first_try"] / total
-            results["legal_rate_with_retry"] = (results["legal_first_try"] + results["legal_with_retry"]) / total
-            results["illegal_rate"] = results["illegal_all_retries"] / total
-        else:
-            results["legal_rate_first_try"] = 0
-            results["legal_rate_with_retry"] = 0
-            results["illegal_rate"] = 1
-        return results
-    def evaluate(
         self,
-        n_games: int = 100,
-        temperature: float = 0.7,
-        verbose: bool = True,
-    ) -> dict:
-        """
-        Run a full win-rate evaluation of the model against Stockfish.
-        Args:
-            n_games: Number of games to play.
-            temperature: Sampling temperature.
-            verbose: Whether to print progress.
         Returns:
-            Dictionary with evaluation metrics.
         """
-        results = {
-            "wins": 0,
-            "losses": 0,
-            "draws": 0,
-            "illegal_moves": 0,
-            "total_moves": 0,
-            "games": [],
-        }
-        for i in range(n_games):
-            # Alternate colors
-            model_color = "white" if i % 2 == 0 else "black"
-            game = self.play_game(
-                model_color=model_color,
-                temperature=temperature,
             )
-            results["games"].append(game)
-            results["total_moves"] += len(game.moves)
-            results["illegal_moves"] += game.illegal_move_count
-            # Count result
-            if game.result == "1/2-1/2":
-                results["draws"] += 1
-            elif (game.result == "1-0" and model_color == "white") or \
-                 (game.result == "0-1" and model_color == "black"):
-                results["wins"] += 1
-            else:
-                results["losses"] += 1
-            if verbose and (i + 1) % 10 == 0:
-                print(f"  Games: {i + 1}/{n_games} | "
-                      f"W: {results['wins']} L: {results['losses']} D: {results['draws']}")
-        # Calculate statistics
-        total = results["wins"] + results["losses"] + results["draws"]
-        results["win_rate"] = results["wins"] / total if total > 0 else 0
-        results["draw_rate"] = results["draws"] / total if total > 0 else 0
-        results["loss_rate"] = results["losses"] / total if total > 0 else 0
-        total_attempts = results["total_moves"] + results["illegal_moves"]
-        # Average length counts both legal moves and illegal attempts so early illegal terminations
-        # don't show as near-zero length games.
-        results["avg_game_length"] = total_attempts / total if total > 0 else 0
-        # Illegal move rate: illegal attempts over total attempts
-        results["illegal_move_rate"] = results["illegal_moves"] / total_attempts if total_attempts > 0 else 0
-        # Estimate ELO (simplified)
-        # Stockfish Level 1 is approximately 1350 ELO
-        stockfish_elo = 1350
-        if results["win_rate"] > 0 or results["loss_rate"] > 0:
-            score = results["wins"] + 0.5 * results["draws"]
-            expected = total * 0.5  # Expected score against equal opponent
-            # Simple ELO estimation
-            if score > 0:
-                win_ratio = score / total
-                if win_ratio > 0 and win_ratio < 1:
-                    elo_diff = -400 * (1 - 2 * win_ratio) / (1 if win_ratio > 0.5 else -1)
-                    results["estimated_elo"] = stockfish_elo + elo_diff
-                else:
-                    results["estimated_elo"] = stockfish_elo + (400 if win_ratio >= 1 else -400)
-            else:
-                results["estimated_elo"] = stockfish_elo - 400
-        else:
-            results["estimated_elo"] = None
-        return results
-def load_model_from_hub(model_id: str, device: str = "auto", verbose: bool = True):
     """
-    Load a model from the Hugging Face Hub.
     Args:
-        model_id: Model ID on Hugging Face Hub.
-        device: Device to load the model on.
-        verbose: Whether to print debug info about loaded tokenizer.
-    Returns:
-        Tuple of (model, tokenizer).
     """
-    from transformers import AutoModelForCausalLM, AutoTokenizer
-    # Import to register custom classes
-    from src.model import ChessConfig, ChessForCausalLM
-    from src.tokenizer import ChessTokenizer
-    # Try AutoTokenizer with trust_remote_code first to load custom tokenizer.py from Hub
-    # Fall back to local ChessTokenizer if the model doesn't have a custom tokenizer
-    tokenizer_source = None
     try:
-        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-        tokenizer_source = "AutoTokenizer (from Hub with trust_remote_code=True)"
     except Exception as e:
-        if verbose:
-            print(f"   AutoTokenizer failed: {e}")
-        tokenizer = ChessTokenizer.from_pretrained(model_id)
-        tokenizer_source = "ChessTokenizer (local class, vocab from Hub)"
-    model = AutoModelForCausalLM.from_pretrained(
-        model_id,
-        trust_remote_code=True,
-        device_map=device,
-    )
-    # Print debug info
-    if verbose:
-        print(f"   Tokenizer loaded via: {tokenizer_source}")
-        print(f"   Tokenizer class: {type(tokenizer).__name__}")
-        print(f"   Tokenizer vocab size: {tokenizer.vocab_size}")
-        # Check if tokenizer has custom attributes that might differ
-        if hasattr(tokenizer, '_vocab'):
-            print(f"   Tokenizer has _vocab attribute: yes ({len(tokenizer._vocab)} entries)")
-    return model, tokenizer
 def main():
     """Main evaluation function."""
-    parser = argparse.ArgumentParser(description="Evaluate a chess model")
     parser.add_argument(
         "--model_path", type=str, required=True,
-        help="Path to the model or Hugging Face model ID"
-    )
-    parser.add_argument(
-        "--mode", type=str, default="legal", choices=["legal", "winrate", "both"],
-        help="Evaluation mode: 'legal' for legal move rate, 'winrate' for games, 'both' for both"
-    )
-    parser.add_argument(
-        "--stockfish_path", type=str, default=None,
-        help="Path to Stockfish executable"
-    )
-    parser.add_argument(
-        "--stockfish_level", type=int, default=1,
-        help="Stockfish skill level (0-20)"
     )
     parser.add_argument(
-        "--n_positions", type=int, default=500,
-        help="Number of positions for legal move evaluation"
     )
     parser.add_argument(
-        "--seed", type=int, default=42,
-        help="Random seed for reproducibility"
     )
     parser.add_argument(
-        "--n_games", type=int, default=100,
-        help="Number of games to play for win rate evaluation"
-    )
-    parser.add_argument(
-        "--temperature", type=float, default=0.7,
-        help="Sampling temperature"
     )
     args = parser.parse_args()
@@ -840,95 +1004,76 @@ def main():
     print("=" * 60)
     print("CHESS CHALLENGE - EVALUATION")
     print("=" * 60)
-    # Load model
-    print(f"\nLoading model from: {args.model_path}")
-    import os
-    is_local_path = os.path.exists(args.model_path)
-    if is_local_path:
-        # Local path
-        from transformers import AutoModelForCausalLM
-        from src.tokenizer import ChessTokenizer
-        from src.model import ChessConfig, ChessForCausalLM
-        tokenizer = ChessTokenizer.from_pretrained(args.model_path)
-        model = AutoModelForCausalLM.from_pretrained(
-            args.model_path,
-            device_map="auto",
-        )
-    else:
-        # Assume Hugging Face model ID (or invalid path)
-        if args.model_path.startswith(".") or args.model_path.startswith("/"):
-            raise FileNotFoundError(
-                f"Local model path not found: {args.model_path}\n"
-                f"Please check that the path exists and contains model files."
-            )
-        model, tokenizer = load_model_from_hub(args.model_path)
     # Create evaluator
-    print(f"\nSetting up evaluator...")
     evaluator = ChessEvaluator(
         model=model,
         tokenizer=tokenizer,
-        stockfish_path=args.stockfish_path,
-        stockfish_level=args.stockfish_level,
     )
-    # Run legal move evaluation
-    if args.mode in ["legal", "both"]:
-        print(f"\n" + "=" * 60)
-        print("PHASE 1: LEGAL MOVE EVALUATION")
-        print("=" * 60)
-        print(f"Testing {args.n_positions} random positions...")
-        legal_results = evaluator.evaluate_legal_moves(
-            n_positions=args.n_positions,
-            temperature=args.temperature,
-            verbose=True,
-            seed=args.seed,
-        )
-        print("\n" + "-" * 40)
-        print("LEGAL MOVE RESULTS")
-        print("-" * 40)
-        print(f"  Positions tested:     {legal_results['total_positions']}")
-        print(f"  Legal (1st try):      {legal_results['legal_first_try']} ({legal_results['legal_rate_first_try']:.1%})")
-        print(f"  Legal (with retry):   {legal_results['legal_first_try'] + legal_results['legal_with_retry']} ({legal_results['legal_rate_with_retry']:.1%})")
-        print(f"  Always illegal:       {legal_results['illegal_all_retries']} ({legal_results['illegal_rate']:.1%})")
-    # Run win rate evaluation
-    if args.mode in ["winrate", "both"]:
-        print(f"\n" + "=" * 60)
-        print("PHASE 2: WIN RATE EVALUATION")
-        print("=" * 60)
-        print(f"Playing {args.n_games} games against Stockfish (Level {args.stockfish_level})...")
-        winrate_results = evaluator.evaluate(
-            n_games=args.n_games,
-            temperature=args.temperature,
-            verbose=True,
-        )
-        print("\n" + "-" * 40)
-        print("WIN RATE RESULTS")
-        print("-" * 40)
-        print(f"  Wins:   {winrate_results['wins']}")
-        print(f"  Losses: {winrate_results['losses']}")
-        print(f"  Draws:  {winrate_results['draws']}")
-        print(f"\n  Win Rate:  {winrate_results['win_rate']:.1%}")
-        print(f"  Draw Rate: {winrate_results['draw_rate']:.1%}")
-        print(f"  Loss Rate: {winrate_results['loss_rate']:.1%}")
-        print(f"\n  Avg Game Length: {winrate_results['avg_game_length']:.1f} moves")
-        print(f"  Illegal Move Rate: {winrate_results['illegal_move_rate']:.2%}")
-        if winrate_results["estimated_elo"]:
-            print(f"\n  Estimated ELO: {winrate_results['estimated_elo']:.0f}")
-    print("\n" + "=" * 60)
     print("EVALUATION COMPLETE")
     print("=" * 60)
 if __name__ == "__main__":

 """
 Evaluation script for the Chess Challenge.
+This script evaluates a trained chess model by:
+1. Checking if the model has < 1M parameters
+2. Verifying no illegal use of python-chess for move filtering
+3. Playing games against a deterministic engine (500 total moves, restarting after 25 moves)
+4. Tracking legal move rates (first try and with retries)
+The evaluation is deterministic (greedy decoding, seeded random).
 """
 from __future__ import annotations
 import argparse
+import ast
+import os
 import random
 import re
+import warnings
+from dataclasses import dataclass, field
+from pathlib import Path
 from typing import List, Optional, Tuple
 import torch
+# Suppress HuggingFace warning about empty module names (harmless)
+# This warning comes from transformers' dynamic_module_utils when loading custom code
+import transformers.utils.logging as hf_logging
+hf_logging.set_verbosity_error()
+# =============================================================================
+# Data Classes
+# =============================================================================
 @dataclass
+class EvaluationResult:
+    """Complete result of an evaluation run."""
+    model_id: str
+    n_parameters: int
+    passed_param_check: bool
+    passed_pychess_check: bool
+    total_moves: int
+    legal_moves_first_try: int
+    legal_moves_with_retry: int
+    games_played: int
+    moves_per_game: List[int] = field(default_factory=list)
+    error_message: Optional[str] = None
+    @property
+    def legal_rate_first_try(self) -> float:
+        return self.legal_moves_first_try / self.total_moves if self.total_moves > 0 else 0.0
+    @property
+    def legal_rate_with_retry(self) -> float:
+        return self.legal_moves_with_retry / self.total_moves if self.total_moves > 0 else 0.0
+    def to_dict(self) -> dict:
+        return {
+            "model_id": self.model_id,
+            "n_parameters": self.n_parameters,
+            "passed_param_check": self.passed_param_check,
+            "passed_pychess_check": self.passed_pychess_check,
+            "total_moves": self.total_moves,
+            "legal_moves_first_try": self.legal_moves_first_try,
+            "legal_moves_with_retry": self.legal_moves_with_retry,
+            "legal_rate_first_try": self.legal_rate_first_try,
+            "legal_rate_with_retry": self.legal_rate_with_retry,
+            "games_played": self.games_played,
+            "moves_per_game": self.moves_per_game,
+            "error_message": self.error_message,
+        }
+    def summary(self) -> str:
+        """Generate a human-readable summary for the model page discussion."""
+        lines = [
+            "## Evaluation Results",
+            "",
+            f"**Model**: `{self.model_id}`",
+            f"**Parameters**: {self.n_parameters:,} {'[PASS]' if self.passed_param_check else '[FAIL] (exceeds 1M limit)'}",
+            f"**Chess library check**: {'[PASS]' if self.passed_pychess_check else '[FAIL] (illegal use of python-chess)'}",
+            "",
+        ]
+        if not self.passed_param_check:
+            lines.append("**Evaluation not performed**: Model exceeds 1M parameter limit.")
+            return "\n".join(lines)
+        if not self.passed_pychess_check:
+            lines.append("**Evaluation not performed**: Model illegally uses python-chess for move filtering.")
+            return "\n".join(lines)
+        if self.error_message:
+            lines.append(f"**Evaluation error**: {self.error_message}")
+            return "\n".join(lines)
+        lines.extend([
+            "### Performance",
+            "",
+            "| Metric | Value |",
+            "|--------|-------|",
+            f"| Total moves played | {self.total_moves} |",
+            f"| Games played | {self.games_played} |",
+            f"| Legal moves (first try) | {self.legal_moves_first_try} ({self.legal_rate_first_try*100:.1f}%) |",
+            f"| Legal moves (with retries) | {self.legal_moves_with_retry} ({self.legal_rate_with_retry*100:.1f}%) |",
+            "",
+            "### Interpretation",
+            "",
+            "- **>90% legal rate**: Excellent! Model has learned chess rules well.",
+            "- **70-90% legal rate**: Good, but room for improvement.",
+            "- **<70% legal rate**: Model struggles with legal move generation.",
+        ])
+        return "\n".join(lines)
+# =============================================================================
+# Security Checks
+# =============================================================================
+def count_parameters(model) -> int:
+    """Count the total number of parameters in a model."""
+    return sum(p.numel() for p in model.parameters())
+def check_pychess_usage(model_path: str) -> Tuple[bool, Optional[str]]:
     """
+    Check if the model code illegally uses python-chess for move filtering.
+    Scans Python files in the model directory for patterns that suggest
+    using chess.Board.legal_moves or similar to filter model outputs.
+    Args:
+        model_path: Path to the model directory.
+    Returns:
+        Tuple of (passed_check, error_message).
+        passed_check is True if no illegal usage detected.
+    """
+    forbidden_patterns = [
+        r'\.legal_moves',
+        r'board\.is_legal\s*\(',
+        r'move\s+in\s+.*legal',
+        r'filter.*legal',
+        r'legal.*filter',
+    ]
+    model_dir = Path(model_path)
+    if not model_dir.is_dir():
+        # If it's a HuggingFace model ID, we can't check local files
+        # We'll check the downloaded files after loading
+        return True, None
+    python_files = list(model_dir.glob("*.py"))
+    for py_file in python_files:
+        try:
+            content = py_file.read_text()
+            # Skip if it's just the standard model.py or tokenizer.py from the template
+            if py_file.name in ["model.py", "tokenizer.py"]:
+                # Check if it contains suspicious patterns in generate/forward methods
+                tree = ast.parse(content)
+                for node in ast.walk(tree):
+                    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
+                        if node.name in ["forward", "generate", "__call__", "get_move"]:
+                            func_code = ast.get_source_segment(content, node)
+                            if func_code:
+                                for pattern in forbidden_patterns:
+                                    if re.search(pattern, func_code, re.IGNORECASE):
+                                        return False, f"Illegal chess library usage in {py_file.name}:{node.name}"
             else:
+                # For other files, check all content
+                for pattern in forbidden_patterns:
+                    if re.search(pattern, content, re.IGNORECASE):
+                        return False, f"Illegal chess library usage detected in {py_file.name}"
+        except Exception as e:
+            # If we can't parse the file, skip it
+            continue
+    return True, None
+# =============================================================================
+# Model Loading
+# =============================================================================
+REQUIRED_MODEL_FILES = [
+    "config.json",           # Model configuration
+    "model.safetensors",     # Model weights (or pytorch_model.bin)
+]
+REQUIRED_TOKENIZER_FILES = [
+    "tokenizer_config.json", # Tokenizer configuration
+    "vocab.json",            # Vocabulary file
+]
+def validate_model_files(model_path: str) -> Tuple[bool, List[str]]:
+    """
+    Validate that a model directory contains all required files.
+    For local paths, checks that the model contains:
+    - Model architecture (config.json + weights)
+    - Tokenizer (tokenizer_config.json + vocab.json)
+    For HuggingFace Hub models, this is handled by the Hub.
+    Args:
+        model_path: Local path or HuggingFace model ID.
+    Returns:
+        Tuple of (is_valid, list of missing files).
+    """
+    is_local = os.path.exists(model_path)
+    if not is_local:
+        # HuggingFace Hub - validation happens during download
+        return True, []
+    model_dir = Path(model_path)
+    missing_files = []
+    # Check model files
+    has_safetensors = (model_dir / "model.safetensors").exists()
+    has_pytorch = (model_dir / "pytorch_model.bin").exists()
+    if not (has_safetensors or has_pytorch):
+        missing_files.append("model.safetensors (or pytorch_model.bin)")
+    if not (model_dir / "config.json").exists():
+        missing_files.append("config.json")
+    # Check tokenizer files
+    for fname in REQUIRED_TOKENIZER_FILES:
+        if not (model_dir / fname).exists():
+            missing_files.append(fname)
+    return len(missing_files) == 0, missing_files
+def load_model_and_tokenizer(
+    model_path: str,
+    device: str = "auto",
+    verbose: bool = True,
+) -> Tuple[any, any, str]:
+    """
+    Load a model and tokenizer from a local path or HuggingFace Hub.
+    The model must contain all necessary files:
+    - config.json: Model configuration
+    - model.safetensors (or pytorch_model.bin): Model weights
+    - tokenizer_config.json: Tokenizer configuration
+    - vocab.json: Vocabulary file
+    Models must use trust_remote_code=True to load custom architectures.
+    Args:
+        model_path: Local path or HuggingFace model ID.
+        device: Device to load the model on.
+        verbose: Whether to print debug info.
+    Returns:
+        Tuple of (model, tokenizer, source_description).
+    Raises:
+        FileNotFoundError: If required model files are missing.
+        RuntimeError: If model or tokenizer cannot be loaded.
+    """
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    is_local = os.path.exists(model_path)
+    # Validate model files for local paths
+    is_valid, missing_files = validate_model_files(model_path)
+    if not is_valid:
+        raise FileNotFoundError(
+            f"Model is missing required files: {', '.join(missing_files)}\\n"
+            f"Your model must contain:\\n"
+            f"  - config.json (model configuration)\\n"
+            f"  - model.safetensors or pytorch_model.bin (model weights)\\n"
+            f"  - tokenizer_config.json (tokenizer configuration)\\n"
+            f"  - vocab.json (vocabulary)\\n"
+            f"See example_solution/ for a reference."
+        )
+    if verbose:
+        source = "local path" if is_local else "HuggingFace Hub"
+        print(f"Loading model from {source}: {model_path}")
+    # Try to load tokenizer
+    tokenizer = None
+    load_kwargs = {"trust_remote_code": True}
+    if is_local:
+        load_kwargs["local_files_only"] = True
+    try:
+        tokenizer = AutoTokenizer.from_pretrained(model_path, **load_kwargs)
+    except Exception as e:
+        raise RuntimeError(
+            f"Failed to load tokenizer from {model_path}: {e}\\n"
+            f"Make sure your model includes tokenizer files and custom tokenizer class."
+        )
+    # Load model
+    try:
+        model = AutoModelForCausalLM.from_pretrained(
+            model_path,
+            trust_remote_code=True,
+            device_map=device,
+            local_files_only=is_local,
+        )
+    except Exception as e:
+        raise RuntimeError(
+            f"Failed to load model from {model_path}: {e}\\n"
+            f"Make sure your model includes config.json with auto_map and model weights."
+        )
+    if verbose:
+        print(f"  Tokenizer: {type(tokenizer).__name__} (vocab_size={tokenizer.vocab_size})")
+        print(f"  Model: {type(model).__name__}")
+        print(f"  Parameters: {count_parameters(model):,}")
+    return model, tokenizer, model_path
+# =============================================================================
+# Move Generation
+# =============================================================================
+class MoveGenerator:
+    """
+    Generates moves from a chess model using greedy decoding.
+    The generation process:
+    1. Tokenize the current game history
+    2. Generate tokens greedily until whitespace is produced
+    3. Extract UCI move from generated text
+    4. Retry up to max_retries times if move is illegal
+    """
+    SQUARE_PATTERN = re.compile(r'[a-h][1-8]')
+    def __init__(
+        self,
+        model,
+        tokenizer,
+        device: str = "cuda" if torch.cuda.is_available() else "cpu",
+        max_retries: int = 3,
+        max_tokens_per_move: int = 20,
+    ):
+        self.model = model
+        self.tokenizer = tokenizer
+        self.device = device
+        self.max_retries = max_retries
+        self.max_tokens_per_move = max_tokens_per_move
+        # Move model to device and set to eval mode
+        if hasattr(model, 'to'):
+            self.model = model.to(device)
+        self.model.eval()
+    def _is_whitespace_token(self, token_str: str) -> bool:
+        """Check if token represents whitespace (separator between moves)."""
+        if not token_str:
+            return False
+        # Check for EOS
         if hasattr(self.tokenizer, 'eos_token') and token_str == self.tokenizer.eos_token:
             return True
+        # Check for whitespace
+        return token_str.strip() == "" and len(token_str) > 0
     def _extract_uci_move(self, text: str) -> Optional[str]:
         """
+        Extract a UCI move from generated text.
+        Looks for two consecutive chess squares (e.g., e2e4).
+        Handles promotion by looking for q/r/b/n after the destination.
         """
+        squares = self.SQUARE_PATTERN.findall(text)
         if len(squares) < 2:
             return None
         from_sq, to_sq = squares[0], squares[1]
         uci_move = from_sq + to_sq
+        # Check for promotion piece
+        to_idx = text.find(to_sq)
+        if to_idx != -1:
+            remaining = text[to_idx + 2:to_idx + 5]
             promo_match = re.search(r'[=]?([qrbnQRBN])', remaining)
             if promo_match:
                 uci_move += promo_match.group(1).lower()
         return uci_move
+    def _generate_until_whitespace(
+        self,
         input_ids: torch.Tensor,
+        temperature: float = 0.0,
     ) -> str:
         """
+        Generate tokens until whitespace is encountered.
         Args:
+            input_ids: Input token IDs.
+            temperature: Sampling temperature. 0.0 = greedy (argmax).
+        Uses greedy decoding (argmax) when temperature=0 for determinism.
+        Uses sampling when temperature>0 for retries.
         """
         generated_tokens = []
         current_ids = input_ids.clone()
+        with torch.no_grad():
+            for _ in range(self.max_tokens_per_move):
                 outputs = self.model(input_ids=current_ids)
+                logits = outputs.logits[:, -1, :]
+                if temperature == 0.0:
+                    # Greedy decoding: take argmax
+                    next_token = logits.argmax(dim=-1, keepdim=True)
+                else:
+                    # Sampling with temperature
+                    probs = torch.softmax(logits / temperature, dim=-1)
+                    next_token = torch.multinomial(probs, num_samples=1)
+                # Decode token
+                token_str = self.tokenizer.decode(next_token[0])
+                # Check for whitespace/separator
+                if self._is_whitespace_token(token_str):
                     break
+                generated_tokens.append(next_token)
+                current_ids = torch.cat([current_ids, next_token], dim=-1)
         if generated_tokens:
+            all_tokens = torch.cat(generated_tokens, dim=1)
+            return self.tokenizer.decode(all_tokens[0], skip_special_tokens=True)
         return ""
+    def get_move(
         self,
+        game_history: str,
+        legal_moves: set,
+    ) -> Tuple[Optional[str], bool]:
         """
+        Generate a move for the current position.
+        First attempt uses greedy decoding (deterministic).
+        Retries use sampling with temperature (seeded for reproducibility).
+        Args:
+            game_history: Space-separated move history in model's format.
+            legal_moves: Set of legal UCI moves for validation.
         Returns:
+            Tuple of (uci_move, was_first_try).
+            uci_move is None if all retries failed.
         """
+        # Prepare input
+        if game_history:
+            input_text = self.tokenizer.bos_token + " " + game_history
         else:
+            input_text = self.tokenizer.bos_token
+        # Get max context length
+        max_length = getattr(self.model.config, 'n_ctx', 512)
         inputs = self.tokenizer(
             input_text,
             return_tensors="pt",
             truncation=True,
+            max_length=max_length - self.max_tokens_per_move,
         ).to(self.device)
         # Try to generate a legal move
+        for attempt in range(self.max_retries):
+            # First attempt: greedy (temperature=0)
+            # Retries: sampling with increasing temperature
+            temperature = 0.0 if attempt == 0 else 0.5 + 0.25 * attempt
+            move_text = self._generate_until_whitespace(inputs["input_ids"], temperature)
             uci_move = self._extract_uci_move(move_text)
+            if uci_move and uci_move in legal_moves:
+                return uci_move, (attempt == 0)
+        return None, False
+# =============================================================================
+# Chess Game Handler (with built-in deterministic engine)
+# =============================================================================
+# Piece values for simple evaluation
+PIECE_VALUES = {
+    'P': 100, 'N': 320, 'B': 330, 'R': 500, 'Q': 900, 'K': 20000,
+    'p': -100, 'n': -320, 'b': -330, 'r': -500, 'q': -900, 'k': -20000,
+}
+# Piece-square tables for positional evaluation (simplified)
+PAWN_TABLE = [
+    0,  0,  0,  0,  0,  0,  0,  0,
+    50, 50, 50, 50, 50, 50, 50, 50,
+    10, 10, 20, 30, 30, 20, 10, 10,
+    5,  5, 10, 25, 25, 10,  5,  5,
+    0,  0,  0, 20, 20,  0,  0,  0,
+    5, -5,-10,  0,  0,-10, -5,  5,
+    5, 10, 10,-20,-20, 10, 10,  5,
+    0,  0,  0,  0,  0,  0,  0,  0,
+]
+class SimpleEngine:
+    """
+    A simple deterministic chess engine using minimax with alpha-beta pruning.
+    This replaces Stockfish to ensure fully deterministic evaluation.
+    The engine is intentionally weak (shallow search) to be beatable.
+    """
+    def __init__(self, depth: int = 2):
+        self.depth = depth
+    def evaluate_board(self, board) -> int:
         """
+        Evaluate the board position.
+        Returns a score from white's perspective.
+        Positive = white advantage, Negative = black advantage.
+        """
+        if board.is_checkmate():
+            return -30000 if board.turn else 30000
+        if board.is_stalemate() or board.is_insufficient_material():
+            return 0
+        score = 0
+        # Material counting
+        for square in range(64):
+            piece = board.piece_at(square)
+            if piece:
+                symbol = piece.symbol()
+                score += PIECE_VALUES.get(symbol, 0)
+                # Add positional bonus for pawns
+                if symbol == 'P':
+                    score += PAWN_TABLE[63 - square]  # Flip for white
+                elif symbol == 'p':
+                    score -= PAWN_TABLE[square]
+        # Small bonus for mobility
+        if board.turn:  # White to move
+            score += len(list(board.legal_moves))
+        else:
+            score -= len(list(board.legal_moves))
+        return score
+    def minimax(self, board, depth: int, alpha: int, beta: int, maximizing: bool) -> Tuple[int, Optional[any]]:
         """
+        Minimax with alpha-beta pruning.
+        Returns (score, best_move).
+        """
+        if depth == 0 or board.is_game_over():
+            return self.evaluate_board(board), None
+        # Sort moves for better pruning (captures first)
+        moves = list(board.legal_moves)
+        moves.sort(key=lambda m: (board.is_capture(m), board.gives_check(m)), reverse=True)
+        best_move = moves[0] if moves else None
+        if maximizing:
+            max_eval = -float('inf')
+            for move in moves:
+                board.push(move)
+                eval_score, _ = self.minimax(board, depth - 1, alpha, beta, False)
+                board.pop()
+                if eval_score > max_eval:
+                    max_eval = eval_score
+                    best_move = move
+                alpha = max(alpha, eval_score)
+                if beta <= alpha:
+                    break
+            return max_eval, best_move
         else:
+            min_eval = float('inf')
+            for move in moves:
+                board.push(move)
+                eval_score, _ = self.minimax(board, depth - 1, alpha, beta, True)
+                board.pop()
+                if eval_score < min_eval:
+                    min_eval = eval_score
+                    best_move = move
+                beta = min(beta, eval_score)
+                if beta <= alpha:
+                    break
+            return min_eval, best_move
+    def get_best_move(self, board) -> str:
+        """Get the best move for the current position."""
+        _, best_move = self.minimax(
+            board,
+            self.depth,
+            -float('inf'),
+            float('inf'),
+            board.turn  # True if white to move
         )
+        return best_move.uci() if best_move else None
+class ChessGameHandler:
+    """
+    Handles chess game logic using python-chess.
+    This class is used ONLY by the evaluation framework, not by the model.
+    It manages the chess board state and uses a simple built-in engine
+    for deterministic opponent moves.
+    """
+    def __init__(self, engine_depth: int = 2):
+        import chess
+        self.chess = chess
+        self.board = chess.Board()
+        self.engine = SimpleEngine(depth=engine_depth)
+    def reset(self):
+        """Reset the board to starting position."""
+        self.board = self.chess.Board()
+    def get_legal_moves_uci(self) -> set:
+        """Get set of legal moves in UCI format."""
+        return {move.uci() for move in self.board.legal_moves}
+    def make_move(self, uci_move: str) -> bool:
+        """Make a move on the board. Returns True if successful."""
+        try:
+            move = self.chess.Move.from_uci(uci_move)
+            if move in self.board.legal_moves:
+                self.board.push(move)
+                return True
+        except (ValueError, self.chess.InvalidMoveError):
+            pass
+        return False
+    def get_opponent_move(self) -> str:
+        """Get the opponent engine's move for the current position.
+        Uses the built-in SimpleEngine for deterministic moves.
+        """
+        return self.engine.get_best_move(self.board)
+    def is_game_over(self) -> bool:
+        """Check if the game is over."""
+        return self.board.is_game_over()
+    def get_turn(self) -> str:
+        """Get whose turn it is ('white' or 'black')."""
+        return "white" if self.board.turn == self.chess.WHITE else "black"
+    def get_move_history_formatted(self) -> str:
+        """
+        Get move history in the model's expected format.
+        Converts UCI moves to the format: WPe2e4, BNg8f6, etc.
         """
+        moves = []
+        temp_board = self.chess.Board()
+        for move in self.board.move_stack:
+            color = "W" if temp_board.turn == self.chess.WHITE else "B"
+            piece = temp_board.piece_at(move.from_square)
+            piece_letter = piece.symbol().upper() if piece else "P"
+            from_sq = self.chess.square_name(move.from_square)
+            to_sq = self.chess.square_name(move.to_square)
+            move_str = f"{color}{piece_letter}{from_sq}{to_sq}"
+            # Handle promotion
+            if move.promotion:
+                promo_piece = self.chess.piece_symbol(move.promotion).upper()
+                move_str += f"={promo_piece}"
+            # Handle capture
+            if temp_board.is_capture(move):
+                move_str += "(x)"
+            temp_board.push(move)
+            # Handle check/checkmate
+            if temp_board.is_checkmate():
+                move_str += "(+*)" if "(x)" not in move_str else ""
+                move_str = move_str.replace("(x)", "(x+*)")
+            elif temp_board.is_check():
+                if "(x)" in move_str:
+                    move_str = move_str.replace("(x)", "(x+)")
                 else:
+                    move_str += "(+)"
+            moves.append(move_str)
+        return " ".join(moves)
+    def close(self):
+        """Clean up resources (no-op for built-in engine)."""
+        pass
+# =============================================================================
+# Main Evaluator
+# =============================================================================
+class ChessEvaluator:
+    """
+    Main evaluator for the Chess Challenge.
+    Evaluation procedure:
+    1. Check model has < 1M parameters
+    2. Check model doesn't use python-chess illegally
+    3. Play games against deterministic engine:
+       - 500 total moves (model moves)
+       - Restart game after 25 moves
+       - Model always plays white
+    4. Track legal move rates
+    """
+    TOTAL_MOVES = 500
+    MOVES_PER_GAME = 25
+    SEED = 42
+    def __init__(
         self,
+        model,
+        tokenizer,
+        model_path: str,
+        engine_depth: int = 2,
+        max_retries: int = 3,
+        device: str = "auto",
+        total_moves: int = None,  # Override TOTAL_MOVES for testing
+        moves_per_game: int = None,  # Override MOVES_PER_GAME for testing
+    ):
+        self.model = model
+        self.tokenizer = tokenizer
+        self.model_path = model_path
+        self.max_retries = max_retries
+        # Allow overriding constants for testing
+        self.total_moves = total_moves if total_moves is not None else self.TOTAL_MOVES
+        self.moves_per_game = moves_per_game if moves_per_game is not None else self.MOVES_PER_GAME
+        # Determine device
+        if device == "auto":
+            device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.device = device
+        # Initialize move generator
+        self.move_generator = MoveGenerator(
+            model=model,
+            tokenizer=tokenizer,
+            device=device,
+            max_retries=max_retries,
+        )
+        # Initialize game handler with built-in deterministic engine
+        self.game_handler = ChessGameHandler(engine_depth=engine_depth)
+    def __del__(self):
+        if hasattr(self, 'game_handler'):
+            self.game_handler.close()
+    def evaluate(self, verbose: bool = True) -> EvaluationResult:
+        """
+        Run the complete evaluation procedure.
         Returns:
+            EvaluationResult with all metrics.
         """
+        # Set seeds for determinism
+        random.seed(self.SEED)
+        torch.manual_seed(self.SEED)
+        if torch.cuda.is_available():
+            torch.cuda.manual_seed_all(self.SEED)
+        # Count parameters
+        n_params = count_parameters(self.model)
+        passed_param_check = n_params <= 1_000_000
+        if verbose:
+            status = "[PASS]" if passed_param_check else "[FAIL]"
+            print(f"Parameter check: {n_params:,} parameters {status}")
+        if not passed_param_check:
+            return EvaluationResult(
+                model_id=self.model_path,
+                n_parameters=n_params,
+                passed_param_check=False,
+                passed_pychess_check=True,
+                total_moves=0,
+                legal_moves_first_try=0,
+                legal_moves_with_retry=0,
+                games_played=0,
+                error_message="Model exceeds 1M parameter limit",
             )
+        # Check for illegal python-chess usage
+        passed_pychess, pychess_error = check_pychess_usage(self.model_path)
+        if verbose:
+            status = "[PASS]" if passed_pychess else "[FAIL]"
+            print(f"Python-chess check: {status}")
+        if not passed_pychess:
+            return EvaluationResult(
+                model_id=self.model_path,
+                n_parameters=n_params,
+                passed_param_check=True,
+                passed_pychess_check=False,
+                total_moves=0,
+                legal_moves_first_try=0,
+                legal_moves_with_retry=0,
+                games_played=0,
+                error_message=pychess_error,
+            )
+        # Run evaluation games
+        if verbose:
+            print(f"\nPlaying games against opponent engine...")
+            print(f"  Total moves: {self.total_moves}")
+            print(f"  Moves per game: {self.moves_per_game}")
+        try:
+            result = self._play_evaluation_games(verbose=verbose)
+            result.passed_param_check = True
+            result.passed_pychess_check = True
+            result.n_parameters = n_params
+            return result
+        except Exception as e:
+            return EvaluationResult(
+                model_id=self.model_path,
+                n_parameters=n_params,
+                passed_param_check=True,
+                passed_pychess_check=True,
+                total_moves=0,
+                legal_moves_first_try=0,
+                legal_moves_with_retry=0,
+                games_played=0,
+                error_message=str(e),
+            )
+    def _play_evaluation_games(self, verbose: bool = True) -> EvaluationResult:
+        """
+        Play evaluation games and collect statistics.
+        """
+        total_model_moves = 0
+        legal_first_try = 0
+        legal_with_retry = 0
+        games_played = 0
+        moves_per_game = []
+        while total_model_moves < self.total_moves:
+            # Start a new game
+            self.game_handler.reset()
+            game_moves = 0
+            games_played += 1
+            while game_moves < self.moves_per_game and total_model_moves < self.total_moves:
+                if self.game_handler.is_game_over():
+                    break
+                turn = self.game_handler.get_turn()
+                if turn == "white":
+                    # Model's turn
+                    legal_moves = self.game_handler.get_legal_moves_uci()
+                    history = self.game_handler.get_move_history_formatted()
+                    move, was_first_try = self.move_generator.get_move(history, legal_moves)
+                    total_model_moves += 1
+                    game_moves += 1
+                    if move:
+                        if was_first_try:
+                            legal_first_try += 1
+                        legal_with_retry += 1
+                        self.game_handler.make_move(move)
+                    else:
+                        # All retries failed - make a random legal move to continue
+                        # Sort for determinism (set iteration order is not guaranteed)
+                        if legal_moves:
+                            sorted_moves = sorted(legal_moves)
+                            random_move = random.choice(sorted_moves)
+                            self.game_handler.make_move(random_move)
+                else:
+                    # Opponent engine's turn
+                    opp_move = self.game_handler.get_opponent_move()
+                    self.game_handler.make_move(opp_move)
+            moves_per_game.append(game_moves)
+            if verbose and games_played % 5 == 0:
+                rate = legal_with_retry / total_model_moves if total_model_moves > 0 else 0
+                print(f"  Games: {games_played} | Moves: {total_model_moves}/{self.TOTAL_MOVES} | Legal rate: {rate:.1%}")
+        return EvaluationResult(
+            model_id=self.model_path,
+            n_parameters=0,  # Will be set by caller
+            passed_param_check=True,
+            passed_pychess_check=True,
+            total_moves=total_model_moves,
+            legal_moves_first_try=legal_first_try,
+            legal_moves_with_retry=legal_with_retry,
+            games_played=games_played,
+            moves_per_game=moves_per_game,
+        )
+# =============================================================================
+# Hub Integration
+# =============================================================================
+def post_discussion_summary(model_id: str, result: EvaluationResult, token: Optional[str] = None):
     """
+    Post evaluation summary as a discussion on the model's HuggingFace page.
     Args:
+        model_id: The HuggingFace model ID.
+        result: The evaluation result.
+        token: HuggingFace token with write access.
     """
     try:
+        from huggingface_hub import HfApi
+        api = HfApi(token=token)
+        # Create discussion with evaluation results
+        api.create_discussion(
+            repo_id=model_id,
+            title="🏆 Evaluation Results",
+            description=result.summary(),
+            repo_type="model",
+        )
+        print(f"Posted evaluation summary to {model_id}")
     except Exception as e:
+        print(f"Failed to post discussion: {e}")
+# =============================================================================
+# CLI
+# =============================================================================
 def main():
     """Main evaluation function."""
+    parser = argparse.ArgumentParser(
+        description="Evaluate a chess model for the Chess Challenge",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  # Evaluate a local model
+  python -m src.evaluate --model_path ./my_model
+  # Evaluate a HuggingFace model
+  python -m src.evaluate --model_path LLM-course/chess-example
+  # Evaluate and post results to HuggingFace
+  python -m src.evaluate --model_path LLM-course/chess-example --post_results
+        """
+    )
     parser.add_argument(
         "--model_path", type=str, required=True,
+        help="Path to the model directory or HuggingFace model ID"
     )
     parser.add_argument(
+        "--engine_depth", type=int, default=2,
+        help="Opponent engine search depth (default: 2)"
     )
     parser.add_argument(
+        "--post_results", action="store_true",
+        help="Post results as a discussion on the model's HuggingFace page"
     )
     parser.add_argument(
+        "--hf_token", type=str, default=None,
+        help="HuggingFace token for posting results (uses HF_TOKEN env var if not provided)"
     )
     args = parser.parse_args()
     print("=" * 60)
     print("CHESS CHALLENGE - EVALUATION")
     print("=" * 60)
+    print()
+    # Load model and tokenizer
+    model, tokenizer, model_id = load_model_and_tokenizer(
+        args.model_path,
+        verbose=True,
+    )
+    print()
     # Create evaluator
     evaluator = ChessEvaluator(
         model=model,
         tokenizer=tokenizer,
+        model_path=args.model_path,
+        engine_depth=args.engine_depth,
     )
+    # Run evaluation
+    result = evaluator.evaluate(verbose=True)
+    # Print results
+    print()
+    print("=" * 60)
+    print("RESULTS")
+    print("=" * 60)
+    print()
+    print(result.summary())
+    # Post results if requested
+    if args.post_results:
+        token = args.hf_token or os.environ.get("HF_TOKEN")
+        if token:
+            post_discussion_summary(model_id, result, token)
+        else:
+            print("\nWarning: No HuggingFace token provided. Cannot post results.")
+    print()
+    print("=" * 60)
     print("EVALUATION COMPLETE")
     print("=" * 60)
+    return result
+def evaluate_model(model_path: str, verbose: bool = True) -> EvaluationResult:
+    """
+    Convenience function to evaluate a model from a path.
+    Args:
+        model_path: Path to the model directory (local or HuggingFace repo ID)
+        verbose: Whether to print progress
+    Returns:
+        EvaluationResult with all metrics
+    Example:
+        >>> from src.evaluate import evaluate_model
+        >>> results = evaluate_model("./my_model/final")
+        >>> print(results.to_markdown())
+    """
+    model, tokenizer, model_id = load_model_and_tokenizer(model_path, verbose=verbose)
+    evaluator = ChessEvaluator(
+        model=model,
+        tokenizer=tokenizer,
+        model_path=model_path,
+    )
+    return evaluator.evaluate(verbose=verbose)
 if __name__ == "__main__":

src/utils.py DELETED Viewed

@@ -1,305 +0,0 @@
-"""
-Utility functions for the Chess Challenge.
-This module provides helper functions for:
-- Parameter counting and budget analysis
-- Model registration with Hugging Face
-- Move validation with python-chess
-"""
-from __future__ import annotations
-from typing import Dict, Optional, TYPE_CHECKING
-import torch.nn as nn
-if TYPE_CHECKING:
-    from src.model import ChessConfig
-def count_parameters(model: nn.Module, trainable_only: bool = True) -> int:
-    """
-    Count the number of parameters in a model.
-    Args:
-        model: The PyTorch model.
-        trainable_only: If True, only count trainable parameters.
-    Returns:
-        Total number of parameters.
-    """
-    if trainable_only:
-        return sum(p.numel() for p in model.parameters() if p.requires_grad)
-    return sum(p.numel() for p in model.parameters())
-def count_parameters_by_component(model: nn.Module) -> Dict[str, int]:
-    """
-    Count parameters broken down by model component.
-    Args:
-        model: The PyTorch model.
-    Returns:
-        Dictionary mapping component names to parameter counts.
-    """
-    counts = {}
-    for name, module in model.named_modules():
-        if len(list(module.children())) == 0:  # Leaf module
-            param_count = sum(p.numel() for p in module.parameters(recurse=False))
-            if param_count > 0:
-                counts[name] = param_count
-    return counts
-def estimate_parameters(config: "ChessConfig") -> Dict[str, int]:
-    """
-    Estimate the parameter count for a given configuration.
-    This is useful for planning your architecture before building the model.
-    Args:
-        config: Model configuration.
-    Returns:
-        Dictionary with estimated parameter counts by component.
-    """
-    V = config.vocab_size
-    d = config.n_embd
-    L = config.n_layer
-    n_ctx = config.n_ctx
-    n_inner = config.n_inner
-    estimates = {
-        "token_embeddings": V * d,
-        "position_embeddings": n_ctx * d,
-        "attention_qkv_per_layer": 3 * d * d,
-        "attention_proj_per_layer": d * d,
-        "ffn_per_layer": 2 * d * n_inner,
-        "layernorm_per_layer": 4 * d,  # 2 LayerNorms, each with weight and bias
-        "final_layernorm": 2 * d,
-    }
-    # Calculate totals
-    per_layer = (
-        estimates["attention_qkv_per_layer"] +
-        estimates["attention_proj_per_layer"] +
-        estimates["ffn_per_layer"] +
-        estimates["layernorm_per_layer"]
-    )
-    estimates["total_transformer_layers"] = L * per_layer
-    # LM head (tied with embeddings by default)
-    if config.tie_weights:
-        estimates["lm_head"] = 0
-        estimates["lm_head_note"] = "Tied with token embeddings"
-    else:
-        estimates["lm_head"] = V * d
-    # Grand total
-    estimates["total"] = (
-        estimates["token_embeddings"] +
-        estimates["position_embeddings"] +
-        estimates["total_transformer_layers"] +
-        estimates["final_layernorm"] +
-        estimates["lm_head"]
-    )
-    return estimates
-def print_parameter_budget(config: "ChessConfig", limit: int = 1_000_000) -> None:
-    """
-    Print a formatted parameter budget analysis.
-    Args:
-        config: Model configuration.
-        limit: Parameter limit to compare against.
-    """
-    estimates = estimate_parameters(config)
-    print("=" * 60)
-    print("PARAMETER BUDGET ANALYSIS")
-    print("=" * 60)
-    print(f"\nConfiguration:")
-    print(f"  vocab_size (V) = {config.vocab_size}")
-    print(f"  n_embd (d)     = {config.n_embd}")
-    print(f"  n_layer (L)    = {config.n_layer}")
-    print(f"  n_head         = {config.n_head}")
-    print(f"  n_ctx          = {config.n_ctx}")
-    print(f"  n_inner        = {config.n_inner}")
-    print(f"  tie_weights    = {config.tie_weights}")
-    print(f"\nParameter Breakdown:")
-    print(f"  Token Embeddings:    {estimates['token_embeddings']:>10,}")
-    print(f"  Position Embeddings: {estimates['position_embeddings']:>10,}")
-    print(f"  Transformer Layers:  {estimates['total_transformer_layers']:>10,}")
-    print(f"  Final LayerNorm:     {estimates['final_layernorm']:>10,}")
-    if config.tie_weights:
-        print(f"  LM Head:             {'(tied)':>10}")
-    else:
-        print(f"  LM Head:             {estimates['lm_head']:>10,}")
-    print(f"  " + "-" * 30)
-    print(f"  TOTAL:               {estimates['total']:>10,}")
-    print(f"\nBudget Status:")
-    print(f"  Limit:    {limit:>10,}")
-    print(f"  Used:     {estimates['total']:>10,}")
-    print(f"  Remaining:{limit - estimates['total']:>10,}")
-    if estimates['total'] <= limit:
-        print(f"\n Within budget! ({estimates['total'] / limit * 100:.1f}% used)")
-    else:
-        print(f"\n OVER BUDGET by {estimates['total'] - limit:,} parameters!")
-    print("=" * 60)
-def validate_move_with_chess(move: str, board_fen: Optional[str] = None) -> bool:
-    """
-    Validate a move using python-chess.
-    This function converts the dataset's extended UCI format to standard UCI
-    and validates it against the current board state.
-    Args:
-        move: Move in extended UCI format (e.g., "WPe2e4", "BNg8f6(x)").
-        board_fen: FEN string of the current board state (optional).
-    Returns:
-        True if the move is legal, False otherwise.
-    """
-    try:
-        import chess
-    except ImportError:
-        raise ImportError("python-chess is required for move validation. "
-                         "Install it with: pip install python-chess")
-    # Parse the extended UCI format
-    # Format: [W|B][Piece][from_sq][to_sq][suffix]
-    # Example: WPe2e4, BNg8f6(x), WKe1g1(o)
-    if len(move) < 6:
-        return False
-    # Extract components
-    color = move[0]  # W or B
-    piece = move[1]  # P, N, B, R, Q, K
-    from_sq = move[2:4]  # e.g., "e2"
-    to_sq = move[4:6]  # e.g., "e4"
-    # Check for promotion
-    promotion = None
-    if "=" in move:
-        promo_idx = move.index("=")
-        promotion = move[promo_idx + 1].lower()
-    # Create board
-    board = chess.Board(board_fen) if board_fen else chess.Board()
-    # Build UCI move string
-    uci_move = from_sq + to_sq
-    if promotion:
-        uci_move += promotion
-    try:
-        move_obj = chess.Move.from_uci(uci_move)
-        return move_obj in board.legal_moves
-    except (ValueError, chess.InvalidMoveError):
-        return False
-def convert_extended_uci_to_uci(move: str) -> str:
-    """
-    Convert extended UCI format to standard UCI format.
-    Args:
-        move: Move in extended UCI format (e.g., "WPe2e4").
-    Returns:
-        Move in standard UCI format (e.g., "e2e4").
-    """
-    if len(move) < 6:
-        return move
-    # Extract squares
-    from_sq = move[2:4]
-    to_sq = move[4:6]
-    # Check for promotion
-    promotion = ""
-    if "=" in move:
-        promo_idx = move.index("=")
-        promotion = move[promo_idx + 1].lower()
-    return from_sq + to_sq + promotion
-def convert_uci_to_extended(
-    uci_move: str,
-    board_fen: str,
-) -> str:
-    """
-    Convert standard UCI format to extended UCI format.
-    Args:
-        uci_move: Move in standard UCI format (e.g., "e2e4").
-        board_fen: FEN string of the current board state.
-    Returns:
-        Move in extended UCI format (e.g., "WPe2e4").
-    """
-    try:
-        import chess
-    except ImportError:
-        raise ImportError("python-chess is required for move conversion.")
-    board = chess.Board(board_fen)
-    move = chess.Move.from_uci(uci_move)
-    # Get color
-    color = "W" if board.turn == chess.WHITE else "B"
-    # Get piece
-    piece = board.piece_at(move.from_square)
-    piece_letter = piece.symbol().upper() if piece else "P"
-    # Build extended UCI
-    from_sq = chess.square_name(move.from_square)
-    to_sq = chess.square_name(move.to_square)
-    result = f"{color}{piece_letter}{from_sq}{to_sq}"
-    # Add promotion
-    if move.promotion:
-        result += f"={chess.piece_symbol(move.promotion).upper()}"
-    # Add suffix for captures
-    if board.is_capture(move):
-        result += "(x)"
-    # Add suffix for check/checkmate
-    board.push(move)
-    if board.is_checkmate():
-        if "(x)" in result:
-            result = result.replace("(x)", "(x+*)")
-        else:
-            result += "(+*)"
-    elif board.is_check():
-        if "(x)" in result:
-            result = result.replace("(x)", "(x+)")
-        else:
-            result += "(+)"
-    board.pop()
-    # Handle castling notation
-    if board.is_castling(move):
-        if move.to_square in [chess.G1, chess.G8]:  # Kingside
-            result = result.replace("(x)", "").replace("(+)", "") + "(o)"
-        else:  # Queenside
-            result = result.replace("(x)", "").replace("(+)", "") + "(O)"
-    return result

submit.py CHANGED Viewed

@@ -2,23 +2,143 @@
 """
 Submission script for the Chess Challenge.
-This script pushes your trained model to the Hugging Face Hub under the
-LLM-course organization, with metadata tracking who submitted it.
 Usage:
-    python submit.py --model_path ./my_model/final_model --model_name my-chess-model
 """
 import argparse
 import os
-import tempfile
 from pathlib import Path
 def main():
-    parser = argparse.ArgumentParser(description="Submit your chess model to Hugging Face Hub")
     parser.add_argument(
-        "--model_path", type=str, default="./my_model/final_model",
         help="Path to your trained model directory"
     )
     parser.add_argument(
@@ -26,89 +146,95 @@ def main():
         help="Name for your model on the Hub (e.g., 'my-chess-model')"
     )
     args = parser.parse_args()
-    # Fixed organization
     organization = "LLM-course"
-    # Check model path exists
-    if not os.path.exists(args.model_path):
-        print(f"Error: Model path '{args.model_path}' does not exist.")
-        print("Train a model first with: python -m src.train --output_dir ./my_model")
-        return 1
-    # Import here to avoid slow startup
-    from huggingface_hub import HfApi, HfFolder, whoami
-    from transformers import AutoModelForCausalLM
-    # Ensure user is logged in and get their info
     print("=" * 60)
     print("CHESS CHALLENGE - MODEL SUBMISSION")
     print("=" * 60)
     try:
         user_info = whoami()
         username = user_info["name"]
-        print(f"\nLogged in as: {username}")
     except Exception:
-        print("\nYou need to log in to Hugging Face first.")
-        print("Run: huggingface-cli login")
-        return 1
-    # Import custom classes to register them
-    from src.model import ChessConfig, ChessForCausalLM
-    from src.tokenizer import ChessTokenizer
-    # Load model and tokenizer
-    print(f"\nLoading model from: {args.model_path}")
-    model = AutoModelForCausalLM.from_pretrained(args.model_path)
-    tokenizer = ChessTokenizer.from_pretrained(args.model_path)
-    # Count parameters
-    n_params = sum(p.numel() for p in model.parameters())
-    print(f"Model parameters: {n_params:,}")
-    if n_params > 1_000_000:
-        print(f"WARNING: Model exceeds 1M parameter limit ({n_params:,} params)")
-    # Prepare repo name
-    repo_id = f"{organization}/{args.model_name}"
-    print(f"\nSubmitting to: {repo_id}")
-    # Create a temporary directory to prepare submission
-    with tempfile.TemporaryDirectory() as tmp_dir:
-        tmp_path = Path(tmp_dir)
-        # Register tokenizer for AutoTokenizer so it can be loaded with trust_remote_code=True
-        # This adds the 'auto_map' field to tokenizer_config.json
-        tokenizer.register_for_auto_class("AutoTokenizer")
-        # Register model for AutoModelForCausalLM so custom architectures load correctly
-        # This adds the 'auto_map' field to config.json
-        model.config.auto_map = {
-            "AutoConfig": "model.ChessConfig",
-            "AutoModelForCausalLM": "model.ChessForCausalLM",
-        }
-        # Save model and tokenizer
-        model.save_pretrained(tmp_path)
-        tokenizer.save_pretrained(tmp_path)
-        # Copy tokenizer.py to allow loading with trust_remote_code=True
-        # This ensures the custom ChessTokenizer can be loaded from the Hub
-        import shutil
-        tokenizer_src = Path(__file__).parent / "src" / "tokenizer.py"
-        if tokenizer_src.exists():
-            shutil.copy(tokenizer_src, tmp_path / "tokenizer.py")
-            print("   Included tokenizer.py for remote loading")
-        # Copy model.py to allow loading custom model architectures with trust_remote_code=True
-        # This ensures students who modify the model architecture can load their models from the Hub
-        model_src = Path(__file__).parent / "src" / "model.py"
-        if model_src.exists():
-            shutil.copy(model_src, tmp_path / "model.py")
-            print("   Included model.py for remote loading")
-        # Create model card with submitter info
         model_card = f"""---
 library_name: transformers
 tags:
@@ -128,43 +254,47 @@ Chess model submitted to the LLM Course Chess Challenge.
 - **Parameters**: {n_params:,}
 - **Organization**: {organization}
-## Model Details
-- **Architecture**: Chess Transformer (GPT-style)
-- **Vocab size**: {tokenizer.vocab_size}
-- **Embedding dim**: {model.config.n_embd}
-- **Layers**: {model.config.n_layer}
-- **Heads**: {model.config.n_head}
-"""
-        (tmp_path / "README.md").write_text(model_card)
-        # Push to Hub
-        print("\nUploading to Hugging Face Hub...")
-        api = HfApi()
-        # Create repo if it doesn't exist
-        api.create_repo(
-            repo_id=repo_id,
-            exist_ok=True,
-        )
         # Upload all files
         api.upload_folder(
-            folder_path=tmp_path,
             repo_id=repo_id,
             commit_message=f"Chess Challenge submission by {username}",
         )
     print("\n" + "=" * 60)
     print("SUBMISSION COMPLETE!")
     print("=" * 60)
-    print(f"\nYour model is now available at:")
     print(f"  https://huggingface.co/{repo_id}")
     print(f"\nSubmitted by: {username}")
     print(f"Parameters: {n_params:,}")
     return 0
 if __name__ == "__main__":
-    exit(main())

 """
 Submission script for the Chess Challenge.
+This script validates and uploads your trained model to the Hugging Face Hub
+under the LLM-course organization.
+Your model directory must contain:
+- config.json: Model configuration with auto_map for custom architecture
+- model.safetensors (or pytorch_model.bin): Model weights
+- tokenizer_config.json: Tokenizer configuration with auto_map
+- vocab.json: Vocabulary file
+- model.py: Your custom model architecture (for trust_remote_code)
+- tokenizer.py: Your custom tokenizer (for trust_remote_code)
 Usage:
+    python submit.py --model_path ./my_model --model_name my-chess-model
 """
 import argparse
 import os
+import sys
 from pathlib import Path
+# Required files for a valid submission
+REQUIRED_FILES = {
+    "config.json": "Model configuration (must include auto_map)",
+    "tokenizer_config.json": "Tokenizer configuration (must include auto_map)",
+    "vocab.json": "Vocabulary file",
+    "model.py": "Custom model architecture (for trust_remote_code=True)",
+    "tokenizer.py": "Custom tokenizer class (for trust_remote_code=True)",
+}
+# At least one of these weight files must exist
+WEIGHT_FILES = ["model.safetensors", "pytorch_model.bin"]
+def validate_model_directory(model_path: Path) -> tuple[bool, list[str]]:
+    """
+    Validate that the model directory contains all required files.
+    Returns:
+        Tuple of (is_valid, list of error messages).
+    """
+    errors = []
+    # Check required files
+    for filename, description in REQUIRED_FILES.items():
+        if not (model_path / filename).exists():
+            errors.append(f"Missing {filename}: {description}")
+    # Check weight files (need at least one)
+    has_weights = any((model_path / f).exists() for f in WEIGHT_FILES)
+    if not has_weights:
+        errors.append(f"Missing model weights: need {' or '.join(WEIGHT_FILES)}")
+    return len(errors) == 0, errors
+def validate_auto_map(model_path: Path) -> tuple[bool, list[str]]:
+    """
+    Validate that config.json and tokenizer_config.json have auto_map fields.
+    Returns:
+        Tuple of (is_valid, list of error messages).
+    """
+    import json
+    errors = []
+    # Check config.json for auto_map
+    config_path = model_path / "config.json"
+    if config_path.exists():
+        with open(config_path) as f:
+            config = json.load(f)
+        if "auto_map" not in config:
+            errors.append(
+                "config.json missing 'auto_map' field. Add:\n"
+                '  "auto_map": {\n'
+                '    "AutoConfig": "model.YourConfig",\n'
+                '    "AutoModelForCausalLM": "model.YourModel"\n'
+                '  }'
+            )
+    # Check tokenizer_config.json for auto_map
+    tokenizer_config_path = model_path / "tokenizer_config.json"
+    if tokenizer_config_path.exists():
+        with open(tokenizer_config_path) as f:
+            tokenizer_config = json.load(f)
+        if "auto_map" not in tokenizer_config:
+            errors.append(
+                "tokenizer_config.json missing 'auto_map' field. Add:\n"
+                '  "auto_map": {\n'
+                '    "AutoTokenizer": ["tokenizer.YourTokenizer", null]\n'
+                '  }\n'
+                'Note: AutoTokenizer value must be a list [slow_class, fast_class].'
+            )
+        elif "AutoTokenizer" in tokenizer_config.get("auto_map", {}):
+            auto_tok = tokenizer_config["auto_map"]["AutoTokenizer"]
+            if isinstance(auto_tok, str):
+                errors.append(
+                    "tokenizer_config.json auto_map.AutoTokenizer must be a list, not a string.\n"
+                    'Change from: "AutoTokenizer": "tokenizer.YourTokenizer"\n'
+                    'To: "AutoTokenizer": ["tokenizer.YourTokenizer", null]'
+                )
+    return len(errors) == 0, errors
+def count_parameters(model_path: Path) -> int:
+    """Count parameters in the model."""
+    from transformers import AutoModelForCausalLM
+    model = AutoModelForCausalLM.from_pretrained(
+        model_path,
+        trust_remote_code=True,
+        local_files_only=True,
+    )
+    return sum(p.numel() for p in model.parameters())
 def main():
+    parser = argparse.ArgumentParser(
+        description="Submit your chess model to the Hugging Face Hub",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Required files in your model directory:
+  - config.json          Model configuration with auto_map
+  - model.safetensors    Model weights (or pytorch_model.bin)
+  - tokenizer_config.json Tokenizer configuration with auto_map
+  - vocab.json           Vocabulary file
+  - model.py             Custom model architecture
+  - tokenizer.py         Custom tokenizer class
+Example:
+  python submit.py --model_path ./my_model --model_name my-chess-model
+        """
+    )
     parser.add_argument(
+        "--model_path", type=str, required=True,
         help="Path to your trained model directory"
     )
     parser.add_argument(
         help="Name for your model on the Hub (e.g., 'my-chess-model')"
     )
     args = parser.parse_args()
+    model_path = Path(args.model_path)
     organization = "LLM-course"
     print("=" * 60)
     print("CHESS CHALLENGE - MODEL SUBMISSION")
     print("=" * 60)
+    # Check model path exists
+    if not model_path.exists():
+        print(f"\nError: Model path '{model_path}' does not exist.")
+        return 1
+    # Validate required files
+    print("\n[1/5] Checking required files...")
+    is_valid, errors = validate_model_directory(model_path)
+    if not is_valid:
+        print("\nError: Model directory is incomplete:")
+        for error in errors:
+            print(f"  - {error}")
+        print("\nSee example_solution/ for a complete example.")
+        return 1
+    print("  All required files present.")
+    # Validate auto_map fields
+    print("\n[2/5] Validating auto_map configuration...")
+    is_valid, errors = validate_auto_map(model_path)
+    if not is_valid:
+        print("\nError: Configuration files need auto_map:")
+        for error in errors:
+            print(f"  - {error}")
+        return 1
+    print("  auto_map configuration valid.")
+    # Count parameters
+    print("\n[3/5] Counting parameters...")
+    try:
+        n_params = count_parameters(model_path)
+        print(f"  Parameters: {n_params:,}")
+        if n_params > 1_000_000:
+            print(f"\n  WARNING: Model exceeds 1M parameter limit!")
+            print(f"  Your model has {n_params:,} parameters.")
+            print(f"  It will fail the evaluation parameter check.")
+    except Exception as e:
+        print(f"\nError: Could not load model to count parameters: {e}")
+        return 1
+    # Hugging Face login
+    print("\n[4/5] Checking Hugging Face authentication...")
+    try:
+        from huggingface_hub import HfApi, whoami
+    except ImportError:
+        print("\nError: huggingface_hub not installed.")
+        print("Install with: pip install huggingface_hub")
+        return 1
     try:
         user_info = whoami()
         username = user_info["name"]
+        print(f"  Logged in as: {username}")
     except Exception:
+        print("\n  Not logged in. Starting login process...")
+        print("  You need a Hugging Face account and access token.")
+        print("  Get your token at: https://huggingface.co/settings/tokens")
+        print()
+        # Interactive login
+        from huggingface_hub import login
+        try:
+            login()
+            user_info = whoami()
+            username = user_info["name"]
+            print(f"\n  Successfully logged in as: {username}")
+        except Exception as e:
+            print(f"\nError: Login failed: {e}")
+            return 1
+    # Upload to Hub
+    print("\n[5/5] Uploading to Hugging Face Hub...")
+    repo_id = f"{organization}/{args.model_name}"
+    print(f"  Repository: {repo_id}")
+    api = HfApi()
+    try:
+        # Create repo if it doesn't exist
+        api.create_repo(repo_id=repo_id, exist_ok=True)
+        # Create a model card
         model_card = f"""---
 library_name: transformers
 tags:
 - **Parameters**: {n_params:,}
 - **Organization**: {organization}
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("{repo_id}", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("{repo_id}", trust_remote_code=True)
+```
+## Evaluation
+This model is evaluated at the [Chess Challenge Arena](https://huggingface.co/spaces/LLM-course/Chess1MChallenge).
+"""
+        # Write model card
+        readme_path = model_path / "README.md"
+        readme_path.write_text(model_card)
         # Upload all files
         api.upload_folder(
+            folder_path=model_path,
             repo_id=repo_id,
             commit_message=f"Chess Challenge submission by {username}",
         )
+    except Exception as e:
+        print(f"\nError: Upload failed: {e}")
+        return 1
     print("\n" + "=" * 60)
     print("SUBMISSION COMPLETE!")
     print("=" * 60)
+    print(f"\nYour model is available at:")
     print(f"  https://huggingface.co/{repo_id}")
     print(f"\nSubmitted by: {username}")
     print(f"Parameters: {n_params:,}")
+    print(f"\nNext step: Go to the Chess Challenge Arena to run evaluation:")
+    print(f"  https://huggingface.co/spaces/LLM-course/Chess1MChallenge")
     return 0
 if __name__ == "__main__":
+    sys.exit(main())

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff