Spaces:

LLM-course
/

Chess1MChallenge

Sleeping

App Files Files Community

nathanael-fijalkow commited on Jan 14

Commit

8a7719b

1 Parent(s): d7e086f

Integrated the template, made evaluation more robust to different tokenizers

Browse files

Files changed (10) hide show

.gitignore +56 -0
TEMPLATE_README.md +152 -0
app.py +7 -1
pyproject.toml +59 -0
src/__init__.py +10 -1
src/data.py +253 -0
src/evaluate.py +270 -132
src/train.py +250 -0
src/utils.py +305 -0
submit.py +144 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,56 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+*.egg
+# Virtual environments
+.venv/
+venv/
+env/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+# Model outputs and training artifacts
+my_model/
+checkpoints/
+runs/
+wandb/
+*.pth
+*.pt
+*.safetensors
+*.bin
+# Dataset caches
+.cache/
+*.arrow
+*.parquet
+# Jupyter
+.ipynb_checkpoints/
+*.ipynb
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# Logs
+*.log
+logs/
+# Environment variables
+.env
+.env.local

TEMPLATE_README.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Chess Challenge
+Train a 1M parameter LLM to play chess!
+## Objective
+Design and train a transformer-based language model to predict chess moves. Your model must:
+1. **Stay under 1M parameters** - This is the hard constraint!
+2. **Use a custom tokenizer** - Design an efficient move-level tokenizer
+3. **Play legal chess** - The model should learn to generate valid moves
+4. **Beat Stockfish** - Your ELO will be measured against Stockfish Level 1
+## Dataset
+We use the Lichess dataset: [`dlouapre/lichess_2025-01_1M`](https://huggingface.co/datasets/dlouapre/lichess_2025-01_1M)
+The dataset uses an extended UCI notation:
+- `W`/`B` prefix for White/Black
+- Piece letter: `P`=Pawn, `N`=Knight, `B`=Bishop, `R`=Rook, `Q`=Queen, `K`=King
+- Source and destination squares (e.g., `e2e4`)
+- Special suffixes: `(x)`=capture, `(+)`=check, `(+*)`=checkmate, `(o)`/`(O)`=castling
+Example game:
+```
+WPe2e4 BPe7e5 WNg1f3 BNb8c6 WBf1b5 BPa7a6 WBb5c6(x) BPd7c6(x) ...
+```
+## Quick Start
+### Train a Model
+```bash
+# Basic training
+python -m src.train \
+    --output_dir ./my_model \
+    --num_train_epochs 3 \
+    --per_device_train_batch_size 32
+```
+### Evaluate Your Model
+Evaluation happens in two phases:
+```bash
+# Phase 1: Legal Move Evaluation (quick sanity check)
+python -m src.evaluate \
+    --model_path ./my_model \
+    --mode legal \
+    --n_positions 500
+# Phase 2: Win Rate Evaluation (full games against Stockfish)
+python -m src.evaluate \
+    --model_path ./my_model \
+    --mode winrate \
+    --n_games 100 \
+    --stockfish_level 1
+# Or run both phases:
+python -m src.evaluate \
+    --model_path ./my_model \
+    --mode both
+```
+## Parameter Budget
+Use the utility function to check your budget:
+```python
+from src import ChessConfig, print_parameter_budget
+config = ChessConfig(
+    vocab_size=1200,
+    n_embd=128,
+    n_layer=4,
+    n_head=4,
+)
+print_parameter_budget(config)
+```
+### Pro Tips
+1. **Weight Tying**: The default config ties the embedding and output layer weights, saving ~154k parameters
+2. **Vocabulary Size**: Keep it small! ~1200 tokens covers all moves
+3. **Depth vs Width**: With limited parameters, experiment with shallow-but-wide vs deep-but-narrow
+## Customization
+### Custom Tokenizer
+The template provides a move-level tokenizer that builds vocabulary from the actual dataset.
+Feel free to try different approaches!
+### Custom Architecture
+Modify the model in `src/model.py`:
+```python
+from src import ChessConfig, ChessForCausalLM
+# Customize configuration
+config = ChessConfig(
+    vocab_size=1200,
+    n_embd=128,      # Try 96, 128, or 192
+    n_layer=4,       # Try 3, 4, or 6
+    n_head=4,        # Try 4 or 8
+    n_inner=384,     # Feed-forward dimension (default: 3*n_embd)
+    dropout=0.1,
+    tie_weights=True,
+)
+model = ChessForCausalLM(config)
+```
+## Evaluation Metrics
+### Phase 1: Legal Move Evaluation
+Tests if your model generates valid chess moves:
+| Metric | Description |
+|--------|-------------|
+| **Legal Rate (1st try)** | % of legal moves on first attempt |
+| **Legal Rate (with retry)** | % of legal moves within 3 attempts |
+> **Target**: >90% legal rate before proceeding to Phase 2
+### Phase 2: Win Rate Evaluation
+Full games against Stockfish to measure playing strength:
+| Metric | Description |
+|--------|-------------|
+| **Win Rate** | % of games won against Stockfish |
+| **ELO Rating** | Estimated rating based on game results |
+| **Avg Game Length** | Average number of moves per game |
+| **Illegal Move Rate** | % of illegal moves during games |
+## Submission
+1. Train your model
+2. Log in to Hugging Face: `hf auth login`
+3. Submit your model using the submission script:
+```bash
+python submit.py --model_path ./my_model/final_model --model_name your-model-name
+```
+The script will:
+- Upload your model to the LLM-course organization
+- Include your HF username in the model card for tracking

app.py CHANGED Viewed

@@ -591,7 +591,13 @@ with gr.Blocks(
             The goal is to create a chess-playing language model with **under 1 million parameters**, which is roughly the number of neurons in a honey bee's brain.
             At this scale, efficiency and clever architecture choices are key! We are not targetting superhuman performance, but rather exploring how well small models can learn the rules of chess, the goal being (only) to play **legal moves**.
-            1. **Train your model** using the [Chess Challenge Template](https://github.com/nathanael-fijalkow/ChessChallengeTemplate)
             2. **Push to Hugging Face Hub** using the `submit.py` script provided in the template to make sure that your model is registered correctly.

             The goal is to create a chess-playing language model with **under 1 million parameters**, which is roughly the number of neurons in a honey bee's brain.
             At this scale, efficiency and clever architecture choices are key! We are not targetting superhuman performance, but rather exploring how well small models can learn the rules of chess, the goal being (only) to play **legal moves**.
+            0. **Clone this repository**:
+                ```bash
+                git clone https://huggingface.co/spaces/LLM-course/Chess1MChallenge
+                ```
+                and check the `TEMPLATE_README.md` for detailed instructions.
+            1. **Train your model**
             2. **Push to Hugging Face Hub** using the `submit.py` script provided in the template to make sure that your model is registered correctly.

pyproject.toml ADDED Viewed

	@@ -0,0 +1,59 @@

+[build-system]
+requires = ["setuptools>=61.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "chess-challenge"
+version = "0.1.0"
+description = "LLM Chess Challenge - Train a 1M parameter model to play chess"
+readme = "README.md"
+license = {text = "MIT"}
+requires-python = ">=3.10"
+authors = [
+    {name = "Nathanaël Fijalkow", email = "nathanael.fijalkow@gmail.com"}
+]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Education",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.10",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+]
+dependencies = [
+    "torch>=2.0.0",
+    "transformers>=4.40.0",
+    "accelerate>=0.26.0",
+    "datasets>=2.14.0",
+    "python-chess>=1.999",
+    "huggingface-hub>=0.20.0",
+    "tqdm>=4.65.0",
+    "numpy>=1.24.0",
+    "wandb>=0.15.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0.0",
+    "black>=23.0.0",
+    "ruff>=0.1.0",
+]
+eval = [
+    "stockfish>=3.28.0",
+]
+[project.scripts]
+chess-train = "src.train:main"
+chess-eval = "src.evaluate:main"
+[tool.setuptools.packages.find]
+where = ["src"]
+[tool.black]
+line-length = 100
+target-version = ["py310"]
+[tool.ruff]
+line-length = 100
+select = ["E", "F", "I", "W"]

src/__init__.py CHANGED Viewed

@@ -2,7 +2,16 @@
 from .model import ChessConfig, ChessForCausalLM
 from .tokenizer import ChessTokenizer
-from .evaluate import ChessEvaluator, load_model_from_hub
 __all__ = [
     "ChessConfig",

 from .model import ChessConfig, ChessForCausalLM
 from .tokenizer import ChessTokenizer
+# Lazy import for evaluate to avoid RuntimeWarning when running as module
+def __getattr__(name):
+    if name == "ChessEvaluator":
+        from .evaluate import ChessEvaluator
+        return ChessEvaluator
+    if name == "load_model_from_hub":
+        from .evaluate import load_model_from_hub
+        return load_model_from_hub
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
 __all__ = [
     "ChessConfig",

src/data.py ADDED Viewed

	@@ -0,0 +1,253 @@

+"""
+Data loading utilities for the Chess Challenge.
+This module provides functions to load and process chess game data
+from the Lichess dataset on Hugging Face.
+"""
+from __future__ import annotations
+from typing import Dict, Iterator, List, Optional
+import torch
+from torch.utils.data import Dataset
+class ChessDataset(Dataset):
+    """
+    PyTorch Dataset for chess games.
+    This dataset loads games from a Hugging Face dataset and prepares
+    them for language modeling training.
+    Each game is tokenized and truncated/padded to max_length.
+    The labels are shifted by one position for next-token prediction.
+    Example:
+        >>> from src.tokenizer import ChessTokenizer
+        >>> tokenizer = ChessTokenizer.build_vocab_from_dataset()
+        >>> dataset = ChessDataset(tokenizer, max_length=256)
+        >>> sample = dataset[0]
+        >>> print(sample["input_ids"].shape)  # (256,)
+    """
+    def __init__(
+        self,
+        tokenizer,
+        dataset_name: str = "dlouapre/lichess_2025-01_1M",
+        split: str = "train",
+        column: str = "text",
+        max_length: int = 256,
+        max_samples: Optional[int] = None,
+    ):
+        """
+        Initialize the chess dataset.
+        Args:
+            tokenizer: The chess tokenizer to use.
+            dataset_name: Name of the dataset on Hugging Face Hub.
+            split: Dataset split to use.
+            column: Column containing the game strings.
+            max_length: Maximum sequence length.
+            max_samples: Maximum number of samples to load.
+        """
+        from datasets import load_dataset
+        self.tokenizer = tokenizer
+        self.max_length = max_length
+        self.column = column
+        # Load dataset
+        dataset = load_dataset(dataset_name, split=split)
+        if max_samples is not None:
+            dataset = dataset.select(range(min(max_samples, len(dataset))))
+        self.data = dataset
+    def __len__(self) -> int:
+        return len(self.data)
+    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
+        game = self.data[idx][self.column]
+        # Prepend BOS token for proper language modeling
+        game_with_bos = self.tokenizer.bos_token + " " + game
+        # Tokenize
+        encoding = self.tokenizer(
+            game_with_bos,
+            truncation=True,
+            max_length=self.max_length,
+            padding="max_length",
+            return_tensors="pt",
+        )
+        # Squeeze batch dimension
+        input_ids = encoding["input_ids"].squeeze(0)
+        attention_mask = encoding["attention_mask"].squeeze(0)
+        # Labels are the same as input_ids (model will shift internally)
+        labels = input_ids.clone()
+        # Set padding tokens to -100 to ignore in loss
+        labels[attention_mask == 0] = -100
+        return {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "labels": labels,
+        }
+class ChessDataCollator:
+    """
+    Data collator for chess games.
+    This collator pads sequences to the same length within a batch
+    and creates the appropriate attention masks.
+    """
+    def __init__(self, tokenizer, max_length: int = 256):
+        self.tokenizer = tokenizer
+        self.max_length = max_length
+    def __call__(self, features: List[Dict]) -> Dict[str, torch.Tensor]:
+        # Stack tensors
+        input_ids = torch.stack([f["input_ids"] for f in features])
+        attention_mask = torch.stack([f["attention_mask"] for f in features])
+        labels = torch.stack([f["labels"] for f in features])
+        return {
+            "input_ids": input_ids,
+            "attention_mask": attention_mask,
+            "labels": labels,
+        }
+def create_train_val_datasets(
+    tokenizer,
+    dataset_name: str = "dlouapre/lichess_2025-01_1M",
+    max_length: int = 256,
+    train_samples: Optional[int] = None,
+    val_samples: int = 5000,
+    val_ratio: float = 0.05,
+):
+    """
+    Create training and validation datasets.
+    Args:
+        tokenizer: The chess tokenizer.
+        dataset_name: Name of the dataset.
+        max_length: Maximum sequence length.
+        train_samples: Maximum training samples (None for all).
+        val_samples: Number of validation samples.
+        val_ratio: Ratio of validation samples (used if train_samples is None).
+    Returns:
+        Tuple of (train_dataset, val_dataset).
+    """
+    from datasets import load_dataset
+    # Load full dataset
+    full_dataset = load_dataset(dataset_name, split="train")
+    # Determine split sizes
+    total = len(full_dataset)
+    if train_samples is not None:
+        n_train = min(train_samples, total - val_samples)
+    else:
+        n_train = int(total * (1 - val_ratio))
+    n_val = min(val_samples, total - n_train)
+    # Split dataset
+    train_data = full_dataset.select(range(n_train))
+    val_data = full_dataset.select(range(n_train, n_train + n_val))
+    # Create dataset objects
+    train_dataset = ChessDataset(
+        tokenizer=tokenizer,
+        dataset_name=dataset_name,
+        max_length=max_length,
+    )
+    train_dataset.data = train_data
+    val_dataset = ChessDataset(
+        tokenizer=tokenizer,
+        dataset_name=dataset_name,
+        max_length=max_length,
+    )
+    val_dataset.data = val_data
+    return train_dataset, val_dataset
+def stream_games(
+    dataset_name: str = "dlouapre/lichess_2025-01_1M",
+    split: str = "train",
+    column: str = "text",
+) -> Iterator[str]:
+    """
+    Stream games from the dataset for memory-efficient processing.
+    Args:
+        dataset_name: Name of the dataset on Hugging Face Hub.
+        split: Dataset split to use.
+        column: Column containing the game strings.
+    Yields:
+        Game strings one at a time.
+    """
+    from datasets import load_dataset
+    dataset = load_dataset(dataset_name, split=split, streaming=True)
+    for example in dataset:
+        yield example[column]
+def analyze_dataset_statistics(
+    dataset_name: str = "dlouapre/lichess_2025-01_1M",
+    max_samples: int = 10000,
+) -> Dict:
+    """
+    Analyze statistics of the chess dataset.
+    Args:
+        dataset_name: Name of the dataset.
+        max_samples: Maximum number of samples to analyze.
+    Returns:
+        Dictionary containing dataset statistics.
+    """
+    from collections import Counter
+    from datasets import load_dataset
+    dataset = load_dataset(dataset_name, split="train")
+    dataset = dataset.select(range(min(max_samples, len(dataset))))
+    game_lengths = []
+    move_counts = Counter()
+    opening_moves = Counter()
+    for example in dataset:
+        moves = example["text"].strip().split()
+        game_lengths.append(len(moves))
+        move_counts.update(moves)
+        # Track common openings (first 4 moves)
+        if len(moves) >= 4:
+            opening = " ".join(moves[:4])
+            opening_moves[opening] += 1
+    return {
+        "total_games": len(dataset),
+        "avg_game_length": sum(game_lengths) / len(game_lengths),
+        "min_game_length": min(game_lengths),
+        "max_game_length": max(game_lengths),
+        "unique_moves": len(move_counts),
+        "most_common_moves": move_counts.most_common(20),
+        "most_common_openings": opening_moves.most_common(10),
+    }

src/evaluate.py CHANGED Viewed

@@ -9,6 +9,7 @@ from __future__ import annotations
 import argparse
 import random
 from dataclasses import dataclass
 from typing import List, Optional, Tuple
@@ -23,16 +24,23 @@ class GameResult:
     model_color: str  # "white" or "black"
     termination: str  # "checkmate", "stalemate", "illegal_move", "max_moves", etc.
     illegal_move_count: int
 class ChessEvaluator:
     """
     Evaluator for chess models.
     This class handles playing games between a trained model and Stockfish,
     tracking results, and computing ELO ratings.
     """
     def __init__(
         self,
         model,
@@ -88,10 +96,100 @@ class ChessEvaluator:
         if hasattr(self, 'engine') and self.engine:
             self.engine.quit()
     def _convert_board_to_moves(self, board) -> str:
-        """Convert board move history to model input format."""
         moves = []
         temp_board = self.chess.Board()
         for move in board.move_stack:
             # Get piece and color
@@ -103,29 +201,44 @@ class ChessEvaluator:
             from_sq = self.chess.square_name(move.from_square)
             to_sq = self.chess.square_name(move.to_square)
-            move_str = f"{color}{piece_letter}{from_sq}{to_sq}"
-            # Add promotion
             if move.promotion:
-                move_str += f"={self.chess.piece_symbol(move.promotion).upper()}"
-            # Add capture suffix
-            if temp_board.is_capture(move):
-                move_str += "(x)"
-            # Add check/checkmate suffix
-            temp_board.push(move)
-            if temp_board.is_checkmate():
-                move_str = move_str.replace("(x)", "(x+*)") if "(x)" in move_str else move_str + "(+*)"
-            elif temp_board.is_check():
-                move_str = move_str.replace("(x)", "(x+)") if "(x)" in move_str else move_str + "(+)"
-            # Handle castling
-            if piece_letter == "K" and abs(ord(from_sq[0]) - ord(to_sq[0])) > 1:
-                if to_sq[0] == 'g':  # Kingside
-                    move_str = move_str.split("(")[0] + "(o)"
-                else:  # Queenside
-                    move_str = move_str.split("(")[0] + "(O)"
             moves.append(move_str)
@@ -160,6 +273,65 @@ class ChessEvaluator:
         return False
     def _generate_move_tokens(
         self,
         input_ids: torch.Tensor,
@@ -168,11 +340,12 @@ class ChessEvaluator:
         max_tokens: int = 20,
     ) -> str:
         """
-        Generate tokens until a separator (whitespace/EOS) is encountered.
-        This method supports different tokenization strategies:
-        - For move-level tokenizers: generates one token (the full move)
-        - For character/subword tokenizers: generates until whitespace
         Args:
             input_ids: The input token IDs.
@@ -181,10 +354,11 @@ class ChessEvaluator:
             max_tokens: Maximum tokens to generate for a single move.
         Returns:
-            The generated move string (without trailing separator).
         """
         generated_tokens = []
         current_ids = input_ids.clone()
         for _ in range(max_tokens):
             with torch.no_grad():
@@ -193,31 +367,47 @@ class ChessEvaluator:
                 # Apply top-k filtering
                 if top_k > 0:
-                    top_k_values = torch.topk(logits, min(top_k, logits.size(-1)))[0]
-                    indices_to_remove = logits < top_k_values[..., -1, None]
                     logits[indices_to_remove] = float("-inf")
                 # Sample
                 probs = torch.softmax(logits, dim=-1)
-                next_token = torch.multinomial(probs, num_samples=1)  # Shape: [1, 1]
             # Decode the token
             token_str = self.tokenizer.decode(next_token[0])
             # Check if this is a separator token
             if self._is_separator_token(token_str):
-                break
-            generated_tokens.append(next_token[0])  # Store [1] tensor
-            # Append to input for next iteration (next_token is already [1, 1])
             current_ids = torch.cat([current_ids, next_token], dim=-1)
-            # For move-level tokenizers, a single non-separator token is the full move
-            # We can detect this by checking if the token looks like a complete move
-            # (starts with W or B, has enough characters for a move)
-            if len(token_str) >= 6 and token_str[0] in "WB":
-                break
         # Decode all generated tokens together
         if generated_tokens:
@@ -236,11 +426,15 @@ class ChessEvaluator:
         """
         Get the model's next move prediction.
-        This method generates tokens until a separator (whitespace/EOS) is produced,
-        allowing it to work with different tokenization strategies:
-        - Move-level tokenizers: each move is a single token
-        - Character-level tokenizers: moves are generated character by character
-        - BPE/subword tokenizers: moves may be split into subwords
         Returns:
             Tuple of (UCI move string, number of retries used).
@@ -257,32 +451,26 @@ class ChessEvaluator:
             input_text = self.tokenizer.bos_token + " " + moves_str
         # Tokenize
-        max_len = getattr(self.model.config, 'n_ctx', None) or getattr(self.model.config, 'max_position_embeddings', 256)
         inputs = self.tokenizer(
             input_text,
             return_tensors="pt",
             truncation=True,
-            max_length=max_len - 10,  # Leave room for generated tokens
         ).to(self.device)
         # Try to generate a legal move
         for retry in range(self.max_retries):
-            # Generate tokens until separator
-            move_token = self._generate_move_tokens(
                 inputs["input_ids"],
                 temperature=temperature,
                 top_k=top_k,
             )
-            # Convert to UCI
-            if len(move_token) >= 6:
-                uci_move = move_token[2:4] + move_token[4:6]
-                # Handle promotion
-                if "=" in move_token:
-                    promo_idx = move_token.index("=")
-                    uci_move += move_token[promo_idx + 1].lower()
                 try:
                     move = self.chess.Move.from_uci(uci_move)
                     if move in board.legal_moves:
@@ -390,7 +578,6 @@ class ChessEvaluator:
         n_positions: int = 1000,
         temperature: float = 0.7,
         verbose: bool = True,
-        seed: int = 42,
     ) -> dict:
         """
         Evaluate the model's ability to generate legal moves.
@@ -402,14 +589,10 @@ class ChessEvaluator:
             n_positions: Number of positions to test.
             temperature: Sampling temperature.
             verbose: Whether to print progress.
-            seed: Random seed for reproducibility.
         Returns:
             Dictionary with legal move statistics.
         """
-        # Set seed for deterministic evaluation
-        random.seed(seed)
         results = {
             "total_positions": 0,
             "legal_first_try": 0,
@@ -572,73 +755,24 @@ def load_model_from_hub(model_id: str, device: str = "auto"):
     Returns:
         Tuple of (model, tokenizer).
     """
-    import json
-    from huggingface_hub import hf_hub_download
-    from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
-    # Import custom classes
-    try:
-        from src.model import ChessConfig, ChessForCausalLM
-        from src.tokenizer import ChessTokenizer
-    except ImportError:
-        from .model import ChessConfig, ChessForCausalLM
-        from .tokenizer import ChessTokenizer
-    # Register BEFORE any from_pretrained calls
     try:
-        AutoConfig.register("chess_transformer", ChessConfig)
-    except ValueError:
-        pass
-    try:
-        AutoModelForCausalLM.register(ChessConfig, ChessForCausalLM)
-    except ValueError:
-        pass
-    print(f"Loading model {model_id}...")
-    # Download and load config manually to avoid transformers auto-detection issues
-    config_path = hf_hub_download(repo_id=model_id, filename="config.json")
-    with open(config_path, "r") as f:
-        config_dict = json.load(f)
-    # Remove fields that are not in ChessConfig to avoid unexpected kwargs
-    config_dict.pop("model_type", None)
-    config_dict.pop("architectures", None)
-    config_dict.pop("transformers_version", None)
-    config_dict.pop("dtype", None)
-    config_dict.pop("torch_dtype", None)
-    config = ChessConfig(**config_dict)
-    # Load model weights with our config
-    model = ChessForCausalLM.from_pretrained(
         model_id,
-        config=config,
         device_map=device,
     )
-    # Load tokenizer - try to find vocab.json, else build default
-    try:
-        tokenizer = ChessTokenizer.from_pretrained(model_id)
-    except Exception as e:
-        print(f"ChessTokenizer.from_pretrained failed: {e}")
-        try:
-            tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
-        except Exception as e2:
-            print(f"AutoTokenizer also failed: {e2}")
-            print("Creating default tokenizer with vocab_size from config...")
-            # Create a minimal tokenizer with just the vocab size
-            tokenizer = ChessTokenizer()
-            # Ensure vocab size matches model
-            if hasattr(config, 'vocab_size'):
-                # Build a placeholder vocab of the right size
-                tokenizer._vocab = {f"[MOVE_{i}]": i for i in range(config.vocab_size)}
-                tokenizer._vocab["[PAD]"] = 0
-                tokenizer._vocab["[BOS]"] = 1
-                tokenizer._vocab["[EOS]"] = 2
-                tokenizer._vocab["[UNK]"] = 3
-                tokenizer._ids_to_tokens = {v: k for k, v in tokenizer._vocab.items()}
     return model, tokenizer
@@ -684,21 +818,25 @@ def main():
     # Load model
     print(f"\nLoading model from: {args.model_path}")
-    if "/" in args.model_path and not args.model_path.startswith("."):
-        # Assume Hugging Face model ID
-        model, tokenizer = load_model_from_hub(args.model_path)
-    else:
         # Local path
         from transformers import AutoModelForCausalLM
-        try:
-            from src.tokenizer import ChessTokenizer
-            from src.model import ChessConfig, ChessForCausalLM
-        except ImportError:
-            from .tokenizer import ChessTokenizer
-            from .model import ChessConfig, ChessForCausalLM
         tokenizer = ChessTokenizer.from_pretrained(args.model_path)
         model = AutoModelForCausalLM.from_pretrained(args.model_path)
     # Create evaluator
     print(f"\nSetting up evaluator...")

 import argparse
 import random
+import re
 from dataclasses import dataclass
 from typing import List, Optional, Tuple
     model_color: str  # "white" or "black"
     termination: str  # "checkmate", "stalemate", "illegal_move", "max_moves", etc.
     illegal_move_count: int
 class ChessEvaluator:
     """
     Evaluator for chess models.
     This class handles playing games between a trained model and Stockfish,
     tracking results, and computing ELO ratings.
+    Supports any tokenization format as long as the model generates valid
+    chess squares (e.g., e2, e4). The evaluator extracts UCI moves by finding
+    square patterns in the generated output.
     """
+    # Regex pattern to match chess squares
+    SQUARE_PATTERN = r'[a-h][1-8]'
     def __init__(
         self,
         model,
         if hasattr(self, 'engine') and self.engine:
             self.engine.quit()
+    def _detect_tokenizer_format(self) -> str:
+        """
+        Detect the tokenizer's expected move format by testing tokenization.
+        Tests various formats with a sample move and picks the one that
+        produces the fewest unknown tokens. This makes evaluation work
+        with any tokenizer format.
+        Supported formats:
+        - 'decomposed': "WP e2_f e4_t" (piece, from_suffix, to_suffix)
+        - 'standard': "WPe2e4" (combined with optional annotations)
+        - 'uci': "e2e4" (pure UCI notation)
+        - 'uci_spaced': "e2 e4" (UCI with space separator)
+        Returns:
+            The format string that best matches the tokenizer's vocabulary.
+        """
+        if hasattr(self, '_cached_format'):
+            return self._cached_format
+        # Sample move representations to test
+        test_formats = {
+            'decomposed': "WP e2_f e4_t",
+            'standard': "WPe2e4",
+            'uci': "e2e4",
+            'uci_spaced': "e2 e4",
+        }
+        unk_token_id = getattr(self.tokenizer, 'unk_token_id', None)
+        best_format = 'standard'
+        min_unk_count = float('inf')
+        for fmt, sample in test_formats.items():
+            try:
+                tokens = self.tokenizer.encode(sample, add_special_tokens=False)
+                # Count unknown tokens
+                unk_count = tokens.count(unk_token_id) if unk_token_id is not None else 0
+                # Also penalize if the entire thing became one UNK
+                if len(tokens) == 1 and unk_count == 1:
+                    unk_count = 100  # Heavy penalty
+                if unk_count < min_unk_count:
+                    min_unk_count = unk_count
+                    best_format = fmt
+            except Exception:
+                continue
+        self._cached_format = best_format
+        return best_format
+    def _format_move(self, color: str, piece: str, from_sq: str, to_sq: str,
+                     promotion: str = None) -> str:
+        """
+        Format a single move according to the detected tokenizer format.
+        Args:
+            color: 'W' or 'B'
+            piece: Piece letter (P, N, B, R, Q, K)
+            from_sq: Source square (e.g., 'e2')
+            to_sq: Destination square (e.g., 'e4')
+            promotion: Promotion piece letter or None
+        Returns:
+            Formatted move string.
+        """
+        fmt = self._detect_tokenizer_format()
+        if fmt == 'decomposed':
+            move_str = f"{color}{piece} {from_sq}_f {to_sq}_t"
+        elif fmt == 'uci':
+            move_str = f"{from_sq}{to_sq}"
+            if promotion:
+                move_str += promotion.lower()
+        elif fmt == 'uci_spaced':
+            move_str = f"{from_sq} {to_sq}"
+            if promotion:
+                move_str += f" {promotion.lower()}"
+        else:  # standard
+            move_str = f"{color}{piece}{from_sq}{to_sq}"
+            if promotion:
+                move_str += f"={promotion}"
+        return move_str
     def _convert_board_to_moves(self, board) -> str:
+        """
+        Convert board move history to model input format.
+        Automatically detects the tokenizer's expected format and outputs
+        moves accordingly. Supports any tokenization strategy.
+        """
         moves = []
         temp_board = self.chess.Board()
+        fmt = self._detect_tokenizer_format()
         for move in board.move_stack:
             # Get piece and color
             from_sq = self.chess.square_name(move.from_square)
             to_sq = self.chess.square_name(move.to_square)
+            # Get promotion piece if any
+            promo = None
             if move.promotion:
+                promo = self.chess.piece_symbol(move.promotion).upper()
+            # Format based on detected tokenizer format
+            move_str = self._format_move(color, piece_letter, from_sq, to_sq, promo)
+            # For standard format, add annotations (capture, check, castling)
+            if fmt == 'standard':
+                # Add capture suffix
+                if temp_board.is_capture(move):
+                    move_str += "(x)"
+                # Push move to check for check/checkmate
+                temp_board.push(move)
+                if temp_board.is_checkmate():
+                    if "(x)" in move_str:
+                        move_str = move_str.replace("(x)", "(x+*)")
+                    else:
+                        move_str += "(+*)"
+                elif temp_board.is_check():
+                    if "(x)" in move_str:
+                        move_str = move_str.replace("(x)", "(x+)")
+                    else:
+                        move_str += "(+)"
+                # Handle castling notation
+                if piece_letter == "K":
+                    if abs(ord(from_sq[0]) - ord(to_sq[0])) > 1:
+                        if to_sq[0] == 'g':  # Kingside
+                            move_str = move_str.split("(")[0] + "(o)"
+                        else:  # Queenside
+                            move_str = move_str.split("(")[0] + "(O)"
+            else:
+                # For non-standard formats, just push the move
+                temp_board.push(move)
             moves.append(move_str)
         return False
+    def _extract_uci_move(self, text: str) -> Optional[str]:
+        """
+        Extract a UCI move from generated text using pattern matching.
+        This generic method works with any tokenization format by finding
+        chess square patterns ([a-h][1-8]) in the output.
+        Supported formats include:
+        - Standard: "WPe2e4" -> "e2e4"
+        - Decomposed: "WP e2_f e4_t" -> "e2e4"
+        - Pure UCI: "e2e4" -> "e2e4"
+        - With separators: "e2-e4", "e2 e4" -> "e2e4"
+        - With promotion: "e7e8=Q", "e7e8q" -> "e7e8q"
+        Args:
+            text: The generated text containing a move.
+        Returns:
+            UCI move string (e.g., "e2e4", "e7e8q") or None if not found.
+        """
+        if not text:
+            return None
+        # Find all squares in the text
+        squares = re.findall(self.SQUARE_PATTERN, text)
+        if len(squares) < 2:
+            return None
+        # Take the first two squares as from and to
+        from_sq, to_sq = squares[0], squares[1]
+        uci_move = from_sq + to_sq
+        # Check for promotion (letter after to_square)
+        # Look for patterns like "=Q", "=q", or just "q" after the to_square
+        to_sq_idx = text.find(to_sq)
+        if to_sq_idx != -1:
+            remaining = text[to_sq_idx + 2:to_sq_idx + 5]  # Check next few chars
+            promo_match = re.search(r'[=]?([qrbnQRBN])', remaining)
+            if promo_match:
+                uci_move += promo_match.group(1).lower()
+        return uci_move
+    def _has_complete_move(self, text: str) -> bool:
+        """
+        Check if the generated text contains a complete move.
+        A complete move has at least two valid chess squares.
+        Args:
+            text: The generated text so far.
+        Returns:
+            True if text contains at least two squares.
+        """
+        squares = re.findall(self.SQUARE_PATTERN, text)
+        return len(squares) >= 2
     def _generate_move_tokens(
         self,
         input_ids: torch.Tensor,
         max_tokens: int = 20,
     ) -> str:
         """
+        Generate tokens until a complete move is detected or separator is hit.
+        This method is tokenizer-agnostic and stops when:
+        - A separator token (whitespace/EOS) is encountered
+        - Two chess squares have been generated (complete move)
+        - max_tokens limit is reached
         Args:
             input_ids: The input token IDs.
             max_tokens: Maximum tokens to generate for a single move.
         Returns:
+            The generated move string.
         """
         generated_tokens = []
         current_ids = input_ids.clone()
+        accumulated_text = ""
         for _ in range(max_tokens):
             with torch.no_grad():
                 # Apply top-k filtering
                 if top_k > 0:
+                    top_k_vals = torch.topk(logits, min(top_k, logits.size(-1)))
+                    indices_to_remove = logits < top_k_vals[0][..., -1, None]
                     logits[indices_to_remove] = float("-inf")
                 # Sample
                 probs = torch.softmax(logits, dim=-1)
+                next_token = torch.multinomial(probs, num_samples=1)
             # Decode the token
             token_str = self.tokenizer.decode(next_token[0])
             # Check if this is a separator token
             if self._is_separator_token(token_str):
+                # If we already have a complete move, stop
+                if self._has_complete_move(accumulated_text):
+                    break
+                # Otherwise, if it's EOS, we should also stop
+                if hasattr(self.tokenizer, 'eos_token'):
+                    if token_str == self.tokenizer.eos_token:
+                        break
+                # For whitespace separators, only stop if we have content
+                if accumulated_text:
+                    break
+            generated_tokens.append(next_token[0])
             current_ids = torch.cat([current_ids, next_token], dim=-1)
+            accumulated_text += token_str
+            # Stop if we have a complete move (two squares found)
+            if self._has_complete_move(accumulated_text):
+                # Check if this might be a promotion - peek for one more token
+                # if the move is to rank 1 or 8
+                squares = re.findall(self.SQUARE_PATTERN, accumulated_text)
+                if len(squares) >= 2:
+                    to_sq = squares[1]
+                    if to_sq[1] in '18':  # Potential promotion
+                        # Allow one more iteration to capture promotion piece
+                        if len(generated_tokens) > 3:  # Already have enough
+                            break
+                    else:
+                        break
         # Decode all generated tokens together
         if generated_tokens:
         """
         Get the model's next move prediction.
+        This method is tokenizer-agnostic. It generates tokens and extracts
+        UCI moves using pattern matching on chess squares.
+        Works with any tokenization format:
+        - Move-level: "WPe2e4" -> e2e4
+        - Decomposed: "WP e2_f e4_t" -> e2e4
+        - Pure UCI: "e2e4" -> e2e4
+        - Character-level: "e" "2" "e" "4" -> e2e4
+        - BPE/subword: "e2" "e4" -> e2e4
         Returns:
             Tuple of (UCI move string, number of retries used).
             input_text = self.tokenizer.bos_token + " " + moves_str
         # Tokenize
         inputs = self.tokenizer(
             input_text,
             return_tensors="pt",
             truncation=True,
+            max_length=self.model.config.n_ctx - 10,
         ).to(self.device)
         # Try to generate a legal move
         for retry in range(self.max_retries):
+            # Generate tokens until we have a move
+            move_text = self._generate_move_tokens(
                 inputs["input_ids"],
                 temperature=temperature,
                 top_k=top_k,
             )
+            # Extract UCI move using generic pattern matching
+            uci_move = self._extract_uci_move(move_text)
+            if uci_move:
                 try:
                     move = self.chess.Move.from_uci(uci_move)
                     if move in board.legal_moves:
         n_positions: int = 1000,
         temperature: float = 0.7,
         verbose: bool = True,
     ) -> dict:
         """
         Evaluate the model's ability to generate legal moves.
             n_positions: Number of positions to test.
             temperature: Sampling temperature.
             verbose: Whether to print progress.
         Returns:
             Dictionary with legal move statistics.
         """
         results = {
             "total_positions": 0,
             "legal_first_try": 0,
     Returns:
         Tuple of (model, tokenizer).
     """
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    # Import to register custom classes
+    from src.model import ChessConfig, ChessForCausalLM
+    from src.tokenizer import ChessTokenizer
+    # Try loading with custom tokenizer first, fall back to AutoTokenizer
     try:
+        tokenizer = ChessTokenizer.from_pretrained(model_id)
+    except Exception:
+        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+    model = AutoModelForCausalLM.from_pretrained(
         model_id,
+        trust_remote_code=True,
         device_map=device,
     )
     return model, tokenizer
     # Load model
     print(f"\nLoading model from: {args.model_path}")
+    import os
+    is_local_path = os.path.exists(args.model_path)
+    if is_local_path:
         # Local path
         from transformers import AutoModelForCausalLM
+        from src.tokenizer import ChessTokenizer
+        from src.model import ChessConfig, ChessForCausalLM
         tokenizer = ChessTokenizer.from_pretrained(args.model_path)
         model = AutoModelForCausalLM.from_pretrained(args.model_path)
+    else:
+        # Assume Hugging Face model ID (or invalid path)
+        if args.model_path.startswith(".") or args.model_path.startswith("/"):
+            raise FileNotFoundError(
+                f"Local model path not found: {args.model_path}\n"
+                f"Please check that the path exists and contains model files."
+            )
+        model, tokenizer = load_model_from_hub(args.model_path)
     # Create evaluator
     print(f"\nSetting up evaluator...")

src/train.py ADDED Viewed

	@@ -0,0 +1,250 @@

+"""
+Training script for the Chess Challenge.
+This script provides a complete training pipeline using the Hugging Face Trainer.
+Students can modify this script to experiment with different training strategies.
+"""
+from __future__ import annotations
+import argparse
+import os
+import warnings
+from pathlib import Path
+# Suppress warnings from third-party libraries (multiprocess has Python 3.14 compat issues)
+warnings.filterwarnings("ignore", message="'return' in a 'finally' block")
+import torch
+from transformers import (
+    Trainer,
+    TrainingArguments,
+    set_seed,
+)
+from src.data import ChessDataCollator, create_train_val_datasets
+from src.model import ChessConfig, ChessForCausalLM
+from src.tokenizer import ChessTokenizer
+from src.utils import count_parameters, print_parameter_budget
+def parse_args():
+    """Parse command line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Train a chess-playing language model"
+    )
+    # Model arguments
+    parser.add_argument(
+        "--vocab_size", type=int, default=1200,
+        help="Vocabulary size"
+    )
+    parser.add_argument(
+        "--n_embd", type=int, default=128,
+        help="Embedding dimension"
+    )
+    parser.add_argument(
+        "--n_layer", type=int, default=4,
+        help="Number of transformer layers"
+    )
+    parser.add_argument(
+        "--n_head", type=int, default=4,
+        help="Number of attention heads"
+    )
+    parser.add_argument(
+        "--n_ctx", type=int, default=256,
+        help="Maximum context length"
+    )
+    parser.add_argument(
+        "--n_inner", type=int, default=None,
+        help="Feed-forward inner dimension (default: 4 * n_embd)"
+    )
+    parser.add_argument(
+        "--dropout", type=float, default=0.1,
+        help="Dropout probability"
+    )
+    parser.add_argument(
+        "--no_tie_weights", action="store_true",
+        help="Disable weight tying between embedding and output layers"
+    )
+    # Data arguments
+    parser.add_argument(
+        "--dataset_name", type=str, default="dlouapre/lichess_2025-01_1M",
+        help="Name of the dataset on Hugging Face Hub"
+    )
+    parser.add_argument(
+        "--max_train_samples", type=int, default=None,
+        help="Maximum number of training samples"
+    )
+    parser.add_argument(
+        "--val_samples", type=int, default=5000,
+        help="Number of validation samples"
+    )
+    # Training arguments
+    parser.add_argument(
+        "--output_dir", type=str, default="./output",
+        help="Output directory for model and logs"
+    )
+    parser.add_argument(
+        "--num_train_epochs", type=int, default=3,
+        help="Number of training epochs"
+    )
+    parser.add_argument(
+        "--per_device_train_batch_size", type=int, default=32,
+        help="Training batch size per device"
+    )
+    parser.add_argument(
+        "--per_device_eval_batch_size", type=int, default=64,
+        help="Evaluation batch size per device"
+    )
+    parser.add_argument(
+        "--learning_rate", type=float, default=5e-4,
+        help="Learning rate"
+    )
+    parser.add_argument(
+        "--weight_decay", type=float, default=0.01,
+        help="Weight decay"
+    )
+    parser.add_argument(
+        "--warmup_ratio", type=float, default=0.1,
+        help="Warmup ratio"
+    )
+    parser.add_argument(
+        "--seed", type=int, default=42,
+        help="Random seed"
+    )
+    # Logging arguments
+    parser.add_argument(
+        "--logging_steps", type=int, default=100,
+        help="Logging frequency"
+    )
+    parser.add_argument(
+        "--eval_steps", type=int, default=500,
+        help="Evaluation frequency"
+    )
+    parser.add_argument(
+        "--save_steps", type=int, default=1000,
+        help="Checkpoint saving frequency"
+    )
+    return parser.parse_args()
+def main():
+    """Main training function."""
+    args = parse_args()
+    # Set seed for reproducibility
+    set_seed(args.seed)
+    print("=" * 60)
+    print("CHESS CHALLENGE - TRAINING")
+    print("=" * 60)
+    # Build tokenizer from dataset
+    print("\nBuilding tokenizer from dataset...")
+    tokenizer = ChessTokenizer.build_vocab_from_dataset(
+        dataset_name=args.dataset_name,
+        min_frequency=500,  # Only keep moves that appear at least 500 times
+        max_samples=100000,  # Use 100k games to build vocabulary
+    )
+    print(f"   Vocabulary size: {tokenizer.vocab_size}")
+    # Use the vocab size from tokenizer (override args if provided)
+    actual_vocab_size = tokenizer.vocab_size
+    # Create model configuration
+    print("\nCreating model configuration...")
+    config = ChessConfig(
+        vocab_size=actual_vocab_size,
+        n_embd=args.n_embd,
+        n_layer=args.n_layer,
+        n_head=args.n_head,
+        n_ctx=args.n_ctx,
+        n_inner=args.n_inner,
+        dropout=args.dropout,
+        tie_weights=not args.no_tie_weights,
+        pad_token_id=tokenizer.pad_token_id,
+        bos_token_id=tokenizer.bos_token_id,
+        eos_token_id=tokenizer.eos_token_id,
+    )
+    # Print parameter budget
+    print_parameter_budget(config)
+    # Create model
+    print("\nCreating model...")
+    model = ChessForCausalLM(config)
+    n_params = count_parameters(model)
+    print(f"   Total parameters: {n_params:,}")
+    if n_params > 1_000_000:
+        print("WARNING: Model exceeds 1M parameter limit!")
+    else:
+        print("✓  Model is within 1M parameter limit")
+    # Load datasets
+    print("\nLoading datasets...")
+    train_dataset, val_dataset = create_train_val_datasets(
+        tokenizer=tokenizer,
+        dataset_name=args.dataset_name,
+        max_length=args.n_ctx,
+        train_samples=args.max_train_samples,
+        val_samples=args.val_samples,
+    )
+    print(f"   Training samples: {len(train_dataset):,}")
+    print(f"   Validation samples: {len(val_dataset):,}")
+    # Create data collator
+    data_collator = ChessDataCollator(tokenizer, max_length=args.n_ctx)
+    # Training arguments
+    training_args = TrainingArguments(
+        output_dir=args.output_dir,
+        num_train_epochs=args.num_train_epochs,
+        per_device_train_batch_size=args.per_device_train_batch_size,
+        per_device_eval_batch_size=args.per_device_eval_batch_size,
+        learning_rate=args.learning_rate,
+        weight_decay=args.weight_decay,
+        warmup_ratio=args.warmup_ratio,
+        logging_dir=os.path.join(args.output_dir, "logs"),
+        logging_steps=args.logging_steps,
+        eval_strategy="epoch",
+        save_strategy="epoch",
+        save_total_limit=3,
+        load_best_model_at_end=True,
+        metric_for_best_model="eval_loss",
+        greater_is_better=False,
+        seed=args.seed,
+        bf16=torch.cuda.is_available() and torch.cuda.is_bf16_supported(),
+        report_to=["none"],
+    )
+    # Create trainer
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=val_dataset,
+        data_collator=data_collator,
+        tokenizer=tokenizer,
+    )
+    # Train
+    print("\nStarting training...")
+    trainer.train()
+    # Save final model
+    print("\nSaving final model...")
+    trainer.save_model(os.path.join(args.output_dir, "final_model"))
+    tokenizer.save_pretrained(os.path.join(args.output_dir, "final_model"))
+    print("\nTraining complete!")
+    print(f"   Model saved to: {args.output_dir}/final_model")
+if __name__ == "__main__":
+    main()

src/utils.py ADDED Viewed

	@@ -0,0 +1,305 @@

+"""
+Utility functions for the Chess Challenge.
+This module provides helper functions for:
+- Parameter counting and budget analysis
+- Model registration with Hugging Face
+- Move validation with python-chess
+"""
+from __future__ import annotations
+from typing import Dict, Optional, TYPE_CHECKING
+import torch.nn as nn
+if TYPE_CHECKING:
+    from src.model import ChessConfig
+def count_parameters(model: nn.Module, trainable_only: bool = True) -> int:
+    """
+    Count the number of parameters in a model.
+    Args:
+        model: The PyTorch model.
+        trainable_only: If True, only count trainable parameters.
+    Returns:
+        Total number of parameters.
+    """
+    if trainable_only:
+        return sum(p.numel() for p in model.parameters() if p.requires_grad)
+    return sum(p.numel() for p in model.parameters())
+def count_parameters_by_component(model: nn.Module) -> Dict[str, int]:
+    """
+    Count parameters broken down by model component.
+    Args:
+        model: The PyTorch model.
+    Returns:
+        Dictionary mapping component names to parameter counts.
+    """
+    counts = {}
+    for name, module in model.named_modules():
+        if len(list(module.children())) == 0:  # Leaf module
+            param_count = sum(p.numel() for p in module.parameters(recurse=False))
+            if param_count > 0:
+                counts[name] = param_count
+    return counts
+def estimate_parameters(config: "ChessConfig") -> Dict[str, int]:
+    """
+    Estimate the parameter count for a given configuration.
+    This is useful for planning your architecture before building the model.
+    Args:
+        config: Model configuration.
+    Returns:
+        Dictionary with estimated parameter counts by component.
+    """
+    V = config.vocab_size
+    d = config.n_embd
+    L = config.n_layer
+    n_ctx = config.n_ctx
+    n_inner = config.n_inner
+    estimates = {
+        "token_embeddings": V * d,
+        "position_embeddings": n_ctx * d,
+        "attention_qkv_per_layer": 3 * d * d,
+        "attention_proj_per_layer": d * d,
+        "ffn_per_layer": 2 * d * n_inner,
+        "layernorm_per_layer": 4 * d,  # 2 LayerNorms, each with weight and bias
+        "final_layernorm": 2 * d,
+    }
+    # Calculate totals
+    per_layer = (
+        estimates["attention_qkv_per_layer"] +
+        estimates["attention_proj_per_layer"] +
+        estimates["ffn_per_layer"] +
+        estimates["layernorm_per_layer"]
+    )
+    estimates["total_transformer_layers"] = L * per_layer
+    # LM head (tied with embeddings by default)
+    if config.tie_weights:
+        estimates["lm_head"] = 0
+        estimates["lm_head_note"] = "Tied with token embeddings"
+    else:
+        estimates["lm_head"] = V * d
+    # Grand total
+    estimates["total"] = (
+        estimates["token_embeddings"] +
+        estimates["position_embeddings"] +
+        estimates["total_transformer_layers"] +
+        estimates["final_layernorm"] +
+        estimates["lm_head"]
+    )
+    return estimates
+def print_parameter_budget(config: "ChessConfig", limit: int = 1_000_000) -> None:
+    """
+    Print a formatted parameter budget analysis.
+    Args:
+        config: Model configuration.
+        limit: Parameter limit to compare against.
+    """
+    estimates = estimate_parameters(config)
+    print("=" * 60)
+    print("PARAMETER BUDGET ANALYSIS")
+    print("=" * 60)
+    print(f"\nConfiguration:")
+    print(f"  vocab_size (V) = {config.vocab_size}")
+    print(f"  n_embd (d)     = {config.n_embd}")
+    print(f"  n_layer (L)    = {config.n_layer}")
+    print(f"  n_head         = {config.n_head}")
+    print(f"  n_ctx          = {config.n_ctx}")
+    print(f"  n_inner        = {config.n_inner}")
+    print(f"  tie_weights    = {config.tie_weights}")
+    print(f"\nParameter Breakdown:")
+    print(f"  Token Embeddings:    {estimates['token_embeddings']:>10,}")
+    print(f"  Position Embeddings: {estimates['position_embeddings']:>10,}")
+    print(f"  Transformer Layers:  {estimates['total_transformer_layers']:>10,}")
+    print(f"  Final LayerNorm:     {estimates['final_layernorm']:>10,}")
+    if config.tie_weights:
+        print(f"  LM Head:             {'(tied)':>10}")
+    else:
+        print(f"  LM Head:             {estimates['lm_head']:>10,}")
+    print(f"  " + "-" * 30)
+    print(f"  TOTAL:               {estimates['total']:>10,}")
+    print(f"\nBudget Status:")
+    print(f"  Limit:    {limit:>10,}")
+    print(f"  Used:     {estimates['total']:>10,}")
+    print(f"  Remaining:{limit - estimates['total']:>10,}")
+    if estimates['total'] <= limit:
+        print(f"\n Within budget! ({estimates['total'] / limit * 100:.1f}% used)")
+    else:
+        print(f"\n OVER BUDGET by {estimates['total'] - limit:,} parameters!")
+    print("=" * 60)
+def validate_move_with_chess(move: str, board_fen: Optional[str] = None) -> bool:
+    """
+    Validate a move using python-chess.
+    This function converts the dataset's extended UCI format to standard UCI
+    and validates it against the current board state.
+    Args:
+        move: Move in extended UCI format (e.g., "WPe2e4", "BNg8f6(x)").
+        board_fen: FEN string of the current board state (optional).
+    Returns:
+        True if the move is legal, False otherwise.
+    """
+    try:
+        import chess
+    except ImportError:
+        raise ImportError("python-chess is required for move validation. "
+                         "Install it with: pip install python-chess")
+    # Parse the extended UCI format
+    # Format: [W|B][Piece][from_sq][to_sq][suffix]
+    # Example: WPe2e4, BNg8f6(x), WKe1g1(o)
+    if len(move) < 6:
+        return False
+    # Extract components
+    color = move[0]  # W or B
+    piece = move[1]  # P, N, B, R, Q, K
+    from_sq = move[2:4]  # e.g., "e2"
+    to_sq = move[4:6]  # e.g., "e4"
+    # Check for promotion
+    promotion = None
+    if "=" in move:
+        promo_idx = move.index("=")
+        promotion = move[promo_idx + 1].lower()
+    # Create board
+    board = chess.Board(board_fen) if board_fen else chess.Board()
+    # Build UCI move string
+    uci_move = from_sq + to_sq
+    if promotion:
+        uci_move += promotion
+    try:
+        move_obj = chess.Move.from_uci(uci_move)
+        return move_obj in board.legal_moves
+    except (ValueError, chess.InvalidMoveError):
+        return False
+def convert_extended_uci_to_uci(move: str) -> str:
+    """
+    Convert extended UCI format to standard UCI format.
+    Args:
+        move: Move in extended UCI format (e.g., "WPe2e4").
+    Returns:
+        Move in standard UCI format (e.g., "e2e4").
+    """
+    if len(move) < 6:
+        return move
+    # Extract squares
+    from_sq = move[2:4]
+    to_sq = move[4:6]
+    # Check for promotion
+    promotion = ""
+    if "=" in move:
+        promo_idx = move.index("=")
+        promotion = move[promo_idx + 1].lower()
+    return from_sq + to_sq + promotion
+def convert_uci_to_extended(
+    uci_move: str,
+    board_fen: str,
+) -> str:
+    """
+    Convert standard UCI format to extended UCI format.
+    Args:
+        uci_move: Move in standard UCI format (e.g., "e2e4").
+        board_fen: FEN string of the current board state.
+    Returns:
+        Move in extended UCI format (e.g., "WPe2e4").
+    """
+    try:
+        import chess
+    except ImportError:
+        raise ImportError("python-chess is required for move conversion.")
+    board = chess.Board(board_fen)
+    move = chess.Move.from_uci(uci_move)
+    # Get color
+    color = "W" if board.turn == chess.WHITE else "B"
+    # Get piece
+    piece = board.piece_at(move.from_square)
+    piece_letter = piece.symbol().upper() if piece else "P"
+    # Build extended UCI
+    from_sq = chess.square_name(move.from_square)
+    to_sq = chess.square_name(move.to_square)
+    result = f"{color}{piece_letter}{from_sq}{to_sq}"
+    # Add promotion
+    if move.promotion:
+        result += f"={chess.piece_symbol(move.promotion).upper()}"
+    # Add suffix for captures
+    if board.is_capture(move):
+        result += "(x)"
+    # Add suffix for check/checkmate
+    board.push(move)
+    if board.is_checkmate():
+        if "(x)" in result:
+            result = result.replace("(x)", "(x+*)")
+        else:
+            result += "(+*)"
+    elif board.is_check():
+        if "(x)" in result:
+            result = result.replace("(x)", "(x+)")
+        else:
+            result += "(+)"
+    board.pop()
+    # Handle castling notation
+    if board.is_castling(move):
+        if move.to_square in [chess.G1, chess.G8]:  # Kingside
+            result = result.replace("(x)", "").replace("(+)", "") + "(o)"
+        else:  # Queenside
+            result = result.replace("(x)", "").replace("(+)", "") + "(O)"
+    return result

submit.py ADDED Viewed

	@@ -0,0 +1,144 @@

+#!/usr/bin/env python3
+"""
+Submission script for the Chess Challenge.
+This script pushes your trained model to the Hugging Face Hub under the
+LLM-course organization, with metadata tracking who submitted it.
+Usage:
+    python submit.py --model_path ./my_model/final_model --model_name my-chess-model
+"""
+import argparse
+import os
+import tempfile
+from pathlib import Path
+def main():
+    parser = argparse.ArgumentParser(description="Submit your chess model to Hugging Face Hub")
+    parser.add_argument(
+        "--model_path", type=str, default="./my_model/final_model",
+        help="Path to your trained model directory"
+    )
+    parser.add_argument(
+        "--model_name", type=str, required=True,
+        help="Name for your model on the Hub (e.g., 'my-chess-model')"
+    )
+    args = parser.parse_args()
+    # Fixed organization
+    organization = "LLM-course"
+    # Check model path exists
+    if not os.path.exists(args.model_path):
+        print(f"Error: Model path '{args.model_path}' does not exist.")
+        print("Train a model first with: python -m src.train --output_dir ./my_model")
+        return 1
+    # Import here to avoid slow startup
+    from huggingface_hub import HfApi, HfFolder, whoami
+    from transformers import AutoModelForCausalLM
+    # Ensure user is logged in and get their info
+    print("=" * 60)
+    print("CHESS CHALLENGE - MODEL SUBMISSION")
+    print("=" * 60)
+    try:
+        user_info = whoami()
+        username = user_info["name"]
+        print(f"\nLogged in as: {username}")
+    except Exception:
+        print("\nYou need to log in to Hugging Face first.")
+        print("Run: huggingface-cli login")
+        return 1
+    # Import custom classes to register them
+    from src.model import ChessConfig, ChessForCausalLM
+    from src.tokenizer import ChessTokenizer
+    # Load model and tokenizer
+    print(f"\nLoading model from: {args.model_path}")
+    model = AutoModelForCausalLM.from_pretrained(args.model_path)
+    tokenizer = ChessTokenizer.from_pretrained(args.model_path)
+    # Count parameters
+    n_params = sum(p.numel() for p in model.parameters())
+    print(f"Model parameters: {n_params:,}")
+    if n_params > 1_000_000:
+        print(f"WARNING: Model exceeds 1M parameter limit ({n_params:,} params)")
+    # Prepare repo name
+    repo_id = f"{organization}/{args.model_name}"
+    print(f"\nSubmitting to: {repo_id}")
+    # Create a temporary directory to prepare submission
+    with tempfile.TemporaryDirectory() as tmp_dir:
+        tmp_path = Path(tmp_dir)
+        # Save model and tokenizer
+        model.save_pretrained(tmp_path)
+        tokenizer.save_pretrained(tmp_path)
+        # Create model card with submitter info
+        model_card = f"""---
+library_name: transformers
+tags:
+- chess
+- llm-course
+- chess-challenge
+license: mit
+---
+# {args.model_name}
+Chess model submitted to the LLM Course Chess Challenge.
+## Submission Info
+- **Submitted by**: [{username}](https://huggingface.co/{username})
+- **Parameters**: {n_params:,}
+- **Organization**: {organization}
+## Model Details
+- **Architecture**: Chess Transformer (GPT-style)
+- **Vocab size**: {tokenizer.vocab_size}
+- **Embedding dim**: {model.config.n_embd}
+- **Layers**: {model.config.n_layer}
+- **Heads**: {model.config.n_head}
+"""
+        (tmp_path / "README.md").write_text(model_card)
+        # Push to Hub
+        print("\nUploading to Hugging Face Hub...")
+        api = HfApi()
+        # Create repo if it doesn't exist
+        api.create_repo(
+            repo_id=repo_id,
+            exist_ok=True,
+        )
+        # Upload all files
+        api.upload_folder(
+            folder_path=tmp_path,
+            repo_id=repo_id,
+            commit_message=f"Chess Challenge submission by {username}",
+        )
+    print("\n" + "=" * 60)
+    print("SUBMISSION COMPLETE!")
+    print("=" * 60)
+    print(f"\nYour model is now available at:")
+    print(f"  https://huggingface.co/{repo_id}")
+    print(f"\nSubmitted by: {username}")
+    print(f"Parameters: {n_params:,}")
+    return 0
+if __name__ == "__main__":
+    exit(main())