---
language: en
license: mit
tags:
  - tic-tac-toe
  - xo
  - tiny-bert
  - educational
  - custom-tokenizer
metrics:
  - accuracy
model-index:
  - name: FemtoXO
    results: []
---

# FemtoXO 🎮 – Tiny Transformer for Tic-Tac-Toe

**FemtoXO** is an ultra-small Transformer model (BERT-based) trained to play the game of Tic-Tac-Toe (XO) as player **X**.  
It was built **entirely from scratch** – including a custom tokenizer and training pipeline – as an educational project to demonstrate how to create and train a language model for a structured game using the Hugging Face ecosystem.

## Model Details

- **Model type:** BERT for sequence classification (9 classes: board positions 0–8)
- **Size:**  
  - Hidden size: `64`  
  - Layers: `2`  
  - Attention heads: `2`  
  - Intermediate size: `128`  
  - **Total parameters:** `~90k` (truly femto-scale!)
- **Tokenizer:** Custom character-level tokenizer with special tokens (`<pad>`, `<eos>`, `<unk>`). Vocabulary consists of `.`, `X`, `O` and digits `0–9`.
- **Input:** A string of 9 characters representing the board (`.` = empty, `X` = model, `O` = opponent)  
  Example: `X..O....`
- **Output:** Logits over 9 positions; the legal move with highest logit is chosen (illegal moves are masked).

## Intended Use

This model is **purely educational**. It illustrates:
- How to create a custom tokenizer and a Transformer from scratch using `transformers` and `tokenizers`.
- How to generate synthetic training data and set up a full training loop.
- How to deploy a game-playing AI.

You can play against the model using the provided `play.py` script.

## Training Data

- **Dataset:** 10,000 randomly generated Tic-Tac-Toe games (≈90,000 board–move pairs).  
  For each game, we recorded every board state before X's move and the chosen move.
- **Preprocessing:** Board states were tokenized with the custom tokenizer and padded to length 12.

## Training Procedure

- **Framework:** Hugging Face `transformers` + `datasets` + `tokenizers`
- **Hardware:** CPU (or any GPU – training is extremely fast)
- **Hyperparameters:**
  - Epochs: 5
  - Batch size: 64
  - Optimizer: AdamW (default)
  - Learning rate schedule: linear decay (default)
- **Metrics:** Accuracy on held-out 10% validation set.

## How to Use

```python
from transformers import BertForSequenceClassification, PreTrainedTokenizerFast
import torch

model = BertForSequenceClassification.from_pretrained("abdelkader-dev/FemtoXO")
tokenizer = PreTrainedTokenizerFast.from_pretrained("abdelkader-dev/FemtoXO")

board = "X..O....."  # X to move
inputs = tokenizer(board, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits.squeeze()

# Mask occupied cells
for i, ch in enumerate(board):
    if ch != '.':
        logits[i] = -float('inf')

move = torch.argmax(logits).item()
print(f"Model plays at position {move}")
```

# Full game loop:
- Check the src/ directory for the complete training and playing scripts:

- train.py – Generate data, train, and save the model.

- play.py – Interactive game against the model.

# Limitations & Biases
- Random play data: The training data comes from random games, so the model plays at a novice level. It does not learn optimal strategy (Minimax).

- Small capacity: With only 90k parameters, it may miss some patterns.

- Single task: Only handles Tic-Tac-Toe boards; not generalizable to other games.

# Repository Structure

```txt
OX_Model/
├── src/
│   ├── model.py           # Model definition
│   ├── tokenizer.py       # Tokenizer definition
│   ├── train.py           # Training pipeline
│   ├── play.py            # Interactive game
│   └── requirements.txt
├── ox_model/              # Trained model files (config, weights, etc.)
└── xo_tokenizer/          # Tokenizer files
```

# Citation

If you find this educational project useful, feel free to mention it:

```txt
@misc{FemtoXO,
  author = {Abdelkader Hazerchi},
  title = {FemtoXO: A Tiny Transformer for Tic-Tac-Toe},
  year = {2025},
  howpublished = {\url{https://huggingface.co/abdelkader-dev/FemtoXO}},
}
```

# Acknowledgements
Built with ❤️ using the Hugging Face ecosystem: `transformers`, `tokenizers`, `datasets`, and `PyTorch`.