|
|
---
|
|
|
license: mit
|
|
|
language: en
|
|
|
tags:
|
|
|
- llm
|
|
|
- pytorch
|
|
|
- custom-model
|
|
|
- causal-lm
|
|
|
- character-level
|
|
|
- math
|
|
|
- tiny-model
|
|
|
model_type: tiny-causal-llm
|
|
|
datasets:
|
|
|
- custom
|
|
|
pipeline_tag: text-generation
|
|
|
---
|
|
|
# TinyLLM: Character-Level Math Solver
|
|
|
|
|
|
## Model Description
|
|
|
|
|
|
**TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.
|
|
|
|
|
|
This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.
|
|
|
|
|
|
### Key Features
|
|
|
* **Architecture:** Causal Transformer Decoder.
|
|
|
* **Task:** Character-level text generation (autoregressive).
|
|
|
* **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
|
|
|
* **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.
|
|
|
|
|
|
---
|
|
|
## How to Use (Inference)
|
|
|
|
|
|
To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**.
|
|
|
|
|
|
### 1. Installation
|
|
|
|
|
|
First, ensure you have the required libraries installed:
|
|
|
```bash
|
|
|
pip install torch huggingface-hub
|
|
|
from huggingface_hub import snapshot_download
|
|
|
import torch
|
|
|
import os
|
|
|
import sys
|
|
|
|
|
|
# 1. Configuration: REPLACE with your repository ID
|
|
|
MODEL_ID = "anujbhatt4ai/tiny-math-llm"
|
|
|
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
|
|
|
|
|
|
# 2. Download all files (code and weights)
|
|
|
local_path = snapshot_download(repo_id=MODEL_ID)
|
|
|
|
|
|
# 3. Import Custom Classes
|
|
|
# The downloaded path must be added to sys.path to allow custom imports
|
|
|
sys.path.append(local_path)
|
|
|
from model import TinyLLM
|
|
|
from tokenizer import CharacterTokenizer, generate_v1_data
|
|
|
|
|
|
# 4. Setup and Load Model
|
|
|
def load_tiny_llm():
|
|
|
# In this minimal case, we hardcode the known config values
|
|
|
vocab_size = 22
|
|
|
block_size = 14
|
|
|
|
|
|
# Initialize the model with the exact trained parameters
|
|
|
model = TinyLLM(
|
|
|
vocab_size=vocab_size,
|
|
|
block_size=block_size,
|
|
|
n_embed=64, n_head=4, n_layer=4, dropout=0.1
|
|
|
).to(DEVICE)
|
|
|
|
|
|
# Load the trained weights
|
|
|
weights_path = os.path.join(local_path, "pytorch_model.bin")
|
|
|
model.load_state_dict(torch.load(weights_path, map_location=DEVICE))
|
|
|
model.eval()
|
|
|
|
|
|
# Initialize the tokenizer
|
|
|
raw_data = generate_v1_data()
|
|
|
tokenizer = CharacterTokenizer(raw_data)
|
|
|
|
|
|
return model, tokenizer
|
|
|
|
|
|
# Use the loaded model and tokenizer in your own generation logic
|
|
|
model, tokenizer = load_tiny_llm()
|
|
|
print("Model loaded and ready for math inference!")
|
|
|
|
|
|
**Block 4: Training Details and Repository Files**
|
|
|
|
|
|
`markdown
|
|
|
## Training Details
|
|
|
|
|
|
### Architecture Configuration
|
|
|
|
|
|
The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:
|
|
|
|
|
|
| Parameter | Value | Description |
|
|
|
| :--- | :--- | :--- |
|
|
|
| **`vocab_size`** | `22` | The size of the character vocabulary. |
|
|
|
| **`block_size`** | `14` | The maximum sequence length (context window). |
|
|
|
| **`n_embed`** | `64` | Embedding dimension. |
|
|
|
| **`n_head`** | `4` | Number of attention heads. |
|
|
|
| **`n_layer`** | `4` | Number of Transformer decoder blocks. |
|
|
|
| **`dropout`** | `0.1` | Dropout rate. |
|
|
|
|
|
|
### Training Hyperparameters (from `train.py`)
|
|
|
|
|
|
| Parameter | Value |
|
|
|
| :--- | :--- |
|
|
|
| **`BATCH_SIZE`** | `32` |
|
|
|
| **`LEARNING_RATE`** | `1e-3` (AdamW) |
|
|
|
| **`EPOCHS`** | `100` |
|
|
|
| **`DEVICE`** | `cuda` if available, else `cpu` |
|
|
|
|
|
|
### Dataset
|
|
|
|
|
|
The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Repository Files
|
|
|
|
|
|
This flat repository contains all the source code needed for complete reproducibility.
|
|
|
|
|
|
| File Name | Description |
|
|
|
| :--- | :--- |
|
|
|
| **`pytorch_model.bin`** | The trained model weights. |
|
|
|
| **`config.json`** | Model configuration/hyperparameters. |
|
|
|
| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |
|
|
|
| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |
|
|
|
| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |
|
|
|
| **`train.py`** | The complete training script and final hyperparameters. |
|
|
|
| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |
|
|
|
| **`README.md`** | This model card and documentation. | |