tiny-math-llm / README.md
anujbhatt4ai's picture
Initial upload of TinyLLM
13c35e3 verified
---
license: mit
language: en
tags:
- llm
- pytorch
- custom-model
- causal-lm
- character-level
- math
- tiny-model
model_type: tiny-causal-llm
datasets:
- custom
pipeline_tag: text-generation
---
# TinyLLM: Character-Level Math Solver
## Model Description
**TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.
This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.
### Key Features
* **Architecture:** Causal Transformer Decoder.
* **Task:** Character-level text generation (autoregressive).
* **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
* **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.
---
## How to Use (Inference)
To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**.
### 1. Installation
First, ensure you have the required libraries installed:
```bash
pip install torch huggingface-hub
from huggingface_hub import snapshot_download
import torch
import os
import sys
# 1. Configuration: REPLACE with your repository ID
MODEL_ID = "anujbhatt4ai/tiny-math-llm"
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
# 2. Download all files (code and weights)
local_path = snapshot_download(repo_id=MODEL_ID)
# 3. Import Custom Classes
# The downloaded path must be added to sys.path to allow custom imports
sys.path.append(local_path)
from model import TinyLLM
from tokenizer import CharacterTokenizer, generate_v1_data
# 4. Setup and Load Model
def load_tiny_llm():
# In this minimal case, we hardcode the known config values
vocab_size = 22
block_size = 14
# Initialize the model with the exact trained parameters
model = TinyLLM(
vocab_size=vocab_size,
block_size=block_size,
n_embed=64, n_head=4, n_layer=4, dropout=0.1
).to(DEVICE)
# Load the trained weights
weights_path = os.path.join(local_path, "pytorch_model.bin")
model.load_state_dict(torch.load(weights_path, map_location=DEVICE))
model.eval()
# Initialize the tokenizer
raw_data = generate_v1_data()
tokenizer = CharacterTokenizer(raw_data)
return model, tokenizer
# Use the loaded model and tokenizer in your own generation logic
model, tokenizer = load_tiny_llm()
print("Model loaded and ready for math inference!")
**Block 4: Training Details and Repository Files**
`markdown
## Training Details
### Architecture Configuration
The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:
| Parameter | Value | Description |
| :--- | :--- | :--- |
| **`vocab_size`** | `22` | The size of the character vocabulary. |
| **`block_size`** | `14` | The maximum sequence length (context window). |
| **`n_embed`** | `64` | Embedding dimension. |
| **`n_head`** | `4` | Number of attention heads. |
| **`n_layer`** | `4` | Number of Transformer decoder blocks. |
| **`dropout`** | `0.1` | Dropout rate. |
### Training Hyperparameters (from `train.py`)
| Parameter | Value |
| :--- | :--- |
| **`BATCH_SIZE`** | `32` |
| **`LEARNING_RATE`** | `1e-3` (AdamW) |
| **`EPOCHS`** | `100` |
| **`DEVICE`** | `cuda` if available, else `cpu` |
### Dataset
The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.
---
## Repository Files
This flat repository contains all the source code needed for complete reproducibility.
| File Name | Description |
| :--- | :--- |
| **`pytorch_model.bin`** | The trained model weights. |
| **`config.json`** | Model configuration/hyperparameters. |
| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |
| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |
| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |
| **`train.py`** | The complete training script and final hyperparameters. |
| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |
| **`README.md`** | This model card and documentation. |