File size: 4,852 Bytes

13c35e3

---

license: mit
language: en
tags:
- llm
- pytorch
- custom-model
- causal-lm
- character-level
- math
- tiny-model
model_type: tiny-causal-llm
datasets:
- custom
pipeline_tag: text-generation
---

# TinyLLM: Character-Level Math Solver

## Model Description

**TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.

This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.

### Key Features
* **Architecture:** Causal Transformer Decoder.
* **Task:** Character-level text generation (autoregressive).
* **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
* **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.

---
##  How to Use (Inference)

To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**.

### 1. Installation

First, ensure you have the required libraries installed:
```bash

pip install torch huggingface-hub

from huggingface_hub import snapshot_download

import torch

import os

import sys



# 1. Configuration: REPLACE with your repository ID

MODEL_ID = "anujbhatt4ai/tiny-math-llm" 

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'



# 2. Download all files (code and weights)

local_path = snapshot_download(repo_id=MODEL_ID)



# 3. Import Custom Classes

# The downloaded path must be added to sys.path to allow custom imports

sys.path.append(local_path) 

from model import TinyLLM

from tokenizer import CharacterTokenizer, generate_v1_data



# 4. Setup and Load Model

def load_tiny_llm():

    # In this minimal case, we hardcode the known config values

    vocab_size = 22

    block_size = 14

    

    # Initialize the model with the exact trained parameters

    model = TinyLLM(

        vocab_size=vocab_size, 

        block_size=block_size, 

        n_embed=64, n_head=4, n_layer=4, dropout=0.1

    ).to(DEVICE)



    # Load the trained weights

    weights_path = os.path.join(local_path, "pytorch_model.bin")

    model.load_state_dict(torch.load(weights_path, map_location=DEVICE))

    model.eval()

    

    # Initialize the tokenizer

    raw_data = generate_v1_data()

    tokenizer = CharacterTokenizer(raw_data)

    

    return model, tokenizer



# Use the loaded model and tokenizer in your own generation logic

model, tokenizer = load_tiny_llm()

print("Model loaded and ready for math inference!")



**Block 4: Training Details and Repository Files**



`markdown

##  Training Details



### Architecture Configuration



The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:



| Parameter | Value | Description |

| :--- | :--- | :--- |

| **`vocab_size`** | `22` | The size of the character vocabulary. |

| **`block_size`** | `14` | The maximum sequence length (context window). |

| **`n_embed`** | `64` | Embedding dimension. |

| **`n_head`** | `4` | Number of attention heads. |

| **`n_layer`** | `4` | Number of Transformer decoder blocks. |

| **`dropout`** | `0.1` | Dropout rate. |



### Training Hyperparameters (from `train.py`)



| Parameter | Value |

| :--- | :--- |

| **`BATCH_SIZE`** | `32` |

| **`LEARNING_RATE`** | `1e-3` (AdamW) |

| **`EPOCHS`** | `100` |

| **`DEVICE`** | `cuda` if available, else `cpu` |



### Dataset



The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.



---



## Repository Files



This flat repository contains all the source code needed for complete reproducibility.



| File Name | Description |

| :--- | :--- |

| **`pytorch_model.bin`** | The trained model weights. |

| **`config.json`** | Model configuration/hyperparameters. |

| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |

| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |

| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |

| **`train.py`** | The complete training script and final hyperparameters. |

| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |

| **`README.md`** | This model card and documentation. |