tiny-math-llm / README.md

Initial upload of TinyLLM

13c35e3 verified about 2 months ago

4.85 kB

	---
	license: mit
	language: en
	tags:
	- llm
	- pytorch
	- custom-model
	- causal-lm
	- character-level
	- math
	- tiny-model
	model_type: tiny-causal-llm
	datasets:
	- custom
	pipeline_tag: text-generation
	---
	# TinyLLM: Character-Level Math Solver

	## Model Description

	TinyLLM is a highly compact, character-level Causal Language Model (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.

	This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.

	### Key Features
	* Architecture: Causal Transformer Decoder.
	* Task: Character-level text generation (autoregressive).
	* Input/Output: Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
	* Custom Code Required: This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.

	---
	## How to Use (Inference)

	To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in `model.py` and the `CharacterTokenizer` in `tokenizer.py`.

	### 1. Installation

	First, ensure you have the required libraries installed:
	```bash
	pip install torch huggingface-hub
	from huggingface_hub import snapshot_download
	import torch
	import os
	import sys

	# 1. Configuration: REPLACE with your repository ID
	MODEL_ID = "anujbhatt4ai/tiny-math-llm"
	DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

	# 2. Download all files (code and weights)
	local_path = snapshot_download(repo_id=MODEL_ID)

	# 3. Import Custom Classes
	# The downloaded path must be added to sys.path to allow custom imports
	sys.path.append(local_path)
	from model import TinyLLM
	from tokenizer import CharacterTokenizer, generate_v1_data

	# 4. Setup and Load Model
	def load_tiny_llm():
	# In this minimal case, we hardcode the known config values
	vocab_size = 22
	block_size = 14

	# Initialize the model with the exact trained parameters
	model = TinyLLM(
	vocab_size=vocab_size,
	block_size=block_size,
	n_embed=64, n_head=4, n_layer=4, dropout=0.1
	).to(DEVICE)

	# Load the trained weights
	weights_path = os.path.join(local_path, "pytorch_model.bin")
	model.load_state_dict(torch.load(weights_path, map_location=DEVICE))
	model.eval()

	# Initialize the tokenizer
	raw_data = generate_v1_data()
	tokenizer = CharacterTokenizer(raw_data)

	return model, tokenizer

	# Use the loaded model and tokenizer in your own generation logic
	model, tokenizer = load_tiny_llm()
	print("Model loaded and ready for math inference!")

	Block 4: Training Details and Repository Files

	`markdown
	## Training Details

	### Architecture Configuration

	The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:

	\| Parameter \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| `vocab_size` \| `22` \| The size of the character vocabulary. \|
	\| `block_size` \| `14` \| The maximum sequence length (context window). \|
	\| `n_embed` \| `64` \| Embedding dimension. \|
	\| `n_head` \| `4` \| Number of attention heads. \|
	\| `n_layer` \| `4` \| Number of Transformer decoder blocks. \|
	\| `dropout` \| `0.1` \| Dropout rate. \|

	### Training Hyperparameters (from `train.py`)

	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| `BATCH_SIZE` \| `32` \|
	\| `LEARNING_RATE` \| `1e-3` (AdamW) \|
	\| `EPOCHS` \| `100` \|
	\| `DEVICE` \| `cuda` if available, else `cpu` \|

	### Dataset

	The model was trained on an exhaustive set of all single-digit math problems (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The `dataset.py` file contains the logic for the essential sequence shift used for language modeling training.

	---

	## Repository Files

	This flat repository contains all the source code needed for complete reproducibility.

	\| File Name \| Description \|
	\| :--- \| :--- \|
	\| `pytorch_model.bin` \| The trained model weights. \|
	\| `config.json` \| Model configuration/hyperparameters. \|
	\| `model.py` \| Core Logic: Custom `TinyLLM` architecture definition. \|
	\| `tokenizer.py` \| Core Logic: Custom `CharacterTokenizer` and data generator. \|
	\| `dataset.py` \| Defines the `MathDataset` class and sequence shift logic. \|
	\| `train.py` \| The complete training script and final hyperparameters. \|
	\| `custom_run.py` (or `run.py`) \| Example script demonstrating how to use the model for generation. \|
	\| `README.md` \| This model card and documentation. \|