File size: 4,852 Bytes
13c35e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---

license: mit
language: en
tags:
- llm
- pytorch
- custom-model
- causal-lm
- character-level
- math
- tiny-model
model_type: tiny-causal-llm
datasets:
- custom
pipeline_tag: text-generation
---

# TinyLLM: Character-Level Math Solver

## Model Description

**TinyLLM** is a highly compact, character-level **Causal Language Model** (based on the standard Transformer decoder architecture) trained specifically to solve single-digit math problems.

This model serves as a minimalist, educational example of how a standard LLM architecture can be trained from scratch on a very small, custom dataset.

### Key Features
* **Architecture:** Causal Transformer Decoder.
* **Task:** Character-level text generation (autoregressive).
* **Input/Output:** Solves problems formatted as `N op N` and generates the answer, e.g., `4 + 5 = 9<EOS>`.
* **Custom Code Required:** This is a custom PyTorch model and requires custom code (`model.py`, `tokenizer.py`) to be loaded by users.

---
##  How to Use (Inference)

To load and run this custom model, users must download the entire repository structure and use the provided custom code, specifically the `TinyLLM` class defined in **`model.py`** and the `CharacterTokenizer` in **`tokenizer.py`**.

### 1. Installation

First, ensure you have the required libraries installed:
```bash

pip install torch huggingface-hub

from huggingface_hub import snapshot_download

import torch

import os

import sys



# 1. Configuration: REPLACE with your repository ID

MODEL_ID = "anujbhatt4ai/tiny-math-llm" 

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'



# 2. Download all files (code and weights)

local_path = snapshot_download(repo_id=MODEL_ID)



# 3. Import Custom Classes

# The downloaded path must be added to sys.path to allow custom imports

sys.path.append(local_path) 

from model import TinyLLM

from tokenizer import CharacterTokenizer, generate_v1_data



# 4. Setup and Load Model

def load_tiny_llm():

    # In this minimal case, we hardcode the known config values

    vocab_size = 22

    block_size = 14

    

    # Initialize the model with the exact trained parameters

    model = TinyLLM(

        vocab_size=vocab_size, 

        block_size=block_size, 

        n_embed=64, n_head=4, n_layer=4, dropout=0.1

    ).to(DEVICE)



    # Load the trained weights

    weights_path = os.path.join(local_path, "pytorch_model.bin")

    model.load_state_dict(torch.load(weights_path, map_location=DEVICE))

    model.eval()

    

    # Initialize the tokenizer

    raw_data = generate_v1_data()

    tokenizer = CharacterTokenizer(raw_data)

    

    return model, tokenizer



# Use the loaded model and tokenizer in your own generation logic

model, tokenizer = load_tiny_llm()

print("Model loaded and ready for math inference!")



**Block 4: Training Details and Repository Files**



`markdown

##  Training Details



### Architecture Configuration



The `TinyLLM` is configured with the following parameters, derived from the `config.json` and `model.py` files:



| Parameter | Value | Description |

| :--- | :--- | :--- |

| **`vocab_size`** | `22` | The size of the character vocabulary. |

| **`block_size`** | `14` | The maximum sequence length (context window). |

| **`n_embed`** | `64` | Embedding dimension. |

| **`n_head`** | `4` | Number of attention heads. |

| **`n_layer`** | `4` | Number of Transformer decoder blocks. |

| **`dropout`** | `0.1` | Dropout rate. |



### Training Hyperparameters (from `train.py`)



| Parameter | Value |

| :--- | :--- |

| **`BATCH_SIZE`** | `32` |

| **`LEARNING_RATE`** | `1e-3` (AdamW) |

| **`EPOCHS`** | `100` |

| **`DEVICE`** | `cuda` if available, else `cpu` |



### Dataset



The model was trained on an **exhaustive set of all single-digit math problems** (addition, subtraction, multiplication, and non-remainder division) where the result is also a single digit (0-9). The **`dataset.py`** file contains the logic for the essential sequence shift used for language modeling training.



---



## Repository Files



This flat repository contains all the source code needed for complete reproducibility.



| File Name | Description |

| :--- | :--- |

| **`pytorch_model.bin`** | The trained model weights. |

| **`config.json`** | Model configuration/hyperparameters. |

| **`model.py`** | **Core Logic:** Custom `TinyLLM` architecture definition. |

| **`tokenizer.py`** | **Core Logic:** Custom `CharacterTokenizer` and data generator. |

| **`dataset.py`** | Defines the `MathDataset` class and sequence shift logic. |

| **`train.py`** | The complete training script and final hyperparameters. |

| **`custom_run.py`** (or `run.py`) | Example script demonstrating how to use the model for generation. |

| **`README.md`** | This model card and documentation. |