File size: 2,791 Bytes
be4c7d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4bc2400
 
 
 
 
4c2c1c5
 
4bc2400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c2c1c5
4bc2400
 
4c2c1c5
4bc2400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cbf06b6
4bc2400
cbf06b6
 
 
 
4bc2400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be4c7d6
4bc2400
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: mit
tags:
  - gpt2
  - causal-lm
  - pytorch
  - transformer
  - from-scratch
  - instruction-tuned
  - educational
  - small-llm
  - pocketGPT
language:
  - en
dataset:
  - custom-ml-corpus
  - custom-instruction-data
model_creator: your-name
paper: ""
---



# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch

**pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using:

- A **24k Byte-Level BPE tokenizer**
- A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size)
- A **384-token context window**
- **~165M-token pretraining corpus**
- **~4.5M-token instruction tuning dataset**

This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.

---

## Model Highlights

### Architecture

| Component      | Value      |
|----------------|------------|
| Layers         | 10         |
| Hidden size    | 384        |
| Attention heads| 6          |
| FFN size       | 1536       |
| Vocab size     | 24,000     |
| Context length | 384        |
| Parameters     | ~27–35M    | (depending on tokenizer)

---

## Training Overview

### Pretraining
- **Objective:** Causal Language Modeling  
- **Dataset:** ~165M tokens of ML/AI literature  
- **Hardware:** NVIDIA T4 (Kaggle)  
- **Precision:** FP16  
- **Epochs:** 3  
- **Optimizer:** AdamW  

### Instruction Finetuning
- **Dataset:** ~4.5M tokens  
- **Format:**  <|bos|>Instruction: ... Response: ... <|eos|>
- **Purpose:** Improve conversational and Q&A ability  
- **Result:** Evaluation loss decreased consistently (no overfitting observed)

---

## Intended Use

- Educational LLMs  
- Lightweight research models  
- Offline/local small-scale chatbots  
- ML students exploring LLM training  

**Not intended for production or safety-critical use.**

---

## Usage Example

```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os

os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"

model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")

def ask(prompt):
    formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
    
    inputs = tokenizer.encode(formatted, return_tensors="pt")
    inputs = inputs.to(model.device) 

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=384,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ask("what is an Artificial Neural Network?")
```