---
license: mit
tags:
  - gpt2
  - causal-lm
  - pytorch
  - transformer
  - from-scratch
  - instruction-tuned
  - educational
  - small-llm
  - pocketGPT
language:
  - en
dataset:
  - custom-ml-corpus
  - custom-instruction-data
model_creator: your-name
paper: ""
---


# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch

**pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using:

- A **24k Byte-Level BPE tokenizer**
- A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size)
- A **384-token context window**
- **~165M-token pretraining corpus**
- **~4.5M-token instruction tuning dataset**

This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.

---

## Model Highlights

### Architecture

| Component      | Value      |
|----------------|------------|
| Layers         | 10         |
| Hidden size    | 384        |
| Attention heads| 6          |
| FFN size       | 1536       |
| Vocab size     | 24,000     |
| Context length | 384        |
| Parameters     | ~27–35M    | (depending on tokenizer)

---

## Training Overview

### Pretraining
- **Objective:** Causal Language Modeling  
- **Dataset:** ~165M tokens of ML/AI literature  
- **Hardware:** NVIDIA T4 (Kaggle)  
- **Precision:** FP16  
- **Epochs:** 3  
- **Optimizer:** AdamW  

### Instruction Finetuning
- **Dataset:** ~4.5M tokens  
- **Format:**  <|bos|>Instruction: ... Response: ... <|eos|>
- **Purpose:** Improve conversational and Q&A ability  
- **Result:** Evaluation loss decreased consistently (no overfitting observed)

---

## Intended Use

- Educational LLMs  
- Lightweight research models  
- Offline/local small-scale chatbots  
- ML students exploring LLM training  

**Not intended for production or safety-critical use.**

---

## Usage Example

```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os

os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"

model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")

def ask(prompt):
    formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
    
    inputs = tokenizer.encode(formatted, return_tensors="pt")
    inputs = inputs.to(model.device) 

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=384,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ask("what is an Artificial Neural Network?")
```