pocketGPT-27M β€” A Custom 27M Parameter GPT Model Trained From Scratch

pocketGPT-27M is a fully custom GPT-style language model, trained entirely from scratch using:

  • A 24k Byte-Level BPE tokenizer
  • A Transformer architecture (10 layers, 6 heads, 384-d hidden size)
  • A 384-token context window
  • ~165M-token pretraining corpus
  • ~4.5M-token instruction tuning dataset

This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.


Model Highlights

Architecture

Component Value
Layers 10
Hidden size 384
Attention heads 6
FFN size 1536
Vocab size 24,000
Context length 384
Parameters ~27–35M

Training Overview

Pretraining

  • Objective: Causal Language Modeling
  • Dataset: ~165M tokens of ML/AI literature
  • Hardware: NVIDIA T4 (Kaggle)
  • Precision: FP16
  • Epochs: 3
  • Optimizer: AdamW

Instruction Finetuning

  • Dataset: ~4.5M tokens
  • Format: <|bos|>Instruction: ... Response: ... <|eos|>
  • Purpose: Improve conversational and Q&A ability
  • Result: Evaluation loss decreased consistently (no overfitting observed)

Intended Use

  • Educational LLMs
  • Lightweight research models
  • Offline/local small-scale chatbots
  • ML students exploring LLM training

Not intended for production or safety-critical use.


Usage Example

from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os

os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"

model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")

def ask(prompt):
    formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
    
    inputs = tokenizer.encode(formatted, return_tensors="pt")
    inputs = inputs.to(model.device) 

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=384,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ask("what is an Artificial Neural Network?")
Downloads last month
8
Safetensors
Model size
27.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support