pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch

pocketGPT-27M is a fully custom GPT-style language model, trained entirely from scratch using:

A 24k Byte-Level BPE tokenizer
A Transformer architecture (10 layers, 6 heads, 384-d hidden size)
A 384-token context window
~165M-token pretraining corpus
~4.5M-token instruction tuning dataset

This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.

Model Highlights

Architecture

Component	Value
Layers	10
Hidden size	384
Attention heads	6
FFN size	1536
Vocab size	24,000
Context length	384
Parameters	~27–35M

Training Overview

Pretraining

Objective: Causal Language Modeling
Dataset: ~165M tokens of ML/AI literature
Hardware: NVIDIA T4 (Kaggle)
Precision: FP16
Epochs: 3
Optimizer: AdamW

Instruction Finetuning

Dataset: ~4.5M tokens
Format: <|bos|>Instruction: ... Response: ... <|eos|>
Purpose: Improve conversational and Q&A ability
Result: Evaluation loss decreased consistently (no overfitting observed)

Intended Use

Educational LLMs
Lightweight research models
Offline/local small-scale chatbots
ML students exploring LLM training

Not intended for production or safety-critical use.

Usage Example

from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os

os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"

model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")

def ask(prompt):
    formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
    
    inputs = tokenizer.encode(formatted, return_tensors="pt")
    inputs = inputs.to(model.device) 

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=384,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id
        )

    print(tokenizer.decode(outputs[0], skip_special_tokens=True))

ask("what is an Artificial Neural Network?")

Downloads last month: 4

Safetensors

Model size

27.1M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support