pocketGPT-27M β A Custom 27M Parameter GPT Model Trained From Scratch
pocketGPT-27M is a fully custom GPT-style language model, trained entirely from scratch using:
- A 24k Byte-Level BPE tokenizer
- A Transformer architecture (10 layers, 6 heads, 384-d hidden size)
- A 384-token context window
- ~165M-token pretraining corpus
- ~4.5M-token instruction tuning dataset
This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.
Model Highlights
Architecture
| Component | Value |
|---|---|
| Layers | 10 |
| Hidden size | 384 |
| Attention heads | 6 |
| FFN size | 1536 |
| Vocab size | 24,000 |
| Context length | 384 |
| Parameters | ~27β35M |
Training Overview
Pretraining
- Objective: Causal Language Modeling
- Dataset: ~165M tokens of ML/AI literature
- Hardware: NVIDIA T4 (Kaggle)
- Precision: FP16
- Epochs: 3
- Optimizer: AdamW
Instruction Finetuning
- Dataset: ~4.5M tokens
- Format: <|bos|>Instruction: ... Response: ... <|eos|>
- Purpose: Improve conversational and Q&A ability
- Result: Evaluation loss decreased consistently (no overfitting observed)
Intended Use
- Educational LLMs
- Lightweight research models
- Offline/local small-scale chatbots
- ML students exploring LLM training
Not intended for production or safety-critical use.
Usage Example
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"
model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")
def ask(prompt):
formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
inputs = tokenizer.encode(formatted, return_tensors="pt")
inputs = inputs.to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=384,
do_sample=True,
top_p=0.9,
temperature=0.8,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ask("what is an Artificial Neural Network?")
- Downloads last month
- 8
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support