--- license: mit tags: - gpt2 - causal-lm - pytorch - transformer - from-scratch - instruction-tuned - educational - small-llm - pocketGPT language: - en dataset: - custom-ml-corpus - custom-instruction-data model_creator: your-name paper: "" --- # pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch **pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using: - A **24k Byte-Level BPE tokenizer** - A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size) - A **384-token context window** - **~165M-token pretraining corpus** - **~4.5M-token instruction tuning dataset** This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights. --- ## Model Highlights ### Architecture | Component | Value | |----------------|------------| | Layers | 10 | | Hidden size | 384 | | Attention heads| 6 | | FFN size | 1536 | | Vocab size | 24,000 | | Context length | 384 | | Parameters | ~27–35M | (depending on tokenizer) --- ## Training Overview ### Pretraining - **Objective:** Causal Language Modeling - **Dataset:** ~165M tokens of ML/AI literature - **Hardware:** NVIDIA T4 (Kaggle) - **Precision:** FP16 - **Epochs:** 3 - **Optimizer:** AdamW ### Instruction Finetuning - **Dataset:** ~4.5M tokens - **Format:** <|bos|>Instruction: ... Response: ... <|eos|> - **Purpose:** Improve conversational and Q&A ability - **Result:** Evaluation loss decreased consistently (no overfitting observed) --- ## Intended Use - Educational LLMs - Lightweight research models - Offline/local small-scale chatbots - ML students exploring LLM training **Not intended for production or safety-critical use.** --- ## Usage Example ```python from transformers import GPT2LMHeadModel, GPT2TokenizerFast import torch import os os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens" model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M") tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M") def ask(prompt): formatted = f"<|bos|>Instruction: {prompt}\nResponse:" inputs = tokenizer.encode(formatted, return_tensors="pt") inputs = inputs.to(model.device) with torch.no_grad(): outputs = model.generate( inputs, max_length=384, do_sample=True, top_p=0.9, temperature=0.8, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ask("what is an Artificial Neural Network?") ```