PocketGPT_27M / README.md
Amogh1221's picture
Update README.md
cbf06b6 verified
---
license: mit
tags:
- gpt2
- causal-lm
- pytorch
- transformer
- from-scratch
- instruction-tuned
- educational
- small-llm
- pocketGPT
language:
- en
dataset:
- custom-ml-corpus
- custom-instruction-data
model_creator: your-name
paper: ""
---
# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch
**pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using:
- A **24k Byte-Level BPE tokenizer**
- A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size)
- A **384-token context window**
- **~165M-token pretraining corpus**
- **~4.5M-token instruction tuning dataset**
This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.
---
## Model Highlights
### Architecture
| Component | Value |
|----------------|------------|
| Layers | 10 |
| Hidden size | 384 |
| Attention heads| 6 |
| FFN size | 1536 |
| Vocab size | 24,000 |
| Context length | 384 |
| Parameters | ~27–35M | (depending on tokenizer)
---
## Training Overview
### Pretraining
- **Objective:** Causal Language Modeling
- **Dataset:** ~165M tokens of ML/AI literature
- **Hardware:** NVIDIA T4 (Kaggle)
- **Precision:** FP16
- **Epochs:** 3
- **Optimizer:** AdamW
### Instruction Finetuning
- **Dataset:** ~4.5M tokens
- **Format:** <|bos|>Instruction: ... Response: ... <|eos|>
- **Purpose:** Improve conversational and Q&A ability
- **Result:** Evaluation loss decreased consistently (no overfitting observed)
---
## Intended Use
- Educational LLMs
- Lightweight research models
- Offline/local small-scale chatbots
- ML students exploring LLM training
**Not intended for production or safety-critical use.**
---
## Usage Example
```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"
model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")
def ask(prompt):
formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
inputs = tokenizer.encode(formatted, return_tensors="pt")
inputs = inputs.to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=384,
do_sample=True,
top_p=0.9,
temperature=0.8,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ask("what is an Artificial Neural Network?")
```