|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- gpt2 |
|
|
- causal-lm |
|
|
- pytorch |
|
|
- transformer |
|
|
- from-scratch |
|
|
- instruction-tuned |
|
|
- educational |
|
|
- small-llm |
|
|
- pocketGPT |
|
|
language: |
|
|
- en |
|
|
dataset: |
|
|
- custom-ml-corpus |
|
|
- custom-instruction-data |
|
|
model_creator: your-name |
|
|
paper: "" |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch |
|
|
|
|
|
**pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using: |
|
|
|
|
|
- A **24k Byte-Level BPE tokenizer** |
|
|
- A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size) |
|
|
- A **384-token context window** |
|
|
- **~165M-token pretraining corpus** |
|
|
- **~4.5M-token instruction tuning dataset** |
|
|
|
|
|
This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Highlights |
|
|
|
|
|
### Architecture |
|
|
|
|
|
| Component | Value | |
|
|
|----------------|------------| |
|
|
| Layers | 10 | |
|
|
| Hidden size | 384 | |
|
|
| Attention heads| 6 | |
|
|
| FFN size | 1536 | |
|
|
| Vocab size | 24,000 | |
|
|
| Context length | 384 | |
|
|
| Parameters | ~27–35M | (depending on tokenizer) |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Overview |
|
|
|
|
|
### Pretraining |
|
|
- **Objective:** Causal Language Modeling |
|
|
- **Dataset:** ~165M tokens of ML/AI literature |
|
|
- **Hardware:** NVIDIA T4 (Kaggle) |
|
|
- **Precision:** FP16 |
|
|
- **Epochs:** 3 |
|
|
- **Optimizer:** AdamW |
|
|
|
|
|
### Instruction Finetuning |
|
|
- **Dataset:** ~4.5M tokens |
|
|
- **Format:** <|bos|>Instruction: ... Response: ... <|eos|> |
|
|
- **Purpose:** Improve conversational and Q&A ability |
|
|
- **Result:** Evaluation loss decreased consistently (no overfitting observed) |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- Educational LLMs |
|
|
- Lightweight research models |
|
|
- Offline/local small-scale chatbots |
|
|
- ML students exploring LLM training |
|
|
|
|
|
**Not intended for production or safety-critical use.** |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage Example |
|
|
|
|
|
```python |
|
|
from transformers import GPT2LMHeadModel, GPT2TokenizerFast |
|
|
import torch |
|
|
import os |
|
|
|
|
|
os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens" |
|
|
|
|
|
model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M") |
|
|
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M") |
|
|
|
|
|
def ask(prompt): |
|
|
formatted = f"<|bos|>Instruction: {prompt}\nResponse:" |
|
|
|
|
|
inputs = tokenizer.encode(formatted, return_tensors="pt") |
|
|
inputs = inputs.to(model.device) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
inputs, |
|
|
max_length=384, |
|
|
do_sample=True, |
|
|
top_p=0.9, |
|
|
temperature=0.8, |
|
|
eos_token_id=tokenizer.eos_token_id, |
|
|
pad_token_id=tokenizer.pad_token_id |
|
|
) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
ask("what is an Artificial Neural Network?") |
|
|
``` |
|
|
|