File size: 2,791 Bytes
be4c7d6 4bc2400 4c2c1c5 4bc2400 4c2c1c5 4bc2400 4c2c1c5 4bc2400 cbf06b6 4bc2400 cbf06b6 4bc2400 be4c7d6 4bc2400 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | ---
license: mit
tags:
- gpt2
- causal-lm
- pytorch
- transformer
- from-scratch
- instruction-tuned
- educational
- small-llm
- pocketGPT
language:
- en
dataset:
- custom-ml-corpus
- custom-instruction-data
model_creator: your-name
paper: ""
---
# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch
**pocketGPT-27M** is a fully custom GPT-style language model, trained entirely from scratch using:
- A **24k Byte-Level BPE tokenizer**
- A **Transformer architecture** (10 layers, 6 heads, 384-d hidden size)
- A **384-token context window**
- **~165M-token pretraining corpus**
- **~4.5M-token instruction tuning dataset**
This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.
---
## Model Highlights
### Architecture
| Component | Value |
|----------------|------------|
| Layers | 10 |
| Hidden size | 384 |
| Attention heads| 6 |
| FFN size | 1536 |
| Vocab size | 24,000 |
| Context length | 384 |
| Parameters | ~27–35M | (depending on tokenizer)
---
## Training Overview
### Pretraining
- **Objective:** Causal Language Modeling
- **Dataset:** ~165M tokens of ML/AI literature
- **Hardware:** NVIDIA T4 (Kaggle)
- **Precision:** FP16
- **Epochs:** 3
- **Optimizer:** AdamW
### Instruction Finetuning
- **Dataset:** ~4.5M tokens
- **Format:** <|bos|>Instruction: ... Response: ... <|eos|>
- **Purpose:** Improve conversational and Q&A ability
- **Result:** Evaluation loss decreased consistently (no overfitting observed)
---
## Intended Use
- Educational LLMs
- Lightweight research models
- Offline/local small-scale chatbots
- ML students exploring LLM training
**Not intended for production or safety-critical use.**
---
## Usage Example
```python
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch
import os
os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"
model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")
def ask(prompt):
formatted = f"<|bos|>Instruction: {prompt}\nResponse:"
inputs = tokenizer.encode(formatted, return_tensors="pt")
inputs = inputs.to(model.device)
with torch.no_grad():
outputs = model.generate(
inputs,
max_length=384,
do_sample=True,
top_p=0.9,
temperature=0.8,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ask("what is an Artificial Neural Network?")
```
|