Amogh1221
/

PocketGPT_27M

instruction-tuned

Model card Files Files and versions

PocketGPT_27M / README.md

Amogh1221's picture

Update README.md

cbf06b6 verified 2 months ago

|

history blame contribute delete

2.79 kB

	---
	license: mit
	tags:
	- gpt2
	- causal-lm
	- pytorch
	- transformer
	- from-scratch
	- instruction-tuned
	- educational
	- small-llm
	- pocketGPT
	language:
	- en
	dataset:
	- custom-ml-corpus
	- custom-instruction-data
	model_creator: your-name
	paper: ""
	---



	# pocketGPT-27M — A Custom 27M Parameter GPT Model Trained From Scratch

	pocketGPT-27M is a fully custom GPT-style language model, trained entirely from scratch using:

	- A 24k Byte-Level BPE tokenizer
	- A Transformer architecture (10 layers, 6 heads, 384-d hidden size)
	- A 384-token context window
	- ~165M-token pretraining corpus
	- ~4.5M-token instruction tuning dataset

	This project demonstrates how a compact GPT model can be designed, trained, and deployed end-to-end without relying on any pretrained weights.

	---

	## Model Highlights

	### Architecture

	\| Component \| Value \|
	\|----------------\|------------\|
	\| Layers \| 10 \|
	\| Hidden size \| 384 \|
	\| Attention heads\| 6 \|
	\| FFN size \| 1536 \|
	\| Vocab size \| 24,000 \|
	\| Context length \| 384 \|
	\| Parameters \| ~27–35M \| (depending on tokenizer)

	---

	## Training Overview

	### Pretraining
	- Objective: Causal Language Modeling
	- Dataset: ~165M tokens of ML/AI literature
	- Hardware: NVIDIA T4 (Kaggle)
	- Precision: FP16
	- Epochs: 3
	- Optimizer: AdamW

	### Instruction Finetuning
	- Dataset: ~4.5M tokens
	- Format: <\|bos\|>Instruction: ... Response: ... <\|eos\|>
	- Purpose: Improve conversational and Q&A ability
	- Result: Evaluation loss decreased consistently (no overfitting observed)

	---

	## Intended Use

	- Educational LLMs
	- Lightweight research models
	- Offline/local small-scale chatbots
	- ML students exploring LLM training

	Not intended for production or safety-critical use.

	---

	## Usage Example

	```python
	from transformers import GPT2LMHeadModel, GPT2TokenizerFast
	import torch
	import os

	os.environ["HUGGINGFACE_HUB_TOKEN"] = "Your Tokens"

	model = GPT2LMHeadModel.from_pretrained("Amogh1221/PocketGPT_27M")
	tokenizer = GPT2TokenizerFast.from_pretrained("Amogh1221/PocketGPT_27M")

	def ask(prompt):
	formatted = f"<\|bos\|>Instruction: {prompt}\nResponse:"

	inputs = tokenizer.encode(formatted, return_tensors="pt")
	inputs = inputs.to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	inputs,
	max_length=384,
	do_sample=True,
	top_p=0.9,
	temperature=0.8,
	eos_token_id=tokenizer.eos_token_id,
	pad_token_id=tokenizer.pad_token_id
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))

	ask("what is an Artificial Neural Network?")
	```