YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MiniGPT Shakespeare
A lightweight GPT-style transformer trained from scratch on Shakespeare text and integrated with Hugging Face.
Features
- Custom Decoder-only Transformer
- Built entirely in PyTorch
- Hugging Face compatible (
AutoModel) - Byte-level BPE tokenizer
- Supports text generation with sampling
Model Details
- Architecture: Decoder-only Transformer (GPT-style)
- Layers: 6
- Heads: 4
- Embedding Size: 256
- Vocab Size: 1000
- Max Sequence Length: 256
Usage
Load Model
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"flamingo44333/mini-gpt-shakespeare",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"flamingo44333/mini-gpt-shakespeare",
trust_remote_code=True
)
Generate Text
import torch
import torch.nn.functional as F
def generate(
model,
tokenizer,
prompt,
max_new_tokens=100,
temperature=0.5,
top_k=40,
device="cuda" if torch.cuda.is_available() else "cpu"
):
model.eval()
model.to(device)
# Encode (match training behavior)
input_ids = tokenizer.encode(prompt, add_special_tokens=False)
input_ids = torch.tensor(input_ids, dtype=torch.long).unsqueeze(0).to(device)
# Handle DataParallel safely
model_to_use = model.module if hasattr(model, "module") else model
with torch.no_grad():
for _ in range(max_new_tokens):
input_crop = input_ids[:, -model.config.max_seq_len:]
out = model_to_use(input_crop)
logits = out["logits"]
pad_id = tokenizer.pad_token_id
unk_id = tokenizer.unk_token_id
if pad_id is not None:
logits[:, -1, pad_id] = float('-inf')
if unk_id is not None:
logits[:, -1, unk_id] = float('-inf')
logits = logits[:, -1, :] / temperature
if top_k is not None:
values, indices = torch.topk(logits, top_k)
probs = F.softmax(values, dim=-1)
next_token = indices.gather(-1, torch.multinomial(probs, 1))
else:
probs = F.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, 1)
input_ids = torch.cat([input_ids, next_token], dim=1)
return tokenizer.decode(
input_ids[0].tolist(),
clean_up_tokenization_spaces=False
)
Notes
- No KV-cache (generation is slower than production LLMs)
- Uses top-k sampling for stability
- Requires
trust_remote_code=Trueto load custom model
Future Improvements
- Add
.generate()API support - Implement KV caching
- Add top-p (nucleus sampling)
- Train on larger datasets
Purpose
This project was built to understand:
- Transformer internals
- Attention mechanisms
- Hugging Face model integration
Author
Praful Srinivasan
- Downloads last month
- 254
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support