mini-gpt1 / hf_model /README.md
dilip025's picture
Upload hf_model/README.md with huggingface_hub
8cbc022 verified

Mini GPT1 Clone

This is a decoder-only transformer model (GPT1-style) trained from scratch using PyTorch.

Model Details

  • Architecture: Decoder-only Transformer
  • Layers: 6
  • Embedding Size: 512
  • Heads: 8
  • Feedforward Dim: 2048
  • Sequence Length: 256
  • Vocab Size: 35,000

Tokenizer

Trained using ByteLevelBPETokenizer from the tokenizers library.

Inference Example

from transformers import PreTrainedTokenizerFast, AutoModelForCausalLM
import torch

tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer/tokenizer.json")
model = AutoModelForCausalLM.from_pretrained("dilip025/mini-gpt1")

prompt = "Once upon a time,"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
License
MIT