| # Mini GPT1 Clone | |
| This is a decoder-only transformer model (GPT1-style) trained from scratch using PyTorch. | |
| ## Model Details | |
| - **Architecture**: Decoder-only Transformer | |
| - **Layers**: 6 | |
| - **Embedding Size**: 512 | |
| - **Heads**: 8 | |
| - **Feedforward Dim**: 2048 | |
| - **Sequence Length**: 256 | |
| - **Vocab Size**: 35,000 | |
| ## Tokenizer | |
| Trained using `ByteLevelBPETokenizer` from the `tokenizers` library. | |
| ## Inference Example | |
| ```python | |
| from transformers import PreTrainedTokenizerFast, AutoModelForCausalLM | |
| import torch | |
| tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer/tokenizer.json") | |
| model = AutoModelForCausalLM.from_pretrained("dilip025/mini-gpt1") | |
| prompt = "Once upon a time," | |
| input_ids = tokenizer(prompt, return_tensors="pt").input_ids | |
| outputs = model.generate(input_ids, max_length=50) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| License | |
| MIT | |