| | --- |
| | license: mit |
| | tags: |
| | - text-generation |
| | - transformer |
| | - tiny-shakespeare |
| | - decoder-only |
| | model-index: |
| | - name: tiny_shakespeare_transformer |
| | results: [] |
| | --- |
| | |
| | # tiny_shakespeare_transformer |
| |
|
| | A small Transformer Decoder model trained from scratch on the Tiny Shakespeare dataset. |
| |
|
| | ## Training details |
| | - Dataset: Tiny Shakespeare |
| | - Epochs: 5 |
| | - Learning Rate: 0.0003 |
| | - Batch Size: 32 |
| | - Block Size: 128 |
| | - Optimizer: AdamW |
| | - Loss Function: CrossEntropyLoss |
| | - Dropout Rate: 0.1 |
| | - Embedding Dimension: 256 |
| | - Number of Layers: 6 |
| | - Number of Attention Heads: 8 |
| |
|
| | ## Usage |
| | To use this model, simply load it using the following code: |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | # Load the model and tokenizer |
| | model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer") |
| | tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer") |
| | |
| | # Encode input text |
| | inputs = tokenizer("Once upon a time", return_tensors="pt") |
| | outputs = model.generate(**inputs) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | ## Model Architecture |
| | This model is a Transformer Decoder-based architecture, optimized for text generation. |
| | It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text. |
| |
|
| | ## Training Process |
| | - Training was performed for 5 epochs. |
| | - The model uses AdamW optimizer with a learning rate of 0.0003. |
| | - Dropout rate during training was set to 0.1 to reduce overfitting. |
| |
|
| | ## License |
| | This model is released under the MIT License. |
| |
|