File size: 1,495 Bytes
d0e018b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
de72c33
d0e018b
c76f731
c978fc1
d0e018b
c978fc1
 
 
c76f731
 
d0e018b
 
c978fc1
 
d0e018b
 
 
c978fc1
d0e018b
 
 
c978fc1
d0e018b
 
 
 
c978fc1
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: mit
tags:
- text-generation
- transformer
- tiny-shakespeare
- decoder-only
model-index:
- name: tiny_shakespeare_transformer
  results: []
---

# tiny_shakespeare_transformer

A small Transformer Decoder model trained from scratch on the Tiny Shakespeare dataset.

## Training details
- Dataset: Tiny Shakespeare
- Epochs: 5
- Learning Rate: 0.0003
- Batch Size: 32
- Block Size: 128
- Optimizer: AdamW
- Loss Function: CrossEntropyLoss
- Dropout Rate: 0.1
- Embedding Dimension: 256
- Number of Layers: 6
- Number of Attention Heads: 8

## Usage
To use this model, simply load it using the following code:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")

# Encode input text
inputs = tokenizer("Once upon a time", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
```

## Model Architecture
This model is a Transformer Decoder-based architecture, optimized for text generation. 
It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.

## Training Process
- Training was performed for 5 epochs.
- The model uses AdamW optimizer with a learning rate of 0.0003.
- Dropout rate during training was set to 0.1 to reduce overfitting.

## License
This model is released under the MIT License.