Initial commit with model and tokenizer
Browse files
README.md
CHANGED
|
@@ -18,18 +18,39 @@ A small Transformer Decoder model trained from scratch on the Tiny Shakespeare d
|
|
| 18 |
- Dataset: Tiny Shakespeare
|
| 19 |
- Epochs: 5
|
| 20 |
- Learning Rate: 0.0003
|
|
|
|
|
|
|
| 21 |
- Optimizer: AdamW
|
| 22 |
-
- Loss: CrossEntropyLoss
|
| 23 |
-
-
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## Usage
|
|
|
|
|
|
|
| 26 |
```python
|
| 27 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 28 |
|
|
|
|
| 29 |
model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
|
| 30 |
tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")
|
| 31 |
|
|
|
|
| 32 |
inputs = tokenizer("Once upon a time", return_tensors="pt")
|
| 33 |
outputs = model.generate(**inputs)
|
| 34 |
print(tokenizer.decode(outputs[0]))
|
| 35 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
- Dataset: Tiny Shakespeare
|
| 19 |
- Epochs: 5
|
| 20 |
- Learning Rate: 0.0003
|
| 21 |
+
- Batch Size: 64
|
| 22 |
+
- Block Size: 128
|
| 23 |
- Optimizer: AdamW
|
| 24 |
+
- Loss Function: CrossEntropyLoss
|
| 25 |
+
- Dropout Rate: 0.1
|
| 26 |
+
- Embedding Dimension: 256
|
| 27 |
+
- Number of Layers: 4
|
| 28 |
+
- Number of Attention Heads: 4
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
+
To use this model, simply load it using the following code:
|
| 32 |
+
|
| 33 |
```python
|
| 34 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 35 |
|
| 36 |
+
# Load the model and tokenizer
|
| 37 |
model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
|
| 38 |
tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")
|
| 39 |
|
| 40 |
+
# Encode input text
|
| 41 |
inputs = tokenizer("Once upon a time", return_tensors="pt")
|
| 42 |
outputs = model.generate(**inputs)
|
| 43 |
print(tokenizer.decode(outputs[0]))
|
| 44 |
```
|
| 45 |
+
|
| 46 |
+
## Model Architecture
|
| 47 |
+
This model is a Transformer Decoder-based architecture, optimized for text generation.
|
| 48 |
+
It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.
|
| 49 |
+
|
| 50 |
+
## Training Process
|
| 51 |
+
- Training was performed for 5 epochs.
|
| 52 |
+
- The model uses AdamW optimizer with a learning rate of 0.0003.
|
| 53 |
+
- Dropout rate during training was set to 0.1 to reduce overfitting.
|
| 54 |
+
|
| 55 |
+
## License
|
| 56 |
+
This model is released under the MIT License.
|