NataliaH commited on
Commit
c978fc1
·
verified ·
1 Parent(s): ac63822

Initial commit with model and tokenizer

Browse files
Files changed (1) hide show
  1. README.md +23 -2
README.md CHANGED
@@ -18,18 +18,39 @@ A small Transformer Decoder model trained from scratch on the Tiny Shakespeare d
18
  - Dataset: Tiny Shakespeare
19
  - Epochs: 5
20
  - Learning Rate: 0.0003
 
 
21
  - Optimizer: AdamW
22
- - Loss: CrossEntropyLoss
23
- - wandb project: [Link to wandb project](https://wandb.ai/honcharova-de-hannover/LanguageModel_Project?nw=nwuserhoncharovade)
 
 
 
24
 
25
  ## Usage
 
 
26
  ```python
27
  from transformers import AutoModelForCausalLM, AutoTokenizer
28
 
 
29
  model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
30
  tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")
31
 
 
32
  inputs = tokenizer("Once upon a time", return_tensors="pt")
33
  outputs = model.generate(**inputs)
34
  print(tokenizer.decode(outputs[0]))
35
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - Dataset: Tiny Shakespeare
19
  - Epochs: 5
20
  - Learning Rate: 0.0003
21
+ - Batch Size: 64
22
+ - Block Size: 128
23
  - Optimizer: AdamW
24
+ - Loss Function: CrossEntropyLoss
25
+ - Dropout Rate: 0.1
26
+ - Embedding Dimension: 256
27
+ - Number of Layers: 4
28
+ - Number of Attention Heads: 4
29
 
30
  ## Usage
31
+ To use this model, simply load it using the following code:
32
+
33
  ```python
34
  from transformers import AutoModelForCausalLM, AutoTokenizer
35
 
36
+ # Load the model and tokenizer
37
  model = AutoModelForCausalLM.from_pretrained("NataliaH/tiny_shakespeare_transformer")
38
  tokenizer = AutoTokenizer.from_pretrained("NataliaH/tiny_shakespeare_transformer")
39
 
40
+ # Encode input text
41
  inputs = tokenizer("Once upon a time", return_tensors="pt")
42
  outputs = model.generate(**inputs)
43
  print(tokenizer.decode(outputs[0]))
44
  ```
45
+
46
+ ## Model Architecture
47
+ This model is a Transformer Decoder-based architecture, optimized for text generation.
48
+ It was trained on the Tiny Shakespeare dataset to generate Shakespeare-like text.
49
+
50
+ ## Training Process
51
+ - Training was performed for 5 epochs.
52
+ - The model uses AdamW optimizer with a learning rate of 0.0003.
53
+ - Dropout rate during training was set to 0.1 to reduce overfitting.
54
+
55
+ ## License
56
+ This model is released under the MIT License.