Commit ·
d8d5217
1
Parent(s): 0f5b3b5
Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,49 @@ metrics:
|
|
| 7 |
library_name: adapter-transformers
|
| 8 |
pipeline_tag: text-generation
|
| 9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
Model Description:
|
| 12 |
this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
|
|
|
|
| 7 |
library_name: adapter-transformers
|
| 8 |
pipeline_tag: text-generation
|
| 9 |
---
|
| 10 |
+
List of texts included in the corpus:
|
| 11 |
+
|
| 12 |
+
Beautiful Stories of Shakespeare
|
| 13 |
+
As you Like It
|
| 14 |
+
Hamlet
|
| 15 |
+
Julius Ceaser
|
| 16 |
+
King Lear II
|
| 17 |
+
King Richard II
|
| 18 |
+
Macbeth
|
| 19 |
+
Midnight Summer Dream
|
| 20 |
+
Othello
|
| 21 |
+
Shakespeare Roman Play
|
| 22 |
+
Shakespearean Text
|
| 23 |
+
Sonnets
|
| 24 |
+
Taming of the SHrew
|
| 25 |
+
The Tempest
|
| 26 |
+
Tragedy of Romeo Juliet
|
| 27 |
+
|
| 28 |
+
How many tokens are in each text and the total number of tokens in the corpus:
|
| 29 |
+
|
| 30 |
+
Beautiful Stories of Shakespeare 62,537
|
| 31 |
+
As you Like It 38,772
|
| 32 |
+
Hamlet 30,512
|
| 33 |
+
Julius Caesar 30,915
|
| 34 |
+
King Lear II 25,031
|
| 35 |
+
King Richard II 26,029
|
| 36 |
+
Macbeth 197,328
|
| 37 |
+
Midnight Summer Dream 248,172
|
| 38 |
+
Othello 197,328
|
| 39 |
+
Shakespeare Roman Play 24,582
|
| 40 |
+
Shakespearean Text 32,453
|
| 41 |
+
Sonnets 35,675
|
| 42 |
+
Taming of the Shrew 36,794
|
| 43 |
+
The Tempest 30,700
|
| 44 |
+
Tragedy of Romeo Juliet 37,623
|
| 45 |
+
Total: 1,054,451
|
| 46 |
+
|
| 47 |
+
How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
|
| 48 |
+
|
| 49 |
+
How the text was pre-processed or tokenized:
|
| 50 |
+
|
| 51 |
+
Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
|
| 52 |
+
|
| 53 |
|
| 54 |
Model Description:
|
| 55 |
this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
|