Commit ·
9f5ea86
1
Parent(s): d8d5217
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,20 +10,20 @@ pipeline_tag: text-generation
|
|
| 10 |
List of texts included in the corpus:
|
| 11 |
|
| 12 |
Beautiful Stories of Shakespeare
|
| 13 |
-
As you Like It
|
| 14 |
-
Hamlet
|
| 15 |
-
Julius Ceaser
|
| 16 |
-
King Lear II
|
| 17 |
-
King Richard II
|
| 18 |
-
Macbeth
|
| 19 |
-
Midnight Summer Dream
|
| 20 |
-
Othello
|
| 21 |
-
Shakespeare Roman Play
|
| 22 |
-
Shakespearean Text
|
| 23 |
-
Sonnets
|
| 24 |
-
Taming of the
|
| 25 |
-
The Tempest
|
| 26 |
-
Tragedy of Romeo Juliet
|
| 27 |
|
| 28 |
How many tokens are in each text and the total number of tokens in the corpus:
|
| 29 |
|
|
@@ -46,11 +46,10 @@ Total: 1,054,451
|
|
| 46 |
|
| 47 |
How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
|
| 48 |
|
| 49 |
-
How the text was pre-processed or tokenized:
|
| 50 |
|
| 51 |
Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
|
| 52 |
|
| 53 |
-
|
| 54 |
Model Description:
|
| 55 |
this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
|
| 56 |
|
|
|
|
| 10 |
List of texts included in the corpus:
|
| 11 |
|
| 12 |
Beautiful Stories of Shakespeare
|
| 13 |
+
As you Like It,
|
| 14 |
+
Hamlet,
|
| 15 |
+
Julius Ceaser,
|
| 16 |
+
King Lear II,
|
| 17 |
+
King Richard II,
|
| 18 |
+
Macbeth,
|
| 19 |
+
Midnight Summer Dream,
|
| 20 |
+
Othello,
|
| 21 |
+
Shakespeare Roman Play,
|
| 22 |
+
Shakespearean Text,
|
| 23 |
+
Sonnets,
|
| 24 |
+
Taming of the Shrew,
|
| 25 |
+
The Tempest,
|
| 26 |
+
Tragedy of Romeo Juliet,
|
| 27 |
|
| 28 |
How many tokens are in each text and the total number of tokens in the corpus:
|
| 29 |
|
|
|
|
| 46 |
|
| 47 |
How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
|
| 48 |
|
| 49 |
+
How the text was pre-processed or tokenized: text was preprocessed by removing all empty spaces. the text was combined into a single line and broken down into paragraphs a combined into one text file that was used for training and valiating the model.
|
| 50 |
|
| 51 |
Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
|
| 52 |
|
|
|
|
| 53 |
Model Description:
|
| 54 |
this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
|
| 55 |
|