Commit
·
aebd536
1
Parent(s):
b5f728a
Update README.md
Browse files
README.md
CHANGED
|
@@ -23,25 +23,25 @@ Shakespearean Text,
|
|
| 23 |
Sonnets,
|
| 24 |
Taming of the Shrew,
|
| 25 |
The Tempest,
|
| 26 |
-
Tragedy of Romeo Juliet
|
| 27 |
|
| 28 |
How many tokens are in each text and the total number of tokens in the corpus:
|
| 29 |
|
| 30 |
-
Beautiful Stories of Shakespeare 62,537
|
| 31 |
-
As you Like It 38,772
|
| 32 |
-
Hamlet 30,512
|
| 33 |
-
Julius Caesar 30,915
|
| 34 |
-
King Lear II 25,031
|
| 35 |
-
King Richard II 26,029
|
| 36 |
-
Macbeth 197,328
|
| 37 |
-
Midnight Summer Dream 248,172
|
| 38 |
-
Othello 197,328
|
| 39 |
-
Shakespeare Roman Play 24,582
|
| 40 |
-
Shakespearean Text 32,453
|
| 41 |
-
Sonnets 35,675
|
| 42 |
-
Taming of the Shrew 36,794
|
| 43 |
-
The Tempest 30,700
|
| 44 |
-
Tragedy of Romeo Juliet 37,623
|
| 45 |
Total: 1,054,451
|
| 46 |
|
| 47 |
How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
|
|
|
|
| 23 |
Sonnets,
|
| 24 |
Taming of the Shrew,
|
| 25 |
The Tempest,
|
| 26 |
+
Tragedy of Romeo Juliet
|
| 27 |
|
| 28 |
How many tokens are in each text and the total number of tokens in the corpus:
|
| 29 |
|
| 30 |
+
Beautiful Stories of Shakespeare 62,537;
|
| 31 |
+
As you Like It 38,772;
|
| 32 |
+
Hamlet 30,512;
|
| 33 |
+
Julius Caesar 30,915;
|
| 34 |
+
King Lear II 25,031;
|
| 35 |
+
King Richard II 26,029;
|
| 36 |
+
Macbeth 197,328;
|
| 37 |
+
Midnight Summer Dream 248,172;
|
| 38 |
+
Othello 197,328;
|
| 39 |
+
Shakespeare Roman Play 24,582;
|
| 40 |
+
Shakespearean Text 32,453;
|
| 41 |
+
Sonnets 35,675;
|
| 42 |
+
Taming of the Shrew 36,794;
|
| 43 |
+
The Tempest 30,700;
|
| 44 |
+
Tragedy of Romeo Juliet 37,623;
|
| 45 |
Total: 1,054,451
|
| 46 |
|
| 47 |
How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
|