Israhassan commited on
Commit
aebd536
·
1 Parent(s): b5f728a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -23,25 +23,25 @@ Shakespearean Text,
23
  Sonnets,
24
  Taming of the Shrew,
25
  The Tempest,
26
- Tragedy of Romeo Juliet,
27
 
28
  How many tokens are in each text and the total number of tokens in the corpus:
29
 
30
- Beautiful Stories of Shakespeare 62,537
31
- As you Like It 38,772
32
- Hamlet 30,512
33
- Julius Caesar 30,915
34
- King Lear II 25,031
35
- King Richard II 26,029
36
- Macbeth 197,328
37
- Midnight Summer Dream 248,172
38
- Othello 197,328
39
- Shakespeare Roman Play 24,582
40
- Shakespearean Text 32,453
41
- Sonnets 35,675
42
- Taming of the Shrew 36,794
43
- The Tempest 30,700
44
- Tragedy of Romeo Juliet 37,623
45
  Total: 1,054,451
46
 
47
  How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
 
23
  Sonnets,
24
  Taming of the Shrew,
25
  The Tempest,
26
+ Tragedy of Romeo Juliet
27
 
28
  How many tokens are in each text and the total number of tokens in the corpus:
29
 
30
+ Beautiful Stories of Shakespeare 62,537;
31
+ As you Like It 38,772;
32
+ Hamlet 30,512;
33
+ Julius Caesar 30,915;
34
+ King Lear II 25,031;
35
+ King Richard II 26,029;
36
+ Macbeth 197,328;
37
+ Midnight Summer Dream 248,172;
38
+ Othello 197,328;
39
+ Shakespeare Roman Play 24,582;
40
+ Shakespearean Text 32,453;
41
+ Sonnets 35,675;
42
+ Taming of the Shrew 36,794;
43
+ The Tempest 30,700;
44
+ Tragedy of Romeo Juliet 37,623;
45
  Total: 1,054,451
46
 
47
  How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.