Israhassan
/

Shakespeare

Text Generation

Model card Files Files and versions

Israhassan commited on Feb 23, 2023

Commit

aebd536

·

1 Parent(s): b5f728a

Update README.md

Files changed (1) hide show

README.md +16 -16

README.md CHANGED Viewed

@@ -23,25 +23,25 @@ Shakespearean Text,
 Sonnets,
 Taming of the Shrew,
 The Tempest,
-Tragedy of Romeo Juliet,
 How many tokens are in each text and the total number of tokens in the corpus:
-Beautiful Stories of Shakespeare   62,537
-As you Like It      	           38,772
-Hamlet                	           30,512
-Julius Caesar                      30,915
-King Lear II           	           25,031
-King Richard II        	           26,029
-Macbeth             	           197,328
-Midnight Summer Dream  	           248,172
-Othello             	           197,328
-Shakespeare Roman Play 	           24,582
-Shakespearean Text     	           32,453
-Sonnets               	           35,675
-Taming of the Shrew    	           36,794
-The Tempest            	           30,700
-Tragedy of Romeo Juliet	           37,623
 Total:                             1,054,451
 How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.

 Sonnets,
 Taming of the Shrew,
 The Tempest,
+Tragedy of Romeo Juliet
 How many tokens are in each text and the total number of tokens in the corpus:
+Beautiful Stories of Shakespeare   62,537;
+As you Like It      	           38,772;
+Hamlet                	           30,512;
+Julius Caesar                      30,915;
+King Lear II           	           25,031;
+King Richard II        	           26,029;
+Macbeth             	           197,328;
+Midnight Summer Dream  	           248,172;
+Othello             	           197,328;
+Shakespeare Roman Play 	           24,582;
+Shakespearean Text     	           32,453;
+Sonnets               	           35,675;
+Taming of the Shrew    	           36,794;
+The Tempest            	           30,700;
+Tragedy of Romeo Juliet	           37,623;
 Total:                             1,054,451
 How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.