Israhassan
/

Shakespeare

Text Generation

Model card Files Files and versions

Israhassan commited on Feb 23, 2023

Commit

d8d5217

·

1 Parent(s): 0f5b3b5

Update README.md

Files changed (1) hide show

README.md +43 -0

README.md CHANGED Viewed

@@ -7,6 +7,49 @@ metrics:
 library_name: adapter-transformers
 pipeline_tag: text-generation
 ---
 Model Description:
 this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.

 library_name: adapter-transformers
 pipeline_tag: text-generation
 ---
+List of texts included in the corpus:
+Beautiful Stories of Shakespeare
+As you Like It
+Hamlet
+Julius Ceaser
+King Lear II
+King Richard II
+Macbeth
+Midnight Summer Dream
+Othello
+Shakespeare Roman Play
+Shakespearean Text
+Sonnets
+Taming of the SHrew
+The Tempest
+Tragedy of Romeo Juliet
+How many tokens are in each text and the total number of tokens in the corpus:
+Beautiful Stories of Shakespeare   62,537
+As you Like It      	           38,772
+Hamlet                	           30,512
+Julius Caesar                      30,915
+King Lear II           	           25,031
+King Richard II        	           26,029
+Macbeth             	           197,328
+Midnight Summer Dream  	           248,172
+Othello             	           197,328
+Shakespeare Roman Play 	           24,582
+Shakespearean Text     	           32,453
+Sonnets               	           35,675
+Taming of the Shrew    	           36,794
+The Tempest            	           30,700
+Tragedy of Romeo Juliet	           37,623
+Total:                             1,054,451
+How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
+How the text was pre-processed or tokenized:
+Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
 Model Description:
 this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.