Israhassan
/

Shakespeare

Text Generation

Model card Files Files and versions

Israhassan commited on Feb 23, 2023

Commit

9f5ea86

·

1 Parent(s): d8d5217

Update README.md

Files changed (1) hide show

README.md +15 -16

README.md CHANGED Viewed

@@ -10,20 +10,20 @@ pipeline_tag: text-generation
 List of texts included in the corpus:
 Beautiful Stories of Shakespeare
-As you Like It
-Hamlet
-Julius Ceaser
-King Lear II
-King Richard II
-Macbeth
-Midnight Summer Dream
-Othello
-Shakespeare Roman Play
-Shakespearean Text
-Sonnets
-Taming of the SHrew
-The Tempest
-Tragedy of Romeo Juliet
 How many tokens are in each text and the total number of tokens in the corpus:
@@ -46,11 +46,10 @@ Total:                             1,054,451
 How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
-How the text was pre-processed or tokenized:
 Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
 Model Description:
 this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.

 List of texts included in the corpus:
 Beautiful Stories of Shakespeare
+As you Like It,
+Hamlet,
+Julius Ceaser,
+King Lear II,
+King Richard II,
+Macbeth,
+Midnight Summer Dream,
+Othello,
+Shakespeare Roman Play,
+Shakespearean Text,
+Sonnets,
+Taming of the Shrew,
+The Tempest,
+Tragedy of Romeo Juliet,
 How many tokens are in each text and the total number of tokens in the corpus:
 How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
+How the text was pre-processed or tokenized: text was preprocessed by removing all empty spaces. the text was combined into a single line and broken down into paragraphs a combined into one text file that was used for training and valiating the model.
 Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
 Model Description:
 this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.