| | --- |
| | language: |
| | - en |
| | metrics: |
| | - code_eval |
| | - accuracy |
| | library_name: adapter-transformers |
| | pipeline_tag: text-generation |
| | --- |
| | List of texts included in the corpus: |
| |
|
| | Beautiful Stories of Shakespeare, |
| | As you Like It, |
| | Hamlet, |
| | Julius Ceaser, |
| | King Lear II, |
| | King Richard II, |
| | Macbeth, |
| | Midnight Summer Dream, |
| | Othello, |
| | Shakespeare Roman Play, |
| | Shakespearean Text, |
| | Sonnets, |
| | Taming of the Shrew, |
| | The Tempest, |
| | Tragedy of Romeo Juliet |
| |
|
| | How many tokens are in each text and the total number of tokens in the corpus: |
| |
|
| | Beautiful Stories of Shakespeare 62,537; |
| | As you Like It 38,772; |
| | Hamlet 30,512; |
| | Julius Caesar 30,915; |
| | King Lear II 25,031; |
| | King Richard II 26,029; |
| | Macbeth 197,328; |
| | Midnight Summer Dream 248,172; |
| | Othello 197,328; |
| | Shakespeare Roman Play 24,582; |
| | Shakespearean Text 32,453; |
| | Sonnets 35,675; |
| | Taming of the Shrew 36,794; |
| | The Tempest 30,700; |
| | Tragedy of Romeo Juliet 37,623; |
| | Total: 1,054,451 |
| |
|
| | How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it. This corpus was created on 18 Feb 2023. |
| | How the text was pre-processed or tokenized: text was preprocessed by removing all empty spaces. the text was combined into a single line and broken down into paragraphs a combined into one text file that was used for training and valiating the model. |
| |
|
| | Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200 |
| | |
| | Model Description: |
| | this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language. |
| | |
| | Intended uses & limitations: |
| | This model can be used to generate text in the Shakespearean language. |
| | |
| | How to use: |
| | This model can be downloaded from the hugging face library and can be run on Google Colab. |
| | |
| | Training Data: |
| | This model is trained on a corpus of over a million tokens of Shakespeare's work, that was collected from 15 novel of Shakespeare from Gutenberg.org. |
| | |
| | Training Procedure: |
| | This model was run on Google Colab using a GPU. Processing time took about 15 - 20 minutes. |
| | To select a GPU, click on Runtime and Change Runtime Type. Select GPU and Save. Then run the codes in Colab. |
| | |
| | Variable and metrics: |
| | Prompt given to the model to start a sentence was "The" and max_length was set to 300. |
| |
|
| | Evaluation results: |
| | Results of text generation of this model is above satisfactory. The model was able to generate reasonable text and in Shakespearean language. |