Israhassan commited on
Commit
9f5ea86
·
1 Parent(s): d8d5217

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -16
README.md CHANGED
@@ -10,20 +10,20 @@ pipeline_tag: text-generation
10
  List of texts included in the corpus:
11
 
12
  Beautiful Stories of Shakespeare
13
- As you Like It
14
- Hamlet
15
- Julius Ceaser
16
- King Lear II
17
- King Richard II
18
- Macbeth
19
- Midnight Summer Dream
20
- Othello
21
- Shakespeare Roman Play
22
- Shakespearean Text
23
- Sonnets
24
- Taming of the SHrew
25
- The Tempest
26
- Tragedy of Romeo Juliet
27
 
28
  How many tokens are in each text and the total number of tokens in the corpus:
29
 
@@ -46,11 +46,10 @@ Total: 1,054,451
46
 
47
  How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
48
 
49
- How the text was pre-processed or tokenized:
50
 
51
  Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
52
 
53
-
54
  Model Description:
55
  this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
56
 
 
10
  List of texts included in the corpus:
11
 
12
  Beautiful Stories of Shakespeare
13
+ As you Like It,
14
+ Hamlet,
15
+ Julius Ceaser,
16
+ King Lear II,
17
+ King Richard II,
18
+ Macbeth,
19
+ Midnight Summer Dream,
20
+ Othello,
21
+ Shakespeare Roman Play,
22
+ Shakespearean Text,
23
+ Sonnets,
24
+ Taming of the Shrew,
25
+ The Tempest,
26
+ Tragedy of Romeo Juliet,
27
 
28
  How many tokens are in each text and the total number of tokens in the corpus:
29
 
 
46
 
47
  How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
48
 
49
+ How the text was pre-processed or tokenized: text was preprocessed by removing all empty spaces. the text was combined into a single line and broken down into paragraphs a combined into one text file that was used for training and valiating the model.
50
 
51
  Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
52
 
 
53
  Model Description:
54
  this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
55