Israhassan commited on
Commit
d8d5217
·
1 Parent(s): 0f5b3b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -7,6 +7,49 @@ metrics:
7
  library_name: adapter-transformers
8
  pipeline_tag: text-generation
9
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  Model Description:
12
  this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.
 
7
  library_name: adapter-transformers
8
  pipeline_tag: text-generation
9
  ---
10
+ List of texts included in the corpus:
11
+
12
+ Beautiful Stories of Shakespeare
13
+ As you Like It
14
+ Hamlet
15
+ Julius Ceaser
16
+ King Lear II
17
+ King Richard II
18
+ Macbeth
19
+ Midnight Summer Dream
20
+ Othello
21
+ Shakespeare Roman Play
22
+ Shakespearean Text
23
+ Sonnets
24
+ Taming of the SHrew
25
+ The Tempest
26
+ Tragedy of Romeo Juliet
27
+
28
+ How many tokens are in each text and the total number of tokens in the corpus:
29
+
30
+ Beautiful Stories of Shakespeare 62,537
31
+ As you Like It 38,772
32
+ Hamlet 30,512
33
+ Julius Caesar 30,915
34
+ King Lear II 25,031
35
+ King Richard II 26,029
36
+ Macbeth 197,328
37
+ Midnight Summer Dream 248,172
38
+ Othello 197,328
39
+ Shakespeare Roman Play 24,582
40
+ Shakespearean Text 32,453
41
+ Sonnets 35,675
42
+ Taming of the Shrew 36,794
43
+ The Tempest 30,700
44
+ Tragedy of Romeo Juliet 37,623
45
+ Total: 1,054,451
46
+
47
+ How, when, and why the corpus was collected: The corpus was collected from Project Gutenberg https://www.gutenberg.org/ which is a library of over 60,000 free ebooks. I collected 15 books of Shakespeare and combined them to one text file. The corpus was collected to create a dataset of over a million characters/ tokens so a model could be fine tuned as per Shakespeare's work and generate text according to it.
48
+
49
+ How the text was pre-processed or tokenized:
50
+
51
+ Values of hyperparameters used during fine tuning: max_length=768 tokenizer=GPT2 Batch Size=2 top_p=0.95 output max_length=200
52
+
53
 
54
  Model Description:
55
  this model is finetuned on a corpus of Shakespeare's work to generate text in Shakespearean language.