Update README.md
Browse files
README.md
CHANGED
|
@@ -32,16 +32,14 @@ inference:
|
|
| 32 |
min_length: 2
|
| 33 |
max_length: 64
|
| 34 |
length_penalty: 0.7
|
| 35 |
-
no_repeat_ngram_size:
|
| 36 |
do_sample: True
|
| 37 |
-
top_p: 0.
|
| 38 |
-
top_k:
|
| 39 |
repetition_penalty: 2.1
|
| 40 |
|
| 41 |
---
|
| 42 |
|
| 43 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 44 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 45 |
|
| 46 |
# distilgpt2-tiny-conversational
|
| 47 |
|
|
@@ -55,15 +53,16 @@ It achieves the following results on the evaluation set:
|
|
| 55 |
|
| 56 |
## Intended uses & limitations
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
## Training and evaluation data
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
## Training procedure
|
| 65 |
|
| 66 |
-
- deepspeed
|
|
|
|
| 67 |
### Training hyperparameters
|
| 68 |
|
| 69 |
The following hyperparameters were used during training:
|
|
|
|
| 32 |
min_length: 2
|
| 33 |
max_length: 64
|
| 34 |
length_penalty: 0.7
|
| 35 |
+
no_repeat_ngram_size: 2
|
| 36 |
do_sample: True
|
| 37 |
+
top_p: 0.95
|
| 38 |
+
top_k: 30
|
| 39 |
repetition_penalty: 2.1
|
| 40 |
|
| 41 |
---
|
| 42 |
|
|
|
|
|
|
|
| 43 |
|
| 44 |
# distilgpt2-tiny-conversational
|
| 45 |
|
|
|
|
| 53 |
|
| 54 |
## Intended uses & limitations
|
| 55 |
|
| 56 |
+
- [ai-msgbot](https://github.com/pszemraj/ai-msgbot)
|
| 57 |
|
| 58 |
## Training and evaluation data
|
| 59 |
|
| 60 |
+
- [wizard of Wikipedia](https://parl.ai/projects/wizard_of_wikipedia/) parsed, from parlAI
|
| 61 |
|
| 62 |
## Training procedure
|
| 63 |
|
| 64 |
+
- deepspeed + huggingface trainer, an example notebook is in [ai-msgbot](https://github.com/pszemraj/ai-msgbot)
|
| 65 |
+
|
| 66 |
### Training hyperparameters
|
| 67 |
|
| 68 |
The following hyperparameters were used during training:
|