Fill-Mask
Transformers
PyTorch
English
roberta
exbert
anchit commited on
Commit
1950bf8
·
1 Parent(s): ed6e760

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -8
README.md CHANGED
@@ -43,13 +43,6 @@ Note that this model is primarily aimed at being fine-tuned on tasks that use th
43
  to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
44
  generation you should look at model like GPT2.
45
 
46
- ### Pretraining
47
-
48
- The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The
49
- optimizer used is Adam with a learning rate of 6e-4, \\\\(\\beta_{1} = 0.9\\\\), \\\\(\\beta_{2} = 0.98\\\\) and
50
- \\\\(\\epsilon = 1e-6\\\\), a weight decay of 0.01, learning rate warmup for 24,000 steps and linear decay of the learning
51
- rate after.
52
-
53
  ## Evaluation results
54
 
55
  When fine-tuned on downstream tasks, this model achieves the following results:
@@ -86,5 +79,5 @@ Glue test results:
86
  ```
87
 
88
  <a href="https://huggingface.co/facebook/muppet-roberta-base">
89
- \t<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
90
  </a>
 
43
  to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
44
  generation you should look at model like GPT2.
45
 
 
 
 
 
 
 
 
46
  ## Evaluation results
47
 
48
  When fine-tuned on downstream tasks, this model achieves the following results:
 
79
  ```
80
 
81
  <a href="https://huggingface.co/facebook/muppet-roberta-base">
82
+ \\t<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
83
  </a>