Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ Model was pretrained using standard MLM objective on a large text corpora includ
|
|
| 27 |
|
| 28 |
## How to Get Started with the Model
|
| 29 |
|
| 30 |
-
```
|
| 31 |
from transformers import AutoTokenizer, AutoModel
|
| 32 |
|
| 33 |
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
|
|
@@ -43,31 +43,21 @@ predictions = model(**inputs)
|
|
| 43 |
|
| 44 |
### Training Data
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
Mix of the following data:
|
| 49 |
-
* Wikipedia
|
| 50 |
-
* Books
|
| 51 |
-
* Twitter comments
|
| 52 |
-
* Pikabu
|
| 53 |
-
* Proza.ru
|
| 54 |
-
* Film subtitles
|
| 55 |
-
* News websites
|
| 56 |
-
* Social corpus
|
| 57 |
-
|
| 58 |
-
~500gb of raw texts
|
| 59 |
|
| 60 |
### Training Procedure
|
| 61 |
|
| 62 |
#### Training Hyperparameters
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
|
|
|
| 71 |
|
| 72 |
Model was trained using 8xA100 for ~22 days.
|
| 73 |
|
|
@@ -75,25 +65,29 @@ Model was trained using 8xA100 for ~22 days.
|
|
| 75 |
|
| 76 |
Standard RoBERTa-base parameters:
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
|
| 88 |
## Evaluation
|
| 89 |
|
| 90 |
-
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
| Модель
|
| 93 |
-
|
| 94 |
-
| vk-roberta-base
|
| 95 |
-
| vk-deberta-distill | 0.433
|
| 96 |
-
| vk-deberta-base
|
| 97 |
-
| vk-bert-base
|
| 98 |
-
| sber-
|
| 99 |
-
| sber-
|
|
|
|
| 27 |
|
| 28 |
## How to Get Started with the Model
|
| 29 |
|
| 30 |
+
```python
|
| 31 |
from transformers import AutoTokenizer, AutoModel
|
| 32 |
|
| 33 |
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
|
|
|
|
| 43 |
|
| 44 |
### Training Data
|
| 45 |
|
| 46 |
+
500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
|
| 47 |
+
News websites, Social corpus.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
### Training Procedure
|
| 50 |
|
| 51 |
#### Training Hyperparameters
|
| 52 |
|
| 53 |
+
| Argument | Value |
|
| 54 |
+
|--------------------|----------------------|
|
| 55 |
+
| Training regime | fp16 mixed precision |
|
| 56 |
+
| Training framework | Fairseq |
|
| 57 |
+
| Optimizer | Adam |
|
| 58 |
+
| Adam betas | 0.9,0.98 |
|
| 59 |
+
| Adam eps | 1e-6 |
|
| 60 |
+
| Num training steps | 500k |
|
| 61 |
|
| 62 |
Model was trained using 8xA100 for ~22 days.
|
| 63 |
|
|
|
|
| 65 |
|
| 66 |
Standard RoBERTa-base parameters:
|
| 67 |
|
| 68 |
+
| Argument | Value |
|
| 69 |
+
|-------------------------|-------|
|
| 70 |
+
|Activation function | gelu |
|
| 71 |
+
|Attention dropout | 0.1 |
|
| 72 |
+
|Dropout | 0.1 |
|
| 73 |
+
|Encoder attention heads | 12 |
|
| 74 |
+
|Encoder embed dim | 768 |
|
| 75 |
+
|Encoder ffn embed dim | 3,072 |
|
| 76 |
+
|Encoder layers | 12 |
|
| 77 |
+
|Max positions | 512 |
|
| 78 |
+
|Vocab size | 50266 |
|
| 79 |
|
| 80 |
## Evaluation
|
| 81 |
|
| 82 |
+
Russian Super Glue dev set.
|
| 83 |
+
|
| 84 |
+
Best result across base size models in bold.
|
| 85 |
|
| 86 |
+
| Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
|
| 87 |
+
|------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
|
| 88 |
+
| [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
|
| 89 |
+
| [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
|
| 90 |
+
| [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** |
|
| 91 |
+
| [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 |
|
| 92 |
+
| [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 |
|
| 93 |
+
| [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |
|