Fixed hyperlinks
#26
by 1Bananas1 - opened
README.md
CHANGED
|
@@ -12,7 +12,7 @@ datasets:
|
|
| 12 |
|
| 13 |
This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
|
| 14 |
introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
|
| 15 |
-
[here](https://github.com/huggingface/transformers/tree/main/
|
| 16 |
not make a difference between english and English.
|
| 17 |
|
| 18 |
## Model description
|
|
@@ -187,7 +187,7 @@ The details of the masking procedure for each sentence are the following:
|
|
| 187 |
### Pretraining
|
| 188 |
|
| 189 |
The model was trained on 8 16 GB V100 for 90 hours. See the
|
| 190 |
-
[training code](https://github.com/huggingface/transformers/tree/main/
|
| 191 |
details.
|
| 192 |
|
| 193 |
## Evaluation results
|
|
|
|
| 12 |
|
| 13 |
This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
|
| 14 |
introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
|
| 15 |
+
[here](https://github.com/huggingface/transformers-research-projects/tree/main/distillation). This model is uncased: it does
|
| 16 |
not make a difference between english and English.
|
| 17 |
|
| 18 |
## Model description
|
|
|
|
| 187 |
### Pretraining
|
| 188 |
|
| 189 |
The model was trained on 8 16 GB V100 for 90 hours. See the
|
| 190 |
+
[training code](https://github.com/huggingface/transformers-research-projects/tree/main/distillation) for all hyperparameters
|
| 191 |
details.
|
| 192 |
|
| 193 |
## Evaluation results
|