distilbert
/

distilbert-base-uncased

Model card Files Files and versions

Fixed hyperlinks

#26

by jmacd867 - opened Feb 16

base: refs/heads/main

←

from: refs/pr/26

Discussion Files changed

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ datasets:
 This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
 introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
-[here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation). This model is uncased: it does
 not make a difference between english and English.
 ## Model description
@@ -187,7 +187,7 @@ The details of the masking procedure for each sentence are the following:
 ### Pretraining
 The model was trained on 8 16 GB V100 for 90 hours. See the
-[training code](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for all hyperparameters
 details.
 ## Evaluation results

 This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-uncased). It was
 introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found
+[here](https://github.com/huggingface/transformers-research-projects/tree/main/distillation). This model is uncased: it does
 not make a difference between english and English.
 ## Model description
 ### Pretraining
 The model was trained on 8 16 GB V100 for 90 hours. See the
+[training code](https://github.com/huggingface/transformers-research-projects/tree/main/distillation) for all hyperparameters
 details.
 ## Evaluation results