Update README.md
Browse files
README.md
CHANGED
|
@@ -54,7 +54,7 @@ widget:
|
|
| 54 |
|
| 55 |
This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from the paper's [official repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
|
| 56 |
|
| 57 |
-
The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads
|
| 58 |
|
| 59 |
We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the teacher model, as well as the training and evaluation data.
|
| 60 |
|
|
|
|
| 54 |
|
| 55 |
This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from the paper's [official repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
|
| 56 |
|
| 57 |
+
The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads. This adds up to a total of 82M parameters, which is considerably less than the 125M of standard RoBERTa-base models. This makes the model lighter and faster than the original.
|
| 58 |
|
| 59 |
We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the teacher model, as well as the training and evaluation data.
|
| 60 |
|