mapama247 commited on
Commit
4b2d453
·
1 Parent(s): 69aa348

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -54,7 +54,7 @@ widget:
54
 
55
  This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from [the paper's repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
56
 
57
- The model has 6 layers, 768 dimensional embeddings and 12 attention heads, totalizing 82M parameters (compared to 125M parameters of standrard RoBERTa-base models). On average, it is twice as fast as its teacher.
58
 
59
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the teacher model, as well as the training and evaluation data.
60
 
 
54
 
55
  This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from [the paper's repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
56
 
57
+ The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads, totalizing 82M parameters (compared to 125M parameters of standrard RoBERTa-base models). On average, it is twice as fast as its teacher.
58
 
59
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the teacher model, as well as the training and evaluation data.
60