mapama247
/

DistilBERTa

Model card Files Files and versions

mapama247 commited on Dec 28, 2022

Commit

0b9dd8c

·

1 Parent(s): 4b2d453

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -52,7 +52,7 @@ widget:
 ## Model description
-This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from [the paper's repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
 The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads, totalizing 82M parameters (compared to 125M parameters of standrard RoBERTa-base models). On average, it is twice as fast as its teacher.

 ## Model description
+This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2). It follows the same training procedure as [DistilBERT](https://arxiv.org/abs/1910.01108), using the implementation of Knowledge Distillation from the paper's [official repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
 The resulting architecture consists of 6 layers, 768 dimensional embeddings and 12 attention heads, totalizing 82M parameters (compared to 125M parameters of standrard RoBERTa-base models). On average, it is twice as fast as its teacher.