mapama247
/

DistilBERTa

Catalan

catalan

masked-lm

distilroberta

Model card Files Files and versions

xet

Community

mapama247 commited on Dec 28, 2022

Commit

1faac78

1 Parent(s): 938eb16

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -15

README.md CHANGED Viewed

@@ -29,6 +29,8 @@ widget:
 - [How to use](#how-to-use)
 - [Limitations and bias](#limitations-and-bias)
 - [Training](#training)
 - [Evaluation](#evaluation)
 - [Additional information](#additional-information)
   - [Authors](#authors)
@@ -36,7 +38,7 @@ widget:
   - [Copyright](#copyright)
   - [Licensing information](#licensing-information)
   - [Funding](#funding)
-  - [Citation Information](#citation-information)
   - [Disclaimer](#disclaimer)
 </details>
@@ -56,16 +58,6 @@ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (c
 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
-**About Knowledge Distiallation**
-It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
-The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
-So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
-As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
 ## Intended uses and limitations
 This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
@@ -109,7 +101,13 @@ The training corpus consists of several corpora gathered from web crawling and p
 ### Training procedure
-[TODO]
 ## Evaluation
@@ -129,7 +127,7 @@ This model has been fine-tuned on the downstream tasks of the Catalan Language U
 <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
-## Additional Information
 ### Authors
@@ -143,7 +141,7 @@ For further information, send an email to [aina@bsc.es](aina@bsc.es).
 Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
-### Licensing Information
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
@@ -151,7 +149,7 @@ This work is licensed under a [Apache License, Version 2.0](https://www.apache.o
 This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
-### Citation Information
 ```bibtex
 [TODO: add bibtext citation here]

 - [How to use](#how-to-use)
 - [Limitations and bias](#limitations-and-bias)
 - [Training](#training)
+  - [Training data](#training-data)
+  - [Training procedure](#training-procedure)
 - [Evaluation](#evaluation)
 - [Additional information](#additional-information)
   - [Authors](#authors)
   - [Copyright](#copyright)
   - [Licensing information](#licensing-information)
   - [Funding](#funding)
+  - [Citation information](#citation-information)
   - [Disclaimer](#disclaimer)
 </details>
 We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
 ## Intended uses and limitations
 This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
 ### Training procedure
+This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
+The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
+So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
+As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
 ## Evaluation
 <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
+## Additional information
 ### Authors
 Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
+### Licensing information
 This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
 This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
+### Citation information
 ```bibtex
 [TODO: add bibtext citation here]