mapama247 commited on
Commit
1faac78
·
1 Parent(s): 938eb16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -15
README.md CHANGED
@@ -29,6 +29,8 @@ widget:
29
  - [How to use](#how-to-use)
30
  - [Limitations and bias](#limitations-and-bias)
31
  - [Training](#training)
 
 
32
  - [Evaluation](#evaluation)
33
  - [Additional information](#additional-information)
34
  - [Authors](#authors)
@@ -36,7 +38,7 @@ widget:
36
  - [Copyright](#copyright)
37
  - [Licensing information](#licensing-information)
38
  - [Funding](#funding)
39
- - [Citation Information](#citation-information)
40
  - [Disclaimer](#disclaimer)
41
 
42
  </details>
@@ -56,16 +58,6 @@ The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (c
56
 
57
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
58
 
59
- **About Knowledge Distiallation**
60
-
61
- It is a technique used to shrink networks to a reasonable size while minimizing the loss in performance.
62
-
63
- The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
64
-
65
- So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
66
-
67
- As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
68
-
69
  ## Intended uses and limitations
70
 
71
  This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
@@ -109,7 +101,13 @@ The training corpus consists of several corpora gathered from web crawling and p
109
 
110
  ### Training procedure
111
 
112
- [TODO]
 
 
 
 
 
 
113
 
114
  ## Evaluation
115
 
@@ -129,7 +127,7 @@ This model has been fine-tuned on the downstream tasks of the Catalan Language U
129
 
130
  <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
131
 
132
- ## Additional Information
133
 
134
  ### Authors
135
 
@@ -143,7 +141,7 @@ For further information, send an email to [aina@bsc.es](aina@bsc.es).
143
 
144
  Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
145
 
146
- ### Licensing Information
147
 
148
  This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
149
 
@@ -151,7 +149,7 @@ This work is licensed under a [Apache License, Version 2.0](https://www.apache.o
151
 
152
  This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
153
 
154
- ### Citation Information
155
 
156
  ```bibtex
157
  [TODO: add bibtext citation here]
 
29
  - [How to use](#how-to-use)
30
  - [Limitations and bias](#limitations-and-bias)
31
  - [Training](#training)
32
+ - [Training data](#training-data)
33
+ - [Training procedure](#training-procedure)
34
  - [Evaluation](#evaluation)
35
  - [Additional information](#additional-information)
36
  - [Authors](#authors)
 
38
  - [Copyright](#copyright)
39
  - [Licensing information](#licensing-information)
40
  - [Funding](#funding)
41
+ - [Citation information](#citation-information)
42
  - [Disclaimer](#disclaimer)
43
 
44
  </details>
 
58
 
59
  We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
60
 
 
 
 
 
 
 
 
 
 
 
61
  ## Intended uses and limitations
62
 
63
  This model is ready-to-use only for masked language modeling (MLM) to perform the Fill-Mask task. However, it is intended to be fine-tuned on non-generative downstream tasks such as Question Answering, Text Classification, or Named Entity Recognition.
 
101
 
102
  ### Training procedure
103
 
104
+ This model has been trained using a technique known as Knowledge Distillation, which is used to shrink networks to a reasonable size while minimizing the loss in performance.
105
+
106
+ The main idea is to distill a large language model (the teacher) into a more lightweight, energy-efficient, and production-friendly model (the student).
107
+
108
+ So, in a “teacher-student learning” setup, a small student model is trained to mimic the behavior of a larger teacher model.
109
+
110
+ As an example, the distilled version of BERT has 40% fewer parameters and runs 60% faster while preserving 97% of BERT's performance on the GLUE benchmark. This translates in lower inference time and the ability to run in commodity hardware.
111
 
112
  ## Evaluation
113
 
 
127
 
128
  <sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
129
 
130
+ ## Additional information
131
 
132
  ### Authors
133
 
 
141
 
142
  Copyright by the Text Mining Unit at Barcelona Supercomputing Center.
143
 
144
+ ### Licensing information
145
 
146
  This work is licensed under a [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
147
 
 
149
 
150
  This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
151
 
152
+ ### Citation information
153
 
154
  ```bibtex
155
  [TODO: add bibtext citation here]