mapama247
/

DistilBERTa

Catalan

catalan

masked-lm

distilroberta

Model card Files Files and versions

xet

Community

mapama247 commited on Dec 28, 2022

Commit

9f57e6c

1 Parent(s): 1faac78

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -4

README.md CHANGED Viewed

@@ -32,6 +32,9 @@ widget:
   - [Training data](#training-data)
   - [Training procedure](#training-procedure)
 - [Evaluation](#evaluation)
 - [Additional information](#additional-information)
   - [Authors](#authors)
   - [Contact information](#contact-information)
@@ -56,7 +59,7 @@ This model is a distilled version of [projecte-aina/roberta-base-ca-v2](https://
 The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average, it is twice as fast as its teacher.
-We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the training and evaluation data.
 ## Intended uses and limitations
@@ -81,7 +84,8 @@ At the time of submission, no measures have been taken to estimate the bias embe
 ### Training data
 The training corpus consists of several corpora gathered from web crawling and public corpora.
 | Corpus                  | Size in GB |
 |-------------------------|------------|
 | Catalan Crawling        | 13.00      |
@@ -98,6 +102,7 @@ The training corpus consists of several corpora gathered from web crawling and p
 | Nació Digital           | 0.42       |
 | Vilaweb                 | 0.06       |
 | Tweets                  | 0.02       |
 ### Training procedure
@@ -115,13 +120,30 @@ As an example, the distilled version of BERT has 40% fewer parameters and runs 6
 [TODO]
 ### Evaluation results
-This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB). This is how it compares to the teacher model when fine-tuned on the same downstream tasks:
 | Task        | NER (F1)      | POS (F1)   | STS-ca (Comb)   | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
-| RoBERTa-large-ca-v2        | 89.82 | 99.02 | 83.41 | 75.46 | 83.61 | 89.34/75.50 | 89.20/75.77 | 90.72/79.06 | 73.79/55.34 |
 | RoBERTa-base-ca-v2      | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/75.91 | 89.50/76.63 | 73.64/55.42 |
 | DistilRoBERTa-base-ca-v2| xx.xx | xx.xx | xx.xx | xx.xx | xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx |

   - [Training data](#training-data)
   - [Training procedure](#training-procedure)
 - [Evaluation](#evaluation)
+  - [Variable and metrics](#variable-and-metrics)
+  - [Evaluation benchmark](#evaluation-benchmark)
+  - [Evaluation results](#evaluation-results)
 - [Additional information](#additional-information)
   - [Authors](#authors)
   - [Contact information](#contact-information)
 The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base). On average, it is twice as fast as its teacher.
+We encourage users of this model to check out the [projecte-aina/roberta-base-ca-v2](https://huggingface.co/projecte-aina/roberta-base-ca-v2) model card to learn more details about the teacher model, as well as the training and evaluation data.
 ## Intended uses and limitations
 ### Training data
 The training corpus consists of several corpora gathered from web crawling and public corpora.
+<details>
+<summary>Click to expand</summary>
 | Corpus                  | Size in GB |
 |-------------------------|------------|
 | Catalan Crawling        | 13.00      |
 | Nació Digital           | 0.42       |
 | Vilaweb                 | 0.06       |
 | Tweets                  | 0.02       |
+</details>
 ### Training procedure
 [TODO]
+### Evaluation benchmark
+This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB).
+Here are the train/dev/test splits of each dataset:
+| Dataset | Task | Total | Train | Dev  | Test |
+|:--|:--|:--|:--|:--|:--|
+| Ancora | NER |13,581 | 10,628 | 1,427 | 1,526 |
+| Ancora | POS | 16,678 | 13,123 | 1,709 | 1,846 |
+| STS-ca | STS | 3,073 | 2,073 | 500 | 500 |
+| TeCla | TC |  137,775 | 110,203 | 13,786 |  13,786|
+| TE-ca | TE |  21,163 | 16,930 | 2,116 | 2,117
+| VilaQuAD | QA | 6,282  | 3,882  | 1,200  | 1,200 |
+| ViquiQuAD | QA | 14,239  | 11,255  | 1,492  | 1,429 |
+| CatalanQA | QA | 21,427  | 17,135  | 2,157  | 2,135 |
 ### Evaluation results
+This is how it compares to the teacher model when fine-tuned on the same downstream tasks:
 | Task        | NER (F1)      | POS (F1)   | STS-ca (Comb)   | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
 | ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
+| RoBERTa-large-ca-v2     | 89.82 | 99.02 | 83.41 | 75.46 | 83.61 | 89.34/75.50 | 89.20/75.77 | 90.72/79.06 | 73.79/55.34 |
 | RoBERTa-base-ca-v2      | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/75.91 | 89.50/76.63 | 73.64/55.42 |
 | DistilRoBERTa-base-ca-v2| xx.xx | xx.xx | xx.xx | xx.xx | xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx |