Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,6 @@ widget:
|
|
| 32 |
- [Training data](#training-data)
|
| 33 |
- [Training procedure](#training-procedure)
|
| 34 |
- [Evaluation](#evaluation)
|
| 35 |
-
- [Variable and metrics](#variable-and-metrics)
|
| 36 |
- [Evaluation benchmark](#evaluation-benchmark)
|
| 37 |
- [Evaluation results](#evaluation-results)
|
| 38 |
- [Additional information](#additional-information)
|
|
@@ -114,16 +113,11 @@ As an example, the distilled version of BERT has 40% fewer parameters and runs 6
|
|
| 114 |
|
| 115 |
## Evaluation
|
| 116 |
|
| 117 |
-
### Variable and metrics
|
| 118 |
-
|
| 119 |
-
[TODO]
|
| 120 |
-
|
| 121 |
### Evaluation benchmark
|
| 122 |
|
| 123 |
This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB).
|
| 124 |
|
| 125 |
Here are the train/dev/test splits of each dataset:
|
| 126 |
-
|
| 127 |
| Dataset | Task | Total | Train | Dev | Test |
|
| 128 |
|:--|:--|:--|:--|:--|:--|
|
| 129 |
| Ancora | NER |13,581 | 10,628 | 1,427 | 1,526 |
|
|
@@ -138,7 +132,6 @@ Here are the train/dev/test splits of each dataset:
|
|
| 138 |
### Evaluation results
|
| 139 |
|
| 140 |
This is how it compares to the teacher model when fine-tuned on the same downstream tasks:
|
| 141 |
-
|
| 142 |
| Model \ Task| NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 143 |
| ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
|
| 144 |
| RoBERTa-large-ca-v2 | 89.82 | 99.02 | 83.41 | 75.46 | 83.61 | 89.34/75.50 | 89.20/75.77 | 90.72/79.06 | 73.79/55.34 |
|
|
|
|
| 32 |
- [Training data](#training-data)
|
| 33 |
- [Training procedure](#training-procedure)
|
| 34 |
- [Evaluation](#evaluation)
|
|
|
|
| 35 |
- [Evaluation benchmark](#evaluation-benchmark)
|
| 36 |
- [Evaluation results](#evaluation-results)
|
| 37 |
- [Additional information](#additional-information)
|
|
|
|
| 113 |
|
| 114 |
## Evaluation
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
### Evaluation benchmark
|
| 117 |
|
| 118 |
This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB).
|
| 119 |
|
| 120 |
Here are the train/dev/test splits of each dataset:
|
|
|
|
| 121 |
| Dataset | Task | Total | Train | Dev | Test |
|
| 122 |
|:--|:--|:--|:--|:--|:--|
|
| 123 |
| Ancora | NER |13,581 | 10,628 | 1,427 | 1,526 |
|
|
|
|
| 132 |
### Evaluation results
|
| 133 |
|
| 134 |
This is how it compares to the teacher model when fine-tuned on the same downstream tasks:
|
|
|
|
| 135 |
| Model \ Task| NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 136 |
| ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
|
| 137 |
| RoBERTa-large-ca-v2 | 89.82 | 99.02 | 83.41 | 75.46 | 83.61 | 89.34/75.50 | 89.20/75.77 | 90.72/79.06 | 73.79/55.34 |
|