Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ widget:
|
|
| 18 |
- text: "Catalunya és una referència en <mask> a nivell europeu."
|
| 19 |
---
|
| 20 |
|
| 21 |
-
#
|
| 22 |
|
| 23 |
## Table of Contents
|
| 24 |
<details>
|
|
@@ -42,10 +42,10 @@ widget:
|
|
| 42 |
</details>
|
| 43 |
|
| 44 |
## Overview
|
| 45 |
-
- **Architecture:**
|
| 46 |
-
- **Language:**
|
| 47 |
-
- **Task:**
|
| 48 |
-
- **Data:**
|
| 49 |
|
| 50 |
|
| 51 |
## Model description
|
|
@@ -61,8 +61,7 @@ widget:
|
|
| 61 |
## How to use
|
| 62 |
|
| 63 |
```python
|
| 64 |
-
from transformers import
|
| 65 |
-
from datasets import load_dataset
|
| 66 |
[TODO: Add minimal code here]
|
| 67 |
```
|
| 68 |
|
|
@@ -105,24 +104,25 @@ The training corpus consists of several corpora gathered from web crawling and p
|
|
| 105 |
|
| 106 |
### Evaluation results
|
| 107 |
|
|
|
|
|
|
|
| 108 |
| Task | NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 109 |
| ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
|
| 110 |
-
| RoBERTa-large-ca-v2 |
|
| 111 |
-
| RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72
|
| 112 |
| DistilRoBERTa-base-ca-v2| xx.xx | xx.xx | xx.xx | xx.xx | xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx |
|
| 113 |
|
| 114 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
| 115 |
|
| 116 |
-
|
| 117 |
## Additional Information
|
| 118 |
|
| 119 |
### Authors
|
| 120 |
|
| 121 |
-
Text Mining Unit (TeMU)
|
| 122 |
|
| 123 |
### Contact information
|
| 124 |
|
| 125 |
-
For further information, send an email to aina@bsc.es.
|
| 126 |
|
| 127 |
## Copyright
|
| 128 |
|
|
|
|
| 18 |
- text: "Catalunya és una referència en <mask> a nivell europeu."
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# DistilBerta-base
|
| 22 |
|
| 23 |
## Table of Contents
|
| 24 |
<details>
|
|
|
|
| 42 |
</details>
|
| 43 |
|
| 44 |
## Overview
|
| 45 |
+
- **Architecture:** DistilRoBERTa
|
| 46 |
+
- **Language:** Catalan
|
| 47 |
+
- **Task:** Fill-Mask
|
| 48 |
+
- **Data:** Crawling
|
| 49 |
|
| 50 |
|
| 51 |
## Model description
|
|
|
|
| 61 |
## How to use
|
| 62 |
|
| 63 |
```python
|
| 64 |
+
from transformers import pipeline
|
|
|
|
| 65 |
[TODO: Add minimal code here]
|
| 66 |
```
|
| 67 |
|
|
|
|
| 104 |
|
| 105 |
### Evaluation results
|
| 106 |
|
| 107 |
+
This model has been fine-tuned on the downstream tasks of the Catalan Language Understanding Evaluation benchmark (CLUB).
|
| 108 |
+
|
| 109 |
| Task | NER (F1) | POS (F1) | STS-ca (Comb) | TeCla (Acc.) | TEca (Acc.) | VilaQuAD (F1/EM)| ViquiQuAD (F1/EM) | CatalanQA (F1/EM) | XQuAD-ca <sup>1</sup> (F1/EM) |
|
| 110 |
| ------------|:-------------:| -----:|:------|:------|:-------|:------|:----|:----|:----|
|
| 111 |
+
| RoBERTa-large-ca-v2 | 89.82 | 99.02 | 83.41 | 75.46 | 83.61 | 89.34/75.50 | 89.20/75.77 | 90.72/79.06 | 73.79/55.34 |
|
| 112 |
+
| RoBERTa-base-ca-v2 | 89.29 | 98.96 | 79.07 | 74.26 | 83.14 | 87.74/72.58 | 88.72/75.91 | 89.50/76.63 | 73.64/55.42 |
|
| 113 |
| DistilRoBERTa-base-ca-v2| xx.xx | xx.xx | xx.xx | xx.xx | xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx | xx.xx/xx.xx |
|
| 114 |
|
| 115 |
<sup>1</sup> : Trained on CatalanQA, tested on XQuAD-ca.
|
| 116 |
|
|
|
|
| 117 |
## Additional Information
|
| 118 |
|
| 119 |
### Authors
|
| 120 |
|
| 121 |
+
The Text Mining Unit (TeMU) from Barcelona Supercomputing Center ([bsc-temu@bsc.es](bsc-temu@bsc.es)).
|
| 122 |
|
| 123 |
### Contact information
|
| 124 |
|
| 125 |
+
For further information, send an email to [aina@bsc.es](aina@bsc.es).
|
| 126 |
|
| 127 |
## Copyright
|
| 128 |
|