Commit
·
fdd8ca4
1
Parent(s):
c57b9f4
Update README.md
Browse files
README.md
CHANGED
|
@@ -120,14 +120,14 @@ that has been created along with the model.
|
|
| 120 |
|
| 121 |
It contains the following tasks and their related datasets:
|
| 122 |
|
| 123 |
-
1.
|
| 124 |
|
| 125 |
-
Catalan
|
| 126 |
|
| 127 |
-
|
|
|
|
| 128 |
|
| 129 |
-
|
| 130 |
-
filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format
|
| 131 |
|
| 132 |
3. Text Classification (TC)
|
| 133 |
|
|
@@ -135,7 +135,7 @@ It contains the following tasks and their related datasets:
|
|
| 135 |
|
| 136 |
4. Textual Entailment (TE)
|
| 137 |
|
| 138 |
-
**[
|
| 139 |
|
| 140 |
5. Semantic Textual Similarity (STS)
|
| 141 |
|
|
@@ -159,7 +159,7 @@ Here are the train/dev/test splits of the datasets:
|
|
| 159 |
| POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
|
| 160 |
| STS | 3,073 | 2,073 | 500 | 500 |
|
| 161 |
| TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
|
| 162 |
-
| TE (
|
| 163 |
| QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
|
| 164 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
| 165 |
| QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
|
|
|
|
| 120 |
|
| 121 |
It contains the following tasks and their related datasets:
|
| 122 |
|
| 123 |
+
1. Named Entity Recognition (NER)
|
| 124 |
|
| 125 |
+
**[AnCora Catalan 2.0.0](https://zenodo.org/record/4762031#.YKaFjqGxWUk)**: extracted named entities from the original [Ancora](https://doi.org/10.5281/zenodo.4762030) version, filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format.
|
| 126 |
|
| 127 |
+
|
| 128 |
+
2. Part-of-Speech Tagging (POS)
|
| 129 |
|
| 130 |
+
Catalan-Ancora: from the [Universal Dependencies treebank](https://github.com/UniversalDependencies/UD_Catalan-AnCora) of the well-known Ancora corpus.
|
|
|
|
| 131 |
|
| 132 |
3. Text Classification (TC)
|
| 133 |
|
|
|
|
| 135 |
|
| 136 |
4. Textual Entailment (TE)
|
| 137 |
|
| 138 |
+
**[TECa](https://huggingface.co/datasets/projecte-aina/teca)**: consisting of 21,163 pairs of premises and hypotheses, annotated according to the inference relation they have (implication, contradiction, or neutral), extracted from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus).
|
| 139 |
|
| 140 |
5. Semantic Textual Similarity (STS)
|
| 141 |
|
|
|
|
| 159 |
| POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
|
| 160 |
| STS | 3,073 | 2,073 | 500 | 500 |
|
| 161 |
| TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
|
| 162 |
+
| TE (TECa) | 21,163 | 16,930 | 2,116 | 2,117
|
| 163 |
| QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
|
| 164 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
| 165 |
| QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
|