Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -2,19 +2,7 @@
|
|
| 2 |
|
| 3 |
UniversalCEFR is a largescale, multilingual, multidimensional dataset comprising of texts annotated according to the [CEFR (Common European Framework of Reference)](https://www.coe.int/en/web/common-european-framework-reference-languages/level-descriptions). The collection comprises of a total of 505,807 CEFR-labeled texts in 13 languages as listed below:
|
| 4 |
|
| 5 |
-
|
| 6 |
-
- Spanish (es)
|
| 7 |
-
- German (de)
|
| 8 |
-
- Dutch (nl)
|
| 9 |
-
- Czech (cs)
|
| 10 |
-
- Italian (it)
|
| 11 |
-
- French (fr)
|
| 12 |
-
- Estonian (et)
|
| 13 |
-
- Portuguese (pt)
|
| 14 |
-
- Arabic (ar)
|
| 15 |
-
- Hindi (hi)
|
| 16 |
-
- Russian (ru)
|
| 17 |
-
- Welsh (cy)
|
| 18 |
|
| 19 |
## UniversalCEFR Data Format / Schema
|
| 20 |
To ensure interoperability, transformation, and machine readability, adopted **standardised JSON format** for each CEFR-labeled text. These fields include the source dataset, language, granularity (document, paragraph, sentence, discourse), production category (learner or reference), and license.
|
|
|
|
| 2 |
|
| 3 |
UniversalCEFR is a largescale, multilingual, multidimensional dataset comprising of texts annotated according to the [CEFR (Common European Framework of Reference)](https://www.coe.int/en/web/common-european-framework-reference-languages/level-descriptions). The collection comprises of a total of 505,807 CEFR-labeled texts in 13 languages as listed below:
|
| 4 |
|
| 5 |
+
English (en), Spanish (es), German (de), Dutch (nl), Czech (cs), Italian (it), French (fr), Estonian (et), Portuguese (pt), Arabic (ar), Hindi (hi), Russian (ru), Welsh (cy)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
## UniversalCEFR Data Format / Schema
|
| 8 |
To ensure interoperability, transformation, and machine readability, adopted **standardised JSON format** for each CEFR-labeled text. These fields include the source dataset, language, granularity (document, paragraph, sentence, discourse), production category (learner or reference), and license.
|