Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -28,4 +28,15 @@ To ensure interoperability, transformation, and machine readability, adopted **s
|
|
| 28 |
| `category` | The classification of the text in terms of who created the material. The recognized categories are `reference` for texts created by experts, teachers, and language learning professionals and `learner` for texts written by language learners and students. |
|
| 29 |
| `cefr_level` | The CEFR level associated with the text. The six recognized CEFR levels are the following: [`A1`, `A2`, `B1`, `B2`, `C1`, `C2`]. A small fraction (<1%) of text in UniversalCEFR contains unlabelled text, texts with plus signs (e.g., `A1+`), and texts with no level indicator (e.g., `A`, `B`). |
|
| 30 |
| `license` | The licensing information associated with the text (e.g., `CC-BY-NC-SA` or `Unknown` if not stated). |
|
| 31 |
-
| `text` | The actual content of the text itself.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
| `category` | The classification of the text in terms of who created the material. The recognized categories are `reference` for texts created by experts, teachers, and language learning professionals and `learner` for texts written by language learners and students. |
|
| 29 |
| `cefr_level` | The CEFR level associated with the text. The six recognized CEFR levels are the following: [`A1`, `A2`, `B1`, `B2`, `C1`, `C2`]. A small fraction (<1%) of text in UniversalCEFR contains unlabelled text, texts with plus signs (e.g., `A1+`), and texts with no level indicator (e.g., `A`, `B`). |
|
| 30 |
| `license` | The licensing information associated with the text (e.g., `CC-BY-NC-SA` or `Unknown` if not stated). |
|
| 31 |
+
| `text` | The actual content of the text itself.
|
| 32 |
+
|
| 33 |
+
## Accessing UniversalCEFR
|
| 34 |
+
|
| 35 |
+
If you're interested in a specific individual or group of datasets from UniversalCEFR, you may access their transformed, standardised version through the UniversalCEFR Huggingface Org: https://huggingface.co/UniversalCEFR
|
| 36 |
+
|
| 37 |
+
If you use any of the datasets indexed in UniversalCEFR, **please cite the original dataset papers** they are associated with. You can find them in the data directory above.
|
| 38 |
+
|
| 39 |
+
Note that there are a few datasets in UniversalCEFR---`EFCAMDAT`, `APA-LHA`, `BEA Shared Task 2019 Write and Improve`, and `DEPlain`---that are not directly available from the UniversalCEFR Huggingface Org as they require users to agree with their Terms of Use before using them for non-commercial research. Once you've done this, you can use the preprocessing Python scripts in `universal-cefr-experiments` repository to transform the raw version to UniversalCEFR version.
|
| 40 |
+
|
| 41 |
+
### Contact
|
| 42 |
+
For questions, concerns, clarifications, and issues, please contact [Joseph Marvin Imperial](https://www.josephimperial.com/) (jmri20@bath.ac.uk).
|