josephimperial commited on
Commit
46be90e
·
verified ·
1 Parent(s): 70e7a70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -28,4 +28,15 @@ To ensure interoperability, transformation, and machine readability, adopted **s
28
  | `category` | The classification of the text in terms of who created the material. The recognized categories are `reference` for texts created by experts, teachers, and language learning professionals and `learner` for texts written by language learners and students. |
29
  | `cefr_level` | The CEFR level associated with the text. The six recognized CEFR levels are the following: [`A1`, `A2`, `B1`, `B2`, `C1`, `C2`]. A small fraction (<1%) of text in UniversalCEFR contains unlabelled text, texts with plus signs (e.g., `A1+`), and texts with no level indicator (e.g., `A`, `B`). |
30
  | `license` | The licensing information associated with the text (e.g., `CC-BY-NC-SA` or `Unknown` if not stated). |
31
- | `text` | The actual content of the text itself.
 
 
 
 
 
 
 
 
 
 
 
 
28
  | `category` | The classification of the text in terms of who created the material. The recognized categories are `reference` for texts created by experts, teachers, and language learning professionals and `learner` for texts written by language learners and students. |
29
  | `cefr_level` | The CEFR level associated with the text. The six recognized CEFR levels are the following: [`A1`, `A2`, `B1`, `B2`, `C1`, `C2`]. A small fraction (<1%) of text in UniversalCEFR contains unlabelled text, texts with plus signs (e.g., `A1+`), and texts with no level indicator (e.g., `A`, `B`). |
30
  | `license` | The licensing information associated with the text (e.g., `CC-BY-NC-SA` or `Unknown` if not stated). |
31
+ | `text` | The actual content of the text itself.
32
+
33
+ ## Accessing UniversalCEFR
34
+
35
+ If you're interested in a specific individual or group of datasets from UniversalCEFR, you may access their transformed, standardised version through the UniversalCEFR Huggingface Org: https://huggingface.co/UniversalCEFR
36
+
37
+ If you use any of the datasets indexed in UniversalCEFR, **please cite the original dataset papers** they are associated with. You can find them in the data directory above.
38
+
39
+ Note that there are a few datasets in UniversalCEFR---`EFCAMDAT`, `APA-LHA`, `BEA Shared Task 2019 Write and Improve`, and `DEPlain`---that are not directly available from the UniversalCEFR Huggingface Org as they require users to agree with their Terms of Use before using them for non-commercial research. Once you've done this, you can use the preprocessing Python scripts in `universal-cefr-experiments` repository to transform the raw version to UniversalCEFR version.
40
+
41
+ ### Contact
42
+ For questions, concerns, clarifications, and issues, please contact [Joseph Marvin Imperial](https://www.josephimperial.com/) (jmri20@bath.ac.uk).