Bugpie commited on
Commit
0c8cf28
·
1 Parent(s): 22a3072

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -13,6 +13,20 @@ CamemBERT is a state-of-the-art language model for French based on the RoBERTa m
13
 
14
  The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## How to use
17
 
18
  -**Filling masks using pipeline**
@@ -42,18 +56,4 @@ The model developers evaluated CamemBERT using four different downstream tasks f
42
  'token': 1654,
43
  'token_str': 'parfait',
44
  'sequence': 'Le camembert est parfait :)'}]
45
- ```
46
-
47
- ## Limitations and bias
48
-
49
- Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
50
-
51
- This model was pretrinaed on a subcorpus of OSCAR multilingual corpus. Some of the limitations and risks associated with the OSCAR dataset, which are further detailed in the [OSCAR dataset card](https://huggingface.co/datasets/oscar), include the following:
52
-
53
- > The quality of some OSCAR sub-corpora might be lower than expected, specifically for the lowest-resource languages.
54
-
55
- > Constructed from Common Crawl, Personal and sensitive information might be present.
56
-
57
- ## Training data
58
-
59
- OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the Ungoliant architecture.
 
13
 
14
  The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
15
 
16
+ ## Limitations and bias
17
+
18
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
19
+
20
+ This model was pretrinaed on a subcorpus of OSCAR multilingual corpus. Some of the limitations and risks associated with the OSCAR dataset, which are further detailed in the [OSCAR dataset card](https://huggingface.co/datasets/oscar), include the following:
21
+
22
+ > The quality of some OSCAR sub-corpora might be lower than expected, specifically for the lowest-resource languages.
23
+
24
+ > Constructed from Common Crawl, Personal and sensitive information might be present.
25
+
26
+ ## Training data
27
+
28
+ OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the Ungoliant architecture.
29
+
30
  ## How to use
31
 
32
  -**Filling masks using pipeline**
 
56
  'token': 1654,
57
  'token_str': 'parfait',
58
  'sequence': 'Le camembert est parfait :)'}]
59
+ ```