Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,20 @@ CamemBERT is a state-of-the-art language model for French based on the RoBERTa m
|
|
| 13 |
|
| 14 |
The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
## How to use
|
| 17 |
|
| 18 |
-**Filling masks using pipeline**
|
|
@@ -42,18 +56,4 @@ The model developers evaluated CamemBERT using four different downstream tasks f
|
|
| 42 |
'token': 1654,
|
| 43 |
'token_str': 'parfait',
|
| 44 |
'sequence': 'Le camembert est parfait :)'}]
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
## Limitations and bias
|
| 48 |
-
|
| 49 |
-
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
| 50 |
-
|
| 51 |
-
This model was pretrinaed on a subcorpus of OSCAR multilingual corpus. Some of the limitations and risks associated with the OSCAR dataset, which are further detailed in the [OSCAR dataset card](https://huggingface.co/datasets/oscar), include the following:
|
| 52 |
-
|
| 53 |
-
> The quality of some OSCAR sub-corpora might be lower than expected, specifically for the lowest-resource languages.
|
| 54 |
-
|
| 55 |
-
> Constructed from Common Crawl, Personal and sensitive information might be present.
|
| 56 |
-
|
| 57 |
-
## Training data
|
| 58 |
-
|
| 59 |
-
OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the Ungoliant architecture.
|
|
|
|
| 13 |
|
| 14 |
The model developers evaluated CamemBERT using four different downstream tasks for French: part-of-speech (POS) tagging, dependency parsing, named entity recognition (NER) and natural language inference (NLI).
|
| 15 |
|
| 16 |
+
## Limitations and bias
|
| 17 |
+
|
| 18 |
+
Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).
|
| 19 |
+
|
| 20 |
+
This model was pretrinaed on a subcorpus of OSCAR multilingual corpus. Some of the limitations and risks associated with the OSCAR dataset, which are further detailed in the [OSCAR dataset card](https://huggingface.co/datasets/oscar), include the following:
|
| 21 |
+
|
| 22 |
+
> The quality of some OSCAR sub-corpora might be lower than expected, specifically for the lowest-resource languages.
|
| 23 |
+
|
| 24 |
+
> Constructed from Common Crawl, Personal and sensitive information might be present.
|
| 25 |
+
|
| 26 |
+
## Training data
|
| 27 |
+
|
| 28 |
+
OSCAR or Open Super-large Crawled Aggregated coRpus is a multilingual corpus obtained by language classification and filtering of the Common Crawl corpus using the Ungoliant architecture.
|
| 29 |
+
|
| 30 |
## How to use
|
| 31 |
|
| 32 |
-**Filling masks using pipeline**
|
|
|
|
| 56 |
'token': 1654,
|
| 57 |
'token_str': 'parfait',
|
| 58 |
'sequence': 'Le camembert est parfait :)'}]
|
| 59 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|