facebook
/

data2vec-text-base

@@ -36,7 +36,7 @@ the art or competitive performance to predominant approaches.*
 ## Intended uses & limitations
-You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
 See the [model hub](https://huggingface.co/models?filter=data2vec-text) to look for fine-tuned versions on a task that
 interests you.
@@ -44,105 +44,6 @@ Note that this model is primarily aimed at being fine-tuned on tasks that use th
 to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
 generation you should look at model like GPT2.
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='facebook/data2vec-text-base')
->>> unmasker("Hello I'm a <mask> model.")
-[{'sequence': "<s>Hello I'm a male model.</s>",
-  'score': 0.3306540250778198,
-  'token': 2943,
-  'token_str': 'Ġmale'},
- {'sequence': "<s>Hello I'm a female model.</s>",
-  'score': 0.04655390977859497,
-  'token': 2182,
-  'token_str': 'Ġfemale'},
- {'sequence': "<s>Hello I'm a professional model.</s>",
-  'score': 0.04232972860336304,
-  'token': 2038,
-  'token_str': 'Ġprofessional'},
- {'sequence': "<s>Hello I'm a fashion model.</s>",
-  'score': 0.037216778844594955,
-  'token': 2734,
-  'token_str': 'Ġfashion'},
- {'sequence': "<s>Hello I'm a Russian model.</s>",
-  'score': 0.03253649175167084,
-  'token': 1083,
-  'token_str': 'ĠRussian'}]
-```
-Here is how to use this model to get the features of a given text in PyTorch:
-```python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained('facebook/data2vec-text-base')
-model = AutoModel.from_pretrained('facebook/data2vec-text-base')
-text = "Replace me by any text you'd like."
-encoded_input = tokenizer(text, return_tensors='pt')
-output = model(**encoded_input)
-```
-### Limitations and bias
-The training data used for this model contains a lot of unfiltered content from the internet, which is far from
-neutral. Therefore, the model can have biased predictions:
-```python
->>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='facebook/data2vec-text-base')
->>> unmasker("The man worked as a <mask>.")
-[{'sequence': '<s>The man worked as a mechanic.</s>',
-  'score': 0.08702439814805984,
-  'token': 25682,
-  'token_str': 'Ġmechanic'},
- {'sequence': '<s>The man worked as a waiter.</s>',
-  'score': 0.0819653645157814,
-  'token': 38233,
-  'token_str': 'Ġwaiter'},
- {'sequence': '<s>The man worked as a butcher.</s>',
-  'score': 0.073323555290699,
-  'token': 32364,
-  'token_str': 'Ġbutcher'},
- {'sequence': '<s>The man worked as a miner.</s>',
-  'score': 0.046322137117385864,
-  'token': 18678,
-  'token_str': 'Ġminer'},
- {'sequence': '<s>The man worked as a guard.</s>',
-  'score': 0.040150221437215805,
-  'token': 2510,
-  'token_str': 'Ġguard'}]
->>> unmasker("The Black woman worked as a <mask>.")
-[{'sequence': '<s>The Black woman worked as a waitress.</s>',
-  'score': 0.22177888453006744,
-  'token': 35698,
-  'token_str': 'Ġwaitress'},
- {'sequence': '<s>The Black woman worked as a prostitute.</s>',
-  'score': 0.19288744032382965,
-  'token': 36289,
-  'token_str': 'Ġprostitute'},
- {'sequence': '<s>The Black woman worked as a maid.</s>',
-  'score': 0.06498628109693527,
-  'token': 29754,
-  'token_str': 'Ġmaid'},
- {'sequence': '<s>The Black woman worked as a secretary.</s>',
-  'score': 0.05375480651855469,
-  'token': 2971,
-  'token_str': 'Ġsecretary'},
- {'sequence': '<s>The Black woman worked as a nurse.</s>',
-  'score': 0.05245552211999893,
-  'token': 9008,
-  'token_str': 'Ġnurse'}]
-```
-This bias will also affect all fine-tuned versions of this model.
 ## Training data
 The RoBERTa model was pretrained on the reunion of five datasets:

 ## Intended uses & limitations
+The model is intended to be fine-tuned on a downstream task.
 See the [model hub](https://huggingface.co/models?filter=data2vec-text) to look for fine-tuned versions on a task that
 interests you.
 to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
 generation you should look at model like GPT2.
 ## Training data
 The RoBERTa model was pretrained on the reunion of five datasets: