This model was released on 2020-10-23 and added to Hugging Face Transformers on 2020-11-27.
BARThez
BARThez is a BART model designed for French language tasks. Unlike existing French BERT models, BARThez includes a pretrained encoder-decoder, allowing it to generate text as well. This model is also available as a multilingual variant, mBARThez, by continuing pretraining multilingual BART on a French corpus.
You can find all of the original BARThez checkpoints under the BARThez collection.
This model was contributed by moussakam. Refer to the BART docs for more usage examples.
The example below demonstrates how to predict the <mask> token with [Pipeline], [AutoModel], and from the command line.
import torch
from transformers import pipeline
pipeline = pipeline(
task="fill-mask",
model="moussaKam/barthez",
dtype=torch.float16,
device=0
)
pipeline("Les plantes produisent <mask> grâce à un processus appelé photosynthèse.")
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"moussaKam/barthez",
)
model = AutoModelForMaskedLM.from_pretrained(
"moussaKam/barthez",
dtype=torch.float16,
device_map="auto",
)
inputs = tokenizer("Les plantes produisent <mask> grâce à un processus appelé photosynthèse.", return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print(f"The predicted token is: {predicted_token}")
echo -e "Les plantes produisent <mask> grâce à un processus appelé photosynthèse." | transformers run --task fill-mask --model moussaKam/barthez --device 0
BarthezTokenizer
[[autodoc]] BarthezTokenizer
BarthezTokenizerFast
[[autodoc]] BarthezTokenizerFast