How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("fill-mask", model="moussaKam/barthez")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("moussaKam/barthez")
model = AutoModelForSeq2SeqLM.from_pretrained("moussaKam/barthez")
Quick Links

A french sequence to sequence pretrained model based on BART.
BARThez is pretrained by learning to reconstruct a corrupted input sentence. A corpus of 66GB of french raw text is used to carry out the pretraining.
Unlike already existing BERT-based French language models such as CamemBERT and FlauBERT, BARThez is particularly well-suited for generative tasks (such as abstractive summarization), since not only its encoder but also its decoder is pretrained.

In addition to BARThez that is pretrained from scratch, we continue the pretraining of a multilingual BART mBART which boosted its performance in both discriminative and generative tasks. We call the french adapted version mBARThez.

Model Architecture #layers #params
BARThez BASE 12 165M
mBARThez LARGE 24 458M

paper: https://arxiv.org/abs/2010.12321
github: https://github.com/moussaKam/BARThez

@article{eddine2020barthez,
  title={BARThez: a Skilled Pretrained French Sequence-to-Sequence Model},
  author={Eddine, Moussa Kamal and Tixier, Antoine J-P and Vazirgiannis, Michalis},
  journal={arXiv preprint arXiv:2010.12321},
  year={2020}
}
Downloads last month
1,983
Inference Providers NEW
Examples
No mask token found for this model.

Spaces using moussaKam/barthez 5

Paper for moussaKam/barthez