| <!--Copyright 2020 The HuggingFace Team. All rights reserved. |
|
|
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
|
|
| http: |
|
|
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations under the License. |
| --> |
|
|
| # 🤗 Transformers |
|
|
|
|
| Estado da Arte para Aprendizado de Máquina em PyTorch, TensorFlow e JAX. |
| O 🤗 Transformers disponibiliza APIs para facilmente baixar e treinar modelos pré-treinados de última geração. |
| O uso de modelos pré-treinados pode diminuir os seus custos de computação, a sua pegada de carbono, além de economizar o |
| tempo necessário para se treinar um modelo do zero. Os modelos podem ser usados para diversas tarefas: |
|
|
| |
| |
| |
| |
| documentos escaneados, classificação de vídeo, perguntas e respostas visuais. |
|
|
| Nossa biblioteca aceita integração contínua entre três das bibliotecas mais populares de aprendizado profundo: |
| Our library supports seamless integration between three of the most popular deep learning libraries: |
| [PyTorch](https: |
| Treine seu modelo em três linhas de código em um framework, e carregue-o para execução em outro. |
|
|
| Cada arquitetura 🤗 Transformers é definida em um módulo individual do Python, para que seja facilmente customizável para pesquisa e experimentos. |
|
|
| ## Se você estiver procurando suporte do time da Hugging Face, acesse |
|
|
| <a target="_blank" href="https://huggingface.co/support"> |
| <img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="width: 100%; max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);"></img> |
| </a> |
|
|
| ## Conteúdo |
|
|
| A documentação é dividida em cinco partes: |
| - **INÍCIO** contém um tour rápido de instalação e instruções para te dar um empurrão inicial com os 🤗 Transformers. |
| - **TUTORIAIS** são perfeitos para começar a aprender sobre a nossa biblioteca. Essa seção irá te ajudar a desenvolver |
| habilidades básicas necessárias para usar o 🤗 Transformers. |
| - **GUIAS PRÁTICOS** irão te mostrar como alcançar um certo objetivo, como o fine-tuning de um modelo pré-treinado |
| para modelamento de idioma, ou como criar um cabeçalho personalizado para um modelo. |
| - **GUIAS CONCEITUAIS** te darão mais discussões e explicações dos conceitos fundamentais e idéias por trás dos modelos, |
| tarefas e da filosofia de design por trás do 🤗 Transformers. |
| - **API** descreve o funcionamento de cada classe e função, agrupada em: |
|
|
| - **CLASSES PRINCIPAIS** para as classes que expõe as APIs importantes da biblioteca. |
| - **MODELOS** para as classes e funções relacionadas à cada modelo implementado na biblioteca. |
| - **AUXILIARES INTERNOS** para as classes e funções usadas internamente. |
|
|
| Atualmente a biblioteca contém implementações do PyTorch, TensorFlow e JAX, pesos para modelos pré-treinados e scripts de uso e conversão de utilidades para os seguintes modelos: |
|
|
| ### Modelos atuais |
|
|
| <!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! --> |
|
|
| 1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https: |
| 1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https: |
| 1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https: |
| 1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https: |
| 1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https: |
| 1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https: |
| 1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https: |
| 1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
| 1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
| 1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https: |
| 1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
| 1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https: |
| 1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https: |
| 1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https: |
| 1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https: |
| 1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https: |
| 1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https: |
| 1. **[ConvNeXTV2](model_doc/convnextv2)** (from Facebook AI) released with the paper [ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders](https: |
| 1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https: |
| 1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https: |
| 1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https: |
| 1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https: |
| 1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https: |
| 1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
| 1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https: |
| 1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https: |
| 1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https: |
| 1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https: |
| 1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https: |
| 1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https: |
| 1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https: |
| 1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https: |
| 1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https: |
| 1. **[EfficientNet](model_doc/efficientnet)** (from Google Research) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https: |
| 1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https: |
| 1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https: |
| 1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https: |
| 1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https: |
| 1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https: |
| 1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https: |
| 1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https: |
| 1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https: |
| 1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https: |
| 1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https: |
| 1. **[GPTSAN-japanese](model_doc/gptsan-japanese)** released in the repository [tanreinama/GPTSAN](https: |
| 1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https: |
| 1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https: |
| 1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https: |
| 1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https: |
| 1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https: |
| 1. **[LayoutXLM](model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https: |
| 1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
| 1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https: |
| 1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https: |
| 1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https: |
| 1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https: |
| 1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https: |
| 1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http: |
| 1. **[Mask2Former](model_doc/mask2former)** (from FAIR and UIUC) released with the paper [Masked-attention Mask Transformer for Universal Image Segmentation](https: |
| 1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https: |
| 1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https: |
| 1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https: |
| 1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
| 1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https: |
| 1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https: |
| 1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https: |
| 1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https: |
| 1. **[OneFormer](model_doc/oneformer)** (from SHI Labs) released with the paper [OneFormer: One Transformer to Rule Universal Image Segmentation](https: |
| 1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https: |
| 1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https: |
| 1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https: |
| 1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https: |
| 1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https: |
| 1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
| 1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https: |
| 1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https: |
| 1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https: |
| 1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https: |
| 1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https: |
| 1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https: |
| 1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https: |
| 1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https: |
| 1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https: |
| 1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
| 1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https: |
| 1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https: |
| 1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https: |
| 1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https: |
| 1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https: |
| 1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https: |
| 1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https: |
| 1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https: |
| 1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https: |
| 1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https: |
| 1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https: |
| 1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https: |
| 1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https: |
| 1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https: |
| 1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https: |
| 1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https: |
| 1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https: |
| 1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https: |
| 1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https: |
| 1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https: |
| 1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https: |
| 1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https: |
| 1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https: |
| 1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https: |
| 1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https: |
| 1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https: |
| 1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https: |
| 1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https: |
| 1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https: |
| 1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https: |
| 1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https: |
|
|
|
|
| ### Frameworks aceitos |
|
|
| A tabela abaixo representa a lista de suporte na biblioteca para cada um dos seguintes modelos, caso tenham um tokenizer |
| do Python (chamado de "slow"), ou um tokenizer construído em cima da biblioteca 🤗 Tokenizers (chamado de "fast"). Além |
| disso, são diferenciados pelo suporte em diferentes frameworks: JAX (por meio do Flax); PyTorch; e/ou Tensorflow. |
|
|
| <!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!--> |
|
|
| | Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | |
| |:---------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:| |
| | ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | BART | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | |
| | BERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | BigBird | ✅ | ✅ | ✅ | ❌ | ✅ | |
| | BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | Canine | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | ConvNext | ❌ | ❌ | ✅ | ✅ | ❌ | |
| | CTRL | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | Data2VecVision | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | DeBERTa-v2 | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | DeiT | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | DETR | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | DPR | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | DPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
| | FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | FNet | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | GLPN | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ | |
| | GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ | |
| | Hubert | ❌ | ❌ | ✅ | ✅ | ❌ | |
| | I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | LED | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | Longformer | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | LUKE | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | Marian | ✅ | ❌ | ✅ | ✅ | ✅ | |
| | MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | mBART | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | MPNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | mT5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | PLBart | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | RAG | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | Realm | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | Reformer | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | RegNet | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | ResNet | ❌ | ❌ | ✅ | ❌ | ✅ | |
| | RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | SEW | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
| | Speech2Text | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ | |
| | Splinter | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ | |
| | Swin | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | T5 | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | TAPEX | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | VAN | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | ViLT | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ | |
| | VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ | |
| | VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | ViT | ❌ | ❌ | ✅ | ✅ | ✅ | |
| | ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ | |
| | Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ | |
| | WavLM | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | XGLM | ✅ | ✅ | ✅ | ❌ | ✅ | |
| | XLM | ✅ | ❌ | ✅ | ✅ | ❌ | |
| | XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | |
| | XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ | |
| | XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | |
| | XLNet | ✅ | ✅ | ✅ | ✅ | ❌ | |
| | YOSO | ❌ | ❌ | ✅ | ❌ | ❌ | |
|
|
| <!-- End table--> |
|
|