| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| |
|
| | # BERT |
| |
|
| | <div class="flex flex-wrap space-x-1"> |
| | <a href="https://huggingface.co/models?filter=bert"> |
| | <img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet"> |
| | </a> |
| | <a href="https://huggingface.co/spaces/docs-demos/bert-base-uncased"> |
| | <img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"> |
| | </a> |
| | </div> |
| |
|
| | ## Overview |
| |
|
| | The BERT model was proposed in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a |
| | bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence |
| | prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia. |
| |
|
| | The abstract from the paper is the following: |
| |
|
| | *We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations |
| | from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional |
| | representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, |
| | the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models |
| | for a wide range of tasks, such as question answering and language inference, without substantial task-specific |
| | architecture modifications.* |
| |
|
| | *BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural |
| | language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI |
| | accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute |
| | improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).* |
| |
|
| | Tips: |
| |
|
| | - BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than |
| | the left. |
| | - BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is |
| | efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. |
| | - Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually 15%) is masked by: |
| |
|
| | * a special mask token with probability 0.8 |
| | * a random token different from the one masked with probability 0.1 |
| | * the same token with probability 0.1 |
| | |
| | - The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50% they are not related. The model has to predict if the sentences are consecutive or not. |
| |
|
| |
|
| |
|
| | This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/bert). |
| |
|
| | ## Resources |
| |
|
| | A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. |
| |
|
| | <PipelineTag pipeline="text-classification"/> |
| |
|
| | - A blog post on [BERT Text Classification in a different language](https://www.philschmid.de/bert-text-classification-in-a-different-language). |
| | - A notebook for [Finetuning BERT (and friends) for multi-label text classification](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/BERT/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb). |
| | - A notebook on how to [Finetune BERT for multi-label classification using PyTorch](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb). 🌎 |
| | - A notebook on how to [warm-start an EncoderDecoder model with BERT for summarization](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb). |
| | - [`BertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). |
| | - [`TFBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). |
| | - [`FlaxBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). |
| | - [Text classification task guide](../tasks/sequence_classification) |
| |
|
| | <PipelineTag pipeline="token-classification"/> |
| |
|
| | - A blog post on how to use [Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition](https://www.philschmid.de/huggingface-transformers-keras-tf). |
| | - A notebook for [Finetuning BERT for named-entity recognition](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Custom_Named_Entity_Recognition_with_BERT_only_first_wordpiece.ipynb) using only the first wordpiece of each word in the word label during tokenization. To propagate the label of the word to all wordpieces, see this [version](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Custom_Named_Entity_Recognition_with_BERT.ipynb) of the notebook instead. |
| | - [`BertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). |
| | - [`TFBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). |
| | - [`FlaxBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). |
| | - [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course. |
| | - [Token classification task guide](../tasks/token_classification) |
| |
|
| | <PipelineTag pipeline="fill-mask"/> |
| |
|
| | - [`BertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). |
| | - [`TFBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). |
| | - [`FlaxBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). |
| | - [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course. |
| | - [Masked language modeling task guide](../tasks/masked_language_modeling) |
| |
|
| | <PipelineTag pipeline="question-answering"/> |
| |
|
| | - [`BertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). |
| | - [`TFBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). |
| | - [`FlaxBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering). |
| | - [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course. |
| | - [Question answering task guide](../tasks/question_answering) |
| |
|
| | **Multiple choice** |
| | - [`BertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). |
| | - [`TFBertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). |
| | - [Multiple choice task guide](../tasks/multiple_choice) |
| |
|
| | ⚡️ **Inference** |
| | - A blog post on how to [Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia](https://huggingface.co/blog/bert-inferentia-sagemaker). |
| | - A blog post on how to [Accelerate BERT inference with DeepSpeed-Inference on GPUs](https://www.philschmid.de/bert-deepspeed-inference). |
| |
|
| | ⚙️ **Pretraining** |
| | - A blog post on [Pre-Training BERT with Hugging Face Transformers and Habana Gaudi](https://www.philschmid.de/pre-training-bert-habana). |
| |
|
| | 🚀 **Deploy** |
| | - A blog post on how to [Convert Transformers to ONNX with Hugging Face Optimum](https://www.philschmid.de/convert-transformers-to-onnx). |
| | - A blog post on how to [Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS](https://www.philschmid.de/getting-started-habana-gaudi#conclusion). |
| | - A blog post on [Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module](https://www.philschmid.de/terraform-huggingface-amazon-sagemaker-advanced). |
| | - A blog post on [Serverless BERT with HuggingFace, AWS Lambda, and Docker](https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker). |
| | - A blog post on [Hugging Face Transformers BERT fine-tuning using Amazon SageMaker and Training Compiler](https://www.philschmid.de/huggingface-amazon-sagemaker-training-compiler). |
| | - A blog post on [Task-specific knowledge distillation for BERT using Transformers & Amazon SageMaker](https://www.philschmid.de/knowledge-distillation-bert-transformers). |
| |
|
| | ## BertConfig |
| |
|
| | [[autodoc]] BertConfig |
| | - all |
| |
|
| | ## BertTokenizer |
| |
|
| | [[autodoc]] BertTokenizer |
| | - build_inputs_with_special_tokens |
| | - get_special_tokens_mask |
| | - create_token_type_ids_from_sequences |
| | - save_vocabulary |
| |
|
| | ## BertTokenizerFast |
| |
|
| | [[autodoc]] BertTokenizerFast |
| |
|
| | ## TFBertTokenizer |
| |
|
| | [[autodoc]] TFBertTokenizer |
| |
|
| | ## Bert specific outputs |
| |
|
| | [[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput |
| |
|
| | [[autodoc]] models.bert.modeling_tf_bert.TFBertForPreTrainingOutput |
| |
|
| | [[autodoc]] models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput |
| |
|
| | ## BertModel |
| |
|
| | [[autodoc]] BertModel |
| | - forward |
| |
|
| | ## BertForPreTraining |
| |
|
| | [[autodoc]] BertForPreTraining |
| | - forward |
| |
|
| | ## BertLMHeadModel |
| |
|
| | [[autodoc]] BertLMHeadModel |
| | - forward |
| |
|
| | ## BertForMaskedLM |
| |
|
| | [[autodoc]] BertForMaskedLM |
| | - forward |
| |
|
| | ## BertForNextSentencePrediction |
| |
|
| | [[autodoc]] BertForNextSentencePrediction |
| | - forward |
| |
|
| | ## BertForSequenceClassification |
| |
|
| | [[autodoc]] BertForSequenceClassification |
| | - forward |
| |
|
| | ## BertForMultipleChoice |
| |
|
| | [[autodoc]] BertForMultipleChoice |
| | - forward |
| |
|
| | ## BertForTokenClassification |
| |
|
| | [[autodoc]] BertForTokenClassification |
| | - forward |
| |
|
| | ## BertForQuestionAnswering |
| |
|
| | [[autodoc]] BertForQuestionAnswering |
| | - forward |
| |
|
| | ## TFBertModel |
| |
|
| | [[autodoc]] TFBertModel |
| | - call |
| |
|
| | ## TFBertForPreTraining |
| |
|
| | [[autodoc]] TFBertForPreTraining |
| | - call |
| |
|
| | ## TFBertModelLMHeadModel |
| |
|
| | [[autodoc]] TFBertLMHeadModel |
| | - call |
| |
|
| | ## TFBertForMaskedLM |
| |
|
| | [[autodoc]] TFBertForMaskedLM |
| | - call |
| |
|
| | ## TFBertForNextSentencePrediction |
| |
|
| | [[autodoc]] TFBertForNextSentencePrediction |
| | - call |
| |
|
| | ## TFBertForSequenceClassification |
| |
|
| | [[autodoc]] TFBertForSequenceClassification |
| | - call |
| |
|
| | ## TFBertForMultipleChoice |
| |
|
| | [[autodoc]] TFBertForMultipleChoice |
| | - call |
| |
|
| | ## TFBertForTokenClassification |
| |
|
| | [[autodoc]] TFBertForTokenClassification |
| | - call |
| |
|
| | ## TFBertForQuestionAnswering |
| |
|
| | [[autodoc]] TFBertForQuestionAnswering |
| | - call |
| |
|
| | ## FlaxBertModel |
| |
|
| | [[autodoc]] FlaxBertModel |
| | - __call__ |
| |
|
| | ## FlaxBertForPreTraining |
| |
|
| | [[autodoc]] FlaxBertForPreTraining |
| | - __call__ |
| |
|
| | ## FlaxBertForCausalLM |
| |
|
| | [[autodoc]] FlaxBertForCausalLM |
| | - __call__ |
| |
|
| | ## FlaxBertForMaskedLM |
| |
|
| | [[autodoc]] FlaxBertForMaskedLM |
| | - __call__ |
| |
|
| | ## FlaxBertForNextSentencePrediction |
| |
|
| | [[autodoc]] FlaxBertForNextSentencePrediction |
| | - __call__ |
| |
|
| | ## FlaxBertForSequenceClassification |
| |
|
| | [[autodoc]] FlaxBertForSequenceClassification |
| | - __call__ |
| |
|
| | ## FlaxBertForMultipleChoice |
| |
|
| | [[autodoc]] FlaxBertForMultipleChoice |
| | - __call__ |
| |
|
| | ## FlaxBertForTokenClassification |
| |
|
| | [[autodoc]] FlaxBertForTokenClassification |
| | - __call__ |
| |
|
| | ## FlaxBertForQuestionAnswering |
| |
|
| | [[autodoc]] FlaxBertForQuestionAnswering |
| | - __call__ |
| |
|