Spaces:
Runtime error
Runtime error
| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| --> | |
| # BERT | |
| <div class="flex flex-wrap space-x-1"> | |
| <a href="https://huggingface.co/models?filter=bert"> | |
| <img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet"> | |
| </a> | |
| <a href="https://huggingface.co/spaces/docs-demos/bert-base-uncased"> | |
| <img alt="Spaces" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"> | |
| </a> | |
| </div> | |
| ## Overview | |
| The BERT model was proposed in [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a | |
| bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence | |
| prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia. | |
| The abstract from the paper is the following: | |
| *We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations | |
| from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional | |
| representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, | |
| the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models | |
| for a wide range of tasks, such as question answering and language inference, without substantial task-specific | |
| architecture modifications.* | |
| *BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural | |
| language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI | |
| accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute | |
| improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).* | |
| Tips: | |
| - BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than | |
| the left. | |
| - BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is | |
| efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. | |
| - Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually 15%) is masked by: | |
| * a special mask token with probability 0.8 | |
| * a random token different from the one masked with probability 0.1 | |
| * the same token with probability 0.1 | |
| - The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50% they are not related. The model has to predict if the sentences are consecutive or not. | |
| This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/bert). | |
| ## Resources | |
| A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
| <PipelineTag pipeline="text-classification"/> | |
| - A blog post on [BERT Text Classification in a different language](https://www.philschmid.de/bert-text-classification-in-a-different-language). | |
| - A notebook for [Finetuning BERT (and friends) for multi-label text classification](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/BERT/Fine_tuning_BERT_(and_friends)_for_multi_label_text_classification.ipynb). | |
| - A notebook on how to [Finetune BERT for multi-label classification using PyTorch](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb). 🌎 | |
| - A notebook on how to [warm-start an EncoderDecoder model with BERT for summarization](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb). | |
| - [`BertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb). | |
| - [`TFBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb). | |
| - [`FlaxBertForSequenceClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb). | |
| - [Text classification task guide](../tasks/sequence_classification) | |
| <PipelineTag pipeline="token-classification"/> | |
| - A blog post on how to use [Hugging Face Transformers with Keras: Fine-tune a non-English BERT for Named Entity Recognition](https://www.philschmid.de/huggingface-transformers-keras-tf). | |
| - A notebook for [Finetuning BERT for named-entity recognition](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/Custom_Named_Entity_Recognition_with_BERT_only_first_wordpiece.ipynb) using only the first wordpiece of each word in the word label during tokenization. To propagate the label of the word to all wordpieces, see this [version](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BERT/Custom_Named_Entity_Recognition_with_BERT.ipynb) of the notebook instead. | |
| - [`BertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb). | |
| - [`TFBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb). | |
| - [`FlaxBertForTokenClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification). | |
| - [Token classification](https://huggingface.co/course/chapter7/2?fw=pt) chapter of the 🤗 Hugging Face Course. | |
| - [Token classification task guide](../tasks/token_classification) | |
| <PipelineTag pipeline="fill-mask"/> | |
| - [`BertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb). | |
| - [`TFBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb). | |
| - [`FlaxBertForMaskedLM`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb). | |
| - [Masked language modeling](https://huggingface.co/course/chapter7/3?fw=pt) chapter of the 🤗 Hugging Face Course. | |
| - [Masked language modeling task guide](../tasks/masked_language_modeling) | |
| <PipelineTag pipeline="question-answering"/> | |
| - [`BertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb). | |
| - [`TFBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb). | |
| - [`FlaxBertForQuestionAnswering`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering). | |
| - [Question answering](https://huggingface.co/course/chapter7/7?fw=pt) chapter of the 🤗 Hugging Face Course. | |
| - [Question answering task guide](../tasks/question_answering) | |
| **Multiple choice** | |
| - [`BertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb). | |
| - [`TFBertForMultipleChoice`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb). | |
| - [Multiple choice task guide](../tasks/multiple_choice) | |
| ⚡️ **Inference** | |
| - A blog post on how to [Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia](https://huggingface.co/blog/bert-inferentia-sagemaker). | |
| - A blog post on how to [Accelerate BERT inference with DeepSpeed-Inference on GPUs](https://www.philschmid.de/bert-deepspeed-inference). | |
| ⚙️ **Pretraining** | |
| - A blog post on [Pre-Training BERT with Hugging Face Transformers and Habana Gaudi](https://www.philschmid.de/pre-training-bert-habana). | |
| 🚀 **Deploy** | |
| - A blog post on how to [Convert Transformers to ONNX with Hugging Face Optimum](https://www.philschmid.de/convert-transformers-to-onnx). | |
| - A blog post on how to [Setup Deep Learning environment for Hugging Face Transformers with Habana Gaudi on AWS](https://www.philschmid.de/getting-started-habana-gaudi#conclusion). | |
| - A blog post on [Autoscaling BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module](https://www.philschmid.de/terraform-huggingface-amazon-sagemaker-advanced). | |
| - A blog post on [Serverless BERT with HuggingFace, AWS Lambda, and Docker](https://www.philschmid.de/serverless-bert-with-huggingface-aws-lambda-docker). | |
| - A blog post on [Hugging Face Transformers BERT fine-tuning using Amazon SageMaker and Training Compiler](https://www.philschmid.de/huggingface-amazon-sagemaker-training-compiler). | |
| - A blog post on [Task-specific knowledge distillation for BERT using Transformers & Amazon SageMaker](https://www.philschmid.de/knowledge-distillation-bert-transformers). | |
| ## BertConfig | |
| [[autodoc]] BertConfig | |
| - all | |
| ## BertTokenizer | |
| [[autodoc]] BertTokenizer | |
| - build_inputs_with_special_tokens | |
| - get_special_tokens_mask | |
| - create_token_type_ids_from_sequences | |
| - save_vocabulary | |
| ## BertTokenizerFast | |
| [[autodoc]] BertTokenizerFast | |
| ## TFBertTokenizer | |
| [[autodoc]] TFBertTokenizer | |
| ## Bert specific outputs | |
| [[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput | |
| [[autodoc]] models.bert.modeling_tf_bert.TFBertForPreTrainingOutput | |
| [[autodoc]] models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput | |
| ## BertModel | |
| [[autodoc]] BertModel | |
| - forward | |
| ## BertForPreTraining | |
| [[autodoc]] BertForPreTraining | |
| - forward | |
| ## BertLMHeadModel | |
| [[autodoc]] BertLMHeadModel | |
| - forward | |
| ## BertForMaskedLM | |
| [[autodoc]] BertForMaskedLM | |
| - forward | |
| ## BertForNextSentencePrediction | |
| [[autodoc]] BertForNextSentencePrediction | |
| - forward | |
| ## BertForSequenceClassification | |
| [[autodoc]] BertForSequenceClassification | |
| - forward | |
| ## BertForMultipleChoice | |
| [[autodoc]] BertForMultipleChoice | |
| - forward | |
| ## BertForTokenClassification | |
| [[autodoc]] BertForTokenClassification | |
| - forward | |
| ## BertForQuestionAnswering | |
| [[autodoc]] BertForQuestionAnswering | |
| - forward | |
| ## TFBertModel | |
| [[autodoc]] TFBertModel | |
| - call | |
| ## TFBertForPreTraining | |
| [[autodoc]] TFBertForPreTraining | |
| - call | |
| ## TFBertModelLMHeadModel | |
| [[autodoc]] TFBertLMHeadModel | |
| - call | |
| ## TFBertForMaskedLM | |
| [[autodoc]] TFBertForMaskedLM | |
| - call | |
| ## TFBertForNextSentencePrediction | |
| [[autodoc]] TFBertForNextSentencePrediction | |
| - call | |
| ## TFBertForSequenceClassification | |
| [[autodoc]] TFBertForSequenceClassification | |
| - call | |
| ## TFBertForMultipleChoice | |
| [[autodoc]] TFBertForMultipleChoice | |
| - call | |
| ## TFBertForTokenClassification | |
| [[autodoc]] TFBertForTokenClassification | |
| - call | |
| ## TFBertForQuestionAnswering | |
| [[autodoc]] TFBertForQuestionAnswering | |
| - call | |
| ## FlaxBertModel | |
| [[autodoc]] FlaxBertModel | |
| - __call__ | |
| ## FlaxBertForPreTraining | |
| [[autodoc]] FlaxBertForPreTraining | |
| - __call__ | |
| ## FlaxBertForCausalLM | |
| [[autodoc]] FlaxBertForCausalLM | |
| - __call__ | |
| ## FlaxBertForMaskedLM | |
| [[autodoc]] FlaxBertForMaskedLM | |
| - __call__ | |
| ## FlaxBertForNextSentencePrediction | |
| [[autodoc]] FlaxBertForNextSentencePrediction | |
| - __call__ | |
| ## FlaxBertForSequenceClassification | |
| [[autodoc]] FlaxBertForSequenceClassification | |
| - __call__ | |
| ## FlaxBertForMultipleChoice | |
| [[autodoc]] FlaxBertForMultipleChoice | |
| - __call__ | |
| ## FlaxBertForTokenClassification | |
| [[autodoc]] FlaxBertForTokenClassification | |
| - __call__ | |
| ## FlaxBertForQuestionAnswering | |
| [[autodoc]] FlaxBertForQuestionAnswering | |
| - __call__ | |