dimpo
/

bert-pretrained-wikitext-2-raw-v1

@@ -18,28 +18,43 @@ pipeline_tag: fill-mask
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# bert-pretrained-wikitext-2-raw-v1
-This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
 It achieves the following results on the evaluation set:
 - Loss: 7.9307
-- Masked ml accuracy: 0.1485
-- Nsp accuracy: 0.7891
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -47,13 +62,15 @@ The following hyperparameters were used during training:
 - train_batch_size: 16
 - eval_batch_size: 32
 - seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 20
 ### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Masked ml accuracy | Nsp accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:------------------:|:------------:|
 | 7.9726        | 1.0   | 564   | 7.5680          | 0.1142             | 0.5          |
 | 7.5085        | 2.0   | 1128  | 7.4155          | 0.1329             | 0.5557       |

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# BERT
+This model is a pre-trained version of [BERT](https://huggingface.co/bert-base-uncased) on the [WikiText](https://huggingface.co/datasets/wikitext)
+language modeling dataset for educational purposes (see the [Training BERT from Scratch series on Medium(https://medium.com/p/b048682c795f)]).
+You cannot use it for any production purposes whatsoever.
 It achieves the following results on the evaluation set:
 - Loss: 7.9307
+- Masked Language Modeling (Masked LM) Accuracy: 0.1485
+- Next Sentence Prediction (NSP) Accuracy: 0.7891
 ## Model description
+BERT, which stands for Bidirectional Encoder Representations from Transformers, is a revolutionary Natural Language Processing (NLP) model developed
+by Google in 2018. Its introduction marked a significant advancement in the field, setting new state-of-the-art benchmarks across various NLP tasks.
+For many, this is regarded as the ImageNet moment for the field.
+BERT is pre-trained on a massive amount of data, with one goal: to understand what language is and what’s the meaning of context in a document.
+As a result, this pre-trained model can be fine-tuned for specific tasks such as question-answering or sentiment analysis.
 ## Intended uses & limitations
+This repository contains the model trained for 20 epochs on the WikiText dataset. Please note that the model is not suitable for production use
+and will not provide accurate predictions for Masked Language Modeling tasks.
 ## Training and evaluation data
+The model was trained for 20 epochs on the [WikiText](https://huggingface.co/datasets/wikitext) language modeling dataset using the
+`wikitext-2-raw-v1` subset.
 ## Training procedure
+We usually divide the training of BERT into two distinct phases. The first phase, known as "pre-training," aims to familiarize the model
+with language structure and the contextual significance of words. The second phase, termed "fine-tuning," focuses on adapting the model for specific, useful tasks.
+The model available in this repository has only undergone the pre-training phase.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - train_batch_size: 16
 - eval_batch_size: 32
 - seed: 42
+- optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
 - num_epochs: 20
 ### Training results
+The table below illustrates the model's training progress across the 20 epochs.
+| Training Loss | Epoch | Step  | Validation Loss | Masked LM Accuracy | NSP Accuracy |
 |:-------------:|:-----:|:-----:|:---------------:|:------------------:|:------------:|
 | 7.9726        | 1.0   | 564   | 7.5680          | 0.1142             | 0.5          |
 | 7.5085        | 2.0   | 1128  | 7.4155          | 0.1329             | 0.5557       |