dimpo commited on
Commit
5091563
·
1 Parent(s): 38a3192

Update README

Browse files

Complete the different sections in the README file.

Files changed (1) hide show
  1. README.md +26 -9
README.md CHANGED
@@ -18,28 +18,43 @@ pipeline_tag: fill-mask
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
- # bert-pretrained-wikitext-2-raw-v1
 
 
 
 
22
 
23
- This model is a fine-tuned version of [](https://huggingface.co/) on the None dataset.
24
  It achieves the following results on the evaluation set:
25
  - Loss: 7.9307
26
- - Masked ml accuracy: 0.1485
27
- - Nsp accuracy: 0.7891
28
 
29
  ## Model description
30
 
31
- More information needed
 
 
 
 
 
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
 
36
 
37
  ## Training and evaluation data
38
 
39
- More information needed
 
40
 
41
  ## Training procedure
42
 
 
 
 
 
 
43
  ### Training hyperparameters
44
 
45
  The following hyperparameters were used during training:
@@ -47,13 +62,15 @@ The following hyperparameters were used during training:
47
  - train_batch_size: 16
48
  - eval_batch_size: 32
49
  - seed: 42
50
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
51
  - lr_scheduler_type: linear
52
  - num_epochs: 20
53
 
54
  ### Training results
55
 
56
- | Training Loss | Epoch | Step | Validation Loss | Masked ml accuracy | Nsp accuracy |
 
 
57
  |:-------------:|:-----:|:-----:|:---------------:|:------------------:|:------------:|
58
  | 7.9726 | 1.0 | 564 | 7.5680 | 0.1142 | 0.5 |
59
  | 7.5085 | 2.0 | 1128 | 7.4155 | 0.1329 | 0.5557 |
 
18
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
  should probably proofread and complete it, then remove this comment. -->
20
 
21
+ # BERT
22
+
23
+ This model is a pre-trained version of [BERT](https://huggingface.co/bert-base-uncased) on the [WikiText](https://huggingface.co/datasets/wikitext)
24
+ language modeling dataset for educational purposes (see the [Training BERT from Scratch series on Medium(https://medium.com/p/b048682c795f)]).
25
+ You cannot use it for any production purposes whatsoever.
26
 
 
27
  It achieves the following results on the evaluation set:
28
  - Loss: 7.9307
29
+ - Masked Language Modeling (Masked LM) Accuracy: 0.1485
30
+ - Next Sentence Prediction (NSP) Accuracy: 0.7891
31
 
32
  ## Model description
33
 
34
+ BERT, which stands for Bidirectional Encoder Representations from Transformers, is a revolutionary Natural Language Processing (NLP) model developed
35
+ by Google in 2018. Its introduction marked a significant advancement in the field, setting new state-of-the-art benchmarks across various NLP tasks.
36
+ For many, this is regarded as the ImageNet moment for the field.
37
+
38
+ BERT is pre-trained on a massive amount of data, with one goal: to understand what language is and what’s the meaning of context in a document.
39
+ As a result, this pre-trained model can be fine-tuned for specific tasks such as question-answering or sentiment analysis.
40
 
41
  ## Intended uses & limitations
42
 
43
+ This repository contains the model trained for 20 epochs on the WikiText dataset. Please note that the model is not suitable for production use
44
+ and will not provide accurate predictions for Masked Language Modeling tasks.
45
 
46
  ## Training and evaluation data
47
 
48
+ The model was trained for 20 epochs on the [WikiText](https://huggingface.co/datasets/wikitext) language modeling dataset using the
49
+ `wikitext-2-raw-v1` subset.
50
 
51
  ## Training procedure
52
 
53
+ We usually divide the training of BERT into two distinct phases. The first phase, known as "pre-training," aims to familiarize the model
54
+ with language structure and the contextual significance of words. The second phase, termed "fine-tuning," focuses on adapting the model for specific, useful tasks.
55
+
56
+ The model available in this repository has only undergone the pre-training phase.
57
+
58
  ### Training hyperparameters
59
 
60
  The following hyperparameters were used during training:
 
62
  - train_batch_size: 16
63
  - eval_batch_size: 32
64
  - seed: 42
65
+ - optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
66
  - lr_scheduler_type: linear
67
  - num_epochs: 20
68
 
69
  ### Training results
70
 
71
+ The table below illustrates the model's training progress across the 20 epochs.
72
+
73
+ | Training Loss | Epoch | Step | Validation Loss | Masked LM Accuracy | NSP Accuracy |
74
  |:-------------:|:-----:|:-----:|:---------------:|:------------------:|:------------:|
75
  | 7.9726 | 1.0 | 564 | 7.5680 | 0.1142 | 0.5 |
76
  | 7.5085 | 2.0 | 1128 | 7.4155 | 0.1329 | 0.5557 |