Typo corrections
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ datasets:
|
|
| 5 |
- wikipedia
|
| 6 |
---
|
| 7 |
|
| 8 |
-
#
|
| 9 |
|
| 10 |
Pretrained model on French language using a masked language modeling (MLM) objective. It was introduced in
|
| 11 |
[this paper](https://arxiv.org/abs/1909.11942) and first released in
|
|
@@ -14,7 +14,7 @@ between french and French.
|
|
| 14 |
|
| 15 |
## Model description
|
| 16 |
|
| 17 |
-
|
| 18 |
was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
|
| 19 |
publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
|
| 20 |
was pretrained with two objectives:
|
|
@@ -24,13 +24,13 @@ was pretrained with two objectives:
|
|
| 24 |
recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
|
| 25 |
GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
|
| 26 |
sentence.
|
| 27 |
-
- Sentence Ordering Prediction (SOP):
|
| 28 |
|
| 29 |
This way, the model learns an inner representation of the English language that can then be used to extract features
|
| 30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
| 31 |
-
classifier using the features produced by the
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
This is the first version of the base model.
|
| 36 |
|
|
@@ -87,7 +87,7 @@ output = model(encoded_input)
|
|
| 87 |
|
| 88 |
## Training data
|
| 89 |
|
| 90 |
-
The
|
| 91 |
headers).
|
| 92 |
|
| 93 |
## Training procedure
|
|
@@ -103,7 +103,7 @@ then of the form:
|
|
| 103 |
|
| 104 |
### Training
|
| 105 |
|
| 106 |
-
The
|
| 107 |
|
| 108 |
The details of the masking procedure for each sentence are the following:
|
| 109 |
- 15% of the tokens are masked.
|
|
|
|
| 5 |
- wikipedia
|
| 6 |
---
|
| 7 |
|
| 8 |
+
# FrALBERT Base
|
| 9 |
|
| 10 |
Pretrained model on French language using a masked language modeling (MLM) objective. It was introduced in
|
| 11 |
[this paper](https://arxiv.org/abs/1909.11942) and first released in
|
|
|
|
| 14 |
|
| 15 |
## Model description
|
| 16 |
|
| 17 |
+
FrALBERT is a transformers model pretrained on 4Go of French Wikipedia in a self-supervised fashion. This means it
|
| 18 |
was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of
|
| 19 |
publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
|
| 20 |
was pretrained with two objectives:
|
|
|
|
| 24 |
recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
|
| 25 |
GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the
|
| 26 |
sentence.
|
| 27 |
+
- Sentence Ordering Prediction (SOP): FrALBERT uses a pretraining loss based on predicting the ordering of two consecutive segments of text.
|
| 28 |
|
| 29 |
This way, the model learns an inner representation of the English language that can then be used to extract features
|
| 30 |
useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard
|
| 31 |
+
classifier using the features produced by the FrALBERT model as inputs.
|
| 32 |
|
| 33 |
+
FrALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
|
| 34 |
|
| 35 |
This is the first version of the base model.
|
| 36 |
|
|
|
|
| 87 |
|
| 88 |
## Training data
|
| 89 |
|
| 90 |
+
The FrALBERT model was pretrained on 4go of [French Wikipedia](https://fr.wikipedia.org/wiki/French_Wikipedia) (excluding lists, tables and
|
| 91 |
headers).
|
| 92 |
|
| 93 |
## Training procedure
|
|
|
|
| 103 |
|
| 104 |
### Training
|
| 105 |
|
| 106 |
+
The FrALBERT procedure follows the BERT setup.
|
| 107 |
|
| 108 |
The details of the masking procedure for each sentence are the following:
|
| 109 |
- 15% of the tokens are masked.
|