facebook
/

opt-1.3b

Text Generation

text-generation-inference

Model card Files Files and versions

4 fixes

#33

by icognito - opened Nov 10, 2024

base: refs/heads/main

←

from: refs/pr/33

Discussion Files changed

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -37,7 +37,7 @@ To quote the first two paragraphs of the [official paper](https://arxiv.org/abs/
 ## Model description
 OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
-OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
 For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
 the [official paper](https://arxiv.org/abs/2205.01068).
@@ -128,14 +128,14 @@ dataset that was used in RoBERTa (Liu et al., 2019b)
 The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
 to each dataset’s size in the pretraining corpus.
-The dataset might contains offensive content as parts of the dataset are a subset of
 public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
 that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
 ### Collection process
-The dataset was collected form internet, and went through classic data processing algorithms  and
-re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or
 *This ebook by Project Gutenberg.*
 ## Training procedure

 ## Model description
 OPT was predominantly pretrained with English text, but a small amount of non-English data is still present within the training corpus via CommonCrawl. The model was pretrained using a causal language modeling (CLM) objective.
+OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modelling objective.
 For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
 the [official paper](https://arxiv.org/abs/2205.01068).
 The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
 to each dataset’s size in the pretraining corpus.
+The dataset might contain offensive content as parts of the dataset are a subset of
 public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
 that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
 ### Collection process
+The dataset was collected form the internet, and went through classic data processing algorithms  and
+reformatting practices, including removing repetitive/non-informative text like *Chapter One* or
 *This ebook by Project Gutenberg.*
 ## Training procedure