facebook
/

opt-125m

@@ -77,8 +77,8 @@ unfiltered content from the internet, which is far from neutral the model is str
 > Like other large language models for which the diversity (or lack thereof) of training
 > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
-> of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
-> hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
 > large language models.
 This bias will also affect all fine-tuned versions of this model.
@@ -118,7 +118,7 @@ re-formatting practices, including removing repetitive/non-informative text like
 The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
 vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
-The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
 ### BibTeX entry and citation info

 > Like other large language models for which the diversity (or lack thereof) of training
 > data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
+> of bias and safety. OPT-175M can also have quality issues in terms of generation diversity and
+> hallucination. In general, OPT-175M is not immune from the plethora of issues that plague modern
 > large language models.
 This bias will also affect all fine-tuned versions of this model.
 The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
 vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
+The 175M model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
 ### BibTeX entry and citation info