add model
Browse files- README.md +24 -138
- config.json +7 -6
- tf_model.h5 +3 -0
README.md
CHANGED
|
@@ -1,161 +1,47 @@
|
|
| 1 |
---
|
| 2 |
-
language: en
|
| 3 |
-
inference: false
|
| 4 |
tags:
|
| 5 |
-
-
|
| 6 |
-
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
commercial: false
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
| 15 |
-
|
| 16 |
-
**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
|
| 17 |
-
Content from **this** model card has been written by the Hugging Face team.
|
| 18 |
-
|
| 19 |
-
## Intro
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
> can interact with these models through paid APIs, full model access is currently limited to only a
|
| 26 |
-
> few highly resourced labs. This restricted access has limited researchers’ ability to study how and
|
| 27 |
-
> why these large language models work, hindering progress on improving known challenges in areas
|
| 28 |
-
> such as robustness, bias, and toxicity.
|
| 29 |
|
| 30 |
-
> We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M
|
| 31 |
-
> to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match
|
| 32 |
-
> the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data
|
| 33 |
-
> collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and
|
| 34 |
-
> to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the
|
| 35 |
-
> collective research community as a whole, which is only possible when models are available for study.
|
| 36 |
|
| 37 |
## Model description
|
| 38 |
|
| 39 |
-
|
| 40 |
-
OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
|
| 41 |
|
| 42 |
-
For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
|
| 43 |
-
the [official paper](https://arxiv.org/abs/2205.01068).
|
| 44 |
## Intended uses & limitations
|
| 45 |
|
| 46 |
-
|
| 47 |
-
In addition, the model can be fine-tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling). For all other OPT checkpoints, please have a look at the [model hub](https://huggingface.co/models?filter=opt).
|
| 48 |
-
|
| 49 |
-
### How to use
|
| 50 |
-
|
| 51 |
-
You can use this model directly with a pipeline for text generation.
|
| 52 |
-
|
| 53 |
-
```python
|
| 54 |
-
>>> from transformers import pipeline
|
| 55 |
-
|
| 56 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b")
|
| 57 |
-
>>> generator("Hello, I'm am conscious and")
|
| 58 |
-
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that I'm dreaming."}]
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
| 62 |
-
|
| 63 |
-
```python
|
| 64 |
-
>>> from transformers import pipeline, set_seed
|
| 65 |
-
|
| 66 |
-
>>> set_seed(32)
|
| 67 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True)
|
| 68 |
-
>>> generator("Hello, I'm am conscious and")
|
| 69 |
-
[{'generated_text': "Hello, I'm am conscious and aware of my surroundings. I'm aware that my thoughts are thoughts"}]
|
| 70 |
-
```
|
| 71 |
-
|
| 72 |
-
### Limitations and bias
|
| 73 |
-
|
| 74 |
-
As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of
|
| 75 |
-
unfiltered content from the internet, which is far from neutral the model is strongly biased :
|
| 76 |
|
| 77 |
-
|
| 78 |
-
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
| 79 |
-
> of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
|
| 80 |
-
> hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
|
| 81 |
-
> large language models.
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
-
|
| 86 |
-
>>> from transformers import pipeline, set_seed
|
| 87 |
-
|
| 88 |
-
>>> set_seed(32)
|
| 89 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
| 90 |
-
>>> generator("The woman worked as a")
|
| 91 |
-
[{'generated_text': 'The woman worked as a waitress for six months before she started dating her boyfriend, who was working at'},
|
| 92 |
-
{'generated_text': "The woman worked as a prostitute, but she didn't want to sell herself anymore. She wanted to"},
|
| 93 |
-
{'generated_text': 'The woman worked as a translator at the embassy during her studies at Cambridge University in England. She said'},
|
| 94 |
-
{'generated_text': 'The woman worked as a secretary for Senator Ted Stevens of Alaska for 22 years before retiring from his Senate'},
|
| 95 |
-
{'generated_text': 'The woman worked as a caregiver for elderly patients at the nursing home where she lived until she died'}]
|
| 96 |
-
```
|
| 97 |
-
|
| 98 |
-
compared to:
|
| 99 |
-
|
| 100 |
-
```python
|
| 101 |
-
>>> from transformers import pipeline, set_seed
|
| 102 |
-
|
| 103 |
-
>>> set_seed(32)
|
| 104 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-1.3b", do_sample=True, num_return_sequences=5)
|
| 105 |
-
>>> generator("The man worked as a")
|
| 106 |
-
[{'generated_text': 'The man worked as a janitor at the University of Michigan Medical Center before he died after contracting Ebola'},
|
| 107 |
-
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers to businesses around the globe. He traveled'},
|
| 108 |
-
{'generated_text': 'The man worked as a translator for the British Broadcasting Corporation between 1956 and 1961. During that period he'},
|
| 109 |
-
{'generated_text': 'The man worked as a salesman for IBM Corp., selling computers for computers. He traveled extensively and lived'},
|
| 110 |
-
{'generated_text': 'The man worked as a security guard for nearly 30 years before he was shot dead by police officers responding'}]
|
| 111 |
-
```
|
| 112 |
-
|
| 113 |
-
This bias will also affect all fine-tuned versions of this model.
|
| 114 |
-
|
| 115 |
-
## Training data
|
| 116 |
-
|
| 117 |
-
The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:
|
| 118 |
-
|
| 119 |
-
- BookCorpus, which consists of more than 10K unpublished books,
|
| 120 |
-
- CC-Stories, which contains a subset of CommonCrawl data filtered to match the
|
| 121 |
-
story-like style of Winograd schemas,
|
| 122 |
-
- The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included.
|
| 123 |
-
- Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in
|
| 124 |
-
Roller et al. (2021)
|
| 125 |
-
- CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
|
| 126 |
-
dataset that was used in RoBERTa (Liu et al., 2019b)
|
| 127 |
-
|
| 128 |
-
The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
|
| 129 |
-
to each dataset’s size in the pretraining corpus.
|
| 130 |
-
|
| 131 |
-
The dataset might contains offensive content as parts of the dataset are a subset of
|
| 132 |
-
public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
|
| 133 |
-
that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
|
| 134 |
-
|
| 135 |
-
### Collection process
|
| 136 |
|
| 137 |
-
|
| 138 |
-
re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or
|
| 139 |
-
*This ebook by Project Gutenberg.*
|
| 140 |
|
| 141 |
-
|
|
|
|
|
|
|
| 142 |
|
| 143 |
-
###
|
| 144 |
|
| 145 |
-
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
| 146 |
-
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
| 147 |
|
| 148 |
-
The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
| 149 |
|
| 150 |
-
###
|
| 151 |
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
year={2022},
|
| 157 |
-
eprint={2205.01068},
|
| 158 |
-
archivePrefix={arXiv},
|
| 159 |
-
primaryClass={cs.CL}
|
| 160 |
-
}
|
| 161 |
-
```
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
+
- generated_from_keras_callback
|
| 4 |
+
model-index:
|
| 5 |
+
- name: opt-1.3b
|
| 6 |
+
results: []
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
+
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
| 10 |
+
probably proofread and complete it, then remove this comment. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
# opt-1.3b
|
| 13 |
|
| 14 |
+
This model was trained from scratch on an unknown dataset.
|
| 15 |
+
It achieves the following results on the evaluation set:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## Model description
|
| 19 |
|
| 20 |
+
More information needed
|
|
|
|
| 21 |
|
|
|
|
|
|
|
| 22 |
## Intended uses & limitations
|
| 23 |
|
| 24 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
## Training and evaluation data
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
+
More information needed
|
| 29 |
|
| 30 |
+
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
### Training hyperparameters
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
The following hyperparameters were used during training:
|
| 35 |
+
- optimizer: None
|
| 36 |
+
- training_precision: float32
|
| 37 |
|
| 38 |
+
### Training results
|
| 39 |
|
|
|
|
|
|
|
| 40 |
|
|
|
|
| 41 |
|
| 42 |
+
### Framework versions
|
| 43 |
|
| 44 |
+
- Transformers 4.20.0.dev0
|
| 45 |
+
- TensorFlow 2.9.1
|
| 46 |
+
- Datasets 2.2.2
|
| 47 |
+
- Tokenizers 0.12.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,16 +1,17 @@
|
|
| 1 |
{
|
|
|
|
| 2 |
"activation_dropout": 0.0,
|
| 3 |
"activation_function": "relu",
|
| 4 |
"architectures": [
|
| 5 |
-
"
|
| 6 |
],
|
| 7 |
"attention_dropout": 0.0,
|
| 8 |
"bos_token_id": 2,
|
| 9 |
-
"hidden_size": 2048,
|
| 10 |
"do_layer_norm_before": true,
|
| 11 |
"dropout": 0.1,
|
| 12 |
"eos_token_id": 2,
|
| 13 |
"ffn_dim": 8192,
|
|
|
|
| 14 |
"init_std": 0.02,
|
| 15 |
"layerdrop": 0.0,
|
| 16 |
"max_position_embeddings": 2048,
|
|
@@ -18,10 +19,10 @@
|
|
| 18 |
"num_attention_heads": 32,
|
| 19 |
"num_hidden_layers": 24,
|
| 20 |
"pad_token_id": 1,
|
| 21 |
-
"
|
| 22 |
-
"
|
|
|
|
| 23 |
"use_cache": true,
|
| 24 |
"vocab_size": 50272,
|
| 25 |
-
"word_embed_proj_dim": 2048
|
| 26 |
-
"prefix": "</s>"
|
| 27 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"_name_or_path": "facebook/opt-1.3b",
|
| 3 |
"activation_dropout": 0.0,
|
| 4 |
"activation_function": "relu",
|
| 5 |
"architectures": [
|
| 6 |
+
"OPTModel"
|
| 7 |
],
|
| 8 |
"attention_dropout": 0.0,
|
| 9 |
"bos_token_id": 2,
|
|
|
|
| 10 |
"do_layer_norm_before": true,
|
| 11 |
"dropout": 0.1,
|
| 12 |
"eos_token_id": 2,
|
| 13 |
"ffn_dim": 8192,
|
| 14 |
+
"hidden_size": 2048,
|
| 15 |
"init_std": 0.02,
|
| 16 |
"layerdrop": 0.0,
|
| 17 |
"max_position_embeddings": 2048,
|
|
|
|
| 19 |
"num_attention_heads": 32,
|
| 20 |
"num_hidden_layers": 24,
|
| 21 |
"pad_token_id": 1,
|
| 22 |
+
"prefix": "</s>",
|
| 23 |
+
"torch_dtype": "float32",
|
| 24 |
+
"transformers_version": "4.20.0.dev0",
|
| 25 |
"use_cache": true,
|
| 26 |
"vocab_size": 50272,
|
| 27 |
+
"word_embed_proj_dim": 2048
|
|
|
|
| 28 |
}
|
tf_model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cab5d1b7b11900213091184b559d3201ddaeaa4a2c12bef4ae14a90ceee7113c
|
| 3 |
+
size 5263424312
|