Update README.md
#1
by
ArthurZ
HF Staff
- opened
README.md
CHANGED
|
@@ -104,6 +104,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
| 104 |
|
| 105 |
outputs = model.generate(input_ids)
|
| 106 |
print(tokenizer.decode(outputs[0]))
|
|
|
|
| 107 |
```
|
| 108 |
|
| 109 |
</details>
|
|
@@ -127,6 +128,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
| 127 |
|
| 128 |
outputs = model.generate(input_ids)
|
| 129 |
print(tokenizer.decode(outputs[0]))
|
|
|
|
| 130 |
```
|
| 131 |
|
| 132 |
</details>
|
|
@@ -148,6 +150,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(0)
|
|
| 148 |
|
| 149 |
outputs = model.generate(input_ids)
|
| 150 |
print(tokenizer.decode(outputs[0]))
|
|
|
|
| 151 |
```
|
| 152 |
|
| 153 |
</details>
|
|
@@ -180,7 +183,7 @@ More information needed.
|
|
| 180 |
|
| 181 |
## Sensitive Use:
|
| 182 |
|
| 183 |
-
>
|
| 184 |
|
| 185 |
# Training Details
|
| 186 |
|
|
@@ -193,7 +196,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
|
|
| 193 |
|
| 194 |
According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
|
| 195 |
|
| 196 |
-
> These models are based on pretrained
|
| 197 |
|
| 198 |
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
| 199 |
|
|
|
|
| 104 |
|
| 105 |
outputs = model.generate(input_ids)
|
| 106 |
print(tokenizer.decode(outputs[0]))
|
| 107 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
| 108 |
```
|
| 109 |
|
| 110 |
</details>
|
|
|
|
| 128 |
|
| 129 |
outputs = model.generate(input_ids)
|
| 130 |
print(tokenizer.decode(outputs[0]))
|
| 131 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
| 132 |
```
|
| 133 |
|
| 134 |
</details>
|
|
|
|
| 150 |
|
| 151 |
outputs = model.generate(input_ids)
|
| 152 |
print(tokenizer.decode(outputs[0]))
|
| 153 |
+
>>> <pad> <extra_id_0> man<extra_id_1> beer<extra_id_2> a<extra_id_3> salt<extra_id_4>.</s>
|
| 154 |
```
|
| 155 |
|
| 156 |
</details>
|
|
|
|
| 183 |
|
| 184 |
## Sensitive Use:
|
| 185 |
|
| 186 |
+
> SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
|
| 187 |
|
| 188 |
# Training Details
|
| 189 |
|
|
|
|
| 196 |
|
| 197 |
According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
|
| 198 |
|
| 199 |
+
> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
|
| 200 |
|
| 201 |
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
| 202 |
|