keras
/

opt_2.7b_en

@@ -1,14 +1,164 @@
 ---
 library_name: keras-hub
 ---
-This is a [`OPT` model](https://keras.io/api/keras_hub/models/opt) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **vocabulary_size:** 50272
-* **num_layers:** 32
-* **num_heads:** 32
-* **hidden_dim:** 2560
-* **intermediate_dim:** 10240
-* **dropout:** 0.1
-* **max_sequence_length:** 2048
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+An OPT decoder network.
+This class implements a Transformer-based decoder model as described in
+["OPT: Open Pre-trained Transformer Language Models"](https://arxiv.org/abs/2205.01068).
+The default constructor gives a fully customizable, randomly initialized OPT
+model with any number of layers, heads, and embedding dimensions. To load
+preset architectures and weights, use the `from_preset()` constructor.
+Disclaimer: Pre-trained models are provided on an "as is" basis, without
+warranties or conditions of any kind. The underlying model is provided by a
+third party and subject to a separate license, available
+[here](https://github.com/facebookresearch/fairseq/).
+__Arguments__
+- __vocabulary_size__: int. The size of the token vocabulary.
+- __num_layers__: int. The number of transformer decoder layers.
+- __num_heads__: int. The number of attention heads for each transformer.
+    The hidden size must be divisible by the number of attention heads.
+- __hidden_dim__: int. The hidden size of the transformer decoder layers.
+- __intermediate_dim__: int. The output dimension of the first Dense layer in
+    a two-layer feedforward network for each transformer decoder layer.
+- __dropout__: float. Dropout probability for the Transformer decoder.
+- __max_sequence_length__: int. The maximum sequence length that this decoder
+    can consume. If `None`, `max_sequence_length` uses the value from
+    sequence length. This determines the variable shape for positional
+    embeddings.
+### Example Usage
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Use `generate()` to do text generation.
+```python
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_2.7b_en")
+opt_lm.generate("I want to say", max_length=30)
+# Generate with batched prompts.
+opt_lm.generate(["This is a", "Where are you"], max_length=30)
+```
+Compile the `generate()` function with a custom sampler.
+```python
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_2.7b_en")
+opt_lm.compile(sampler="greedy")
+opt_lm.generate("I want to say", max_length=30)
+opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
+opt_lm.generate("I want to say", max_length=30)
+```
+Use `generate()` without preprocessing.
+```python
+# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
+# Use `"padding_mask"` to indicate values that should not be overridden.
+prompt = {
+    "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
+}
+opt_lm = keras_hub.models.OPTCausalLM.from_preset(
+    "opt_2.7b_en",
+    preprocessor=None,
+)
+opt_lm.generate(prompt)
+```
+Call `fit()` on a single batch.
+```python
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("opt_2.7b_en")
+opt_lm.fit(x=features, batch_size=2)
+```
+Call `fit()` without preprocessing.
+```python
+x = {
+    "token_ids": np.array([[1, 2, 3, 4, 5]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
+}
+y = np.array([[2, 3, 4, 5, 0]] * 2)
+sw = np.array([[1, 1, 1, 1, 1]] * 2)
+opt_lm = keras_hub.models.OPTCausalLM.from_preset(
+    "opt_2.7b_en",
+    preprocessor=None,
+)
+opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)
+```
+## Example Usage with Hugging Face URI
+```python
+import keras
+import keras_hub
+import numpy as np
+```
+Use `generate()` to do text generation.
+```python
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_2.7b_en")
+opt_lm.generate("I want to say", max_length=30)
+# Generate with batched prompts.
+opt_lm.generate(["This is a", "Where are you"], max_length=30)
+```
+Compile the `generate()` function with a custom sampler.
+```python
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_2.7b_en")
+opt_lm.compile(sampler="greedy")
+opt_lm.generate("I want to say", max_length=30)
+opt_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=2))
+opt_lm.generate("I want to say", max_length=30)
+```
+Use `generate()` without preprocessing.
+```python
+# Prompt the model with `5338, 318` (the token ids for `"Who is"`).
+# Use `"padding_mask"` to indicate values that should not be overridden.
+prompt = {
+    "token_ids": np.array([[5338, 318, 0, 0, 0]] * 2),
+    "padding_mask": np.array([[1, 1, 0, 0, 0]] * 2),
+}
+opt_lm = keras_hub.models.OPTCausalLM.from_preset(
+    "hf://keras/opt_2.7b_en",
+    preprocessor=None,
+)
+opt_lm.generate(prompt)
+```
+Call `fit()` on a single batch.
+```python
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+opt_lm = keras_hub.models.OPTCausalLM.from_preset("hf://keras/opt_2.7b_en")
+opt_lm.fit(x=features, batch_size=2)
+```
+Call `fit()` without preprocessing.
+```python
+x = {
+    "token_ids": np.array([[1, 2, 3, 4, 5]] * 2),
+    "padding_mask": np.array([[1, 1, 1, 1, 1]] * 2),
+}
+y = np.array([[2, 3, 4, 5, 0]] * 2)
+sw = np.array([[1, 1, 1, 1, 1]] * 2)
+opt_lm = keras_hub.models.OPTCausalLM.from_preset(
+    "hf://keras/opt_2.7b_en",
+    preprocessor=None,
+)
+opt_lm.fit(x=x, y=y, sample_weight=sw, batch_size=2)
+```