keras
/

roberta_base_en

@@ -1,16 +1,83 @@
----
-library_name: keras-hub
----
-This is a [`Roberta` model](https://keras.io/api/keras_hub/models/roberta) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **name:** roberta_backbone
-* **trainable:** True
-* **vocabulary_size:** 50265
-* **num_layers:** 12
-* **num_heads:** 12
-* **hidden_dim:** 768
-* **intermediate_dim:** 3072
-* **dropout:** 0.1
-* **max_sequence_length:** 512
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

+### Model Overview
+A RoBERTa encoder network.
+This network implements a bi-directional Transformer-based encoder as
+described in ["RoBERTa: A Robustly Optimized BERT Pretraining Approach"](https://arxiv.org/abs/1907.11692).
+It includes the embedding lookups and transformer layers, but does not
+include the masked language model head used during pretraining.
+The default constructor gives a fully customizable, randomly initialized
+RoBERTa encoder with any number of layers, heads, and embedding
+dimensions. To load preset architectures and weights, use the `from_preset()`
+constructor.
+Disclaimer: Pre-trained models are provided on an "as is" basis, without
+warranties or conditions of any kind. The underlying model is provided by a
+third party and subject to a separate license, available
+[here](https://github.com/facebookresearch/fairseq).
+__Arguments__
+- __vocabulary_size__: int. The size of the token vocabulary.
+- __num_layers__: int. The number of transformer layers.
+- __num_heads__: int. The number of attention heads for each transformer.
+    The hidden size must be divisible by the number of attention heads.
+- __hidden_dim__: int. The size of the transformer encoding layer.
+- __intermediate_dim__: int. The output dimension of the first Dense layer in
+    a two-layer feedforward network for each transformer.
+- __dropout__: float. Dropout probability for the Transformer encoder.
+- __max_sequence_length__: int. The maximum sequence length this encoder can
+    consume. The sequence length of the input must be less than
+    `max_sequence_length` default value. This determines the variable
+    shape for positional embeddings.
+### Example Usage
+```python
+import keras
+import keras_nlp
+import numpy as np
+```
+Raw string data.
+```python
+features = ["The quick brown fox jumped.", "I forgot my homework."]
+labels = [0, 3]
+# Pretrained classifier.
+classifier = keras_nlp.models.RobertaClassifier.from_preset(
+    "${VARIATION_SLUG}",
+    num_classes=4,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+classifier.predict(x=features, batch_size=2)
+# Re-compile (e.g., with a new learning rate).
+classifier.compile(
+    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+    optimizer=keras.optimizers.Adam(5e-5),
+    jit_compile=True,
+)
+# Access backbone programmatically (e.g., to change `trainable`).
+classifier.backbone.trainable = False
+# Fit again.
+classifier.fit(x=features, y=labels, batch_size=2)
+```
+Preprocessed integer data.
+```python
+features = {
+    "token_ids": np.ones(shape=(2, 12), dtype="int32"),
+    "padding_mask": np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]] * 2),
+}
+labels = [0, 3]
+# Pretrained classifier without preprocessing.
+classifier = keras_nlp.models.RobertaClassifier.from_preset(
+    "${VARIATION_SLUG}",
+    num_classes=4,
+    preprocessor=None,
+)
+classifier.fit(x=features, y=labels, batch_size=2)
+```