keras
/

gpt_oss_20b_en

Text Generation

KerasHub

Model card Files Files and versions

xet

Community

prasadsachin commited on Feb 26

Commit

462bef7

verified ·

1 Parent(s): d92fb1e

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +42 -24

README.md CHANGED Viewed

@@ -2,27 +2,45 @@
 library_name: keras-hub
 pipeline_tag: text-generation
 ---
-This is a [`GptOss` model](https://keras.io/api/keras_hub/models/gpt_oss) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-This model is related to a `CausalLM` task.
-Model config:
-* **name:** gpt_oss_backbone
-* **trainable:** True
-* **dtype:** {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}
-* **vocabulary_size:** 201088
-* **num_layers:** 24
-* **num_query_heads:** 64
-* **hidden_dim:** 2880
-* **intermediate_dim:** 2880
-* **num_experts:** 32
-* **top_k:** 4
-* **rope_max_wavelength:** 150000
-* **rope_scaling_factor:** 32.0
-* **num_key_value_heads:** 8
-* **sliding_window:** 128
-* **layer_norm_epsilon:** 1e-05
-* **dropout:** 0
-* **output_router_logits:** False
-* **head_dim:** 64
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 library_name: keras-hub
 pipeline_tag: text-generation
 ---
+### Model Overview
+# Model Summary
+OpenAI's gpt-oss family marks a significant shift towards open-source development for the organization, making advanced AI models more accessible.Released under a permissive Apache 2.0 license, these models are designed for strong reasoning, agentic capabilities, and versatile real-world applications.The family includes two text-only variants, a 21-billion and a 117-billion parameter model, which were trained on a dataset focused on STEM, coding, and general knowledge.This release empowers developers and researchers to run, customize, and build upon these models on their own infrastructure, ensuring data privacy and control.
+The gpt-oss models are built on an efficient Transformer architecture that leverages a Mixture-of-Experts (MoE) design.This allows the models to have a large total number of parameters while only activating a fraction for any given task, which significantly reduces computational cost and memory requirements during inference.Both models support a large context length of up to 128,000 tokens and utilize advanced techniques like grouped multi-query attention and Rotary Positional Embeddings (RoPE) for improved efficiency.A key feature is their native quantization, which allows even the large model to run on a single high-end GPU and the smaller model to operate on consumer-grade hardware.
+Designed for practical deployment, the gpt-oss models offer features aimed at usability and trust.A unique capability is the adjustable reasoning effort, allowing users to toggle between low, medium, and high settings to balance performance with latency. The models also provide full access to their chain-of-thought reasoning process, which aids in debugging and understanding the model's outputs.With built-in support for tool use like web browsing and code execution, these models are well-suited for creating sophisticated AI agents and customized applications for a wide range of specialized tasks.
+For more details, please refer to GPT OSS [Blog](https://openai.com/open-models/), [GitHub](https://github.com/openai/gpt-oss).
+Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
+## Links
+* [GPT OSS Quickstart Notebook](Coming soon..!)
+* [GPT OSS API Documentation](https://keras.io/keras_hub/api/models/gpt_oss/)
+* [GPT OSS Model Card](https://huggingface.co/openai/gpt-oss-20b)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
+## Available GPT OSS  Presets.
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset | Parameters | Description |
+|--------|------------|-------------|
+| `gpt_oss_20b_en` |  20B |  This preset has 21 billion total parameters, with 3.6 billion active parameters, a 128k context length, and is de-quantized from MXFP4.|
+| `gpt_oss_120b_en` |  120B |  This preset has 117 billion total parameters, with 5.1 billion active parameters, a 128k context length, and is de-quantized from MXFP4.|