prasadsachin commited on
Commit
462bef7
·
verified ·
1 Parent(s): d92fb1e

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +42 -24
README.md CHANGED
@@ -2,27 +2,45 @@
2
  library_name: keras-hub
3
  pipeline_tag: text-generation
4
  ---
5
- This is a [`GptOss` model](https://keras.io/api/keras_hub/models/gpt_oss) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
6
- This model is related to a `CausalLM` task.
7
-
8
- Model config:
9
- * **name:** gpt_oss_backbone
10
- * **trainable:** True
11
- * **dtype:** {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}
12
- * **vocabulary_size:** 201088
13
- * **num_layers:** 24
14
- * **num_query_heads:** 64
15
- * **hidden_dim:** 2880
16
- * **intermediate_dim:** 2880
17
- * **num_experts:** 32
18
- * **top_k:** 4
19
- * **rope_max_wavelength:** 150000
20
- * **rope_scaling_factor:** 32.0
21
- * **num_key_value_heads:** 8
22
- * **sliding_window:** 128
23
- * **layer_norm_epsilon:** 1e-05
24
- * **dropout:** 0
25
- * **output_router_logits:** False
26
- * **head_dim:** 64
27
-
28
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: keras-hub
3
  pipeline_tag: text-generation
4
  ---
5
+ ### Model Overview
6
+ # Model Summary
7
+
8
+
9
+ OpenAI's gpt-oss family marks a significant shift towards open-source development for the organization, making advanced AI models more accessible.Released under a permissive Apache 2.0 license, these models are designed for strong reasoning, agentic capabilities, and versatile real-world applications.The family includes two text-only variants, a 21-billion and a 117-billion parameter model, which were trained on a dataset focused on STEM, coding, and general knowledge.This release empowers developers and researchers to run, customize, and build upon these models on their own infrastructure, ensuring data privacy and control.
10
+
11
+ The gpt-oss models are built on an efficient Transformer architecture that leverages a Mixture-of-Experts (MoE) design.This allows the models to have a large total number of parameters while only activating a fraction for any given task, which significantly reduces computational cost and memory requirements during inference.Both models support a large context length of up to 128,000 tokens and utilize advanced techniques like grouped multi-query attention and Rotary Positional Embeddings (RoPE) for improved efficiency.A key feature is their native quantization, which allows even the large model to run on a single high-end GPU and the smaller model to operate on consumer-grade hardware.
12
+
13
+ Designed for practical deployment, the gpt-oss models offer features aimed at usability and trust.A unique capability is the adjustable reasoning effort, allowing users to toggle between low, medium, and high settings to balance performance with latency. The models also provide full access to their chain-of-thought reasoning process, which aids in debugging and understanding the model's outputs.With built-in support for tool use like web browsing and code execution, these models are well-suited for creating sophisticated AI agents and customized applications for a wide range of specialized tasks.
14
+
15
+ For more details, please refer to GPT OSS [Blog](https://openai.com/open-models/), [GitHub](https://github.com/openai/gpt-oss).
16
+
17
+ Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
18
+
19
+ ## Links
20
+
21
+ * [GPT OSS Quickstart Notebook](Coming soon..!)
22
+ * [GPT OSS API Documentation](https://keras.io/keras_hub/api/models/gpt_oss/)
23
+ * [GPT OSS Model Card](https://huggingface.co/openai/gpt-oss-20b)
24
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
25
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
26
+
27
+ ## Installation
28
+
29
+ Keras and KerasHub can be installed with:
30
+
31
+ ```
32
+ pip install -U -q keras-hub
33
+ pip install -U -q keras
34
+ ```
35
+
36
+ Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
37
+
38
+ ## Available GPT OSS Presets.
39
+
40
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
41
+
42
+ | Preset | Parameters | Description |
43
+ |--------|------------|-------------|
44
+ | `gpt_oss_20b_en` | 20B | This preset has 21 billion total parameters, with 3.6 billion active parameters, a 128k context length, and is de-quantized from MXFP4.|
45
+ | `gpt_oss_120b_en` | 120B | This preset has 117 billion total parameters, with 5.1 billion active parameters, a 128k context length, and is de-quantized from MXFP4.|
46
+