keras
/

qwen3_coder_instruct_30b_a3b_en

Text Generation

KerasHub

Model card Files Files and versions

xet

Community

prasadsachin commited on Feb 25

Commit

68b458d

verified ·

1 Parent(s): 2f57005

Update README.md with model card content

Browse files

Files changed (1) hide show

README.md +71 -29

README.md CHANGED Viewed

@@ -2,32 +2,74 @@
 library_name: keras-hub
 pipeline_tag: text-generation
 ---
-This is a [`Qwen3Moe` model](https://keras.io/api/keras_hub/models/qwen3_moe) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-This model is related to a `CausalLM` task.
-Model config:
-* **name:** qwen3_moe_backbone
-* **trainable:** True
-* **dtype:** {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}
-* **vocabulary_size:** 151936
-* **num_layers:** 48
-* **num_query_heads:** 32
-* **head_dim:** 128
-* **hidden_dim:** 2048
-* **intermediate_dim:** 6144
-* **moe_intermediate_dim:** 768
-* **rope_max_wavelength:** 10000000
-* **num_key_value_heads:** 4
-* **rope_scaling_factor:** 1.0
-* **layer_norm_epsilon:** 1e-06
-* **dropout:** 0
-* **tie_word_embeddings:** False
-* **sliding_window_size:** None
-* **num_experts:** 128
-* **top_k:** 8
-* **norm_top_k_prob:** True
-* **decoder_sparse_step:** 1
-* **mlp_only_layers:** []
-* **router_aux_loss_coefficient:** 0.001
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 library_name: keras-hub
 pipeline_tag: text-generation
 ---
+### Model Overview
+# Model Summary
+Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Both language models and multimodal models are pretrained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences. Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.
+Qwen3-Coder model maintains impressive performance and efficiency, featuring the following key enhancements:
+* **Significant Performance** among open models on **Agentic Coding, Agentic Browser-Use**, and other foundational coding tasks.
+* **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
+* **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
+For more details, please refer to Qwen [Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/qwen_moe), and [Documentation](https://qwen.readthedocs.io/en/latest/).
+Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
+## Links
+  * [Qwen 3 Coder Quickstart Notebook](Coming Soon!!)
+  * [Qwen 3 Coder API Documentation](https://keras.io/keras_hub/api/models/qwen3_moe/)
+  * [Qwen 3 Coder Model Card](https://qwenlm.github.io/blog/qwen3/)
+  * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+  * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
+## Available Qwen 3 Coder Presets
+The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
+| Preset |  Parameters | Description |
+|--------|------------|-------------|
+| `qwen3_coder_instruct_30b_a3b_en` | 30B |  Code-Specific Model, Mixture-of-Experts (MoE) model has 30.5B  billion total parameters, with 3.3B  billion activated, built on 48 layers, and utilizes 32 query and 4 key/value attention heads with 128 experts (8 active).|
+## Example Usage
+```Python
+import keras
+import keras_hub
+import numpy as np
+# Use generate() for code generation.
+qwen_lm = keras_hub.models.QwenMoeCausalLM.from_preset("qwen3_coder_instruct_30b_a3b_en")
+qwen_lm.generate(" write a quick sort algorithm in python.", max_length=512)
+```
+## Example Usage with Hugging Face URI
+```Python
+import keras
+import keras_hub
+import numpy as np
+# Use generate() for code generation.
+qwen_lm = keras_hub.models.QwenMoeCausalLM.from_preset("hf://keras/qwen3_coder_instruct_30b_a3b_en")
+qwen_lm.generate(" write a quick sort algorithm in python.", max_length=512)
+```