Update README.md with new model card content
Browse files
README.md
CHANGED
|
@@ -2,24 +2,294 @@
|
|
| 2 |
library_name: keras-hub
|
| 3 |
pipeline_tag: text-generation
|
| 4 |
---
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
* **
|
| 15 |
-
|
| 16 |
-
* **
|
| 17 |
-
|
| 18 |
-
* **
|
| 19 |
-
|
| 20 |
-
* **
|
| 21 |
-
|
| 22 |
-
* **
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: keras-hub
|
| 3 |
pipeline_tag: text-generation
|
| 4 |
---
|
| 5 |
+
### Model Overview
|
| 6 |
+
# Model Summary
|
| 7 |
+
|
| 8 |
+
Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Both language models and multimodal models are pretrained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences. Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.
|
| 9 |
+
|
| 10 |
+
The latest version, Qwen3, has the following features:
|
| 11 |
+
|
| 12 |
+
Dense and Mixture-of-Experts (MoE) models, available in 0.6B, 1.7B, 4B, 8B, 14B, 30B,32B and 235B
|
| 13 |
+
|
| 14 |
+
**Seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose chat) within a single model, ensuring optimal performance across various scenarios.
|
| 15 |
+
|
| 16 |
+
**Significantly enhancement in reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
|
| 17 |
+
|
| 18 |
+
**Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
|
| 19 |
+
|
| 20 |
+
**Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
|
| 21 |
+
|
| 22 |
+
**Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
|
| 23 |
+
|
| 24 |
+
For more details, please refer to Qwen [Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
| 25 |
+
|
| 26 |
+
Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
|
| 27 |
+
|
| 28 |
+
## Links
|
| 29 |
+
|
| 30 |
+
* [Qwen 3 Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/qwen3-quickstart-notebook)
|
| 31 |
+
* [Qwen 3 API Documentation](https://keras.io/keras_hub/api/models/qwen3/)
|
| 32 |
+
* [Qwen 3 Model Card](https://qwenlm.github.io/blog/qwen3/)
|
| 33 |
+
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
|
| 34 |
+
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
|
| 35 |
+
|
| 36 |
+
## Installation
|
| 37 |
+
|
| 38 |
+
Keras and KerasHub can be installed with:
|
| 39 |
+
|
| 40 |
+
```
|
| 41 |
+
pip install -U -q keras-hub
|
| 42 |
+
pip install -U -q keras
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
|
| 46 |
+
|
| 47 |
+
## Available Qwen3 Presets
|
| 48 |
+
|
| 49 |
+
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
|
| 50 |
+
|
| 51 |
+
| Preset | Layers | Parameters | Description |
|
| 52 |
+
|--------|--------|------------|-------------|
|
| 53 |
+
| `Qwen3-0.6b` | 28 | 596M | Smallest model, optimized for efficiency |
|
| 54 |
+
| `Qwen3-1.7B` | 28 | 1.72B | Lightweight model with good balance |
|
| 55 |
+
| `Qwen3-4B ` | 36 | 4.02B | Medium model with improved reasoning |
|
| 56 |
+
| `Qwen3-8B` | 36 | 8.19B | Large model with enhanced capabilities |
|
| 57 |
+
| `Qwen3-14B ` | 40 | 14.77B | High-performance model with advanced features |
|
| 58 |
+
| `Qwen3-32B` | 64 | 32.76B | Largest model with state-of-the-art performance |
|
| 59 |
+
|
| 60 |
+
## Example Usage
|
| 61 |
+
```python
|
| 62 |
+
import keras
|
| 63 |
+
import keras_hub
|
| 64 |
+
import numpy as np
|
| 65 |
+
```
|
| 66 |
+
```python
|
| 67 |
+
|
| 68 |
+
# Load pre-trained Qwen3 model
|
| 69 |
+
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "qwen3_8b_en")
|
| 70 |
+
|
| 71 |
+
# Generate text from prompt
|
| 72 |
+
response = qwen3_lm.generate("I want to learn about", max_length=50)
|
| 73 |
+
print(response)
|
| 74 |
+
|
| 75 |
+
# Batch generation with multiple prompts
|
| 76 |
+
prompts = ["The future of AI is", "Machine learning helps us"]
|
| 77 |
+
responses = qwen3_lm.generate(prompts, max_length=30)
|
| 78 |
+
for prompt, response in zip(prompts, responses):
|
| 79 |
+
print(f"Prompt: {prompt}")
|
| 80 |
+
print(f"Response: {response}\n")
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
## Custom Sampling Strategies
|
| 85 |
+
|
| 86 |
+
```python
|
| 87 |
+
|
| 88 |
+
# Greedy sampling (default)
|
| 89 |
+
qwen3_lm.compile(sampler="greedy")
|
| 90 |
+
response = qwen3_lm.generate("Explain quantum computing", max_length=100)
|
| 91 |
+
|
| 92 |
+
# Top-k sampling
|
| 93 |
+
qwen3_lm.compile(sampler="top_k")
|
| 94 |
+
response = qwen3_lm.generate("Write a story about", max_length=80)
|
| 95 |
+
|
| 96 |
+
# Beam search
|
| 97 |
+
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
|
| 98 |
+
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
|
| 99 |
+
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Fine-tuning with LoRA
|
| 103 |
+
|
| 104 |
+
```python
|
| 105 |
+
|
| 106 |
+
# Enable LoRA for efficient fine-tuning
|
| 107 |
+
qwen3_lm.backbone.enable_lora(rank=8)
|
| 108 |
+
|
| 109 |
+
# Prepare training data
|
| 110 |
+
training_texts = [
|
| 111 |
+
"The quick brown fox jumped over the lazy dog.",
|
| 112 |
+
"Machine learning is a subset of artificial intelligence.",
|
| 113 |
+
"Python is a popular programming language for data science.",
|
| 114 |
+
"Deep learning models require large amounts of training data.",
|
| 115 |
+
"Natural language processing helps computers understand human language."
|
| 116 |
+
]
|
| 117 |
+
|
| 118 |
+
# Compile for training
|
| 119 |
+
qwen3_lm.compile(
|
| 120 |
+
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
|
| 121 |
+
optimizer=keras.optimizers.Adam(1e-4),
|
| 122 |
+
metrics=["accuracy"]
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
# Fine-tune the model
|
| 126 |
+
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
|
| 127 |
+
|
| 128 |
+
# Generate with fine-tuned model
|
| 129 |
+
response = qwen3_lm.generate("The importance of", max_length=50)
|
| 130 |
+
print(response)
|
| 131 |
+
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
## Custom Backbone Configuration
|
| 135 |
+
|
| 136 |
+
```python
|
| 137 |
+
|
| 138 |
+
# Create custom Qwen3 backbone
|
| 139 |
+
backbone = keras_hub.models.Qwen3Backbone(
|
| 140 |
+
vocabulary_size=151936,
|
| 141 |
+
num_layers=12, # Smaller model for faster training
|
| 142 |
+
num_query_heads=16,
|
| 143 |
+
num_key_value_heads=8,
|
| 144 |
+
head_dim=128,
|
| 145 |
+
hidden_dim=1024,
|
| 146 |
+
intermediate_dim=2048,
|
| 147 |
+
layer_norm_epsilon=1e-6,
|
| 148 |
+
dropout=0.1,
|
| 149 |
+
dtype="float32"
|
| 150 |
+
)
|
| 151 |
+
|
| 152 |
+
# Create tokenizer first
|
| 153 |
+
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("qwen3_8b_en")
|
| 154 |
+
|
| 155 |
+
# Create preprocessor with tokenizer
|
| 156 |
+
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
|
| 157 |
+
tokenizer=tokenizer,
|
| 158 |
+
sequence_length=512
|
| 159 |
+
)
|
| 160 |
+
|
| 161 |
+
# Create custom causal LM
|
| 162 |
+
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
|
| 163 |
+
backbone=backbone,
|
| 164 |
+
preprocessor=preprocessor
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
# Compile and train
|
| 168 |
+
custom_qwen3.compile(
|
| 169 |
+
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
|
| 170 |
+
optimizer=keras.optimizers.Adam(1e-4)
|
| 171 |
+
)
|
| 172 |
+
|
| 173 |
+
# Training data
|
| 174 |
+
texts = ["Hello world", "How are you", "Machine learning"]
|
| 175 |
+
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
## Example Usage with Hugging Face URI
|
| 179 |
+
|
| 180 |
+
```python
|
| 181 |
+
import keras
|
| 182 |
+
import keras_hub
|
| 183 |
+
import numpy as np
|
| 184 |
+
```
|
| 185 |
+
```python
|
| 186 |
+
|
| 187 |
+
# Load pre-trained Qwen3 model
|
| 188 |
+
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "hf://keras/qwen3_8b_en")
|
| 189 |
+
|
| 190 |
+
# Generate text from prompt
|
| 191 |
+
response = qwen3_lm.generate("I want to learn about", max_length=50)
|
| 192 |
+
print(response)
|
| 193 |
+
|
| 194 |
+
# Batch generation with multiple prompts
|
| 195 |
+
prompts = ["The future of AI is", "Machine learning helps us"]
|
| 196 |
+
responses = qwen3_lm.generate(prompts, max_length=30)
|
| 197 |
+
for prompt, response in zip(prompts, responses):
|
| 198 |
+
print(f"Prompt: {prompt}")
|
| 199 |
+
print(f"Response: {response}\n")
|
| 200 |
+
|
| 201 |
+
```
|
| 202 |
+
|
| 203 |
+
## Custom Sampling Strategies
|
| 204 |
+
|
| 205 |
+
```python
|
| 206 |
+
|
| 207 |
+
# Greedy sampling (default)
|
| 208 |
+
qwen3_lm.compile(sampler="greedy")
|
| 209 |
+
response = qwen3_lm.generate("Explain quantum computing", max_length=100)
|
| 210 |
+
|
| 211 |
+
# Top-k sampling
|
| 212 |
+
qwen3_lm.compile(sampler="top_k")
|
| 213 |
+
response = qwen3_lm.generate("Write a story about", max_length=80)
|
| 214 |
+
|
| 215 |
+
# Beam search
|
| 216 |
+
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
|
| 217 |
+
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
|
| 218 |
+
|
| 219 |
+
```
|
| 220 |
+
|
| 221 |
+
## Fine-tuning with LoRA
|
| 222 |
+
|
| 223 |
+
```python
|
| 224 |
+
|
| 225 |
+
# Enable LoRA for efficient fine-tuning
|
| 226 |
+
qwen3_lm.backbone.enable_lora(rank=8)
|
| 227 |
+
|
| 228 |
+
# Prepare training data
|
| 229 |
+
training_texts = [
|
| 230 |
+
"The quick brown fox jumped over the lazy dog.",
|
| 231 |
+
"Machine learning is a subset of artificial intelligence.",
|
| 232 |
+
"Python is a popular programming language for data science.",
|
| 233 |
+
"Deep learning models require large amounts of training data.",
|
| 234 |
+
"Natural language processing helps computers understand human language."
|
| 235 |
+
]
|
| 236 |
+
|
| 237 |
+
# Compile for training
|
| 238 |
+
qwen3_lm.compile(
|
| 239 |
+
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
|
| 240 |
+
optimizer=keras.optimizers.Adam(1e-4),
|
| 241 |
+
metrics=["accuracy"]
|
| 242 |
+
)
|
| 243 |
+
|
| 244 |
+
# Fine-tune the model
|
| 245 |
+
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
|
| 246 |
+
|
| 247 |
+
# Generate with fine-tuned model
|
| 248 |
+
response = qwen3_lm.generate("The importance of", max_length=50)
|
| 249 |
+
print(response)
|
| 250 |
+
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
## Custom Backbone Configuration
|
| 254 |
+
|
| 255 |
+
```python
|
| 256 |
+
|
| 257 |
+
# Create custom Qwen3 backbone
|
| 258 |
+
backbone = keras_hub.models.Qwen3Backbone(
|
| 259 |
+
vocabulary_size=151936,
|
| 260 |
+
num_layers=12, # Smaller model for faster training
|
| 261 |
+
num_query_heads=16,
|
| 262 |
+
num_key_value_heads=8,
|
| 263 |
+
head_dim=128,
|
| 264 |
+
hidden_dim=1024,
|
| 265 |
+
intermediate_dim=2048,
|
| 266 |
+
layer_norm_epsilon=1e-6,
|
| 267 |
+
dropout=0.1,
|
| 268 |
+
dtype="float32"
|
| 269 |
+
)
|
| 270 |
+
|
| 271 |
+
# Create tokenizer first
|
| 272 |
+
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("hf://keras/qwen3_8b_en")
|
| 273 |
+
|
| 274 |
+
# Create preprocessor with tokenizer
|
| 275 |
+
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
|
| 276 |
+
tokenizer=tokenizer,
|
| 277 |
+
sequence_length=512
|
| 278 |
+
)
|
| 279 |
+
|
| 280 |
+
# Create custom causal LM
|
| 281 |
+
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
|
| 282 |
+
backbone=backbone,
|
| 283 |
+
preprocessor=preprocessor
|
| 284 |
+
)
|
| 285 |
+
|
| 286 |
+
# Compile and train
|
| 287 |
+
custom_qwen3.compile(
|
| 288 |
+
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
|
| 289 |
+
optimizer=keras.optimizers.Adam(1e-4)
|
| 290 |
+
)
|
| 291 |
+
|
| 292 |
+
# Training data
|
| 293 |
+
texts = ["Hello world", "How are you", "Machine learning"]
|
| 294 |
+
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
|
| 295 |
+
```
|