Model Overview
Model Summary
Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Both language models and multimodal models are pretrained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences. Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.
The latest version, Qwen3, has the following features:
Dense and Mixture-of-Experts (MoE) models, available in 0.6B, 1.7B, 4B, 8B, 14B, 30B,32B and 235B
Seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose chat) within a single model, ensuring optimal performance across various scenarios.
Significantly enhancement in reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation.
For more details, please refer to Qwen Blog, GitHub, and Documentation.
Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.
Links
- Qwen 3 Quickstart Notebook
- Qwen 3 API Documentation
- Qwen 3 Model Card
- KerasHub Beginner Guide
- KerasHub Model Publishing Guide
Installation
Keras and KerasHub can be installed with:
pip install -U -q keras-hub
pip install -U -q keras
Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.
Available Qwen3 Presets
The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
| Preset | Layers | Parameters | Description |
|---|---|---|---|
Qwen3-0.6b |
28 | 596M | Smallest model, optimized for efficiency |
Qwen3-1.7B |
28 | 1.72B | Lightweight model with good balance |
Qwen3-4B |
36 | 4.02B | Medium model with improved reasoning |
Qwen3-8B |
36 | 8.19B | Large model with enhanced capabilities |
Qwen3-14B |
40 | 14.77B | High-performance model with advanced features |
Qwen3-32B |
64 | 32.76B | Largest model with state-of-the-art performance |
Example Usage
import keras
import keras_hub
import numpy as np
# Load pre-trained Qwen3 model
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "qwen3_4b_en")
# Generate text from prompt
response = qwen3_lm.generate("I want to learn about", max_length=50)
print(response)
# Batch generation with multiple prompts
prompts = ["The future of AI is", "Machine learning helps us"]
responses = qwen3_lm.generate(prompts, max_length=30)
for prompt, response in zip(prompts, responses):
print(f"Prompt: {prompt}")
print(f"Response: {response}\n")
Custom Sampling Strategies
# Greedy sampling (default)
qwen3_lm.compile(sampler="greedy")
response = qwen3_lm.generate("Explain quantum computing", max_length=100)
# Top-k sampling
qwen3_lm.compile(sampler="top_k")
response = qwen3_lm.generate("Write a story about", max_length=80)
# Beam search
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
Fine-tuning with LoRA
# Enable LoRA for efficient fine-tuning
qwen3_lm.backbone.enable_lora(rank=8)
# Prepare training data
training_texts = [
"The quick brown fox jumped over the lazy dog.",
"Machine learning is a subset of artificial intelligence.",
"Python is a popular programming language for data science.",
"Deep learning models require large amounts of training data.",
"Natural language processing helps computers understand human language."
]
# Compile for training
qwen3_lm.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(1e-4),
metrics=["accuracy"]
)
# Fine-tune the model
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
# Generate with fine-tuned model
response = qwen3_lm.generate("The importance of", max_length=50)
print(response)
Custom Backbone Configuration
# Create custom Qwen3 backbone
backbone = keras_hub.models.Qwen3Backbone(
vocabulary_size=151936,
num_layers=12, # Smaller model for faster training
num_query_heads=16,
num_key_value_heads=8,
head_dim=128,
hidden_dim=1024,
intermediate_dim=2048,
layer_norm_epsilon=1e-6,
dropout=0.1,
dtype="float32"
)
# Create tokenizer first
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("qwen3_4b_en")
# Create preprocessor with tokenizer
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
tokenizer=tokenizer,
sequence_length=512
)
# Create custom causal LM
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
backbone=backbone,
preprocessor=preprocessor
)
# Compile and train
custom_qwen3.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(1e-4)
)
# Training data
texts = ["Hello world", "How are you", "Machine learning"]
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
Example Usage with Hugging Face URI
import keras
import keras_hub
import numpy as np
# Load pre-trained Qwen3 model
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "hf://keras/qwen3_4b_en")
# Generate text from prompt
response = qwen3_lm.generate("I want to learn about", max_length=50)
print(response)
# Batch generation with multiple prompts
prompts = ["The future of AI is", "Machine learning helps us"]
responses = qwen3_lm.generate(prompts, max_length=30)
for prompt, response in zip(prompts, responses):
print(f"Prompt: {prompt}")
print(f"Response: {response}\n")
Custom Sampling Strategies
# Greedy sampling (default)
qwen3_lm.compile(sampler="greedy")
response = qwen3_lm.generate("Explain quantum computing", max_length=100)
# Top-k sampling
qwen3_lm.compile(sampler="top_k")
response = qwen3_lm.generate("Write a story about", max_length=80)
# Beam search
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
Fine-tuning with LoRA
# Enable LoRA for efficient fine-tuning
qwen3_lm.backbone.enable_lora(rank=8)
# Prepare training data
training_texts = [
"The quick brown fox jumped over the lazy dog.",
"Machine learning is a subset of artificial intelligence.",
"Python is a popular programming language for data science.",
"Deep learning models require large amounts of training data.",
"Natural language processing helps computers understand human language."
]
# Compile for training
qwen3_lm.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(1e-4),
metrics=["accuracy"]
)
# Fine-tune the model
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
# Generate with fine-tuned model
response = qwen3_lm.generate("The importance of", max_length=50)
print(response)
Custom Backbone Configuration
# Create custom Qwen3 backbone
backbone = keras_hub.models.Qwen3Backbone(
vocabulary_size=151936,
num_layers=12, # Smaller model for faster training
num_query_heads=16,
num_key_value_heads=8,
head_dim=128,
hidden_dim=1024,
intermediate_dim=2048,
layer_norm_epsilon=1e-6,
dropout=0.1,
dtype="float32"
)
# Create tokenizer first
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("hf://keras/qwen3_4b_en")
# Create preprocessor with tokenizer
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
tokenizer=tokenizer,
sequence_length=512
)
# Create custom causal LM
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
backbone=backbone,
preprocessor=preprocessor
)
# Compile and train
custom_qwen3.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.Adam(1e-4)
)
# Training data
texts = ["Hello world", "How are you", "Machine learning"]
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
- Downloads last month
- 10