Model Overview

Model Summary

Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Both language models and multimodal models are pretrained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences. Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.

The latest version, Qwen3, has the following features:

Dense and Mixture-of-Experts (MoE) models, available in 0.6B, 1.7B, 4B, 8B, 14B, 30B,32B and 235B

Seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose chat) within a single model, ensuring optimal performance across various scenarios.

Significantly enhancement in reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.

Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.

Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.

Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation.

For more details, please refer to Qwen Blog, GitHub, and Documentation.

Weights are released under the Apache 2 License . Keras model code is released under the Apache 2 License.

Installation

Keras and KerasHub can be installed with:

pip install -U -q keras-hub
pip install -U -q keras

Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the Keras Getting Started page.

Available Qwen3 Presets

The following model checkpoints are provided by the Keras team. Full code examples for each are available below.

Preset	Layers	Parameters	Description
`qwen3_0.6b_en`	28	596M	Smallest model, optimized for efficiency
`qwen3_1.7b_en`	28	1.72B	Lightweight model with good balance
`qwen3_4b _en`	36	4.02B	Medium model with improved reasoning
`qwen3_8b_en`	36	8.19B	Large model with enhanced capabilities
`qwen3_14b_en`	40	14.77B	High-performance model with advanced features
`qwen3_32b_en`	64	32.76B	Largest model with state-of-the-art performance

Example Usage

import keras
import keras_hub
import numpy as np


# Load pre-trained Qwen3 model
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "qwen3_4b_en")

# Generate text from prompt
response = qwen3_lm.generate("I want to learn about", max_length=50)
print(response)

# Batch generation with multiple prompts
prompts = ["The future of AI is", "Machine learning helps us"]
responses = qwen3_lm.generate(prompts, max_length=30)
for prompt, response in zip(prompts, responses):
    print(f"Prompt: {prompt}")
    print(f"Response: {response}\n")

Custom Sampling Strategies


# Greedy sampling (default)
qwen3_lm.compile(sampler="greedy")
response = qwen3_lm.generate("Explain quantum computing", max_length=100)

# Top-k sampling
qwen3_lm.compile(sampler="top_k")
response = qwen3_lm.generate("Write a story about", max_length=80)

# Beam search
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)

Fine-tuning with LoRA


# Enable LoRA for efficient fine-tuning
qwen3_lm.backbone.enable_lora(rank=8)

# Prepare training data
training_texts = [
    "The quick brown fox jumped over the lazy dog.",
    "Machine learning is a subset of artificial intelligence.",
    "Python is a popular programming language for data science.",
    "Deep learning models require large amounts of training data.",
    "Natural language processing helps computers understand human language."
]

# Compile for training
qwen3_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(1e-4),
    metrics=["accuracy"]
)

# Fine-tune the model
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)

# Generate with fine-tuned model
response = qwen3_lm.generate("The importance of", max_length=50)
print(response)

Custom Backbone Configuration


# Create custom Qwen3 backbone
backbone = keras_hub.models.Qwen3Backbone(
    vocabulary_size=151936,
    num_layers=12,  # Smaller model for faster training
    num_query_heads=16,
    num_key_value_heads=8,
    head_dim=128,
    hidden_dim=1024,
    intermediate_dim=2048,
    layer_norm_epsilon=1e-6,
    dropout=0.1,
    dtype="float32"
)

# Create tokenizer first
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("qwen3_4b_en")

# Create preprocessor with tokenizer
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
    tokenizer=tokenizer,
    sequence_length=512
)

# Create custom causal LM
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
    backbone=backbone,
    preprocessor=preprocessor
)

# Compile and train
custom_qwen3.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(1e-4)
)

# Training data
texts = ["Hello world", "How are you", "Machine learning"]
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)

Example Usage with Hugging Face URI

import keras
import keras_hub
import numpy as np


# Load pre-trained Qwen3 model
qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "hf://keras/qwen3_4b_en")

# Generate text from prompt
response = qwen3_lm.generate("I want to learn about", max_length=50)
print(response)

# Batch generation with multiple prompts
prompts = ["The future of AI is", "Machine learning helps us"]
responses = qwen3_lm.generate(prompts, max_length=30)
for prompt, response in zip(prompts, responses):
    print(f"Prompt: {prompt}")
    print(f"Response: {response}\n")

Custom Sampling Strategies


# Greedy sampling (default)
qwen3_lm.compile(sampler="greedy")
response = qwen3_lm.generate("Explain quantum computing", max_length=100)

# Top-k sampling
qwen3_lm.compile(sampler="top_k")
response = qwen3_lm.generate("Write a story about", max_length=80)

# Beam search
qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
response = qwen3_lm.generate("The best way to learn programming is", max_length=60)

Fine-tuning with LoRA


# Enable LoRA for efficient fine-tuning
qwen3_lm.backbone.enable_lora(rank=8)

# Prepare training data
training_texts = [
    "The quick brown fox jumped over the lazy dog.",
    "Machine learning is a subset of artificial intelligence.",
    "Python is a popular programming language for data science.",
    "Deep learning models require large amounts of training data.",
    "Natural language processing helps computers understand human language."
]

# Compile for training
qwen3_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(1e-4),
    metrics=["accuracy"]
)

# Fine-tune the model
qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)

# Generate with fine-tuned model
response = qwen3_lm.generate("The importance of", max_length=50)
print(response)

Custom Backbone Configuration


# Create custom Qwen3 backbone
backbone = keras_hub.models.Qwen3Backbone(
    vocabulary_size=151936,
    num_layers=12,  # Smaller model for faster training
    num_query_heads=16,
    num_key_value_heads=8,
    head_dim=128,
    hidden_dim=1024,
    intermediate_dim=2048,
    layer_norm_epsilon=1e-6,
    dropout=0.1,
    dtype="float32"
)

# Create tokenizer first
tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("hf://keras/qwen3_4b_en")

# Create preprocessor with tokenizer
preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
    tokenizer=tokenizer,
    sequence_length=512
)

# Create custom causal LM
custom_qwen3 = keras_hub.models.Qwen3CausalLM(
    backbone=backbone,
    preprocessor=preprocessor
)

# Compile and train
custom_qwen3.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(1e-4)
)

# Training data
texts = ["Hello world", "How are you", "Machine learning"]
custom_qwen3.fit(x=texts, batch_size=2, epochs=1)

Downloads last month: 15

Collection including keras/qwen3_4b_en

Qwen 3

Collection

Multilingual, multimodal LLMs (0.6B–32B) with advanced reasoning, agent capabilities, and seamless mode switching. • 6 items • Updated Feb 26

keras
/

qwen3_4b_en

Model Overview

Model Summary

Links

Installation

Available Qwen3 Presets

Example Usage

Custom Sampling Strategies

Fine-tuning with LoRA

Custom Backbone Configuration

Example Usage with Hugging Face URI

Custom Sampling Strategies

Fine-tuning with LoRA

Custom Backbone Configuration

Collection including keras/qwen3_4b_en

Qwen 3