prasadsachin commited on
Commit
6afb189
·
verified ·
1 Parent(s): cac9c38

Update README.md with new model card content

Browse files
Files changed (1) hide show
  1. README.md +291 -21
README.md CHANGED
@@ -2,24 +2,294 @@
2
  library_name: keras-hub
3
  pipeline_tag: text-generation
4
  ---
5
- This is a [`Qwen3` model](https://keras.io/api/keras_hub/models/qwen3) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
6
- This model is related to a `CausalLM` task.
7
-
8
- Model config:
9
- * **name:** qwen3_backbone
10
- * **trainable:** True
11
- * **vocabulary_size:** 151936
12
- * **num_layers:** 36
13
- * **num_query_heads:** 32
14
- * **hidden_dim:** 4096
15
- * **head_dim:** 128
16
- * **intermediate_dim:** 12288
17
- * **rope_max_wavelength:** 1000000
18
- * **rope_scaling_factor:** 1.0
19
- * **num_key_value_heads:** 8
20
- * **layer_norm_epsilon:** 1e-06
21
- * **dropout:** 0.0
22
- * **tie_word_embeddings:** False
23
- * **sliding_window_size:** None
24
-
25
- This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: keras-hub
3
  pipeline_tag: text-generation
4
  ---
5
+ ### Model Overview
6
+ # Model Summary
7
+
8
+ Qwen is the large language model and large multimodal model series of the Qwen Team, Alibaba Group. Both language models and multimodal models are pretrained on large-scale multilingual and multimodal data and post-trained on quality data for aligning to human preferences. Qwen is capable of natural language understanding, text generation, vision understanding, audio understanding, tool use, role play, playing as AI agent, etc.
9
+
10
+ The latest version, Qwen3, has the following features:
11
+
12
+ Dense and Mixture-of-Experts (MoE) models, available in 0.6B, 1.7B, 4B, 8B, 14B, 30B,32B and 235B
13
+
14
+ **Seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose chat) within a single model, ensuring optimal performance across various scenarios.
15
+
16
+ **Significantly enhancement in reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
17
+
18
+ **Superior human preference alignment**, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
19
+
20
+ **Expertise in agent capabilities**, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
21
+
22
+ **Support of 100+ languages and dialects** with strong capabilities for **multilingual instruction following** and **translation**.
23
+
24
+ For more details, please refer to Qwen [Blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/keras-team/keras-hub/tree/master/keras_hub/src/models/qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
25
+
26
+ Weights are released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE) . Keras model code is released under the [Apache 2 License](https://github.com/keras-team/keras-hub/blob/master/LICENSE).
27
+
28
+ ## Links
29
+
30
+ * [Qwen 3 Quickstart Notebook](https://www.kaggle.com/code/laxmareddypatlolla/qwen3-quickstart-notebook)
31
+ * [Qwen 3 API Documentation](https://keras.io/keras_hub/api/models/qwen3/)
32
+ * [Qwen 3 Model Card](https://qwenlm.github.io/blog/qwen3/)
33
+ * [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
34
+ * [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
35
+
36
+ ## Installation
37
+
38
+ Keras and KerasHub can be installed with:
39
+
40
+ ```
41
+ pip install -U -q keras-hub
42
+ pip install -U -q keras
43
+ ```
44
+
45
+ Jax, TensorFlow, and Torch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment see the [Keras Getting Started](https://keras.io/getting_started/) page.
46
+
47
+ ## Available Qwen3 Presets
48
+
49
+ The following model checkpoints are provided by the Keras team. Full code examples for each are available below.
50
+
51
+ | Preset | Layers | Parameters | Description |
52
+ |--------|--------|------------|-------------|
53
+ | `Qwen3-0.6b` | 28 | 596M | Smallest model, optimized for efficiency |
54
+ | `Qwen3-1.7B` | 28 | 1.72B | Lightweight model with good balance |
55
+ | `Qwen3-4B ` | 36 | 4.02B | Medium model with improved reasoning |
56
+ | `Qwen3-8B` | 36 | 8.19B | Large model with enhanced capabilities |
57
+ | `Qwen3-14B ` | 40 | 14.77B | High-performance model with advanced features |
58
+ | `Qwen3-32B` | 64 | 32.76B | Largest model with state-of-the-art performance |
59
+
60
+ ## Example Usage
61
+ ```python
62
+ import keras
63
+ import keras_hub
64
+ import numpy as np
65
+ ```
66
+ ```python
67
+
68
+ # Load pre-trained Qwen3 model
69
+ qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "qwen3_8b_en")
70
+
71
+ # Generate text from prompt
72
+ response = qwen3_lm.generate("I want to learn about", max_length=50)
73
+ print(response)
74
+
75
+ # Batch generation with multiple prompts
76
+ prompts = ["The future of AI is", "Machine learning helps us"]
77
+ responses = qwen3_lm.generate(prompts, max_length=30)
78
+ for prompt, response in zip(prompts, responses):
79
+ print(f"Prompt: {prompt}")
80
+ print(f"Response: {response}\n")
81
+
82
+ ```
83
+
84
+ ## Custom Sampling Strategies
85
+
86
+ ```python
87
+
88
+ # Greedy sampling (default)
89
+ qwen3_lm.compile(sampler="greedy")
90
+ response = qwen3_lm.generate("Explain quantum computing", max_length=100)
91
+
92
+ # Top-k sampling
93
+ qwen3_lm.compile(sampler="top_k")
94
+ response = qwen3_lm.generate("Write a story about", max_length=80)
95
+
96
+ # Beam search
97
+ qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
98
+ response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
99
+
100
+ ```
101
+
102
+ ## Fine-tuning with LoRA
103
+
104
+ ```python
105
+
106
+ # Enable LoRA for efficient fine-tuning
107
+ qwen3_lm.backbone.enable_lora(rank=8)
108
+
109
+ # Prepare training data
110
+ training_texts = [
111
+ "The quick brown fox jumped over the lazy dog.",
112
+ "Machine learning is a subset of artificial intelligence.",
113
+ "Python is a popular programming language for data science.",
114
+ "Deep learning models require large amounts of training data.",
115
+ "Natural language processing helps computers understand human language."
116
+ ]
117
+
118
+ # Compile for training
119
+ qwen3_lm.compile(
120
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
121
+ optimizer=keras.optimizers.Adam(1e-4),
122
+ metrics=["accuracy"]
123
+ )
124
+
125
+ # Fine-tune the model
126
+ qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
127
+
128
+ # Generate with fine-tuned model
129
+ response = qwen3_lm.generate("The importance of", max_length=50)
130
+ print(response)
131
+
132
+ ```
133
+
134
+ ## Custom Backbone Configuration
135
+
136
+ ```python
137
+
138
+ # Create custom Qwen3 backbone
139
+ backbone = keras_hub.models.Qwen3Backbone(
140
+ vocabulary_size=151936,
141
+ num_layers=12, # Smaller model for faster training
142
+ num_query_heads=16,
143
+ num_key_value_heads=8,
144
+ head_dim=128,
145
+ hidden_dim=1024,
146
+ intermediate_dim=2048,
147
+ layer_norm_epsilon=1e-6,
148
+ dropout=0.1,
149
+ dtype="float32"
150
+ )
151
+
152
+ # Create tokenizer first
153
+ tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("qwen3_8b_en")
154
+
155
+ # Create preprocessor with tokenizer
156
+ preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
157
+ tokenizer=tokenizer,
158
+ sequence_length=512
159
+ )
160
+
161
+ # Create custom causal LM
162
+ custom_qwen3 = keras_hub.models.Qwen3CausalLM(
163
+ backbone=backbone,
164
+ preprocessor=preprocessor
165
+ )
166
+
167
+ # Compile and train
168
+ custom_qwen3.compile(
169
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
170
+ optimizer=keras.optimizers.Adam(1e-4)
171
+ )
172
+
173
+ # Training data
174
+ texts = ["Hello world", "How are you", "Machine learning"]
175
+ custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
176
+ ```
177
+
178
+ ## Example Usage with Hugging Face URI
179
+
180
+ ```python
181
+ import keras
182
+ import keras_hub
183
+ import numpy as np
184
+ ```
185
+ ```python
186
+
187
+ # Load pre-trained Qwen3 model
188
+ qwen3_lm = keras_hub.models.Qwen3CausalLM.from_preset( "hf://keras/qwen3_8b_en")
189
+
190
+ # Generate text from prompt
191
+ response = qwen3_lm.generate("I want to learn about", max_length=50)
192
+ print(response)
193
+
194
+ # Batch generation with multiple prompts
195
+ prompts = ["The future of AI is", "Machine learning helps us"]
196
+ responses = qwen3_lm.generate(prompts, max_length=30)
197
+ for prompt, response in zip(prompts, responses):
198
+ print(f"Prompt: {prompt}")
199
+ print(f"Response: {response}\n")
200
+
201
+ ```
202
+
203
+ ## Custom Sampling Strategies
204
+
205
+ ```python
206
+
207
+ # Greedy sampling (default)
208
+ qwen3_lm.compile(sampler="greedy")
209
+ response = qwen3_lm.generate("Explain quantum computing", max_length=100)
210
+
211
+ # Top-k sampling
212
+ qwen3_lm.compile(sampler="top_k")
213
+ response = qwen3_lm.generate("Write a story about", max_length=80)
214
+
215
+ # Beam search
216
+ qwen3_lm.compile(sampler=keras_hub.samplers.BeamSampler(num_beams=4))
217
+ response = qwen3_lm.generate("The best way to learn programming is", max_length=60)
218
+
219
+ ```
220
+
221
+ ## Fine-tuning with LoRA
222
+
223
+ ```python
224
+
225
+ # Enable LoRA for efficient fine-tuning
226
+ qwen3_lm.backbone.enable_lora(rank=8)
227
+
228
+ # Prepare training data
229
+ training_texts = [
230
+ "The quick brown fox jumped over the lazy dog.",
231
+ "Machine learning is a subset of artificial intelligence.",
232
+ "Python is a popular programming language for data science.",
233
+ "Deep learning models require large amounts of training data.",
234
+ "Natural language processing helps computers understand human language."
235
+ ]
236
+
237
+ # Compile for training
238
+ qwen3_lm.compile(
239
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
240
+ optimizer=keras.optimizers.Adam(1e-4),
241
+ metrics=["accuracy"]
242
+ )
243
+
244
+ # Fine-tune the model
245
+ qwen3_lm.fit(x=training_texts, batch_size=2, epochs=3)
246
+
247
+ # Generate with fine-tuned model
248
+ response = qwen3_lm.generate("The importance of", max_length=50)
249
+ print(response)
250
+
251
+ ```
252
+
253
+ ## Custom Backbone Configuration
254
+
255
+ ```python
256
+
257
+ # Create custom Qwen3 backbone
258
+ backbone = keras_hub.models.Qwen3Backbone(
259
+ vocabulary_size=151936,
260
+ num_layers=12, # Smaller model for faster training
261
+ num_query_heads=16,
262
+ num_key_value_heads=8,
263
+ head_dim=128,
264
+ hidden_dim=1024,
265
+ intermediate_dim=2048,
266
+ layer_norm_epsilon=1e-6,
267
+ dropout=0.1,
268
+ dtype="float32"
269
+ )
270
+
271
+ # Create tokenizer first
272
+ tokenizer = keras_hub.models.Qwen3Tokenizer.from_preset("hf://keras/qwen3_8b_en")
273
+
274
+ # Create preprocessor with tokenizer
275
+ preprocessor = keras_hub.models.Qwen3CausalLMPreprocessor(
276
+ tokenizer=tokenizer,
277
+ sequence_length=512
278
+ )
279
+
280
+ # Create custom causal LM
281
+ custom_qwen3 = keras_hub.models.Qwen3CausalLM(
282
+ backbone=backbone,
283
+ preprocessor=preprocessor
284
+ )
285
+
286
+ # Compile and train
287
+ custom_qwen3.compile(
288
+ loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
289
+ optimizer=keras.optimizers.Adam(1e-4)
290
+ )
291
+
292
+ # Training data
293
+ texts = ["Hello world", "How are you", "Machine learning"]
294
+ custom_qwen3.fit(x=texts, batch_size=2, epochs=1)
295
+ ```