metadata
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
Qwen3-32B
Model Overview
Qwen3-32B has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 32.8B
- Number of Parameters (Non-Embedding): 31.2B
- Number of Layers: 64
- Number of Attention Heads (GQA): 64 for Q and 8 for KV
- Context Length: 32,768 natively and 131,072 tokens with YaRN.