|
|
--- |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE |
|
|
pipeline_tag: text-generation |
|
|
base_model: |
|
|
- Qwen/Qwen3-14B-Base |
|
|
--- |
|
|
|
|
|
# Qwen3-14B |
|
|
<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;"> |
|
|
<img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/> |
|
|
</a> |
|
|
|
|
|
## Qwen3-14B-Instruct Highlights |
|
|
|
|
|
OpenPipe/Qwen3-14B-Instruct is a finetune friendly instruct variant of Qwen3-14B. Qwen3 release does not include a 14B Instruct (non-thinking) model, this fork introduces an updated chat template that makes Qwen3-14B non-thinking by default and be highly compatible with OpenPipe and other finetuning frameworks. |
|
|
|
|
|
The default Qwen3 chat template does not render `<think></think>` tags on the previous assistant message, which can lead to inconsistencies between training and generation. This version resolves that issue by adding `<think></think>` tags to all assistant prompts and generation templates to ensure message format consistency during both training and inference. |
|
|
|
|
|
The model retains the strong general capabilities of Qwen3-14B while providing a more finetuning friendly chat template. |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**Qwen3-14B** has the following features: |
|
|
- Type: Causal Language Models |
|
|
- Training Stage: Pretraining & Post-training |
|
|
- Number of Parameters: 14.8B |
|
|
- Number of Paramaters (Non-Embedding): 13.2B |
|
|
- Number of Layers: 40 |
|
|
- Number of Attention Heads (GQA): 40 for Q and 8 for KV |
|
|
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts). |
|
|
|
|
|
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/). |
|
|
|