Improve language tag

#1
by lbourdois - opened
Files changed (1) hide show
  1. README.md +124 -112
README.md CHANGED
@@ -1,112 +1,124 @@
1
- ---
2
- license: apache-2.0
3
- license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
4
- language:
5
- - en
6
- pipeline_tag: text-generation
7
- base_model: Qwen/Qwen2.5-32B
8
- tags:
9
- - chat
10
- library_name: transformers
11
- ---
12
-
13
- # Qwen2.5-32B-Instruct
14
-
15
- ## Introduction
16
-
17
- Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
18
-
19
- - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
20
- - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
21
- - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
22
- - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
23
-
24
- **This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features:
25
- - Type: Causal Language Models
26
- - Training Stage: Pretraining & Post-training
27
- - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
28
- - Number of Parameters: 32.5B
29
- - Number of Paramaters (Non-Embedding): 31.0B
30
- - Number of Layers: 64
31
- - Number of Attention Heads (GQA): 40 for Q and 8 for KV
32
- - Context Length: Full 131,072 tokens and generation 8192 tokens
33
- - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
34
-
35
- For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
36
-
37
- ## Requirements
38
-
39
- The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
40
-
41
- With `transformers<4.37.0`, you will encounter the following error:
42
- ```
43
- KeyError: 'qwen2'
44
- ```
45
-
46
- ## Quickstart
47
-
48
- Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
49
-
50
- ```python
51
- from transformers import AutoModelForCausalLM, AutoTokenizer
52
-
53
- model_name = "Qwen/Qwen2.5-32B-Instruct"
54
-
55
- model = AutoModelForCausalLM.from_pretrained(
56
- model_name,
57
- torch_dtype="auto",
58
- device_map="auto"
59
- )
60
- tokenizer = AutoTokenizer.from_pretrained(model_name)
61
-
62
- prompt = "Give me a short introduction to large language model."
63
- messages = [
64
- {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
65
- {"role": "user", "content": prompt}
66
- ]
67
- text = tokenizer.apply_chat_template(
68
- messages,
69
- tokenize=False,
70
- add_generation_prompt=True
71
- )
72
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
73
-
74
- generated_ids = model.generate(
75
- **model_inputs,
76
- max_new_tokens=512
77
- )
78
- generated_ids = [
79
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
80
- ]
81
-
82
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
83
- ```
84
-
85
- ### Processing Long Texts
86
-
87
- The current `config.json` is set for context length up to 32,768 tokens.
88
- To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
89
-
90
- For supported frameworks, you could add the following to `config.json` to enable YaRN:
91
- ```json
92
- {
93
- ...,
94
- "rope_scaling": {
95
- "factor": 4.0,
96
- "original_max_position_embeddings": 32768,
97
- "type": "yarn"
98
- }
99
- }
100
- ```
101
-
102
- For deployment, we recommend using vLLM.
103
- Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
104
- Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
105
- We advise adding the `rope_scaling` configuration only when processing long contexts is required.
106
-
107
- ## Evaluation & Performance
108
-
109
- Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
110
-
111
- For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
112
-
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ license_link: https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ pipeline_tag: text-generation
19
+ base_model: Qwen/Qwen2.5-32B
20
+ tags:
21
+ - chat
22
+ library_name: transformers
23
+ ---
24
+
25
+ # Qwen2.5-32B-Instruct
26
+
27
+ ## Introduction
28
+
29
+ Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
30
+
31
+ - Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
32
+ - Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
33
+ - **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
34
+ - **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
35
+
36
+ **This repo contains the instruction-tuned 32B Qwen2.5 model**, which has the following features:
37
+ - Type: Causal Language Models
38
+ - Training Stage: Pretraining & Post-training
39
+ - Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
40
+ - Number of Parameters: 32.5B
41
+ - Number of Paramaters (Non-Embedding): 31.0B
42
+ - Number of Layers: 64
43
+ - Number of Attention Heads (GQA): 40 for Q and 8 for KV
44
+ - Context Length: Full 131,072 tokens and generation 8192 tokens
45
+ - Please refer to [this section](#processing-long-texts) for detailed instructions on how to deploy Qwen2.5 for handling long texts.
46
+
47
+ For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
48
+
49
+ ## Requirements
50
+
51
+ The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
52
+
53
+ With `transformers<4.37.0`, you will encounter the following error:
54
+ ```
55
+ KeyError: 'qwen2'
56
+ ```
57
+
58
+ ## Quickstart
59
+
60
+ Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
61
+
62
+ ```python
63
+ from transformers import AutoModelForCausalLM, AutoTokenizer
64
+
65
+ model_name = "Qwen/Qwen2.5-32B-Instruct"
66
+
67
+ model = AutoModelForCausalLM.from_pretrained(
68
+ model_name,
69
+ torch_dtype="auto",
70
+ device_map="auto"
71
+ )
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
+
74
+ prompt = "Give me a short introduction to large language model."
75
+ messages = [
76
+ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
77
+ {"role": "user", "content": prompt}
78
+ ]
79
+ text = tokenizer.apply_chat_template(
80
+ messages,
81
+ tokenize=False,
82
+ add_generation_prompt=True
83
+ )
84
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
85
+
86
+ generated_ids = model.generate(
87
+ **model_inputs,
88
+ max_new_tokens=512
89
+ )
90
+ generated_ids = [
91
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
92
+ ]
93
+
94
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
95
+ ```
96
+
97
+ ### Processing Long Texts
98
+
99
+ The current `config.json` is set for context length up to 32,768 tokens.
100
+ To handle extensive inputs exceeding 32,768 tokens, we utilize [YaRN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
101
+
102
+ For supported frameworks, you could add the following to `config.json` to enable YaRN:
103
+ ```json
104
+ {
105
+ ...,
106
+ "rope_scaling": {
107
+ "factor": 4.0,
108
+ "original_max_position_embeddings": 32768,
109
+ "type": "yarn"
110
+ }
111
+ }
112
+ ```
113
+
114
+ For deployment, we recommend using vLLM.
115
+ Please refer to our [Documentation](https://qwen.readthedocs.io/en/latest/deployment/vllm.html) for usage if you are not familar with vLLM.
116
+ Presently, vLLM only supports static YARN, which means the scaling factor remains constant regardless of input length, **potentially impacting performance on shorter texts**.
117
+ We advise adding the `rope_scaling` configuration only when processing long contexts is required.
118
+
119
+ ## Evaluation & Performance
120
+
121
+ Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
122
+
123
+ For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
124
+