jjw0126 commited on
Commit
fdc7bab
·
verified ·
1 Parent(s): 8c4b468

Upload folder using huggingface_hub

Browse files
LICENSE.txt ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
2
+ Llama 3.2 Version Release Date: September 25, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution
5
+ and modification of the Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2
8
+ distributed by Meta at https://llama.meta.com/doc/overview.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are
11
+ entering into this Agreement on such person or entity’s behalf), of the age required under
12
+ applicable laws, rules or regulations to provide legal consent and that has legal authority
13
+ to bind your employer or such other person or entity if you are entering in this Agreement
14
+ on their behalf.
15
+
16
+ “Llama 3.2” means the foundational large language models and software and algorithms, including
17
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
18
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
19
+ https://www.llama.com/llama-downloads.
20
+
21
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and
22
+ any portion thereof) made available under this Agreement.
23
+
24
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or,
25
+ if you are an entity, your principal place of business is in the EEA or Switzerland)
26
+ and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
27
+
28
+
29
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
30
+ you agree to be bound by this Agreement.
31
+
32
+
33
+ 1. License Rights and Redistribution.
34
+
35
+ a. Grant of Rights. You are granted a non-exclusive, worldwide,
36
+ non-transferable and royalty-free limited license under Meta’s intellectual property or other rights
37
+ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works
38
+ of, and make modifications to the Llama Materials.
39
+
40
+ b. Redistribution and Use.
41
+
42
+ i. If you distribute or make available the Llama Materials (or any derivative works thereof),
43
+ or a product or service (including another AI model) that contains any of them, you shall (A) provide
44
+ a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama”
45
+ on a related website, user interface, blogpost, about page, or product documentation. If you use the
46
+ Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
47
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama”
48
+ at the beginning of any such AI model name.
49
+
50
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
51
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
52
+
53
+ iii. You must retain in all copies of the Llama Materials that you distribute the
54
+ following attribution notice within a “Notice” text file distributed as a part of such copies:
55
+ “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms,
56
+ Inc. All Rights Reserved.”
57
+
58
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
59
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for
60
+ the Llama Materials (available at https://www.llama.com/llama3_2/use-policy), which is hereby
61
+ incorporated by reference into this Agreement.
62
+
63
+ 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users
64
+ of the products or services made available by or for Licensee, or Licensee’s affiliates,
65
+ is greater than 700 million monthly active users in the preceding calendar month, you must request
66
+ a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to
67
+ exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
68
+
69
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND
70
+ RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS
71
+ ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES
72
+ OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
73
+ FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED
74
+ WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
75
+
76
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY,
77
+ WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT,
78
+ FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
79
+ IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
80
+
81
+ 5. Intellectual Property.
82
+
83
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials,
84
+ neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates,
85
+ except as required for reasonable and customary use in describing and redistributing the Llama Materials or as
86
+ set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required
87
+ to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible
88
+ at https://about.meta.com/brand/resources/meta/company-brand/). All goodwill arising out of your use of the Mark
89
+ will inure to the benefit of Meta.
90
+
91
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any
92
+ derivative works and modifications of the Llama Materials that are made by you, as between you and Meta,
93
+ you are and will be the owner of such derivative works and modifications.
94
+
95
+ c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or
96
+ counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion
97
+ of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable
98
+ by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or
99
+ claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third
100
+ party arising out of or related to your use or distribution of the Llama Materials.
101
+
102
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access
103
+ to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms
104
+ and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this
105
+ Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3,
106
+ 4 and 7 shall survive the termination of this Agreement.
107
+
108
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of
109
+ California without regard to choice of law principles, and the UN Convention on Contracts for the International
110
+ Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of
111
+ any dispute arising out of this Agreement.
README.md ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
+ library_name: transformers
12
+ pipeline_tag: text-generation
13
+ tags:
14
+ - facebook
15
+ - meta
16
+ - pytorch
17
+ - llama
18
+ - llama-3
19
+ license: llama3.2
20
+ extra_gated_prompt: >-
21
+ ### LLAMA 3.2 COMMUNITY LICENSE AGREEMENT
22
+
23
+
24
+ Llama 3.2 Version Release Date: September 25, 2024
25
+
26
+
27
+ “Agreement” means the terms and conditions for use, reproduction, distribution
28
+ and modification of the Llama Materials set forth herein.
29
+
30
+
31
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.2
32
+ distributed by Meta at https://llama.meta.com/doc/overview.
33
+
34
+
35
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are
36
+ entering into this Agreement on such person or entity’s behalf), of the age required under
37
+ applicable laws, rules or regulations to provide legal consent and that has legal authority
38
+ to bind your employer or such other person or entity if you are entering in this Agreement
39
+ on their behalf.
40
+
41
+
42
+ “Llama 3.2” means the foundational large language models and software and algorithms, including
43
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
44
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
45
+ https://www.llama.com/llama-downloads.
46
+
47
+
48
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.2 and Documentation (and
49
+ any portion thereof) made available under this Agreement.
50
+
51
+
52
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or,
53
+ if you are an entity, your principal place of business is in the EEA or Switzerland)
54
+ and Meta Platforms, Inc. (if you are located outside of the EEA or Switzerland).
55
+
56
+
57
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
58
+ you agree to be bound by this Agreement.
59
+
60
+
61
+ 1. License Rights and Redistribution.
62
+
63
+ a. Grant of Rights. You are granted a non-exclusive, worldwide,
64
+ non-transferable and royalty-free limited license under Meta’s intellectual property or other rights
65
+ owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works
66
+ of, and make modifications to the Llama Materials.
67
+
68
+ b. Redistribution and Use.
69
+
70
+ i. If you distribute or make available the Llama Materials (or any derivative works thereof),
71
+ or a product or service (including another AI model) that contains any of them, you shall (A) provide
72
+ a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama”
73
+ on a related website, user interface, blogpost, about page, or product documentation. If you use the
74
+ Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
75
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama”
76
+ at the beginning of any such AI model name.
77
+
78
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
79
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
80
+
81
+ iii. You must retain in all copies of the Llama Materials that you distribute the
82
+ following attribution notice within a “Notice” text file distributed as a part of such copies:
83
+ “Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms,
84
+ Inc. All Rights Reserved.”
85
+
86
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
87
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for
88
+ the Llama Materials (available at https://www.llama.com/llama3_2/use-policy), which is hereby
89
+ incorporated by reference into this Agreement.
90
+
91
+ 2. Additional Commercial Terms. If, on the Llama 3.2 version release date, the monthly active users
92
+ of the products or services made available by or for Licensee, or Licensee’s affiliates,
93
+ is greater than 700 million monthly active users in the preceding calendar month, you must request
94
+ a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to
95
+ exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
96
+
97
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY OUTPUT AND
98
+ RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, AND META DISCLAIMS
99
+ ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED, INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES
100
+ OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE
101
+ FOR DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND ASSUME ANY RISKS ASSOCIATED
102
+ WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND RESULTS.
103
+
104
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY,
105
+ WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING OUT OF THIS AGREEMENT,
106
+ FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN
107
+ IF META OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
108
+
109
+ 5. Intellectual Property.
110
+
111
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama Materials,
112
+ neither Meta nor Licensee may use any name or mark owned by or associated with the other or any of its affiliates,
113
+ except as required for reasonable and customary use in describing and redistributing the Llama Materials or as
114
+ set forth in this Section 5(a). Meta hereby grants you a license to use “Llama” (the “Mark”) solely as required
115
+ to comply with the last sentence of Section 1.b.i. You will comply with Meta’s brand guidelines (currently accessible
116
+ at https://about.meta.com/brand/resources/meta/company-brand/). All goodwill arising out of your use of the Mark
117
+ will inure to the benefit of Meta.
118
+
119
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with respect to any
120
+ derivative works and modifications of the Llama Materials that are made by you, as between you and Meta,
121
+ you are and will be the owner of such derivative works and modifications.
122
+
123
+ c. If you institute litigation or other proceedings against Meta or any entity (including a cross-claim or
124
+ counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.2 outputs or results, or any portion
125
+ of any of the foregoing, constitutes infringement of intellectual property or other rights owned or licensable
126
+ by you, then any licenses granted to you under this Agreement shall terminate as of the date such litigation or
127
+ claim is filed or instituted. You will indemnify and hold harmless Meta from and against any claim by any third
128
+ party arising out of or related to your use or distribution of the Llama Materials.
129
+
130
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this Agreement or access
131
+ to the Llama Materials and will continue in full force and effect until terminated in accordance with the terms
132
+ and conditions herein. Meta may terminate this Agreement if you are in breach of any term or condition of this
133
+ Agreement. Upon termination of this Agreement, you shall delete and cease use of the Llama Materials. Sections 3,
134
+ 4 and 7 shall survive the termination of this Agreement.
135
+
136
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of the State of
137
+ California without regard to choice of law principles, and the UN Convention on Contracts for the International
138
+ Sale of Goods does not apply to this Agreement. The courts of California shall have exclusive jurisdiction of
139
+ any dispute arising out of this Agreement.
140
+
141
+ ### Llama 3.2 Acceptable Use Policy
142
+
143
+ Meta is committed to promoting safe and fair use of its tools and features, including Llama 3.2.
144
+ If you access or use Llama 3.2, you agree to this Acceptable Use Policy (“**Policy**”).
145
+ The most recent copy of this policy can be found at
146
+ [https://www.llama.com/llama3_2/use-policy](https://www.llama.com/llama3_2/use-policy).
147
+
148
+ #### Prohibited Uses
149
+
150
+ We want everyone to use Llama 3.2 safely and responsibly. You agree you will not use, or allow others to use, Llama 3.2 to:
151
+
152
+ 1. Violate the law or others’ rights, including to:
153
+ 1. Engage in, promote, generate, contribute to, encourage, plan, incite, or further illegal or unlawful activity or content, such as:
154
+ 1. Violence or terrorism
155
+ 2. Exploitation or harm to children, including the solicitation, creation, acquisition, or dissemination of child exploitative content or failure to report Child Sexual Abuse Material
156
+ 3. Human trafficking, exploitation, and sexual violence
157
+ 4. The illegal distribution of information or materials to minors, including obscene materials, or failure to employ legally required age-gating in connection with such information or materials.
158
+ 5. Sexual solicitation
159
+ 6. Any other criminal activity
160
+ 1. Engage in, promote, incite, or facilitate the harassment, abuse, threatening, or bullying of individuals or groups of individuals
161
+ 2. Engage in, promote, incite, or facilitate discrimination or other unlawful or harmful conduct in the provision of employment, employment benefits, credit, housing, other economic benefits, or other essential goods and services
162
+ 3. Engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or related professional practices
163
+ 4. Collect, process, disclose, generate, or infer private or sensitive information about individuals, including information about individuals’ identity, health, or demographic information, unless you have obtained the right to do so in accordance with applicable law
164
+ 5. Engage in or facilitate any action or generate any content that infringes, misappropriates, or otherwise violates any third-party rights, including the outputs or results of any products or services using the Llama Materials
165
+ 6. Create, generate, or facilitate the creation of malicious code, malware, computer viruses or do anything else that could disable, overburden, interfere with or impair the proper working, integrity, operation or appearance of a website or computer system
166
+ 7. Engage in any action, or facilitate any action, to intentionally circumvent or remove usage restrictions or other safety measures, or to enable functionality disabled by Meta 
167
+ 2. Engage in, promote, incite, facilitate, or assist in the planning or development of activities that present a risk of death or bodily harm to individuals, including use of Llama 3.2 related to the following:
168
+ 8. Military, warfare, nuclear industries or applications, espionage, use for materials or activities that are subject to the International Traffic Arms Regulations (ITAR) maintained by the United States Department of State or to the U.S. Biological Weapons Anti-Terrorism Act of 1989 or the Chemical Weapons Convention Implementation Act of 1997
169
+ 9. Guns and illegal weapons (including weapon development)
170
+ 10. Illegal drugs and regulated/controlled substances
171
+ 11. Operation of critical infrastructure, transportation technologies, or heavy machinery
172
+ 12. Self-harm or harm to others, including suicide, cutting, and eating disorders
173
+ 13. Any content intended to incite or promote violence, abuse, or any infliction of bodily harm to an individual
174
+ 3. Intentionally deceive or mislead others, including use of Llama 3.2 related to the following:
175
+ 14. Generating, promoting, or furthering fraud or the creation or promotion of disinformation
176
+ 15. Generating, promoting, or furthering defamatory content, including the creation of defamatory statements, images, or other content
177
+ 16. Generating, promoting, or further distributing spam
178
+ 17. Impersonating another individual without consent, authorization, or legal right
179
+ 18. Representing that the use of Llama 3.2 or outputs are human-generated
180
+ 19. Generating or facilitating false online engagement, including fake reviews and other means of fake online engagement 
181
+ 4. Fail to appropriately disclose to end users any known dangers of your AI system
182
+ 5. Interact with third party tools, models, or software designed to generate unlawful content or engage in unlawful or harmful conduct and/or represent that the outputs of such tools, models, or software are associated with Meta or Llama 3.2
183
+
184
+
185
+ With respect to any multimodal models included in Llama 3.2, the rights granted under Section 1(a) of the Llama 3.2 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models.
186
+
187
+
188
+ Please report any violation of this Policy, software “bug,” or other problems that could lead to a violation of this Policy through one of the following means:
189
+
190
+
191
+ * Reporting issues with the model: [https://github.com/meta-llama/llama-models/issues](https://l.workplace.com/l.php?u=https%3A%2F%2Fgithub.com%2Fmeta-llama%2Fllama-models%2Fissues&h=AT0qV8W9BFT6NwihiOHRuKYQM_UnkzN_NmHMy91OT55gkLpgi4kQupHUl0ssR4dQsIQ8n3tfd0vtkobvsEvt1l4Ic6GXI2EeuHV8N08OG2WnbAmm0FL4ObkazC6G_256vN0lN9DsykCvCqGZ)
192
+
193
+ * Reporting risky content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback)
194
+
195
+ * Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info)
196
+
197
+ * Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama 3.2: LlamaUseReport@meta.com
198
+ extra_gated_fields:
199
+ First Name: text
200
+ Last Name: text
201
+ Date of birth: date_picker
202
+ Country: country
203
+ Affiliation: text
204
+ Job title:
205
+ type: select
206
+ options:
207
+ - Student
208
+ - Research Graduate
209
+ - AI researcher
210
+ - AI developer/engineer
211
+ - Reporter
212
+ - Other
213
+ geo: ip_location
214
+ By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
215
+ extra_gated_description: >-
216
+ The information you provide will be collected, stored, processed and shared in
217
+ accordance with the [Meta Privacy
218
+ Policy](https://www.facebook.com/privacy/policy/).
219
+ extra_gated_button_content: Submit
220
+ ---
221
+
222
+ ## Model Information
223
+
224
+ The Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out). The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks. They outperform many of the available open source and closed chat models on common industry benchmarks.
225
+
226
+ **Model Developer:** Meta
227
+
228
+ **Model Architecture:** Llama 3.2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.
229
+
230
+ | | Training Data | Params | Input modalities | Output modalities | Context Length | GQA | Shared Embeddings | Token count | Knowledge cutoff |
231
+ | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
232
+ | Llama 3.2 (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 128k | Yes | Yes | Up to 9T tokens | December 2023 |
233
+ | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | |
234
+ | Llama 3.2 Quantized (text only) | A new mix of publicly available online data. | 1B (1.23B) | Multilingual Text | Multilingual Text and code | 8k | Yes | Yes | Up to 9T tokens | December 2023 |
235
+ | | | 3B (3.21B) | Multilingual Text | Multilingual Text and code | | | | | |
236
+
237
+ **Supported Languages:** English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai are officially supported. Llama 3.2 has been trained on a broader collection of languages than these 8 supported languages. Developers may fine-tune Llama 3.2 models for languages beyond these supported languages, provided they comply with the Llama 3.2 Community License and the Acceptable Use Policy. Developers are always expected to ensure that their deployments, including those that involve additional languages, are completed safely and responsibly.
238
+
239
+ **Llama 3.2 Model Family:** Token counts refer to pretraining data only. All model versions use Grouped-Query Attention (GQA) for improved inference scalability.
240
+
241
+ **Model Release Date:** Sept 25, 2024
242
+
243
+ **Status:** This is a static model trained on an offline dataset. Future versions may be released that improve model capabilities and safety.
244
+
245
+ **License:** Use of Llama 3.2 is governed by the [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE) (a custom, commercial license agreement).
246
+
247
+ **Feedback:** Instructions on how to provide feedback or comments on the model can be found in the Llama Models [README](https://github.com/meta-llama/llama-models/blob/main/README.md). For more technical information about generation parameters and recipes for how to use Llama 3.2 in applications, please go [here](https://github.com/meta-llama/llama-recipes).
248
+
249
+ ## Intended Use
250
+
251
+ **Intended Use Cases:** Llama 3.2 is intended for commercial and research use in multiple languages. Instruction tuned text only models are intended for assistant-like chat and agentic applications like knowledge retrieval and summarization, mobile AI powered writing assistants and query and prompt rewriting. Pretrained models can be adapted for a variety of additional natural language generation tasks. Similarly, quantized models can be adapted for a variety of on-device use-cases with limited compute resources.
252
+
253
+ **Out of Scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3.2 Community License. Use in languages beyond those explicitly referenced as supported in this model card.
254
+
255
+ ## How to use
256
+
257
+ This repository contains two versions of Llama-3.2-1B, for use with transformers and with the original `llama` codebase.
258
+
259
+ ### Use with transformers
260
+
261
+ Starting with transformers >= 4.43.0 onward, you can run conversational inference using the Transformers pipeline abstraction or by leveraging the Auto classes with the generate() function.
262
+
263
+ Make sure to update your transformers installation via pip install --upgrade transformers.
264
+
265
+ ```python
266
+ import torch
267
+ from transformers import pipeline
268
+
269
+ model_id = "meta-llama/Llama-3.2-1B"
270
+
271
+ pipe = pipeline(
272
+ "text-generation",
273
+ model=model_id,
274
+ torch_dtype=torch.bfloat16,
275
+ device_map="auto"
276
+ )
277
+
278
+ pipe("The key to life is")
279
+ ```
280
+
281
+ ### Use with `llama`
282
+
283
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama).
284
+
285
+ To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
286
+
287
+ ```
288
+ huggingface-cli download meta-llama/Llama-3.2-1B --include "original/*" --local-dir Llama-3.2-1B
289
+ ```
290
+
291
+ ## Hardware and Software
292
+
293
+ **Training Factors:** We used custom training libraries, Meta's custom built GPU cluster, and production infrastructure for pretraining. Fine-tuning, quantization, annotation, and evaluation were also performed on production infrastructure.
294
+
295
+ **Training Energy Use:** Training utilized a cumulative of **916k** GPU hours of computation on H100-80GB (TDP of 700W) type hardware, per the table below. Training time is the total GPU time required for training each model and power consumption is the peak power capacity per GPU device used, adjusted for power usage efficiency.
296
+
297
+ **Training Greenhouse Gas Emissions:** Estimated total location-based greenhouse gas emissions were **240** tons CO2eq for training. Since 2020, Meta has maintained net zero greenhouse gas emissions in its global operations and matched 100% of its electricity use with renewable energy; therefore, the total market-based greenhouse gas emissions for training were 0 tons CO2eq.
298
+
299
+ | | Training Time (GPU hours) | Logit Generation Time (GPU Hours) | Training Power Consumption (W) | Training Location-Based Greenhouse Gas Emissions (tons CO2eq) | Training Market-Based Greenhouse Gas Emissions (tons CO2eq) |
300
+ | :---- | :---: | ----- | :---: | :---: | :---: |
301
+ | Llama 3.2 1B | 370k | \- | 700 | 107 | 0 |
302
+ | Llama 3.2 3B | 460k | \- | 700 | 133 | 0 |
303
+ | Llama 3.2 1B SpinQuant | 1.7 | 0 | 700 | *Negligible*\*\* | 0 |
304
+ | Llama 3.2 3B SpinQuant | 2.4 | 0 | 700 | *Negligible*\*\* | 0 |
305
+ | Llama 3.2 1B QLora | 1.3k | 0 | 700 | 0.381 | 0 |
306
+ | Llama 3.2 3B QLora | 1.6k | 0 | 700 | 0.461 | 0 |
307
+ | Total | 833k | 86k | | 240 | 0 |
308
+
309
+ \*\* The location-based CO2e emissions of Llama 3.2 1B SpinQuant and Llama 3.2 3B SpinQuant are less than 0.001 metric tonnes each. This is due to the minimal training GPU hours that are required.
310
+
311
+ The methodology used to determine training energy use and greenhouse gas emissions can be found [here](https://arxiv.org/pdf/2204.05149). Since Meta is openly releasing these models, the training energy use and greenhouse gas emissions will not be incurred by others.
312
+
313
+ ## Training Data
314
+
315
+ **Overview:** Llama 3.2 was pretrained on up to 9 trillion tokens of data from publicly available sources. For the 1B and 3B Llama 3.2 models, we incorporated logits from the Llama 3.1 8B and 70B models into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance. In post-training we used a similar recipe as Llama 3.1 and produced final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involved Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).
316
+
317
+ **Data Freshness:** The pretraining data has a cutoff of December 2023\.
318
+
319
+ ## Quantization
320
+
321
+ ### Quantization Scheme
322
+
323
+ We designed the current quantization scheme with the [PyTorch’s ExecuTorch](https://github.com/pytorch/executorch) inference framework and Arm CPU backend in mind, taking into account metrics including model quality, prefill/decoding speed, and memory footprint. Our quantization scheme involves three parts:
324
+ - All linear layers in all transformer blocks are quantized to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations.
325
+ - The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation.
326
+ - Similar to classification layer, an 8-bit per channel quantization is used for embedding layer.
327
+
328
+
329
+ ### Quantization-Aware Training and LoRA
330
+
331
+ The quantization-aware training (QAT) with low-rank adaptation (LoRA) models went through only post-training stages, using the same data as the full precision models. To initialize QAT, we utilize BF16 Llama 3.2 model checkpoints obtained after supervised fine-tuning (SFT) and perform an additional full round of SFT training with QAT. We then freeze the backbone of the QAT model and perform another round of SFT with LoRA adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16. Because our approach is similar to QLoRA of Dettmers et al., (2023) (i.e., quantization followed by LoRA adapters), we refer this method as QLoRA. Finally, we fine-tune the resulting model (both backbone and LoRA adaptors) using direct preference optimization (DPO).
332
+
333
+ ### SpinQuant
334
+
335
+ [SpinQuant](https://arxiv.org/abs/2405.16406) was applied, together with generative post-training quantization (GPTQ). For the SpinQuant rotation matrix fine-tuning, we optimized for 100 iterations, using 800 samples with sequence-length 2048 from the WikiText 2 dataset. For GPTQ, we used 128 samples from the same dataset with the same sequence-length.
336
+
337
+ ## Benchmarks \- English Text
338
+
339
+ In this section, we report the results for Llama 3.2 models on standard automatic benchmarks. For all these evaluations, we used our internal evaluations library.
340
+
341
+ ### Base Pretrained Models
342
+
343
+ | Category | Benchmark | \# Shots | Metric | Llama 3.2 1B | Llama 3.2 3B | Llama 3.1 8B |
344
+ | ----- | ----- | :---: | :---: | :---: | :---: | :---: |
345
+ | General | MMLU | 5 | macro\_avg/acc\_char | 32.2 | 58 | 66.7 |
346
+ | | AGIEval English | 3-5 | average/acc\_char | 23.3 | 39.2 | 47.8 |
347
+ | | ARC-Challenge | 25 | acc\_char | 32.8 | 69.1 | 79.7 |
348
+ | Reading comprehension | SQuAD | 1 | em | 49.2 | 67.7 | 77 |
349
+ | | QuAC (F1) | 1 | f1 | 37.9 | 42.9 | 44.9 |
350
+ | | DROP (F1) | 3 | f1 | 28.0 | 45.2 | 59.5 |
351
+ | Long Context | Needle in Haystack | 0 | em | 96.8 | 1 | 1 |
352
+
353
+ ### Instruction Tuned Models
354
+
355
+ | Capability | | Benchmark | \# Shots | Metric | Llama 3.2 1B bf16 | Llama 3.2 1B Vanilla PTQ\*\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B bf16 | Llama 3.2 3B Vanilla PTQ\*\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B |
356
+ | :---: | ----- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
357
+ | General | | MMLU | 5 | macro\_avg/acc | 49.3 | 43.3 | 47.3 | 49.0 | 63.4 | 60.5 | 62 | 62.4 | 69.4 |
358
+ | Re-writing | | Open-rewrite eval | 0 | micro\_avg/rougeL | 41.6 | 39.2 | 40.9 | 41.2 | 40.1 | 40.3 | 40.8 | 40.7 | 40.9 |
359
+ | Summarization | | TLDR9+ (test) | 1 | rougeL | 16.8 | 14.9 | 16.7 | 16.8 | 19.0 | 19.1 | 19.2 | 19.1 | 17.2 |
360
+ | Instruction following | | IFEval | 0 | Avg(Prompt/Instruction acc Loose/Strict) | 59.5 | 51.5 | 58.4 | 55.6 | 77.4 | 73.9 | 73.5 | 75.9 | 80.4 |
361
+ | Math | | GSM8K (CoT) | 8 | em\_maj1@1 | 44.4 | 33.1 | 40.6 | 46.5 | 77.7 | 72.9 | 75.7 | 77.9 | 84.5 |
362
+ | | | MATH (CoT) | 0 | final\_em | 30.6 | 20.5 | 25.3 | 31.0 | 48.0 | 44.2 | 45.3 | 49.2 | 51.9 |
363
+ | Reasoning | | ARC-C | 0 | acc | 59.4 | 54.3 | 57 | 60.7 | 78.6 | 75.6 | 77.6 | 77.6 | 83.4 |
364
+ | | | GPQA | 0 | acc | 27.2 | 25.9 | 26.3 | 25.9 | 32.8 | 32.8 | 31.7 | 33.9 | 32.8 |
365
+ | | | Hellaswag | 0 | acc | 41.2 | 38.1 | 41.3 | 41.5 | 69.8 | 66.3 | 68 | 66.3 | 78.7 |
366
+ | Tool Use | | BFCL V2 | 0 | acc | 25.7 | 14.3 | 15.9 | 23.7 | 67.0 | 53.4 | 60.1 | 63.5 | 67.1 |
367
+ | | | Nexus | 0 | macro\_avg/acc | 13.5 | 5.2 | 9.6 | 12.5 | 34.3 | 32.4 | 31.5 | 30.1 | 38.5 |
368
+ | Long Context | | InfiniteBench/En.QA | 0 | longbook\_qa/f1 | 20.3 | N/A | N/A | N/A | 19.8 | N/A | N/A | N/A | 27.3 |
369
+ | | | InfiniteBench/En.MC | 0 | longbook\_choice/acc | 38.0 | N/A | N/A | N/A | 63.3 | N/A | N/A | N/A | 72.2 |
370
+ | | | NIH/Multi-needle | 0 | recall | 75.0 | N/A | N/A | N/A | 84.7 | N/A | N/A | N/A | 98.8 |
371
+ | Multilingual | | MGSM (CoT) | 0 | em | 24.5 | 13.7 | 18.2 | 24.4 | 58.2 | 48.9 | 54.3 | 56.8 | 68.9 |
372
+
373
+ \*\*for comparison purposes only. Model not released.
374
+
375
+ ### Multilingual Benchmarks
376
+
377
+ | Category | Benchmark | Language | Llama 3.2 1B | Llama 3.2 1B Vanilla PTQ\*\* | Llama 3.2 1B Spin Quant | Llama 3.2 1B QLoRA | Llama 3.2 3B | Llama 3.2 3B Vanilla PTQ\*\* | Llama 3.2 3B Spin Quant | Llama 3.2 3B QLoRA | Llama 3.1 8B |
378
+ | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
379
+ | General | MMLU (5-shot, macro_avg/acc) | Portuguese | 39.8 | 34.9 | 38.9 | 40.2 | 54.5 | 50.9 | 53.3 | 53.4 | 62.1 |
380
+ | | | Spanish | 41.5 | 36.0 | 39.8 | 41.8 | 55.1 | 51.9 | 53.6 | 53.6 | 62.5 |
381
+ | | | Italian | 39.8 | 34.9 | 38.1 | 40.6 | 53.8 | 49.9 | 52.1 | 51.7 | 61.6 |
382
+ | | | German | 39.2 | 34.9 | 37.5 | 39.6 | 53.3 | 50.0 | 52.2 | 51.3 | 60.6 |
383
+ | | | French | 40.5 | 34.8 | 39.2 | 40.8 | 54.6 | 51.2 | 53.3 | 53.3 | 62.3 |
384
+ | | | Hindi | 33.5 | 30.0 | 32.1 | 34.0 | 43.3 | 40.4 | 42.0 | 42.1 | 50.9 |
385
+ | | | Thai | 34.7 | 31.2 | 32.4 | 34.9 | 44.5 | 41.3 | 44.0 | 42.2 | 50.3 |
386
+
387
+ \*\*for comparison purposes only. Model not released.
388
+
389
+ ## Inference time
390
+
391
+ In the below table, we compare the performance metrics of different quantization methods (SpinQuant and QAT \+ LoRA) with the BF16 baseline. The evaluation was done using the [ExecuTorch](https://github.com/pytorch/executorch) framework as the inference engine, with the ARM CPU as a backend using Android OnePlus 12 device.
392
+
393
+ | Category | Decode (tokens/sec) | Time-to-first-token (sec) | Prefill (tokens/sec) | Model size (PTE file size in MB) | Memory size (RSS in MB) |
394
+ | :---- | ----- | ----- | ----- | ----- | ----- |
395
+ | 1B BF16 (baseline) | 19.2 | 1.0 | 60.3 | 2358 | 3,185 |
396
+ | 1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1083 (-54.1%) | 1,921 (-39.7%) |
397
+ | 1B QLoRA | 45.8 (2.4x) | 0.3 (-76.0%) | 252.0 (4.2x) | 1127 (-52.2%) | 2,255 (-29.2%) |
398
+ | 3B BF16 (baseline) | 7.6 | 3.0 | 21.2 | 6129 | 7,419 |
399
+ | 3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2435 (-60.3%) | 3,726 (-49.8%) |
400
+ | 3B QLoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2529 (-58.7%) | 4,060 (-45.3%) |
401
+
402
+ (\*) The performance measurement is done using an adb binary-based approach.
403
+ (\*\*) It is measured on an Android OnePlus 12 device.
404
+ (\*\*\*) Time-to-first-token (TTFT) is measured with prompt length=64
405
+
406
+ *Footnote:*
407
+
408
+ - *Decode (tokens/second) is for how quickly it keeps generating. Higher is better.*
409
+ - *Time-to-first-token (TTFT for shorthand) is for how fast it generates the first token for a given prompt. Lower is better.*
410
+ - *Prefill is the inverse of TTFT (aka 1/TTFT) in tokens/second. Higher is better*
411
+ - *Model size \- how big is the model, measured by, PTE file, a binary file format for ExecuTorch*
412
+ - *RSS size \- Memory usage in resident set size (RSS)*
413
+
414
+ ## Responsibility & Safety
415
+
416
+ As part of our Responsible release approach, we followed a three-pronged strategy to managing trust & safety risks:
417
+
418
+ 1. Enable developers to deploy helpful, safe and flexible experiences for their target audience and for the use cases supported by Llama
419
+ 2. Protect developers against adversarial users aiming to exploit Llama capabilities to potentially cause harm
420
+ 3. Provide protections for the community to help prevent the misuse of our models
421
+
422
+ ### Responsible Deployment
423
+
424
+ **Approach:** Llama is a foundational technology designed to be used in a variety of use cases. Examples on how Meta’s Llama models have been responsibly deployed can be found in our [Community Stories webpage](https://llama.meta.com/community-stories/). Our approach is to build the most helpful models, enabling the world to benefit from the technology power, by aligning our model safety for generic use cases and addressing a standard set of harms. Developers are then in the driver’s seat to tailor safety for their use cases, defining their own policies and deploying the models with the necessary safeguards in their Llama systems. Llama 3.2 was developed following the best practices outlined in our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide/).
425
+
426
+ #### Llama 3.2 Instruct
427
+
428
+ **Objective:** Our main objectives for conducting safety fine-tuning are to provide the research community with a valuable resource for studying the robustness of safety fine-tuning, as well as to offer developers a readily available, safe, and powerful model for various applications to reduce the developer workload to deploy safe AI systems. We implemented the same set of safety mitigations as in Llama 3, and you can learn more about these in the Llama 3 [paper](https://ai.meta.com/research/publications/the-llama-3-herd-of-models/).
429
+
430
+ **Fine-Tuning Data:** We employ a multi-faceted approach to data collection, combining human-generated data from our vendors with synthetic data to mitigate potential safety risks. We’ve developed many large language model (LLM)-based classifiers that enable us to thoughtfully select high-quality prompts and responses, enhancing data quality control.
431
+
432
+ **Refusals and Tone:** Building on the work we started with Llama 3, we put a great emphasis on model refusals to benign prompts as well as refusal tone. We included both borderline and adversarial prompts in our safety data strategy, and modified our safety data responses to follow tone guidelines.
433
+
434
+ #### Llama 3.2 Systems
435
+
436
+ **Safety as a System:** Large language models, including Llama 3.2, **are not designed to be deployed in isolation** but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building agentic systems. Safeguards are key to achieve the right helpfulness-safety alignment as well as mitigating safety and security risks inherent to the system and any integration of the model or system with external tools. As part of our responsible release approach, we provide the community with [safeguards](https://llama.meta.com/trust-and-safety/) that developers should deploy with Llama models or other LLMs, including Llama Guard, Prompt Guard and Code Shield. All our [reference implementations](https://github.com/meta-llama/llama-agentic-system) demos contain these safeguards by default so developers can benefit from system-level safety out-of-the-box.
437
+
438
+ ### New Capabilities and Use Cases
439
+
440
+ **Technological Advancement:** Llama releases usually introduce new capabilities that require specific considerations in addition to the best practices that generally apply across all Generative AI use cases. For prior release capabilities also supported by Llama 3.2, see [Llama 3.1 Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md), as the same considerations apply here as well.
441
+
442
+ **Constrained Environments:** Llama 3.2 1B and 3B models are expected to be deployed in highly constrained environments, such as mobile devices. LLM Systems using smaller models will have a different alignment profile and safety/helpfulness tradeoff than more complex, larger systems. Developers should ensure the safety of their system meets the requirements of their use case. We recommend using lighter system safeguards for such use cases, like Llama Guard 3-1B or its mobile-optimized version.
443
+
444
+ ### Evaluations
445
+
446
+ **Scaled Evaluations:** We built dedicated, adversarial evaluation datasets and evaluated systems composed of Llama models and Purple Llama safeguards to filter input prompt and output response. It is important to evaluate applications in context, and we recommend building dedicated evaluation dataset for your use case.
447
+
448
+ **Red Teaming:** We conducted recurring red teaming exercises with the goal of discovering risks via adversarial prompting and we used the learnings to improve our benchmarks and safety tuning datasets. We partnered early with subject-matter experts in critical risk areas to understand the nature of these real-world harms and how such models may lead to unintended harm for society. Based on these conversations, we derived a set of adversarial goals for the red team to attempt to achieve, such as extracting harmful information or reprogramming the model to act in a potentially harmful capacity. The red team consisted of experts in cybersecurity, adversarial machine learning, responsible AI, and integrity in addition to multilingual content specialists with background in integrity issues in specific geographic markets.
449
+
450
+ ### Critical Risks
451
+
452
+ In addition to our safety work above, we took extra care on measuring and/or mitigating the following critical risk areas:
453
+
454
+ **1\. CBRNE (Chemical, Biological, Radiological, Nuclear, and Explosive Weapons):** Llama 3.2 1B and 3B models are smaller and less capable derivatives of Llama 3.1. For Llama 3.1 70B and 405B, to assess risks related to proliferation of chemical and biological weapons, we performed uplift testing designed to assess whether use of Llama 3.1 models could meaningfully increase the capabilities of malicious actors to plan or carry out attacks using these types of weapons and have determined that such testing also applies to the smaller 1B and 3B models.
455
+
456
+ **2\. Child Safety:** Child Safety risk assessments were conducted using a team of experts, to assess the model’s capability to produce outputs that could result in Child Safety risks and inform on any necessary and appropriate risk mitigations via fine tuning. We leveraged those expert red teaming sessions to expand the coverage of our evaluation benchmarks through Llama 3 model development. For Llama 3, we conducted new in-depth sessions using objective based methodologies to assess the model risks along multiple attack vectors including the additional languages Llama 3 is trained on. We also partnered with content specialists to perform red teaming exercises assessing potentially violating content while taking account of market specific nuances or experiences.
457
+
458
+ **3\. Cyber Attacks:** For Llama 3.1 405B, our cyber attack uplift study investigated whether LLMs can enhance human capabilities in hacking tasks, both in terms of skill level and speed.
459
+ Our attack automation study focused on evaluating the capabilities of LLMs when used as autonomous agents in cyber offensive operations, specifically in the context of ransomware attacks. This evaluation was distinct from previous studies that considered LLMs as interactive assistants. The primary objective was to assess whether these models could effectively function as independent agents in executing complex cyber-attacks without human intervention. Because Llama 3.2’s 1B and 3B models are smaller and less capable models than Llama 3.1 405B, we broadly believe that the testing conducted for the 405B model also applies to Llama 3.2 models.
460
+
461
+ ### Community
462
+
463
+ **Industry Partnerships:** Generative AI safety requires expertise and tooling, and we believe in the strength of the open community to accelerate its progress. We are active members of open consortiums, including the AI Alliance, Partnership on AI and MLCommons, actively contributing to safety standardization and transparency. We encourage the community to adopt taxonomies like the MLCommons Proof of Concept evaluation to facilitate collaboration and transparency on safety and content evaluations. Our Purple Llama tools are open sourced for the community to use and widely distributed across ecosystem partners including cloud service providers. We encourage community contributions to our [Github repository](https://github.com/meta-llama/PurpleLlama).
464
+
465
+ **Grants:** We also set up the [Llama Impact Grants](https://llama.meta.com/llama-impact-grants/) program to identify and support the most compelling applications of Meta’s Llama model for societal benefit across three categories: education, climate and open innovation. The 20 finalists from the hundreds of applications can be found [here](https://llama.meta.com/llama-impact-grants/#finalists).
466
+
467
+ **Reporting:** Finally, we put in place a set of resources including an [output reporting mechanism](https://developers.facebook.com/llama_output_feedback) and [bug bounty program](https://www.facebook.com/whitehat) to continuously improve the Llama technology with the help of the community.
468
+
469
+ ## Ethical Considerations and Limitations
470
+
471
+ **Values:** The core values of Llama 3.2 are openness, inclusivity and helpfulness. It is meant to serve everyone, and to work for a wide range of use cases. It is thus designed to be accessible to people across many different backgrounds, experiences and perspectives. Llama 3.2 addresses users and their needs as they are, without insertion unnecessary judgment or normativity, while reflecting the understanding that even content that may appear problematic in some cases can serve valuable purposes in others. It respects the dignity and autonomy of all users, especially in terms of the values of free thought and expression that power innovation and progress.
472
+
473
+ **Testing:** Llama 3.2 is a new technology, and like any new technology, there are risks associated with its use. Testing conducted to date has not covered, nor could it cover, all scenarios. For these reasons, as with all LLMs, Llama 3.2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 3.2 models, developers should perform safety testing and tuning tailored to their specific applications of the model. Please refer to available resources including our [Responsible Use Guide](https://llama.meta.com/responsible-use-guide), [Trust and Safety](https://llama.meta.com/trust-and-safety/) solutions, and other [resources](https://llama.meta.com/docs/get-started/) to learn more about responsible development.
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_llama.LlamaConfig",
9
+ "AutoModel": "modeling_llama.LlamaModel",
10
+ "AutoModelForCausalLM": "modeling_llama.LlamaForCausalLM"
11
+ },
12
+ "bos_token_id": 128000,
13
+ "eos_token_id": 128001,
14
+ "head_dim": 64,
15
+ "hidden_act": "silu",
16
+ "hidden_size": 1280,
17
+ "initializer_range": 0.02,
18
+ "intermediate_size": 3584,
19
+ "max_position_embeddings": 2048,
20
+ "mlp_bias": false,
21
+ "num_attention_heads": 20,
22
+ "num_hidden_layers": 54,
23
+ "num_query_heads": 5,
24
+ "query_dim": 64,
25
+ "num_key_heads": 1,
26
+ "key_dim": 64,
27
+ "num_value_heads": 1,
28
+ "value_dim": 128,
29
+ "pretraining_tp": 1,
30
+ "rms_norm_eps": 1e-05,
31
+ "tie_word_embeddings": true,
32
+ "torch_dtype": "bfloat16",
33
+ "use_cache": true,
34
+ "vocab_size": 128256
35
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
configuration_llama.py ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ """LLaMA model configuration"""
21
+
22
+ from transformers.configuration_utils import PretrainedConfig
23
+ from transformers.modeling_rope_utils import rope_config_validation
24
+
25
+
26
+ class LlamaConfig(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a [`LlamaModel`]. It is used to instantiate an LLaMA
29
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
30
+ defaults will yield a similar configuration to that of the LLaMA-7B.
31
+
32
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
33
+ documentation from [`PretrainedConfig`] for more information.
34
+
35
+
36
+ Args:
37
+ vocab_size (`int`, *optional*, defaults to 32000):
38
+ Vocabulary size of the LLaMA model. Defines the number of different tokens that can be represented by the
39
+ `inputs_ids` passed when calling [`LlamaModel`]
40
+ hidden_size (`int`, *optional*, defaults to 4096):
41
+ Dimension of the hidden representations.
42
+ intermediate_size (`int`, *optional*, defaults to 11008):
43
+ Dimension of the MLP representations.
44
+ num_hidden_layers (`int`, *optional*, defaults to 32):
45
+ Number of hidden layers in the Transformer decoder.
46
+ num_attention_heads (`int`, *optional*, defaults to 32):
47
+ Number of attention heads for each attention layer in the Transformer decoder.
48
+ num_key_value_heads (`int`, *optional*):
49
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
50
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
51
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
52
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
53
+ by meanpooling all the original heads within that group. For more details checkout [this
54
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
55
+ `num_attention_heads`.
56
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
57
+ The non-linear activation function (function or string) in the decoder.
58
+ max_position_embeddings (`int`, *optional*, defaults to 2048):
59
+ The maximum sequence length that this model might ever be used with. Llama 1 supports up to 2048 tokens,
60
+ Llama 2 up to 4096, CodeLlama up to 16384.
61
+ initializer_range (`float`, *optional*, defaults to 0.02):
62
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
63
+ rms_norm_eps (`float`, *optional*, defaults to 1e-06):
64
+ The epsilon used by the rms normalization layers.
65
+ use_cache (`bool`, *optional*, defaults to `True`):
66
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
67
+ relevant if `config.is_decoder=True`.
68
+ pad_token_id (`int`, *optional*):
69
+ Padding token id.
70
+ bos_token_id (`int`, *optional*, defaults to 1):
71
+ Beginning of stream token id.
72
+ eos_token_id (`int`, *optional*, defaults to 2):
73
+ End of stream token id.
74
+ pretraining_tp (`int`, *optional*, defaults to 1):
75
+ Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
76
+ document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to
77
+ understand more about it. This value is necessary to ensure exact reproducibility of the pretraining
78
+ results. Please refer to [this issue](https://github.com/pytorch/pytorch/issues/76232).
79
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
80
+ Whether to tie weight embeddings
81
+ rope_theta (`float`, *optional*, defaults to 10000.0):
82
+ The base period of the RoPE embeddings.
83
+ rope_scaling (`Dict`, *optional*):
84
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
85
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
86
+ accordingly.
87
+ Expected contents:
88
+ `rope_type` (`str`):
89
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
90
+ 'llama3'], with 'default' being the original RoPE implementation.
91
+ `factor` (`float`, *optional*):
92
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
93
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
94
+ original maximum pre-trained length.
95
+ `original_max_position_embeddings` (`int`, *optional*):
96
+ Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
97
+ pretraining.
98
+ `attention_factor` (`float`, *optional*):
99
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
100
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
101
+ `factor` field to infer the suggested value.
102
+ `beta_fast` (`float`, *optional*):
103
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
104
+ ramp function. If unspecified, it defaults to 32.
105
+ `beta_slow` (`float`, *optional*):
106
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
107
+ ramp function. If unspecified, it defaults to 1.
108
+ `short_factor` (`List[float]`, *optional*):
109
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
110
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
111
+ size divided by the number of attention heads divided by 2
112
+ `long_factor` (`List[float]`, *optional*):
113
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
114
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
115
+ size divided by the number of attention heads divided by 2
116
+ `low_freq_factor` (`float`, *optional*):
117
+ Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
118
+ `high_freq_factor` (`float`, *optional*):
119
+ Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
120
+ attention_bias (`bool`, *optional*, defaults to `False`):
121
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
122
+ attention_dropout (`float`, *optional*, defaults to 0.0):
123
+ The dropout ratio for the attention probabilities.
124
+ mlp_bias (`bool`, *optional*, defaults to `False`):
125
+ Whether to use a bias in up_proj, down_proj and gate_proj layers in the MLP layers.
126
+ head_dim (`int`, *optional*):
127
+ The attention head dimension. If None, it will default to hidden_size // num_heads
128
+
129
+ ```python
130
+ >>> from transformers import LlamaModel, LlamaConfig
131
+
132
+ >>> # Initializing a LLaMA llama-7b style configuration
133
+ >>> configuration = LlamaConfig()
134
+
135
+ >>> # Initializing a model from the llama-7b style configuration
136
+ >>> model = LlamaModel(configuration)
137
+
138
+ >>> # Accessing the model configuration
139
+ >>> configuration = model.config
140
+ ```"""
141
+
142
+ model_type = "llama"
143
+ keys_to_ignore_at_inference = ["past_key_values"]
144
+
145
+ def __init__(
146
+ self,
147
+ vocab_size=32000,
148
+ hidden_size=4096,
149
+ intermediate_size=11008,
150
+ num_hidden_layers=32,
151
+ num_attention_heads=32,
152
+ num_key_value_heads=None,
153
+ hidden_act="silu",
154
+ max_position_embeddings=2048,
155
+ initializer_range=0.02,
156
+ rms_norm_eps=1e-6,
157
+ use_cache=True,
158
+ pad_token_id=None,
159
+ bos_token_id=1,
160
+ eos_token_id=2,
161
+ pretraining_tp=1,
162
+ tie_word_embeddings=False,
163
+ rope_theta=10000.0,
164
+ rope_scaling=None,
165
+ attention_bias=False,
166
+ attention_dropout=0.0,
167
+ mlp_bias=False,
168
+ head_dim=None,
169
+ **kwargs,
170
+ ):
171
+ self.vocab_size = vocab_size
172
+ self.max_position_embeddings = max_position_embeddings
173
+ self.hidden_size = hidden_size
174
+ self.intermediate_size = intermediate_size
175
+ self.num_hidden_layers = num_hidden_layers
176
+ self.num_attention_heads = num_attention_heads
177
+
178
+ # for backward compatibility
179
+ if num_key_value_heads is None:
180
+ num_key_value_heads = num_attention_heads
181
+
182
+ self.num_key_value_heads = num_key_value_heads
183
+ self.hidden_act = hidden_act
184
+ self.initializer_range = initializer_range
185
+ self.rms_norm_eps = rms_norm_eps
186
+ self.pretraining_tp = pretraining_tp
187
+ self.use_cache = use_cache
188
+ self.rope_theta = rope_theta
189
+ self.rope_scaling = rope_scaling
190
+ self.attention_bias = attention_bias
191
+ self.attention_dropout = attention_dropout
192
+ self.mlp_bias = mlp_bias
193
+ self.head_dim = head_dim if head_dim is not None else self.hidden_size // self.num_attention_heads
194
+ # Validate the correctness of rotary position embeddings parameters
195
+ # BC: if there is a 'type' field, copy it it to 'rope_type'.
196
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
197
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
198
+ rope_config_validation(self)
199
+
200
+ super().__init__(
201
+ pad_token_id=pad_token_id,
202
+ bos_token_id=bos_token_id,
203
+ eos_token_id=eos_token_id,
204
+ tie_word_embeddings=tie_word_embeddings,
205
+ **kwargs,
206
+ )
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5e797f93862d6516bf371a57c377e5f1a46d68233c7eaf9405fd1f5d1b51cd36
3
+ size 2585960552
modeling_llama.py ADDED
@@ -0,0 +1,1543 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
5
+ # and OPT implementations in this library. It has been modified from its
6
+ # original forms to accommodate minor architectural differences compared
7
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
8
+ #
9
+ # Licensed under the Apache License, Version 2.0 (the "License");
10
+ # you may not use this file except in compliance with the License.
11
+ # You may obtain a copy of the License at
12
+ #
13
+ # http://www.apache.org/licenses/LICENSE-2.0
14
+ #
15
+ # Unless required by applicable law or agreed to in writing, software
16
+ # distributed under the License is distributed on an "AS IS" BASIS,
17
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18
+ # See the License for the specific language governing permissions and
19
+ # limitations under the License.
20
+ import math
21
+ from typing import List, Optional, Tuple, Union
22
+
23
+ import torch
24
+ import torch.nn.functional as F
25
+ import torch.utils.checkpoint
26
+ from torch import nn
27
+
28
+ from transformers.activations import ACT2FN
29
+ from transformers.cache_utils import Cache, DynamicCache, StaticCache
30
+ from transformers.generation import GenerationMixin
31
+ from transformers.modeling_attn_mask_utils import AttentionMaskConverter
32
+ from transformers.modeling_flash_attention_utils import _flash_attention_forward
33
+ from transformers.modeling_outputs import (
34
+ BaseModelOutputWithPast,
35
+ CausalLMOutputWithPast,
36
+ QuestionAnsweringModelOutput,
37
+ SequenceClassifierOutputWithPast,
38
+ TokenClassifierOutput,
39
+ )
40
+ from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS
41
+ from transformers.modeling_utils import PreTrainedModel
42
+ from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
43
+ from transformers.utils import (
44
+ add_code_sample_docstrings,
45
+ add_start_docstrings,
46
+ add_start_docstrings_to_model_forward,
47
+ is_flash_attn_greater_or_equal_2_10,
48
+ logging,
49
+ replace_return_docstrings,
50
+ )
51
+ from .configuration_llama import LlamaConfig
52
+
53
+
54
+ logger = logging.get_logger(__name__)
55
+
56
+ _CHECKPOINT_FOR_DOC = "meta-llama/Llama-2-7b-hf"
57
+ _CONFIG_FOR_DOC = "LlamaConfig"
58
+
59
+ def repeat_output(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
60
+ """
61
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
62
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
63
+ """
64
+ batch, slen, num_key_value_heads, head_dim = hidden_states.shape
65
+ if n_rep == 1:
66
+ return hidden_states
67
+ hidden_states = hidden_states[:, :, :, None, :].expand(
68
+ batch, slen, num_key_value_heads, n_rep, head_dim
69
+ )
70
+ return hidden_states.reshape(batch, slen, num_key_value_heads * n_rep, head_dim)
71
+
72
+
73
+ class LlamaRMSNorm(nn.Module):
74
+ def __init__(self, hidden_size, eps=1e-6):
75
+ """
76
+ LlamaRMSNorm is equivalent to T5LayerNorm
77
+ """
78
+ super().__init__()
79
+ self.weight = nn.Parameter(torch.ones(hidden_size))
80
+ self.variance_epsilon = eps
81
+
82
+ def forward(self, hidden_states):
83
+ input_dtype = hidden_states.dtype
84
+ hidden_states = hidden_states.to(torch.float32)
85
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
86
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
87
+ return self.weight * hidden_states.to(input_dtype)
88
+
89
+ def extra_repr(self):
90
+ return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
91
+
92
+
93
+ ALL_LAYERNORM_LAYERS.append(LlamaRMSNorm)
94
+
95
+
96
+ class LlamaRotaryEmbedding(nn.Module):
97
+ def __init__(
98
+ self,
99
+ dim=None,
100
+ max_position_embeddings=2048,
101
+ base=10000,
102
+ device=None,
103
+ scaling_factor=1.0,
104
+ rope_type="default",
105
+ config: Optional[LlamaConfig] = None,
106
+ ):
107
+ super().__init__()
108
+ # TODO (joao): remove the `if` below, only used for BC
109
+ self.rope_kwargs = {}
110
+ if config is None:
111
+ logger.warning_once(
112
+ "`LlamaRotaryEmbedding` can now be fully parameterized by passing the model config through the "
113
+ "`config` argument. All other arguments will be removed in v4.46"
114
+ )
115
+ self.rope_kwargs = {
116
+ "rope_type": rope_type,
117
+ "factor": scaling_factor,
118
+ "dim": dim,
119
+ "base": base,
120
+ "max_position_embeddings": max_position_embeddings,
121
+ }
122
+ self.rope_type = rope_type
123
+ self.max_seq_len_cached = max_position_embeddings
124
+ self.original_max_seq_len = max_position_embeddings
125
+ else:
126
+ # BC: "rope_type" was originally "type"
127
+ if config.rope_scaling is not None:
128
+ self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
129
+ else:
130
+ self.rope_type = "default"
131
+ self.max_seq_len_cached = config.max_position_embeddings
132
+ self.original_max_seq_len = config.max_position_embeddings
133
+
134
+ self.config = config
135
+ self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
136
+
137
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, **self.rope_kwargs)
138
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
139
+ self.original_inv_freq = self.inv_freq
140
+
141
+ def _dynamic_frequency_update(self, position_ids, device):
142
+ """
143
+ dynamic RoPE layers should recompute `inv_freq` in the following situations:
144
+ 1 - growing beyond the cached sequence length (allow scaling)
145
+ 2 - the current sequence length is in the original scale (avoid losing precision with small sequences)
146
+ """
147
+ seq_len = torch.max(position_ids) + 1
148
+ if seq_len > self.max_seq_len_cached: # growth
149
+ inv_freq, self.attention_scaling = self.rope_init_fn(
150
+ self.config, device, seq_len=seq_len, **self.rope_kwargs
151
+ )
152
+ self.register_buffer("inv_freq", inv_freq, persistent=False) # TODO joao: may break with compilation
153
+ self.max_seq_len_cached = seq_len
154
+
155
+ if seq_len < self.original_max_seq_len and self.max_seq_len_cached > self.original_max_seq_len: # reset
156
+ self.register_buffer("inv_freq", self.original_inv_freq, persistent=False)
157
+ self.max_seq_len_cached = self.original_max_seq_len
158
+
159
+ @torch.no_grad()
160
+ def forward(self, x, position_ids):
161
+ if "dynamic" in self.rope_type:
162
+ self._dynamic_frequency_update(position_ids, device=x.device)
163
+
164
+ # Core RoPE block
165
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
166
+ position_ids_expanded = position_ids[:, None, :].float()
167
+ # Force float32 (see https://github.com/huggingface/transformers/pull/29285)
168
+ device_type = x.device.type
169
+ device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
170
+ with torch.autocast(device_type=device_type, enabled=False):
171
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
172
+ emb = torch.cat((freqs, freqs), dim=-1)
173
+ cos = emb.cos()
174
+ sin = emb.sin()
175
+
176
+ # Advanced RoPE types (e.g. yarn) apply a post-processing scaling factor, equivalent to scaling attention
177
+ cos = cos * self.attention_scaling
178
+ sin = sin * self.attention_scaling
179
+
180
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
181
+
182
+
183
+ class LlamaLinearScalingRotaryEmbedding(LlamaRotaryEmbedding):
184
+ """LlamaRotaryEmbedding extended with linear scaling. Credits to the Reddit user /u/kaiokendev"""
185
+
186
+ def __init__(self, *args, **kwargs):
187
+ logger.warning_once(
188
+ "`LlamaLinearScalingRotaryEmbedding` is deprecated an will be removed in v4.46. Please use "
189
+ "`LlamaRotaryEmbedding`, which now also does linear scaling (simply pass the model config to __init__)."
190
+ )
191
+ kwargs["rope_type"] = "linear"
192
+ super().__init__(*args, **kwargs)
193
+
194
+
195
+ class LlamaDynamicNTKScalingRotaryEmbedding(LlamaRotaryEmbedding):
196
+ """LlamaRotaryEmbedding extended with Dynamic NTK scaling. Credits to the Reddit users /u/bloc97 and /u/emozilla"""
197
+
198
+ def __init__(self, *args, **kwargs):
199
+ logger.warning_once(
200
+ "`LlamaDynamicNTKScalingRotaryEmbedding` is deprecated an will be removed in v4.46. Please use "
201
+ "`LlamaRotaryEmbedding`, which now also does dynamic ntk scaling (simply pass the model config to "
202
+ "__init__)."
203
+ )
204
+ kwargs["rope_type"] = "dynamic"
205
+ super().__init__(*args, **kwargs)
206
+
207
+
208
+ def rotate_half(x):
209
+ """Rotates half the hidden dims of the input."""
210
+ x1 = x[..., : x.shape[-1] // 2]
211
+ x2 = x[..., x.shape[-1] // 2 :]
212
+ return torch.cat((-x2, x1), dim=-1)
213
+
214
+
215
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
216
+ """Applies Rotary Position Embedding to the query and key tensors.
217
+
218
+ Args:
219
+ q (`torch.Tensor`): The query tensor.
220
+ k (`torch.Tensor`): The key tensor.
221
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
222
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
223
+ position_ids (`torch.Tensor`, *optional*):
224
+ Deprecated and unused.
225
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
226
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
227
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
228
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
229
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
230
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
231
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
232
+ Returns:
233
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
234
+ """
235
+ cos = cos.unsqueeze(unsqueeze_dim)
236
+ sin = sin.unsqueeze(unsqueeze_dim)
237
+
238
+ q_embed = (q * cos) + (rotate_half(q) * sin)
239
+ k_embed = (k * cos) + (rotate_half(k) * sin)
240
+ return q_embed, k_embed
241
+
242
+
243
+ class LlamaMLP(nn.Module):
244
+ def __init__(self, config):
245
+ super().__init__()
246
+ self.config = config
247
+ self.hidden_size = config.hidden_size
248
+ self.intermediate_size = config.intermediate_size
249
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
250
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
251
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)
252
+ self.act_fn = ACT2FN[config.hidden_act]
253
+
254
+ def forward(self, x):
255
+ if self.config.pretraining_tp > 1:
256
+ slice = self.intermediate_size // self.config.pretraining_tp
257
+ gate_proj_slices = self.gate_proj.weight.split(slice, dim=0)
258
+ up_proj_slices = self.up_proj.weight.split(slice, dim=0)
259
+ down_proj_slices = self.down_proj.weight.split(slice, dim=1)
260
+
261
+ gate_proj = torch.cat(
262
+ [F.linear(x, gate_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1
263
+ )
264
+ up_proj = torch.cat([F.linear(x, up_proj_slices[i]) for i in range(self.config.pretraining_tp)], dim=-1)
265
+
266
+ intermediate_states = (self.act_fn(gate_proj) * up_proj).split(slice, dim=2)
267
+ down_proj = [
268
+ F.linear(intermediate_states[i], down_proj_slices[i]) for i in range(self.config.pretraining_tp)
269
+ ]
270
+ down_proj = sum(down_proj)
271
+ else:
272
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
273
+
274
+ return down_proj
275
+
276
+
277
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
278
+ """
279
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
280
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
281
+ """
282
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
283
+ if n_rep == 1:
284
+ return hidden_states
285
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
286
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
287
+
288
+
289
+ class LlamaAttention(nn.Module):
290
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
291
+
292
+ def __init__(self, config: LlamaConfig, layer_idx: Optional[int] = None):
293
+ super().__init__()
294
+ self.config = config
295
+ self.layer_idx = layer_idx
296
+ if layer_idx is None:
297
+ logger.warning_once(
298
+ f"Instantiating {self.__class__.__name__} without passing a `layer_idx` is not recommended and will "
299
+ "lead to errors during the forward call if caching is used. Please make sure to provide a `layer_idx` "
300
+ "when creating this class."
301
+ )
302
+
303
+ self.attention_dropout = config.attention_dropout
304
+ self.hidden_size = config.hidden_size
305
+ self.num_heads = config.num_attention_heads
306
+ self.head_dim = getattr(config, "head_dim", self.hidden_size // self.num_heads)
307
+ self.num_query_heads = config.num_query_heads
308
+ self.num_key_heads = config.num_key_heads
309
+ self.num_value_heads = config.num_value_heads
310
+ self.query_dim = config.query_dim
311
+ self.key_dim = config.key_dim
312
+ self.value_dim = config.value_dim
313
+ self.num_query_groups = self.num_heads // self.num_query_heads
314
+ self.num_key_groups = self.num_query_heads // self.num_key_heads
315
+ self.num_value_groups = self.num_query_heads // self.num_value_heads
316
+ self.expend_rate = self.hidden_size // (self.num_query_heads * self.value_dim)
317
+ self.max_position_embeddings = config.max_position_embeddings
318
+ self.rope_theta = config.rope_theta
319
+ self.is_causal = True
320
+
321
+ self.q_proj = nn.Linear(self.hidden_size, self.num_query_heads * self.query_dim, bias=True)
322
+ self.k_proj = nn.Linear(self.hidden_size, self.num_key_heads * self.key_dim, bias=True)
323
+ self.v_proj = nn.Linear(self.hidden_size, self.num_value_heads * self.value_dim, bias=True)
324
+ self.o_proj = nn.Linear(self.num_heads * self.head_dim, self.hidden_size, bias=False)
325
+ self.g_proj = nn.Linear(self.hidden_size, self.hidden_size, bias=True)
326
+ self.v_b_proj = nn.Linear(self.num_query_heads * self.value_dim, self.expend_rate * self.value_dim, bias=False)
327
+
328
+ # TODO (joao): remove in v4.46 (RoPE is computed in the model, not in the decoder layers)
329
+ self.rotary_emb = LlamaRotaryEmbedding(config=self.config)
330
+
331
+ def forward(
332
+ self,
333
+ hidden_states: torch.Tensor,
334
+ attention_mask: Optional[torch.Tensor] = None,
335
+ position_ids: Optional[torch.LongTensor] = None,
336
+ past_key_value: Optional[Cache] = None,
337
+ output_attentions: bool = False,
338
+ use_cache: bool = False,
339
+ cache_position: Optional[torch.LongTensor] = None,
340
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # will become mandatory in v4.46
341
+ **kwargs,
342
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
343
+ bsz, q_len, _ = hidden_states.size()
344
+
345
+ query_states = self.q_proj(hidden_states)
346
+ key_states = self.k_proj(hidden_states)
347
+ value_states = self.v_proj(hidden_states)
348
+
349
+ # Flash attention requires the input to have the shape
350
+ # batch_size x seq_length x head_dim x hidden_dim
351
+ # therefore we just need to keep the original shape
352
+ query_states = query_states.view(bsz, q_len, self.num_query_heads, self.query_dim).transpose(1, 2)
353
+ key_states = key_states.view(bsz, q_len, self.num_key_heads, self.key_dim).transpose(1, 2)
354
+ value_states = value_states.view(bsz, q_len, self.num_value_heads, self.value_dim).transpose(1, 2)
355
+
356
+ if position_embeddings is None:
357
+ logger.warning_once(
358
+ "The attention layers in this model are transitioning from computing the RoPE embeddings internally "
359
+ "through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed "
360
+ "`position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be "
361
+ "removed and `position_embeddings` will be mandatory."
362
+ )
363
+ cos, sin = self.rotary_emb(value_states, position_ids)
364
+ else:
365
+ cos, sin = position_embeddings
366
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
367
+
368
+ if past_key_value is not None:
369
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
370
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
371
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
372
+
373
+ key_states = repeat_kv(key_states, self.num_key_groups)
374
+ value_states = repeat_kv(value_states, self.num_value_groups)
375
+
376
+ attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
377
+
378
+ if attention_mask is not None: # no matter the length, we just slice it
379
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
380
+ attn_weights = attn_weights + causal_mask
381
+
382
+ # upcast attention to fp32
383
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
384
+ attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout, training=self.training)
385
+ attn_output = torch.matmul(attn_weights, value_states)
386
+
387
+ # if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
388
+ # raise ValueError(
389
+ # f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
390
+ # f" {attn_output.size()}"
391
+ # )
392
+
393
+ attn_output = attn_output.transpose(1, 2).contiguous()
394
+
395
+ # attn_output = attn_output.reshape(bsz, q_len, -1)
396
+
397
+ # if self.config.pretraining_tp > 1:
398
+ # attn_output = attn_output.split(self.hidden_size // self.config.pretraining_tp, dim=2)
399
+ # o_proj_slices = self.o_proj.weight.split(self.hidden_size // self.config.pretraining_tp, dim=1)
400
+ # attn_output = sum([F.linear(attn_output[i], o_proj_slices[i]) for i in range(self.config.pretraining_tp)])
401
+ # else:
402
+ # attn_output = self.o_proj(attn_output)
403
+
404
+ attn_output_delta = torch.einsum(
405
+ 'bqhv,hvd->bqhd',
406
+ attn_output,
407
+ self.v_b_proj.weight.view(self.num_query_heads, self.value_dim, -1)
408
+ ).reshape(bsz, q_len, self.hidden_size)
409
+
410
+ attn_output = repeat_output(attn_output, self.expend_rate)
411
+
412
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) + attn_output_delta
413
+ attn_output = attn_output.contiguous()
414
+ attn_output = self.o_proj(F.sigmoid(self.g_proj(hidden_states)) * attn_output)
415
+
416
+ if not output_attentions:
417
+ attn_weights = None
418
+
419
+ return attn_output, attn_weights, past_key_value
420
+
421
+
422
+ class LlamaFlashAttention2(LlamaAttention):
423
+ """
424
+ Llama flash attention module. This module inherits from `LlamaAttention` as the weights of the module stays
425
+ untouched. The only required change would be on the forward pass where it needs to correctly call the public API of
426
+ flash attention and deal with padding tokens in case the input contains any of them.
427
+ """
428
+
429
+ def __init__(self, *args, **kwargs):
430
+ super().__init__(*args, **kwargs)
431
+
432
+ # TODO: Should be removed once Flash Attention for RoCm is bumped to 2.1.
433
+ # flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignement, that was made default for flash_attn>=2.1. This attribute is used to handle this difference. Reference: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.1.0.
434
+ # Beware that with flash_attn<2.1, using q_seqlen != k_seqlen (except for the case q_seqlen == 1) produces a wrong mask (top-left).
435
+ self._flash_attn_uses_top_left_mask = not is_flash_attn_greater_or_equal_2_10()
436
+
437
+ def forward(
438
+ self,
439
+ hidden_states: torch.Tensor,
440
+ attention_mask: Optional[torch.LongTensor] = None,
441
+ position_ids: Optional[torch.LongTensor] = None,
442
+ past_key_value: Optional[Cache] = None,
443
+ output_attentions: bool = False,
444
+ use_cache: bool = False,
445
+ cache_position: Optional[torch.LongTensor] = None,
446
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # will become mandatory in v4.46
447
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
448
+ if isinstance(past_key_value, StaticCache):
449
+ raise ValueError(
450
+ "`static` cache implementation is not compatible with `attn_implementation==flash_attention_2` "
451
+ "make sure to use `sdpa` in the mean time, and open an issue at https://github.com/huggingface/transformers"
452
+ )
453
+
454
+ output_attentions = False
455
+
456
+ bsz, q_len, _ = hidden_states.size()
457
+
458
+ query_states = self.q_proj(hidden_states)
459
+ key_states = self.k_proj(hidden_states)
460
+ value_states = self.v_proj(hidden_states)
461
+
462
+ # Flash attention requires the input to have the shape
463
+ # batch_size x seq_length x head_dim x hidden_dim
464
+ # therefore we just need to keep the original shape
465
+ query_states = query_states.view(bsz, q_len, self.num_query_heads, self.query_dim).transpose(1, 2)
466
+ key_states = key_states.view(bsz, q_len, self.num_key_heads, self.key_dim).transpose(1, 2)
467
+ value_states = value_states.view(bsz, q_len, self.num_value_heads, self.value_dim).transpose(1, 2)
468
+
469
+ if position_embeddings is None:
470
+ logger.warning_once(
471
+ "The attention layers in this model are transitioning from computing the RoPE embeddings internally "
472
+ "through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed "
473
+ "`position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be "
474
+ "removed and `position_embeddings` will be mandatory."
475
+ )
476
+ cos, sin = self.rotary_emb(value_states, position_ids)
477
+ else:
478
+ cos, sin = position_embeddings
479
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
480
+ query_states = F.pad(query_states, [0, self.value_dim - self.query_dim])
481
+ key_states = F.pad(key_states, [0, self.value_dim - self.key_dim])
482
+
483
+ if past_key_value is not None:
484
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
485
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
486
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
487
+
488
+ # value_states = repeat_kv(value_states, self.num_query_heads // self.num_value_heads)
489
+ # key_states = repeat_kv(key_states, self.num_query_heads // self.num_key_heads)
490
+ key_states = repeat_kv(key_states, self.num_key_groups)
491
+ value_states = repeat_kv(value_states, self.num_value_groups)
492
+
493
+ # TODO: These transpose are quite inefficient but Flash Attention requires the layout [batch_size, sequence_length, num_heads, head_dim]. We would need to refactor the KV cache
494
+ # to be able to avoid many of these transpose/reshape/view.
495
+ query_states = query_states.transpose(1, 2)
496
+ key_states = key_states.transpose(1, 2)
497
+ value_states = value_states.transpose(1, 2)
498
+
499
+ dropout_rate = self.attention_dropout if self.training else 0.0
500
+
501
+ # In PEFT, usually we cast the layer norms in float32 for training stability reasons
502
+ # therefore the input hidden states gets silently casted in float32. Hence, we need
503
+ # cast them back in the correct dtype just to be sure everything works as expected.
504
+ # This might slowdown training & inference so it is recommended to not cast the LayerNorms
505
+ # in fp32. (LlamaRMSNorm handles it correctly)
506
+
507
+ input_dtype = query_states.dtype
508
+ if input_dtype == torch.float32:
509
+ if torch.is_autocast_enabled():
510
+ target_dtype = torch.get_autocast_gpu_dtype()
511
+ # Handle the case where the model is quantized
512
+ elif hasattr(self.config, "_pre_quantization_dtype"):
513
+ target_dtype = self.config._pre_quantization_dtype
514
+ else:
515
+ target_dtype = self.q_proj.weight.dtype
516
+
517
+ logger.warning_once(
518
+ f"The input hidden states seems to be silently casted in float32, this might be related to"
519
+ f" the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in"
520
+ f" {target_dtype}."
521
+ )
522
+
523
+ query_states = query_states.to(target_dtype)
524
+ key_states = key_states.to(target_dtype)
525
+ value_states = value_states.to(target_dtype)
526
+
527
+ attn_output = _flash_attention_forward(
528
+ query_states,
529
+ key_states,
530
+ value_states,
531
+ attention_mask,
532
+ q_len,
533
+ position_ids=position_ids,
534
+ dropout=dropout_rate,
535
+ sliding_window=getattr(self, "sliding_window", None),
536
+ use_top_left_mask=self._flash_attn_uses_top_left_mask,
537
+ is_causal=self.is_causal,
538
+ softmax_scale=1/math.sqrt(self.query_dim),
539
+ )
540
+
541
+ attn_output_delta = torch.einsum(
542
+ 'bqhv,hvd->bqhd',
543
+ attn_output,
544
+ self.v_b_proj.weight.view(self.num_query_heads, self.value_dim, -1)
545
+ ).reshape(bsz, q_len, self.hidden_size)
546
+ attn_output = repeat_output(attn_output, self.expend_rate)
547
+
548
+ attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) + attn_output_delta
549
+ attn_output = attn_output.contiguous()
550
+ attn_output = self.o_proj(F.sigmoid(self.g_proj(hidden_states)) * attn_output)
551
+
552
+ if not output_attentions:
553
+ attn_weights = None
554
+
555
+ return attn_output, attn_weights, past_key_value
556
+
557
+
558
+ class LlamaSdpaAttention(LlamaAttention):
559
+ """
560
+ Llama attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
561
+ `LlamaAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
562
+ SDPA API.
563
+ """
564
+
565
+ # Adapted from LlamaAttention.forward
566
+ def forward(
567
+ self,
568
+ hidden_states: torch.Tensor,
569
+ attention_mask: Optional[torch.Tensor] = None,
570
+ position_ids: Optional[torch.LongTensor] = None,
571
+ past_key_value: Optional[Cache] = None,
572
+ output_attentions: bool = False,
573
+ use_cache: bool = False,
574
+ cache_position: Optional[torch.LongTensor] = None,
575
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # will become mandatory in v4.46
576
+ **kwargs,
577
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
578
+ if output_attentions:
579
+ # TODO: Improve this warning with e.g. `model.config.attn_implementation = "manual"` once this is implemented.
580
+ logger.warning_once(
581
+ "LlamaModel is using LlamaSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
582
+ 'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
583
+ )
584
+ return super().forward(
585
+ hidden_states=hidden_states,
586
+ attention_mask=attention_mask,
587
+ position_ids=position_ids,
588
+ past_key_value=past_key_value,
589
+ output_attentions=output_attentions,
590
+ use_cache=use_cache,
591
+ cache_position=cache_position,
592
+ position_embeddings=position_embeddings,
593
+ )
594
+
595
+ bsz, q_len, _ = hidden_states.size()
596
+
597
+ query_states = self.q_proj(hidden_states)
598
+ key_states = self.k_proj(hidden_states)
599
+ value_states = self.v_proj(hidden_states)
600
+
601
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
602
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
603
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
604
+
605
+ if position_embeddings is None:
606
+ logger.warning_once(
607
+ "The attention layers in this model are transitioning from computing the RoPE embeddings internally "
608
+ "through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed "
609
+ "`position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be "
610
+ "removed and `position_embeddings` will be mandatory."
611
+ )
612
+ cos, sin = self.rotary_emb(value_states, position_ids)
613
+ else:
614
+ cos, sin = position_embeddings
615
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
616
+
617
+ if past_key_value is not None:
618
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
619
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
620
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
621
+
622
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
623
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
624
+
625
+ causal_mask = attention_mask
626
+ if attention_mask is not None:
627
+ causal_mask = causal_mask[:, :, :, : key_states.shape[-2]]
628
+
629
+ # SDPA with memory-efficient backend is currently (torch==2.1.2) bugged with non-contiguous inputs with custom attn_mask,
630
+ # Reference: https://github.com/pytorch/pytorch/issues/112577.
631
+ if query_states.device.type == "cuda" and causal_mask is not None:
632
+ query_states = query_states.contiguous()
633
+ key_states = key_states.contiguous()
634
+ value_states = value_states.contiguous()
635
+
636
+ # We dispatch to SDPA's Flash Attention or Efficient kernels via this `is_causal` if statement instead of an inline conditional assignment
637
+ # in SDPA to support both torch.compile's dynamic shapes and full graph options. An inline conditional prevents dynamic shapes from compiling.
638
+ is_causal = True if causal_mask is None and q_len > 1 else False
639
+
640
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
641
+ query_states,
642
+ key_states,
643
+ value_states,
644
+ attn_mask=causal_mask,
645
+ dropout_p=self.attention_dropout if self.training else 0.0,
646
+ is_causal=is_causal,
647
+ )
648
+
649
+ attn_output = attn_output.transpose(1, 2).contiguous()
650
+ attn_output = attn_output.view(bsz, q_len, -1)
651
+
652
+ attn_output = self.o_proj(attn_output)
653
+
654
+ return attn_output, None, past_key_value
655
+
656
+
657
+ LLAMA_ATTENTION_CLASSES = {
658
+ "eager": LlamaAttention,
659
+ "flash_attention_2": LlamaFlashAttention2,
660
+ "sdpa": LlamaSdpaAttention,
661
+ }
662
+
663
+
664
+ class LlamaDecoderLayer(nn.Module):
665
+ def __init__(self, config: LlamaConfig, layer_idx: int):
666
+ super().__init__()
667
+ self.hidden_size = config.hidden_size
668
+
669
+ self.self_attn = LLAMA_ATTENTION_CLASSES[config._attn_implementation](config=config, layer_idx=layer_idx)
670
+
671
+ self.mlp = LlamaMLP(config)
672
+ self.input_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
673
+ self.post_attention_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
674
+
675
+ def forward(
676
+ self,
677
+ hidden_states: torch.Tensor,
678
+ attention_mask: Optional[torch.Tensor] = None,
679
+ position_ids: Optional[torch.LongTensor] = None,
680
+ past_key_value: Optional[Cache] = None,
681
+ output_attentions: Optional[bool] = False,
682
+ use_cache: Optional[bool] = False,
683
+ cache_position: Optional[torch.LongTensor] = None,
684
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # will become mandatory in v4.46
685
+ **kwargs,
686
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
687
+ """
688
+ Args:
689
+ hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
690
+ attention_mask (`torch.FloatTensor`, *optional*):
691
+ attention mask of size `(batch_size, sequence_length)` if flash attention is used or `(batch_size, 1,
692
+ query_sequence_length, key_sequence_length)` if default attention is used.
693
+ output_attentions (`bool`, *optional*):
694
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under
695
+ returned tensors for more detail.
696
+ use_cache (`bool`, *optional*):
697
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding
698
+ (see `past_key_values`).
699
+ past_key_value (`Tuple(torch.FloatTensor)`, *optional*): cached past key and value projection states
700
+ cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
701
+ Indices depicting the position of the input sequence tokens in the sequence
702
+ position_embeddings (`Tuple[torch.FloatTensor, torch.FloatTensor]`, *optional*):
703
+ Tuple containing the cosine and sine positional embeddings of shape `(batch_size, seq_len, head_dim)`,
704
+ with `head_dim` being the embedding dimension of each attention head.
705
+ kwargs (`dict`, *optional*):
706
+ Arbitrary kwargs to be ignored, used for FSDP and other methods that injects code
707
+ into the model
708
+ """
709
+ residual = hidden_states
710
+
711
+ hidden_states = self.input_layernorm(hidden_states)
712
+
713
+ # Self Attention
714
+ hidden_states, self_attn_weights, present_key_value = self.self_attn(
715
+ hidden_states=hidden_states,
716
+ attention_mask=attention_mask,
717
+ position_ids=position_ids,
718
+ past_key_value=past_key_value,
719
+ output_attentions=output_attentions,
720
+ use_cache=use_cache,
721
+ cache_position=cache_position,
722
+ position_embeddings=position_embeddings,
723
+ **kwargs,
724
+ )
725
+ hidden_states = residual + hidden_states
726
+
727
+ # Fully Connected
728
+ residual = hidden_states
729
+ hidden_states = self.post_attention_layernorm(hidden_states)
730
+ hidden_states = self.mlp(hidden_states)
731
+ hidden_states = residual + hidden_states
732
+
733
+ outputs = (hidden_states,)
734
+
735
+ if output_attentions:
736
+ outputs += (self_attn_weights,)
737
+
738
+ if use_cache:
739
+ outputs += (present_key_value,)
740
+
741
+ return outputs
742
+
743
+
744
+ LLAMA_START_DOCSTRING = r"""
745
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
746
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
747
+ etc.)
748
+
749
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
750
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
751
+ and behavior.
752
+
753
+ Parameters:
754
+ config ([`LlamaConfig`]):
755
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
756
+ load the weights associated with the model, only the configuration. Check out the
757
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
758
+ """
759
+
760
+
761
+ @add_start_docstrings(
762
+ "The bare LLaMA Model outputting raw hidden-states without any specific head on top.",
763
+ LLAMA_START_DOCSTRING,
764
+ )
765
+ class LlamaPreTrainedModel(PreTrainedModel):
766
+ config_class = LlamaConfig
767
+ base_model_prefix = "model"
768
+ supports_gradient_checkpointing = True
769
+ _no_split_modules = ["LlamaDecoderLayer"]
770
+ _skip_keys_device_placement = ["past_key_values"]
771
+ _supports_flash_attn_2 = True
772
+ _supports_sdpa = True
773
+ _supports_cache_class = True
774
+ _supports_quantized_cache = True
775
+ _supports_static_cache = True
776
+
777
+ def _init_weights(self, module):
778
+ std = self.config.initializer_range
779
+ if isinstance(module, nn.Linear):
780
+ module.weight.data.normal_(mean=0.0, std=std)
781
+ if module.bias is not None:
782
+ module.bias.data.zero_()
783
+ elif isinstance(module, nn.Embedding):
784
+ module.weight.data.normal_(mean=0.0, std=std)
785
+ if module.padding_idx is not None:
786
+ module.weight.data[module.padding_idx].zero_()
787
+
788
+
789
+ LLAMA_INPUTS_DOCSTRING = r"""
790
+ Args:
791
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
792
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
793
+ it.
794
+
795
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
796
+ [`PreTrainedTokenizer.__call__`] for details.
797
+
798
+ [What are input IDs?](../glossary#input-ids)
799
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
800
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
801
+
802
+ - 1 for tokens that are **not masked**,
803
+ - 0 for tokens that are **masked**.
804
+
805
+ [What are attention masks?](../glossary#attention-mask)
806
+
807
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
808
+ [`PreTrainedTokenizer.__call__`] for details.
809
+
810
+ If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
811
+ `past_key_values`).
812
+
813
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
814
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
815
+ information on the default strategy.
816
+
817
+ - 1 indicates the head is **not masked**,
818
+ - 0 indicates the head is **masked**.
819
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
820
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
821
+ config.n_positions - 1]`.
822
+
823
+ [What are position IDs?](../glossary#position-ids)
824
+ past_key_values (`Cache` or `tuple(tuple(torch.FloatTensor))`, *optional*):
825
+ Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
826
+ blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
827
+ returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
828
+
829
+ Two formats are allowed:
830
+ - a [`~cache_utils.Cache`] instance, see our
831
+ [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache);
832
+ - Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of
833
+ shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`). This is also known as the legacy
834
+ cache format.
835
+
836
+ The model will output the same cache format that is fed as input. If no `past_key_values` are passed, the
837
+ legacy cache format will be returned.
838
+
839
+ If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
840
+ have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
841
+ of shape `(batch_size, sequence_length)`.
842
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
843
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
844
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
845
+ model's internal embedding lookup matrix.
846
+ use_cache (`bool`, *optional*):
847
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
848
+ `past_key_values`).
849
+ output_attentions (`bool`, *optional*):
850
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
851
+ tensors for more detail.
852
+ output_hidden_states (`bool`, *optional*):
853
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
854
+ more detail.
855
+ return_dict (`bool`, *optional*):
856
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
857
+ cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
858
+ Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
859
+ this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
860
+ the complete sequence length.
861
+ """
862
+
863
+
864
+ @add_start_docstrings(
865
+ "The bare LLaMA Model outputting raw hidden-states without any specific head on top.",
866
+ LLAMA_START_DOCSTRING,
867
+ )
868
+ class LlamaModel(LlamaPreTrainedModel):
869
+ """
870
+ Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`LlamaDecoderLayer`]
871
+
872
+ Args:
873
+ config: LlamaConfig
874
+ """
875
+
876
+ def __init__(self, config: LlamaConfig):
877
+ super().__init__(config)
878
+ self.padding_idx = config.pad_token_id
879
+ self.vocab_size = config.vocab_size
880
+
881
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
882
+ self.layers = nn.ModuleList(
883
+ [LlamaDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
884
+ )
885
+ self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
886
+ self.rotary_emb = LlamaRotaryEmbedding(config=config)
887
+ self.gradient_checkpointing = False
888
+
889
+ # Initialize weights and apply final processing
890
+ self.post_init()
891
+
892
+ def get_input_embeddings(self):
893
+ return self.embed_tokens
894
+
895
+ def set_input_embeddings(self, value):
896
+ self.embed_tokens = value
897
+
898
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
899
+ def forward(
900
+ self,
901
+ input_ids: torch.LongTensor = None,
902
+ attention_mask: Optional[torch.Tensor] = None,
903
+ position_ids: Optional[torch.LongTensor] = None,
904
+ past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
905
+ inputs_embeds: Optional[torch.FloatTensor] = None,
906
+ use_cache: Optional[bool] = None,
907
+ output_attentions: Optional[bool] = None,
908
+ output_hidden_states: Optional[bool] = None,
909
+ return_dict: Optional[bool] = None,
910
+ cache_position: Optional[torch.LongTensor] = None,
911
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
912
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
913
+ output_hidden_states = (
914
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
915
+ )
916
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
917
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
918
+
919
+ if (input_ids is None) ^ (inputs_embeds is not None):
920
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
921
+
922
+ if self.gradient_checkpointing and self.training and use_cache:
923
+ logger.warning_once(
924
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`."
925
+ )
926
+ use_cache = False
927
+
928
+ if inputs_embeds is None:
929
+ inputs_embeds = self.embed_tokens(input_ids)
930
+
931
+ # kept for BC (non `Cache` `past_key_values` inputs)
932
+ return_legacy_cache = False
933
+ if use_cache and not isinstance(past_key_values, Cache):
934
+ return_legacy_cache = True
935
+ if past_key_values is None:
936
+ past_key_values = DynamicCache()
937
+ else:
938
+ past_key_values = DynamicCache.from_legacy_cache(past_key_values)
939
+ logger.warning_once(
940
+ "We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and "
941
+ "will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class "
942
+ "(https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)"
943
+ )
944
+
945
+ if cache_position is None:
946
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
947
+ cache_position = torch.arange(
948
+ past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
949
+ )
950
+ if position_ids is None:
951
+ position_ids = cache_position.unsqueeze(0)
952
+
953
+ causal_mask = self._update_causal_mask(
954
+ attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
955
+ )
956
+ hidden_states = inputs_embeds
957
+
958
+ # create position embeddings to be shared across the decoder layers
959
+ position_embeddings = self.rotary_emb(hidden_states, position_ids)
960
+
961
+ # decoder layers
962
+ all_hidden_states = () if output_hidden_states else None
963
+ all_self_attns = () if output_attentions else None
964
+ next_decoder_cache = None
965
+
966
+ for decoder_layer in self.layers:
967
+ if output_hidden_states:
968
+ all_hidden_states += (hidden_states,)
969
+
970
+ if self.gradient_checkpointing and self.training:
971
+ layer_outputs = self._gradient_checkpointing_func(
972
+ decoder_layer.__call__,
973
+ hidden_states,
974
+ causal_mask,
975
+ position_ids,
976
+ past_key_values,
977
+ output_attentions,
978
+ use_cache,
979
+ cache_position,
980
+ position_embeddings,
981
+ )
982
+ else:
983
+ layer_outputs = decoder_layer(
984
+ hidden_states,
985
+ attention_mask=causal_mask,
986
+ position_ids=position_ids,
987
+ past_key_value=past_key_values,
988
+ output_attentions=output_attentions,
989
+ use_cache=use_cache,
990
+ cache_position=cache_position,
991
+ position_embeddings=position_embeddings,
992
+ )
993
+
994
+ hidden_states = layer_outputs[0]
995
+
996
+ if use_cache:
997
+ next_decoder_cache = layer_outputs[2 if output_attentions else 1]
998
+
999
+ if output_attentions:
1000
+ all_self_attns += (layer_outputs[1],)
1001
+
1002
+ hidden_states = self.norm(hidden_states)
1003
+
1004
+ # add hidden states from the last decoder layer
1005
+ if output_hidden_states:
1006
+ all_hidden_states += (hidden_states,)
1007
+
1008
+ next_cache = next_decoder_cache if use_cache else None
1009
+ if return_legacy_cache:
1010
+ next_cache = next_cache.to_legacy_cache()
1011
+
1012
+ if not return_dict:
1013
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
1014
+ return BaseModelOutputWithPast(
1015
+ last_hidden_state=hidden_states,
1016
+ past_key_values=next_cache,
1017
+ hidden_states=all_hidden_states,
1018
+ attentions=all_self_attns,
1019
+ )
1020
+
1021
+ def _update_causal_mask(
1022
+ self,
1023
+ attention_mask: torch.Tensor,
1024
+ input_tensor: torch.Tensor,
1025
+ cache_position: torch.Tensor,
1026
+ past_key_values: Cache,
1027
+ output_attentions: bool,
1028
+ ):
1029
+ if self.config._attn_implementation == "flash_attention_2":
1030
+ if attention_mask is not None and 0.0 in attention_mask:
1031
+ return attention_mask
1032
+ return None
1033
+
1034
+ # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
1035
+ # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
1036
+ # to infer the attention mask.
1037
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
1038
+ using_static_cache = isinstance(past_key_values, StaticCache)
1039
+
1040
+ # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
1041
+ if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
1042
+ if AttentionMaskConverter._ignore_causal_mask_sdpa(
1043
+ attention_mask,
1044
+ inputs_embeds=input_tensor,
1045
+ past_key_values_length=past_seen_tokens,
1046
+ is_training=self.training,
1047
+ ):
1048
+ return None
1049
+
1050
+ dtype, device = input_tensor.dtype, input_tensor.device
1051
+ sequence_length = input_tensor.shape[1]
1052
+ if using_static_cache:
1053
+ target_length = past_key_values.get_max_cache_shape()
1054
+ else:
1055
+ target_length = (
1056
+ attention_mask.shape[-1]
1057
+ if isinstance(attention_mask, torch.Tensor)
1058
+ else past_seen_tokens + sequence_length + 1
1059
+ )
1060
+
1061
+ # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
1062
+ causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
1063
+ attention_mask,
1064
+ sequence_length=sequence_length,
1065
+ target_length=target_length,
1066
+ dtype=dtype,
1067
+ device=device,
1068
+ cache_position=cache_position,
1069
+ batch_size=input_tensor.shape[0],
1070
+ )
1071
+
1072
+ if (
1073
+ self.config._attn_implementation == "sdpa"
1074
+ and attention_mask is not None
1075
+ and attention_mask.device.type == "cuda"
1076
+ and not output_attentions
1077
+ ):
1078
+ # Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
1079
+ # using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
1080
+ # Details: https://github.com/pytorch/pytorch/issues/110213
1081
+ min_dtype = torch.finfo(dtype).min
1082
+ causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
1083
+
1084
+ return causal_mask
1085
+
1086
+ @staticmethod
1087
+ def _prepare_4d_causal_attention_mask_with_cache_position(
1088
+ attention_mask: torch.Tensor,
1089
+ sequence_length: int,
1090
+ target_length: int,
1091
+ dtype: torch.dtype,
1092
+ device: torch.device,
1093
+ cache_position: torch.Tensor,
1094
+ batch_size: int,
1095
+ **kwargs,
1096
+ ):
1097
+ """
1098
+ Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
1099
+ `(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
1100
+
1101
+ Args:
1102
+ attention_mask (`torch.Tensor`):
1103
+ A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
1104
+ `(batch_size, 1, query_length, key_value_length)`.
1105
+ sequence_length (`int`):
1106
+ The sequence length being processed.
1107
+ target_length (`int`):
1108
+ The target length: when generating with static cache, the mask should be as long as the static cache,
1109
+ to account for the 0 padding, the part of the cache that is not filled yet.
1110
+ dtype (`torch.dtype`):
1111
+ The dtype to use for the 4D attention mask.
1112
+ device (`torch.device`):
1113
+ The device to plcae the 4D attention mask on.
1114
+ cache_position (`torch.Tensor`):
1115
+ Indices depicting the position of the input sequence tokens in the sequence.
1116
+ batch_size (`torch.Tensor`):
1117
+ Batch size.
1118
+ """
1119
+ if attention_mask is not None and attention_mask.dim() == 4:
1120
+ # In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
1121
+ causal_mask = attention_mask
1122
+ else:
1123
+ min_dtype = torch.finfo(dtype).min
1124
+ causal_mask = torch.full(
1125
+ (sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device
1126
+ )
1127
+ if sequence_length != 1:
1128
+ causal_mask = torch.triu(causal_mask, diagonal=1)
1129
+ causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1)
1130
+ causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
1131
+ if attention_mask is not None:
1132
+ causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
1133
+ mask_length = attention_mask.shape[-1]
1134
+ padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :]
1135
+ padding_mask = padding_mask == 0
1136
+ causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
1137
+ padding_mask, min_dtype
1138
+ )
1139
+
1140
+ return causal_mask
1141
+
1142
+
1143
+ class LlamaForCausalLM(LlamaPreTrainedModel, GenerationMixin):
1144
+ _tied_weights_keys = ["lm_head.weight"]
1145
+
1146
+ def __init__(self, config):
1147
+ super().__init__(config)
1148
+ self.model = LlamaModel(config)
1149
+ self.vocab_size = config.vocab_size
1150
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
1151
+
1152
+ # Initialize weights and apply final processing
1153
+ self.post_init()
1154
+
1155
+ def get_input_embeddings(self):
1156
+ return self.model.embed_tokens
1157
+
1158
+ def set_input_embeddings(self, value):
1159
+ self.model.embed_tokens = value
1160
+
1161
+ def get_output_embeddings(self):
1162
+ return self.lm_head
1163
+
1164
+ def set_output_embeddings(self, new_embeddings):
1165
+ self.lm_head = new_embeddings
1166
+
1167
+ def set_decoder(self, decoder):
1168
+ self.model = decoder
1169
+
1170
+ def get_decoder(self):
1171
+ return self.model
1172
+
1173
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
1174
+ @replace_return_docstrings(output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC)
1175
+ def forward(
1176
+ self,
1177
+ input_ids: torch.LongTensor = None,
1178
+ attention_mask: Optional[torch.Tensor] = None,
1179
+ position_ids: Optional[torch.LongTensor] = None,
1180
+ past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
1181
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1182
+ labels: Optional[torch.LongTensor] = None,
1183
+ use_cache: Optional[bool] = None,
1184
+ output_attentions: Optional[bool] = None,
1185
+ output_hidden_states: Optional[bool] = None,
1186
+ return_dict: Optional[bool] = None,
1187
+ cache_position: Optional[torch.LongTensor] = None,
1188
+ num_logits_to_keep: int = 0,
1189
+ **loss_kwargs,
1190
+ ) -> Union[Tuple, CausalLMOutputWithPast]:
1191
+ r"""
1192
+ Args:
1193
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1194
+ Labels for computing the masked language modeling loss. Indices should either be in `[0, ...,
1195
+ config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored
1196
+ (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`.
1197
+
1198
+ num_logits_to_keep (`int`, *optional*):
1199
+ Calculate logits for the last `num_logits_to_keep` tokens. If `0`, calculate logits for all
1200
+ `input_ids` (special case). Only last token logits are needed for generation, and calculating them only for that
1201
+ token can save memory, which becomes pretty significant for long sequences or large vocabulary size.
1202
+
1203
+ Returns:
1204
+
1205
+ Example:
1206
+
1207
+ ```python
1208
+ >>> from transformers import AutoTokenizer, LlamaForCausalLM
1209
+
1210
+ >>> model = LlamaForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
1211
+ >>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
1212
+
1213
+ >>> prompt = "Hey, are you conscious? Can you talk to me?"
1214
+ >>> inputs = tokenizer(prompt, return_tensors="pt")
1215
+
1216
+ >>> # Generate
1217
+ >>> generate_ids = model.generate(inputs.input_ids, max_length=30)
1218
+ >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
1219
+ "Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
1220
+ ```"""
1221
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
1222
+ output_hidden_states = (
1223
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
1224
+ )
1225
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1226
+
1227
+ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
1228
+ outputs = self.model(
1229
+ input_ids=input_ids,
1230
+ attention_mask=attention_mask,
1231
+ position_ids=position_ids,
1232
+ past_key_values=past_key_values,
1233
+ inputs_embeds=inputs_embeds,
1234
+ use_cache=use_cache,
1235
+ output_attentions=output_attentions,
1236
+ output_hidden_states=output_hidden_states,
1237
+ return_dict=return_dict,
1238
+ cache_position=cache_position,
1239
+ )
1240
+
1241
+ hidden_states = outputs[0]
1242
+ if self.config.pretraining_tp > 1:
1243
+ lm_head_slices = self.lm_head.weight.split(self.vocab_size // self.config.pretraining_tp, dim=0)
1244
+ logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range(self.config.pretraining_tp)]
1245
+ logits = torch.cat(logits, dim=-1)
1246
+ else:
1247
+ # Only compute necessary logits, and do not upcast them to float if we are not computing the loss
1248
+ logits = self.lm_head(hidden_states[:, -num_logits_to_keep:, :])
1249
+
1250
+ loss = None
1251
+ if labels is not None:
1252
+ loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **loss_kwargs)
1253
+
1254
+ if not return_dict:
1255
+ output = (logits,) + outputs[1:]
1256
+ return (loss,) + output if loss is not None else output
1257
+
1258
+ return CausalLMOutputWithPast(
1259
+ loss=loss,
1260
+ logits=logits,
1261
+ past_key_values=outputs.past_key_values,
1262
+ hidden_states=outputs.hidden_states,
1263
+ attentions=outputs.attentions,
1264
+ )
1265
+
1266
+
1267
+ @add_start_docstrings(
1268
+ """
1269
+ The LLaMa Model transformer with a sequence classification head on top (linear layer).
1270
+
1271
+ [`LlamaForSequenceClassification`] uses the last token in order to do the classification, as other causal models
1272
+ (e.g. GPT-2) do.
1273
+
1274
+ Since it does classification on the last token, it requires to know the position of the last token. If a
1275
+ `pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in each row. If
1276
+ no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot guess the
1277
+ padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last value in
1278
+ each row of the batch).
1279
+ """,
1280
+ LLAMA_START_DOCSTRING,
1281
+ )
1282
+ class LlamaForSequenceClassification(LlamaPreTrainedModel):
1283
+ def __init__(self, config):
1284
+ super().__init__(config)
1285
+ self.num_labels = config.num_labels
1286
+ self.model = LlamaModel(config)
1287
+ self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
1288
+
1289
+ # Initialize weights and apply final processing
1290
+ self.post_init()
1291
+
1292
+ def get_input_embeddings(self):
1293
+ return self.model.embed_tokens
1294
+
1295
+ def set_input_embeddings(self, value):
1296
+ self.model.embed_tokens = value
1297
+
1298
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
1299
+ def forward(
1300
+ self,
1301
+ input_ids: Optional[torch.LongTensor] = None,
1302
+ attention_mask: Optional[torch.Tensor] = None,
1303
+ position_ids: Optional[torch.LongTensor] = None,
1304
+ past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
1305
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1306
+ labels: Optional[torch.LongTensor] = None,
1307
+ use_cache: Optional[bool] = None,
1308
+ output_attentions: Optional[bool] = None,
1309
+ output_hidden_states: Optional[bool] = None,
1310
+ return_dict: Optional[bool] = None,
1311
+ ) -> Union[Tuple, SequenceClassifierOutputWithPast]:
1312
+ r"""
1313
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1314
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1315
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1316
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1317
+ """
1318
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1319
+
1320
+ transformer_outputs = self.model(
1321
+ input_ids,
1322
+ attention_mask=attention_mask,
1323
+ position_ids=position_ids,
1324
+ past_key_values=past_key_values,
1325
+ inputs_embeds=inputs_embeds,
1326
+ use_cache=use_cache,
1327
+ output_attentions=output_attentions,
1328
+ output_hidden_states=output_hidden_states,
1329
+ return_dict=return_dict,
1330
+ )
1331
+ hidden_states = transformer_outputs[0]
1332
+ logits = self.score(hidden_states)
1333
+
1334
+ if input_ids is not None:
1335
+ batch_size = input_ids.shape[0]
1336
+ else:
1337
+ batch_size = inputs_embeds.shape[0]
1338
+
1339
+ if self.config.pad_token_id is None and batch_size != 1:
1340
+ raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1341
+ if self.config.pad_token_id is None:
1342
+ sequence_lengths = -1
1343
+ else:
1344
+ if input_ids is not None:
1345
+ # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
1346
+ sequence_lengths = torch.eq(input_ids, self.config.pad_token_id).int().argmax(-1) - 1
1347
+ sequence_lengths = sequence_lengths % input_ids.shape[-1]
1348
+ sequence_lengths = sequence_lengths.to(logits.device)
1349
+ else:
1350
+ sequence_lengths = -1
1351
+
1352
+ pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
1353
+
1354
+ loss = None
1355
+ if labels is not None:
1356
+ loss = self.loss_function(logits=logits, labels=labels, pooled_logits=pooled_logits, config=self.config)
1357
+
1358
+ if not return_dict:
1359
+ output = (pooled_logits,) + transformer_outputs[1:]
1360
+ return ((loss,) + output) if loss is not None else output
1361
+
1362
+ return SequenceClassifierOutputWithPast(
1363
+ loss=loss,
1364
+ logits=pooled_logits,
1365
+ past_key_values=transformer_outputs.past_key_values,
1366
+ hidden_states=transformer_outputs.hidden_states,
1367
+ attentions=transformer_outputs.attentions,
1368
+ )
1369
+
1370
+
1371
+ @add_start_docstrings(
1372
+ """
1373
+ The Llama Model transformer with a span classification head on top for extractive question-answering tasks like
1374
+ SQuAD (a linear layer on top of the hidden-states output to compute `span start logits` and `span end logits`).
1375
+ """,
1376
+ LLAMA_START_DOCSTRING,
1377
+ )
1378
+ class LlamaForQuestionAnswering(LlamaPreTrainedModel):
1379
+ base_model_prefix = "transformer"
1380
+
1381
+ # Copied from transformers.models.bloom.modeling_bloom.BloomForQuestionAnswering.__init__ with Bloom->Llama
1382
+ def __init__(self, config):
1383
+ super().__init__(config)
1384
+ self.transformer = LlamaModel(config)
1385
+ self.qa_outputs = nn.Linear(config.hidden_size, 2)
1386
+
1387
+ # Initialize weights and apply final processing
1388
+ self.post_init()
1389
+
1390
+ def get_input_embeddings(self):
1391
+ return self.transformer.embed_tokens
1392
+
1393
+ def set_input_embeddings(self, value):
1394
+ self.transformer.embed_tokens = value
1395
+
1396
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
1397
+ def forward(
1398
+ self,
1399
+ input_ids: Optional[torch.LongTensor] = None,
1400
+ attention_mask: Optional[torch.FloatTensor] = None,
1401
+ position_ids: Optional[torch.LongTensor] = None,
1402
+ past_key_values: Optional[Union[Cache, List[torch.FloatTensor]]] = None,
1403
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1404
+ start_positions: Optional[torch.LongTensor] = None,
1405
+ end_positions: Optional[torch.LongTensor] = None,
1406
+ output_attentions: Optional[bool] = None,
1407
+ output_hidden_states: Optional[bool] = None,
1408
+ return_dict: Optional[bool] = None,
1409
+ **kwargs,
1410
+ ) -> Union[Tuple, QuestionAnsweringModelOutput]:
1411
+ r"""
1412
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1413
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1414
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1415
+ are not taken into account for computing the loss.
1416
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1417
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1418
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1419
+ are not taken into account for computing the loss.
1420
+ """
1421
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1422
+
1423
+ outputs = self.transformer(
1424
+ input_ids,
1425
+ attention_mask=attention_mask,
1426
+ position_ids=position_ids,
1427
+ past_key_values=past_key_values,
1428
+ inputs_embeds=inputs_embeds,
1429
+ output_attentions=output_attentions,
1430
+ output_hidden_states=output_hidden_states,
1431
+ return_dict=return_dict,
1432
+ )
1433
+
1434
+ sequence_output = outputs[0]
1435
+
1436
+ logits = self.qa_outputs(sequence_output)
1437
+ start_logits, end_logits = logits.split(1, dim=-1)
1438
+ start_logits = start_logits.squeeze(-1).contiguous()
1439
+ end_logits = end_logits.squeeze(-1).contiguous()
1440
+
1441
+ loss = None
1442
+ if start_positions is not None and end_positions is not None:
1443
+ loss = self.loss_function(start_logits, end_logits, start_positions, end_positions, **kwargs)
1444
+
1445
+ if not return_dict:
1446
+ output = (start_logits, end_logits) + outputs[2:]
1447
+ return ((loss,) + output) if loss is not None else output
1448
+
1449
+ return QuestionAnsweringModelOutput(
1450
+ loss=loss,
1451
+ start_logits=start_logits,
1452
+ end_logits=end_logits,
1453
+ hidden_states=outputs.hidden_states,
1454
+ attentions=outputs.attentions,
1455
+ )
1456
+
1457
+
1458
+ @add_start_docstrings(
1459
+ """
1460
+ The Llama Model transformer with a token classification head on top (a linear layer on top of the hidden-states
1461
+ output) e.g. for Named-Entity-Recognition (NER) tasks.
1462
+ """,
1463
+ LLAMA_START_DOCSTRING,
1464
+ )
1465
+ class LlamaForTokenClassification(LlamaPreTrainedModel):
1466
+ def __init__(self, config):
1467
+ super().__init__(config)
1468
+ self.num_labels = config.num_labels
1469
+ self.model = LlamaModel(config)
1470
+ if getattr(config, "classifier_dropout", None) is not None:
1471
+ classifier_dropout = config.classifier_dropout
1472
+ elif getattr(config, "hidden_dropout", None) is not None:
1473
+ classifier_dropout = config.hidden_dropout
1474
+ else:
1475
+ classifier_dropout = 0.1
1476
+ self.dropout = nn.Dropout(classifier_dropout)
1477
+ self.score = nn.Linear(config.hidden_size, config.num_labels)
1478
+
1479
+ # Initialize weights and apply final processing
1480
+ self.post_init()
1481
+
1482
+ def get_input_embeddings(self):
1483
+ return self.model.embed_tokens
1484
+
1485
+ def set_input_embeddings(self, value):
1486
+ self.model.embed_tokens = value
1487
+
1488
+ @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING)
1489
+ @add_code_sample_docstrings(
1490
+ checkpoint=_CHECKPOINT_FOR_DOC,
1491
+ output_type=TokenClassifierOutput,
1492
+ config_class=_CONFIG_FOR_DOC,
1493
+ )
1494
+ def forward(
1495
+ self,
1496
+ input_ids: Optional[torch.LongTensor] = None,
1497
+ attention_mask: Optional[torch.Tensor] = None,
1498
+ position_ids: Optional[torch.LongTensor] = None,
1499
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
1500
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1501
+ labels: Optional[torch.LongTensor] = None,
1502
+ use_cache: Optional[bool] = None,
1503
+ output_attentions: Optional[bool] = None,
1504
+ output_hidden_states: Optional[bool] = None,
1505
+ return_dict: Optional[bool] = None,
1506
+ ) -> Union[Tuple, TokenClassifierOutput]:
1507
+ r"""
1508
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1509
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1510
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1511
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1512
+ """
1513
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1514
+
1515
+ outputs = self.model(
1516
+ input_ids,
1517
+ attention_mask=attention_mask,
1518
+ position_ids=position_ids,
1519
+ past_key_values=past_key_values,
1520
+ inputs_embeds=inputs_embeds,
1521
+ use_cache=use_cache,
1522
+ output_attentions=output_attentions,
1523
+ output_hidden_states=output_hidden_states,
1524
+ return_dict=return_dict,
1525
+ )
1526
+ sequence_output = outputs[0]
1527
+ sequence_output = self.dropout(sequence_output)
1528
+ logits = self.score(sequence_output)
1529
+
1530
+ loss = None
1531
+ if labels is not None:
1532
+ loss = self.loss_function(logits, labels, self.config)
1533
+
1534
+ if not return_dict:
1535
+ output = (logits,) + outputs[2:]
1536
+ return ((loss,) + output) if loss is not None else output
1537
+
1538
+ return TokenClassifierOutput(
1539
+ loss=loss,
1540
+ logits=logits,
1541
+ hidden_states=outputs.hidden_states,
1542
+ attentions=outputs.attentions,
1543
+ )
special_tokens_map.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ }
16
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,2061 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|reserved_special_token_0|>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|reserved_special_token_1|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|finetune_right_pad_id|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_2|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|eom_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|python_tag|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_3|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_4|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_5|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_6|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_7|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_8|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_9|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_10|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_11|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_12|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_13|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_14|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_15|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_16|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_17|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_18|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_19|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_20|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_21|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_22|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_23|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_24|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_25|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_26|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_27|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_28|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_29|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_30|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_31|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_32|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_33|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_34|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_35|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_36|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_37|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_38|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_39|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_40|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_41|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_42|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_43|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_44|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_45|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_46|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_47|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_48|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_49|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_50|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_51|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_52|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_53|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_54|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_55|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_56|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_57|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_58|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_59|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_60|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_61|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_62|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_63|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_64|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_65|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_66|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_67|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_68|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_69|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_70|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_71|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_72|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_73|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_74|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_75|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_76|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_77|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_78|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_79|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_80|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_81|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_82|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_83|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_84|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_85|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_86|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_87|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_88|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_89|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_90|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_91|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_92|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_93|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_94|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_95|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_96|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_97|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_98|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_99|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_100|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_101|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_102|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_103|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_104|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_105|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_106|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_107|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_108|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_109|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_110|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_111|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_112|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_113|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_114|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_115|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_116|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_117|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_118|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_119|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_120|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_121|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_122|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_123|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_124|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_125|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_126|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_127|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_128|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_129|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_130|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_131|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_132|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_133|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_134|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_135|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_136|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_137|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_138|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_139|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_140|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_141|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_142|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_143|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_144|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_145|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_146|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_147|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_148|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_149|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_150|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_151|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_152|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_153|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_154|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_155|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_156|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_157|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_158|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_159|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_160|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_161|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_162|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_163|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_164|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_165|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_166|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_167|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_168|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_169|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_170|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_171|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_172|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_173|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_174|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_175|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_176|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_177|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_178|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_179|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_180|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_181|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_182|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_183|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_184|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_185|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_186|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_187|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_188|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_189|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_190|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_191|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_192|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_193|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_194|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_195|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_196|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_197|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_198|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_199|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_200|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_201|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_202|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_203|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_204|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_205|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_206|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_207|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_208|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_209|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_210|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_211|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_212|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_213|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_214|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_215|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_216|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_217|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_218|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_219|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_220|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_221|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_222|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_223|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_224|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_225|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_226|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_227|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_228|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_229|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_230|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_231|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_232|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_233|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_234|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_235|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_236|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_237|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_238|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_239|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_240|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_241|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_242|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_243|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_244|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_245|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_246|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_247|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|end_of_text|>",
2055
+ "model_input_names": [
2056
+ "input_ids",
2057
+ "attention_mask"
2058
+ ],
2059
+ "model_max_length": 131072,
2060
+ "tokenizer_class": "PreTrainedTokenizerFast"
2061
+ }