mlavkin commited on
Commit
9ff8b2f
·
verified ·
1 Parent(s): d205736

BF16 from MiniMaxAI/MiniMax-M2.7 (FP8 → BF16 blockwise dequant, exact)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +4 -0
  2. LICENSE +18 -0
  3. README.md +95 -0
  4. chat_template.jinja +159 -0
  5. config.json +100 -0
  6. configuration_minimax_m2.py +200 -0
  7. generation_config.json +9 -0
  8. merges.txt +0 -0
  9. model-00000-of-00130.safetensors +3 -0
  10. model-00001-of-00130.safetensors +3 -0
  11. model-00002-of-00130.safetensors +3 -0
  12. model-00003-of-00130.safetensors +3 -0
  13. model-00004-of-00130.safetensors +3 -0
  14. model-00005-of-00130.safetensors +3 -0
  15. model-00006-of-00130.safetensors +3 -0
  16. model-00007-of-00130.safetensors +3 -0
  17. model-00008-of-00130.safetensors +3 -0
  18. model-00009-of-00130.safetensors +3 -0
  19. model-00010-of-00130.safetensors +3 -0
  20. model-00011-of-00130.safetensors +3 -0
  21. model-00012-of-00130.safetensors +3 -0
  22. model-00013-of-00130.safetensors +3 -0
  23. model-00014-of-00130.safetensors +3 -0
  24. model-00015-of-00130.safetensors +3 -0
  25. model-00016-of-00130.safetensors +3 -0
  26. model-00017-of-00130.safetensors +3 -0
  27. model-00018-of-00130.safetensors +3 -0
  28. model-00019-of-00130.safetensors +3 -0
  29. model-00020-of-00130.safetensors +3 -0
  30. model-00021-of-00130.safetensors +3 -0
  31. model-00022-of-00130.safetensors +3 -0
  32. model-00023-of-00130.safetensors +3 -0
  33. model-00024-of-00130.safetensors +3 -0
  34. model-00025-of-00130.safetensors +3 -0
  35. model-00026-of-00130.safetensors +3 -0
  36. model-00027-of-00130.safetensors +3 -0
  37. model-00028-of-00130.safetensors +3 -0
  38. model-00029-of-00130.safetensors +3 -0
  39. model-00030-of-00130.safetensors +3 -0
  40. model-00031-of-00130.safetensors +3 -0
  41. model-00032-of-00130.safetensors +3 -0
  42. model-00033-of-00130.safetensors +3 -0
  43. model-00034-of-00130.safetensors +3 -0
  44. model-00035-of-00130.safetensors +3 -0
  45. model-00036-of-00130.safetensors +3 -0
  46. model-00037-of-00130.safetensors +3 -0
  47. model-00038-of-00130.safetensors +3 -0
  48. model-00039-of-00130.safetensors +3 -0
  49. model-00040-of-00130.safetensors +3 -0
  50. model-00041-of-00130.safetensors +3 -0
.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/agent_harness.png filter=lfs diff=lfs merge=lfs -text
37
+ figures/agent_teams.gif filter=lfs diff=lfs merge=lfs -text
38
+ figures/banner.png filter=lfs diff=lfs merge=lfs -text
39
+ figures/mle_bench.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NON-COMMERCIAL LICENSE
2
+ Non-commercial use permitted based on MIT-style terms; commercial use requires prior written authorization.
3
+ Copyright (c) 2026 MiniMax
4
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software for non-commercial purposes, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or provide copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
5
+ 1. The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
+ 2. If the Software (or any derivative works thereof) is used for any Commercial Use, you shall prominently display "Built with MiniMax M2.7" on a related website, user interface, blogpost, about page or product documentation.
7
+ 3. Any Commercial Use of the Software or any derivative work thereof is prohibited without obtaining a separate, prior written authorization from MiniMax. To request such authorization, please contact api@minimax.io with the subject line "M2.7 licensing".
8
+ 4. "Commercial Use" means any use of the Software or any derivative work thereof that is primarily intended for commercial advantage or monetary compensation, which includes, without limitation: (i) offering products or services to third parties for a fee, which utilize, incorporate, or rely on the Software or its derivatives, (ii) the commercial use of APIs provided by or for the Software or its derivatives, including to support or enable commercial products, services, or operations, whether in a cloud-based, hosted, or other similar environment, and (iii) the deployment or provision of the Software or its derivatives that have been subjected to post-training, fine-tuning, instruction-tuning, or any other form of modification, for any commercial purpose.
9
+ 5. Permitted Free Uses. The following uses are expressly permitted free of charge: (a) personal use, including self-hosted deployment for coding, development of applications, agents, tools, integrations, research, experimentation, or other personal purposes; (b) use by non-profit organizations, academic institutions, and researchers for non-commercial research or educational purposes; (c) modification of the Software solely for the uses described in (a) or (b) above.
10
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
11
+
12
+ Appendix: Prohibited Uses
13
+ You agree you will not use, or allow others to use, the Software or any derivatives of the Software to:
14
+ 1. Generate or disseminate content prohibited by applicable laws or regulations.
15
+ 2. Assist with, engage in or otherwise support any military purpose.
16
+ 3. Exploit, harm, or attempt to exploit or harm minors.
17
+ 4. Generate or disseminate false or misleading information with the intent to cause harm.
18
+ 5. Promote discrimination, hate speech, or harmful behavior against individuals or groups based on race or ethnic origin, religion, disability, age, nationality and national origin, veteran status, sexual orientation, gender or gender identity, caste, immigration status, or any other characteristic that is associated with systemic discrimination or marginalization.
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: minimax-license
4
+ license_link: https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE
5
+ base_model: MiniMaxAI/MiniMax-M2.7
6
+ base_model_relation: quantized
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ language:
10
+ - en
11
+ - zh
12
+ - ru
13
+ tags:
14
+ - minimax
15
+ - minimax-m2
16
+ - moe
17
+ - mixture-of-experts
18
+ - bf16
19
+ - dequantized
20
+ ---
21
+
22
+ # MiniMax-M2.7 — BF16 (dequantized from FP8)
23
+
24
+ Plain `bfloat16` weights of [MiniMaxAI/MiniMax-M2.7](https://huggingface.co/MiniMaxAI/MiniMax-M2.7),
25
+ reconstructed from the upstream block-FP8 (E4M3, 128×128 blocks) checkpoint
26
+ via shard-by-shard blockwise dequantization. **No calibration, no rounding loss
27
+ beyond the original FP8→BF16 cast** — every block is materialized exactly:
28
+
29
+ ```
30
+ bf16_block = (fp8_block.float() * scale_fp32).bfloat16()
31
+ ```
32
+
33
+ ## Why this exists
34
+
35
+ `MiniMaxAI/MiniMax-M2.7` ships natively in FP8. On Ampere and earlier
36
+ (e.g. RTX A5000) FP8 tensor cores don't exist and inference engines have
37
+ to emulate FP8 through FP16 — paying double bandwidth without the speed
38
+ benefit. For further offline quantization (AWQ, GPTQ, RTN INT8, …) you
39
+ need plain BF16 weights anyway: `transformers + torch_dtype=bfloat16`
40
+ won't materialize the attention projections under the FP8 quant config,
41
+ which trips up `llmcompressor`'s GPTQ tracer.
42
+
43
+ This repo is the missing intermediate: **upstream MiniMax-M2.7 weights in
44
+ plain BF16 safetensors**, ready to be fed into any standard quantization
45
+ pipeline.
46
+
47
+ ## Contents
48
+
49
+ - 47 shards `model-NNNNN-of-00047.safetensors`
50
+ - rebuilt `model.safetensors.index.json` (no `*.weight_scale_inv` entries)
51
+ - `config.json` with the upstream `quantization_config` stripped
52
+ - tokenizer + custom modeling `.py` files copied verbatim from the FP8 source
53
+
54
+ Total ≈ **458 GB**.
55
+
56
+ ## Provenance
57
+
58
+ Produced on a single 48 GB GPU pod (~30 minutes wall time) using a
59
+ ~150-line script — see
60
+ [`dequant_fp8_blockwise.py`](https://github.com/operationrange/zonatelecom-agent/blob/main/scripts/quant/dequant_fp8_blockwise.py).
61
+
62
+ Process per shard:
63
+
64
+ 1. open `model-XXXXX-of-00130.safetensors` from the FP8 source
65
+ 2. for each `*.weight` (FP8 e4m3fn): look up `*.weight_scale_inv` (FP32, 128×128)
66
+ 3. broadcast scale to weight shape, multiply, cast to BF16
67
+ 4. drop the scale tensor
68
+ 5. write `model-NNNNN-of-00047.safetensors` (5 GB shards)
69
+
70
+ Other tensors (embeddings, layer norms, MoE routers/gates that were already
71
+ unquantized in the upstream config's `modules_to_not_convert`) are passed
72
+ through with a BF16 cast.
73
+
74
+ ## Quick load
75
+
76
+ ```python
77
+ from transformers import AutoModelForCausalLM, AutoTokenizer
78
+
79
+ m = AutoModelForCausalLM.from_pretrained(
80
+ "operationrange/MiniMax-M2.7-BF16",
81
+ torch_dtype="bfloat16",
82
+ device_map="auto",
83
+ trust_remote_code=True,
84
+ )
85
+ tok = AutoTokenizer.from_pretrained("operationrange/MiniMax-M2.7-BF16", trust_remote_code=True)
86
+ ```
87
+
88
+ Inference at full BF16 needs ≥ ~470 GB combined GPU+CPU memory, so this
89
+ checkpoint is mostly intended as a starting point for further compression
90
+ (AWQ-INT4, GPTQ-INT8, etc.) rather than direct serving.
91
+
92
+ ## License
93
+
94
+ Inherits the [MiniMax-M2 license](https://huggingface.co/MiniMaxAI/MiniMax-M2.7/blob/main/LICENSE) from the
95
+ upstream model. No weights were modified — only the storage format.
chat_template.jinja ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {# ----------‑‑‑ special token variables ‑‑‑---------- #}
2
+ {%- set toolcall_begin_token = '<minimax:tool_call>' -%}
3
+ {%- set toolcall_end_token = '</minimax:tool_call>' -%}
4
+ {#- Tool Rendering Functions ============================================== -#}
5
+ {%- macro render_tool_namespace(namespace_name, tool_list) -%}
6
+ {%- for tool in tool_list -%}
7
+ <tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
8
+ {% endfor -%}
9
+ {%- endmacro -%}
10
+ {%- macro visible_text(content) -%}
11
+ {%- if content is string -%}
12
+ {{ content }}
13
+ {%- elif content is iterable and content is not mapping -%}
14
+ {%- for item in content -%}
15
+ {%- if item is mapping and item.type == 'text' -%}
16
+ {{- item.text }}
17
+ {%- elif item is string -%}
18
+ {{- item }}
19
+ {%- endif -%}
20
+ {%- endfor -%}
21
+ {%- else -%}
22
+ {{- content }}
23
+ {%- endif -%}
24
+ {%- endmacro -%}
25
+ {#- System Message Construction ============================================ -#}
26
+ {%- macro build_system_message(system_message) -%}
27
+ {%- if system_message and system_message.content -%}
28
+ {{- visible_text(system_message.content) }}
29
+ {%- else -%}
30
+ {%- if model_identity is not defined -%}
31
+ {%- set model_identity = "You are a helpful assistant. Your name is MiniMax-M2.7 and is built by MiniMax." -%}
32
+ {%- endif -%}
33
+ {{- model_identity }}
34
+ {%- endif -%}
35
+
36
+ {#- Handle current_date -#}
37
+ {%- if system_message and system_message.current_date -%}
38
+ {{- '\n' ~ 'Current date: ' + system_message.current_date }}
39
+ {%- endif -%}
40
+ {#- Handle current_location -#}
41
+ {%- if system_message and system_message.current_location -%}
42
+ {{- '\n' ~ 'Current location: ' + system_message.current_location }}
43
+ {%- endif -%}
44
+ {%- endmacro -%}
45
+ {#- Main Template Logic ================================================= -#}
46
+ {#- Extract system message (only first message if it's system) -#}
47
+ {%- set system_message = none -%}
48
+ {%- set conversation_messages = messages -%}
49
+ {%- if messages and messages[0].role == "system" -%}
50
+ {%- set system_message = messages[0] -%}
51
+ {%- set conversation_messages = messages[1:] -%}
52
+ {%- endif -%}
53
+ {#- Get the last user message turn, for interleved thinking -#}
54
+ {%- set ns = namespace(last_user_index=-1) %}
55
+ {% for m in conversation_messages %}
56
+ {%- if m.role == 'user' %}
57
+ {% set ns.last_user_index = loop.index0 -%}
58
+ {%- endif %}
59
+ {%- endfor %}
60
+ {#- Render system message -#}
61
+ {{- ']~!b[' ~ ']~b]system' ~ '\n' }}
62
+ {{- build_system_message(system_message) }}
63
+ {#- Render tools if available -#}
64
+ {%- if tools -%}
65
+ {{- '\n\n' ~ '# Tools' ~ '\n' ~ 'You may call one or more tools to assist with the user query.\nHere are the tools available in JSONSchema format:' ~ '\n' }}
66
+ {{- '\n' ~ '<tools>' ~ '\n' }}
67
+ {{- render_tool_namespace("functions", tools) }}
68
+ {{- '</tools>' ~ '\n\n' }}
69
+ {{- 'When making tool calls, use XML format to invoke tools and pass parameters:' ~ '\n' }}
70
+ {{- '\n' ~ toolcall_begin_token }}
71
+ <invoke name="tool-name-1">
72
+ <parameter name="param-key-1">param-value-1</parameter>
73
+ <parameter name="param-key-2">param-value-2</parameter>
74
+ ...
75
+ </invoke>
76
+ {{- '\n' ~ toolcall_end_token }}
77
+ {%- endif -%}
78
+ {{- '[e~[\n' }}
79
+
80
+ {#- Render messages -#}
81
+ {%- set last_tool_call = namespace(name=none) -%}
82
+ {%- for message in conversation_messages -%}
83
+ {%- if message.role == 'assistant' -%}
84
+ {#- Only render reasoning_content if no user message follows -#}
85
+ {{- ']~b]ai' ~ '\n' }}
86
+
87
+ {%- set reasoning_content = '' %}
88
+ {%- set content = visible_text(message.content) %}
89
+ {%- if message.reasoning_content is string %}
90
+ {%- set reasoning_content = message.reasoning_content %}
91
+ {%- else %}
92
+ {%- if '</think>' in content %}
93
+ {%- set reasoning_content = content.split('</think>')[0].strip('\n').split('<think>')[-1].strip('\n') %}
94
+ {%- set content = content.split('</think>')[-1].strip('\n') %}
95
+ {%- endif %}
96
+ {%- endif %}
97
+ {%- if reasoning_content and loop.index0 > ns.last_user_index -%}
98
+ {{- '<think>' ~ '\n' ~ reasoning_content ~ '\n' ~ '</think>' ~ '\n\n' }}
99
+ {%- endif -%}
100
+ {%- if content -%}
101
+ {{- content }}
102
+ {%- endif -%}
103
+ {%- if message.tool_calls -%}
104
+ {{- '\n' ~ toolcall_begin_token ~ '\n' }}
105
+
106
+ {%- for tool_call in message.tool_calls -%}
107
+ {%- if tool_call.function %}
108
+ {%- set tool_call = tool_call.function %}
109
+ {%- endif %}
110
+ {{- '<invoke name="' + tool_call.name + '">' }}
111
+ {% set _args = tool_call.arguments %}
112
+ {%- for k, v in _args.items() %}
113
+ {{- '<parameter name="' + k + '">' }}
114
+ {{- v | tojson(ensure_ascii=False) if v is not string else v }}
115
+ {{- '</parameter>' }}
116
+ {% endfor %}
117
+ {{- '</invoke>' ~ '\n' }}
118
+ {%- endfor -%}
119
+
120
+ {{- toolcall_end_token}}
121
+ {%- set last_tool_call.name = message.tool_calls[-1].name -%}
122
+ {%- else -%}
123
+ {%- set last_tool_call.name = none -%}
124
+ {%- endif -%}
125
+ {{- '[e~[' ~ '\n' }}
126
+
127
+ {%- elif message.role == 'tool' -%}
128
+ {%- if last_tool_call.name is none -%}
129
+ {{- raise_exception("Message has tool role, but there was no previous assistant message with a tool call!") }}
130
+ {%- endif -%}
131
+ {%- if loop.first or (conversation_messages[loop.index0 - 1].role != 'tool') -%}
132
+ {{- ']~b]tool' }}
133
+ {%- endif -%}
134
+ {%- if message.content is string -%}
135
+ {{- '\n<response>' }}
136
+ {{- message.content }}
137
+ {{- '</response>' }}
138
+ {%- else -%}
139
+ {%- for tr in message.content -%}
140
+ {{- '\n<response>' }}
141
+ {{- tr.output if tr.output is defined else (tr.text if tr.type == 'text' and tr.text is defined else tr) }}
142
+ {{- '\n</response>' }}
143
+ {%- endfor -%}
144
+ {%- endif -%}
145
+ {%- if loop.last or (conversation_messages[loop.index0 + 1].role != 'tool') -%}
146
+ {{- '[e~[\n' -}}
147
+ {%- endif -%}
148
+
149
+ {%- elif message.role == 'user' -%}
150
+ {{- ']~b]user' ~ '\n' }}
151
+ {{- visible_text(message.content) }}
152
+ {{- '[e~[' ~ '\n' }}
153
+ {%- endif -%}
154
+ {%- endfor -%}
155
+
156
+ {#- Generation prompt -#}
157
+ {%- if add_generation_prompt -%}
158
+ {{- ']~b]ai' ~ '\n' ~ '<think>' ~ '\n' }}
159
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MiniMaxM2ForCausalLM"
4
+ ],
5
+ "attn_type_list": [
6
+ 1,
7
+ 1,
8
+ 1,
9
+ 1,
10
+ 1,
11
+ 1,
12
+ 1,
13
+ 1,
14
+ 1,
15
+ 1,
16
+ 1,
17
+ 1,
18
+ 1,
19
+ 1,
20
+ 1,
21
+ 1,
22
+ 1,
23
+ 1,
24
+ 1,
25
+ 1,
26
+ 1,
27
+ 1,
28
+ 1,
29
+ 1,
30
+ 1,
31
+ 1,
32
+ 1,
33
+ 1,
34
+ 1,
35
+ 1,
36
+ 1,
37
+ 1,
38
+ 1,
39
+ 1,
40
+ 1,
41
+ 1,
42
+ 1,
43
+ 1,
44
+ 1,
45
+ 1,
46
+ 1,
47
+ 1,
48
+ 1,
49
+ 1,
50
+ 1,
51
+ 1,
52
+ 1,
53
+ 1,
54
+ 1,
55
+ 1,
56
+ 1,
57
+ 1,
58
+ 1,
59
+ 1,
60
+ 1,
61
+ 1,
62
+ 1,
63
+ 1,
64
+ 1,
65
+ 1,
66
+ 1,
67
+ 1
68
+ ],
69
+ "auto_map": {
70
+ "AutoConfig": "configuration_minimax_m2.MiniMaxM2Config",
71
+ "AutoModelForCausalLM": "modeling_minimax_m2.MiniMaxM2ForCausalLM"
72
+ },
73
+ "dtype": "bfloat16",
74
+ "head_dim": 128,
75
+ "hidden_act": "silu",
76
+ "hidden_size": 3072,
77
+ "intermediate_size": 1536,
78
+ "max_position_embeddings": 204800,
79
+ "model_type": "minimax_m2",
80
+ "mtp_transformer_layers": 1,
81
+ "num_attention_heads": 48,
82
+ "num_experts_per_tok": 8,
83
+ "num_hidden_layers": 62,
84
+ "num_key_value_heads": 8,
85
+ "num_local_experts": 256,
86
+ "num_mtp_modules": 3,
87
+ "qk_norm_type": "per_layer",
88
+ "rms_norm_eps": 1e-06,
89
+ "rope_theta": 5000000,
90
+ "rotary_dim": 64,
91
+ "scoring_func": "sigmoid",
92
+ "shared_intermediate_size": 0,
93
+ "tie_word_embeddings": false,
94
+ "transformers_version": "4.46.1",
95
+ "use_cache": true,
96
+ "use_mtp": true,
97
+ "use_qk_norm": true,
98
+ "use_routing_bias": true,
99
+ "vocab_size": 200064
100
+ }
configuration_minimax_m2.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/minimax_m2/modular_minimax_m2.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_minimax_m2.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 the HuggingFace Team. All rights reserved.
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+
22
+
23
+ from transformers.configuration_utils import PretrainedConfig
24
+
25
+
26
+ class MiniMaxM2Config(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a [`MiniMaxM2Model`]. It is used to instantiate an
29
+ MiniMaxM2 model according to the specified arguments, defining the model architecture. Instantiating a configuration
30
+ with the defaults will yield a similar configuration to that of the MiniMaxM2-7B-v0.1 or MiniMaxM2-7B-Instruct-v0.1.
31
+
32
+ [minimax_m2ai/MiniMaxM2-8x7B](https://huggingface.co/minimax_m2ai/MiniMaxM2-8x7B)
33
+ [minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1](https://huggingface.co/minimax_m2ai/MiniMaxM2-7B-Instruct-v0.1)
34
+
35
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
36
+ documentation from [`PretrainedConfig`] for more information.
37
+
38
+
39
+ Args:
40
+ vocab_size (`int`, *optional*, defaults to 32000):
41
+ Vocabulary size of the MiniMaxM2 model. Defines the number of different tokens that can be represented by the
42
+ `inputs_ids` passed when calling [`MiniMaxM2Model`]
43
+ hidden_size (`int`, *optional*, defaults to 4096):
44
+ Dimension of the hidden representations.
45
+ intermediate_size (`int`, *optional*, defaults to 14336):
46
+ Dimension of the MLP representations.
47
+ num_hidden_layers (`int`, *optional*, defaults to 32):
48
+ Number of hidden layers in the Transformer encoder.
49
+ num_attention_heads (`int`, *optional*, defaults to 32):
50
+ Number of attention heads for each attention layer in the Transformer encoder.
51
+ num_key_value_heads (`int`, *optional*, defaults to 8):
52
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
53
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
54
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
55
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
56
+ by meanpooling all the original heads within that group. For more details, check out [this
57
+ paper](https://huggingface.co/papers/2305.13245). If it is not specified, will default to `8`.
58
+ head_dim (`int`, *optional*, defaults to `hidden_size // num_attention_heads`):
59
+ The attention head dimension.
60
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
61
+ The non-linear activation function (function or string) in the decoder.
62
+ max_position_embeddings (`int`, *optional*, defaults to `4096*32`):
63
+ The maximum sequence length that this model might ever be used with. MiniMaxM2's sliding window attention
64
+ allows sequence of up to 4096*32 tokens.
65
+ initializer_range (`float`, *optional*, defaults to 0.02):
66
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
67
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
68
+ The epsilon used by the rms normalization layers.
69
+ use_cache (`bool`, *optional*, defaults to `True`):
70
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
71
+ relevant if `config.is_decoder=True`.
72
+ pad_token_id (`int`, *optional*):
73
+ The id of the padding token.
74
+ bos_token_id (`int`, *optional*, defaults to 1):
75
+ The id of the "beginning-of-sequence" token.
76
+ eos_token_id (`int`, *optional*, defaults to 2):
77
+ The id of the "end-of-sequence" token.
78
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
79
+ Whether the model's input and output word embeddings should be tied.
80
+ rope_theta (`float`, *optional*, defaults to 1000000.0):
81
+ The base period of the RoPE embeddings.
82
+ sliding_window (`int`, *optional*):
83
+ Sliding window attention window size. If not specified, will default to `4096`.
84
+ attention_dropout (`float`, *optional*, defaults to 0.0):
85
+ The dropout ratio for the attention probabilities.
86
+ num_experts_per_tok (`int`, *optional*, defaults to 2):
87
+ The number of experts to route per-token, can be also interpreted as the `top-k` routing
88
+ parameter
89
+ num_local_experts (`int`, *optional*, defaults to 8):
90
+ Number of experts per Sparse MLP layer.
91
+ output_router_logits (`bool`, *optional*, defaults to `False`):
92
+ Whether or not the router logits should be returned by the model. Enabling this will also
93
+ allow the model to output the auxiliary loss. See [here]() for more details
94
+ router_aux_loss_coef (`float`, *optional*, defaults to 0.001):
95
+ The aux loss factor for the total loss.
96
+ router_jitter_noise (`float`, *optional*, defaults to 0.0):
97
+ Amount of noise to add to the router.
98
+
99
+ ```python
100
+ >>> from transformers import MiniMaxM2Model, MiniMaxM2Config
101
+
102
+ >>> # Initializing a MiniMaxM2 7B style configuration
103
+ >>> configuration = MiniMaxM2Config()
104
+
105
+ >>> # Initializing a model from the MiniMaxM2 7B style configuration
106
+ >>> model = MiniMaxM2Model(configuration)
107
+
108
+ >>> # Accessing the model configuration
109
+ >>> configuration = model.config
110
+ ```"""
111
+
112
+ model_type = "minimax_m2"
113
+ keys_to_ignore_at_inference = ["past_key_values"]
114
+ base_model_tp_plan = {
115
+ "layers.*.self_attn.q_proj": "colwise",
116
+ "layers.*.self_attn.k_proj": "colwise",
117
+ "layers.*.self_attn.v_proj": "colwise",
118
+ "layers.*.self_attn.o_proj": "rowwise",
119
+ "layers.*.block_sparse_moe.gate": "colwise_rep", # we need to replicate here to correctly route experts
120
+ "layers.*.block_sparse_moe.experts.*.w1": "colwise",
121
+ "layers.*.block_sparse_moe.experts.*.w2": "rowwise",
122
+ "layers.*.block_sparse_moe.experts.*.w3": "colwise",
123
+ }
124
+ base_model_pp_plan = {
125
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
126
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
127
+ "norm": (["hidden_states"], ["hidden_states"]),
128
+ }
129
+
130
+ def __init__(
131
+ self,
132
+ vocab_size=32000,
133
+ hidden_size=4096,
134
+ intermediate_size=14336,
135
+ num_hidden_layers=32,
136
+ num_attention_heads=32,
137
+ num_key_value_heads=8,
138
+ head_dim=None,
139
+ hidden_act="silu",
140
+ max_position_embeddings=4096 * 32,
141
+ initializer_range=0.02,
142
+ rms_norm_eps=1e-5,
143
+ use_cache=True,
144
+ pad_token_id=None,
145
+ bos_token_id=1,
146
+ eos_token_id=2,
147
+ tie_word_embeddings=False,
148
+ rope_theta=1e6,
149
+ sliding_window=None,
150
+ attention_dropout=0.0,
151
+ num_experts_per_tok=2,
152
+ num_local_experts=8,
153
+ output_router_logits=False,
154
+ router_aux_loss_coef=0.001,
155
+ router_jitter_noise=0.0,
156
+ **kwargs,
157
+ ):
158
+ self.vocab_size = vocab_size
159
+ self.max_position_embeddings = max_position_embeddings
160
+ self.hidden_size = hidden_size
161
+ self.intermediate_size = intermediate_size
162
+ self.num_hidden_layers = num_hidden_layers
163
+ self.num_attention_heads = num_attention_heads
164
+ self.sliding_window = sliding_window
165
+
166
+ # for backward compatibility
167
+ if num_key_value_heads is None:
168
+ num_key_value_heads = num_attention_heads
169
+
170
+ self.num_key_value_heads = num_key_value_heads
171
+ self.hidden_act = hidden_act
172
+ self.initializer_range = initializer_range
173
+ self.rms_norm_eps = rms_norm_eps
174
+ self.use_cache = use_cache
175
+ self.rope_theta = rope_theta
176
+ self.attention_dropout = attention_dropout
177
+ self.head_dim = head_dim
178
+
179
+ self.num_experts_per_tok = num_experts_per_tok
180
+ self.num_local_experts = num_local_experts
181
+ self.output_router_logits = output_router_logits
182
+ self.router_aux_loss_coef = router_aux_loss_coef
183
+ self.router_jitter_noise = router_jitter_noise
184
+
185
+ self.use_qk_norm = kwargs.pop("use_qk_norm", False)
186
+ self.rotary_dim = kwargs.pop("rotary_dim", self.head_dim)
187
+ self.partial_rotary_factor = kwargs.pop("partial_rotary_factor", 1)
188
+ if self.head_dim is not None:
189
+ self.partial_rotary_factor = self.rotary_dim / self.head_dim
190
+
191
+ super().__init__(
192
+ pad_token_id=pad_token_id,
193
+ bos_token_id=bos_token_id,
194
+ eos_token_id=eos_token_id,
195
+ tie_word_embeddings=tie_word_embeddings,
196
+ **kwargs,
197
+ )
198
+
199
+
200
+ __all__ = ["MiniMaxM2Config"]
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 200019,
3
+ "do_sample": true,
4
+ "eos_token_id": 200020,
5
+ "temperature": 1.0,
6
+ "top_p": 0.95,
7
+ "top_k": 40,
8
+ "transformers_version": "4.46.1"
9
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00000-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a6bdb3acc9a40f4048287376148f2653091c0320ff4b8676be493661bab51331
3
+ size 6150780488
model-00001-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7419c4c4cfc49bc42ccea34b18e6e10868ab809566b23db27d00c9a90972f62e
3
+ size 2415952592
model-00002-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9921692ddfbe03c5fd2a64cbddb3056278152b8b04f2027806be55565d0f0100
3
+ size 4921586920
model-00003-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15c20829dab95331eaca22d73bf982c27af5f6f4ee296487f693089f14c9381c
3
+ size 2415952592
model-00004-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1fbb4aab1709e6157757c429fe81af89044472e614011cb0126c0768fa07b9f
3
+ size 4921586920
model-00005-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5226e66474a373786d302a0d5476fb22ec1dd06d646662b5489d91375193354
3
+ size 2415952592
model-00006-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c19b00b01c504b5ed7c54734cd2674f5991b6ccede6e3dabf4c5e6470d0b26fc
3
+ size 4921586920
model-00007-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c92796f9ceb818b0b99e2555745b601a4ea675911d3771d95378c707536e854a
3
+ size 2415952592
model-00008-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adff9f393f6cb79fa5f63137def18e47f45e8c7e8963ca484f4f6fe8a740aa3f
3
+ size 4921586920
model-00009-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f941c207dea0029b5a67a1ec9daddf1032604729dedfce18ea034d95860f92a
3
+ size 2415952592
model-00010-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82d20209f1a39f7144d4d5d9377c0184d2b8bbfa17f4811338283a38c2ea67dc
3
+ size 4921586920
model-00011-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cdf9c912b0adf7ae3caab22d47fedcfd5f8a8c37d233de4c080b4b09985be742
3
+ size 2415952592
model-00012-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:242514dff28a5ab8a593aba0607512b6a94e53f2ba6f7badfbef91ccd2e7a52d
3
+ size 4921586920
model-00013-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c4874abdcba687dd6d59d48199f19abc4ef871b28eaf89acc1e9bf2bbb4bdab
3
+ size 2415952592
model-00014-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d7bb0fa02b6ca6f34807f086a2fdac2bdc51cda7e9b47f51b4c4afd83370832
3
+ size 4921586920
model-00015-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bc35bc417b8800dff5e6df21c38e6c3a92d276be5b6bee5c8683489515d2032
3
+ size 2415952592
model-00016-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49d86930ef278d5fc6d253ff5201cb76401e2220ed26fca7a39b32c61c5a5506
3
+ size 4921586920
model-00017-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab165541d9f168197328fdc4bad7972349711c89ba329f119d51c233cd1c4f36
3
+ size 2415952592
model-00018-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d25162dd8a5208962373b69d4f2967fe2d72c464a74483f59494c7830c744df
3
+ size 4921586920
model-00019-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07bd87d28dd83a66b38cf8e607a09f0a2d1898e190fee637f0fed0a85ad76988
3
+ size 2415952592
model-00020-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e399a3ff34e4223c23d7e5ab3447d34c456cfa46adf049a5951a260a5c328295
3
+ size 4921587440
model-00021-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef24d6b9c20d15edc626bbc346f46d4276444e92cfae0523f8ea0c8fb2b852a5
3
+ size 2415952848
model-00022-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c01102007cbb381892cdb1ffddac26c56a7aa1dd2758377445e71f2f1f449dc
3
+ size 4921587440
model-00023-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:185fea902110d1d4392b8ad29e94bf1f5bb3fbc973d0e3f283197801c64e8a3b
3
+ size 2415952848
model-00024-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a803530f9b01189ba67449bd5c80f929e8d8411519b17f3be7cd0b2569d21fc5
3
+ size 4921587440
model-00025-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45e1f62799cfe403472a2ed9ab3d734aaa912e38badeb6376f84a74ad49b63f8
3
+ size 2415952848
model-00026-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d3ef7f65f08065bb6c58bc62a2d767286c502d42101c1dc5f5985ae3cd24865
3
+ size 4921587440
model-00027-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:223fa0f2a0cff7a5d0011487d1d2addd104c890efcbf04621e8ba5e6b428c810
3
+ size 2415952848
model-00028-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51b8cad4f2f48dc1fa726c70b5293caf7dacb4dfc99e2e055e2f7487025efba2
3
+ size 4921587440
model-00029-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:885fc83f7126cb72e078d4870c1af193577c8c4e45c2e56b54f4f9117fd5d792
3
+ size 2415952848
model-00030-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:297adb4bda27dfbd4e6c64badf21cc5e85213abac09613b2f6f9aa2dd5f7a957
3
+ size 4921587440
model-00031-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:232b50813d0fa718d01ef56a9be8888eed79e15e91b9bf978a16a930667b0159
3
+ size 2415952848
model-00032-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30f4ebd8e4b1df38e9a69fdcce8216f7cef502874f74242c0286ff75ea69c9ed
3
+ size 4921587440
model-00033-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e9519844cb2a92a7c99790518d01436e074efcd7f878a7d05b40212df7e850c
3
+ size 2415952848
model-00034-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7d247b20113672e2cf658b28463b5160fa25718b7ff439a5ff1400aa1d66d6b
3
+ size 4921587440
model-00035-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c72881de05143dbc7059548c566faada73311c1f6171d0a783472b062e0f9a14
3
+ size 2415952848
model-00036-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed64e1421ab8625efd597875c290f09fd15a125f8645acbeae06c4308a961f92
3
+ size 4921587440
model-00037-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71cb120d31a1a65985b708fa97f4d0b8b1a0b7316d883148f40efb028d7a597c
3
+ size 2415952848
model-00038-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56789a80ac8970477a229d6674061c453fcbbc7c3226a88346ee38605597d516
3
+ size 4921587440
model-00039-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fadfd40a31aff82f7f89c1d56571da39abce28bdbcb1ba54832fec68033645e2
3
+ size 2415952848
model-00040-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:061abc7e57ae12d7c5b0c20c859837bad230e24c6d2c6217f5d8f62d48a04f8e
3
+ size 4921587440
model-00041-of-00130.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0a8f4e03827310b4a7ede4cfb48fe2d139f5d3370d0c1e4af8e1a48f6a1c9852
3
+ size 2415952848