mconcat commited on
Commit
13be8a1
·
verified ·
1 Parent(s): 02e0322

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: tensorrt_llm
3
+ base_model: arcee-ai/Trinity-Large-TrueBase
4
+ tags:
5
+ - nvidia
6
+ - nvfp4
7
+ - fp4
8
+ - quantized
9
+ - tensorrt-llm
10
+ - modelopt
11
+ - mixture-of-experts
12
+ - moe
13
+ - blackwell
14
+ license: other
15
+ license_name: same-as-base-model
16
+ license_link: https://huggingface.co/arcee-ai/Trinity-Large-TrueBase
17
+ ---
18
+
19
+ # Trinity-Large-TrueBase-NVFP4
20
+
21
+ NVFP4-quantized version of [arcee-ai/Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase) for deployment on NVIDIA Blackwell GPUs via TensorRT-LLM.
22
+
23
+ ## Model Details
24
+
25
+ | | |
26
+ |---|---|
27
+ | **Base model** | [arcee-ai/Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase) |
28
+ | **Architecture** | AfmoeForCausalLM (Mixture-of-Experts) |
29
+ | **Parameters** | 398B total |
30
+ | **Layers** | 60 (6 dense + 54 MoE) |
31
+ | **Experts** | 256 per MoE layer, 4 active per token, 1 shared expert |
32
+ | **Hidden size** | 3072 |
33
+ | **MoE intermediate size** | 3072 per expert |
34
+ | **Dense intermediate size** | 12,288 |
35
+ | **Attention** | 48 heads, 8 KV heads (GQA), sliding window (4096) + full attention every 4 layers |
36
+ | **Context length** | 8,192 tokens |
37
+ | **Vocabulary** | 200,192 tokens |
38
+
39
+ ## Quantization
40
+
41
+ | | |
42
+ |---|---|
43
+ | **Method** | NVFP4 (4-bit floating point) |
44
+ | **Tool** | [NVIDIA ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) 0.41.0 |
45
+ | **Group size** | 16 |
46
+ | **Calibration** | 512 samples (Korean, Code, Creative Writing, English), max_seq_length=512 |
47
+ | **Quantized layers** | MLP/expert weights only (`gate_proj`, `up_proj`, `down_proj` in dense and MoE layers) |
48
+ | **BF16 layers** | Attention (Q/K/V/O projections), embeddings, router gates, shared experts, layer norms, lm_head |
49
+ | **Source precision** | BF16 |
50
+
51
+ ### Compression
52
+
53
+ | Format | Size |
54
+ |--------|------|
55
+ | BF16 (original) | 796 GB |
56
+ | **NVFP4 (this model)** | **216 GB** |
57
+
58
+ 3.7x compression.
59
+
60
+ ## Intended Use
61
+
62
+ This checkpoint is intended for deployment on NVIDIA Blackwell (SM100) GPUs using TensorRT-LLM's NVFP4 inference path. The NVFP4 format requires Blackwell's 5th-generation Tensor Cores for native FP4 execution.
63
+
64
+ ### Loading with TensorRT-LLM
65
+
66
+ ```bash
67
+ # Convert to TensorRT-LLM engine
68
+ trtllm-build \
69
+ --checkpoint_dir ./Trinity-Large-TrueBase-NVFP4 \
70
+ --output_dir ./engine \
71
+ --gemm_plugin auto
72
+ ```
73
+
74
+ ## Quantization Recipe
75
+
76
+ Following NVIDIA's MLP-only quantization strategy (similar to the [DeepSeek-R1 NVFP4 recipe](https://developer.nvidia.com/blog/nvidia-publishes-the-first-deepseek-r1-nvfp4-quantized-model/)):
77
+
78
+ - Only MLP/expert weights (`gate_proj`, `up_proj`, `down_proj`) are quantized to FP4
79
+ - All attention projections remain in BF16 to preserve quality
80
+ - Router gates (`mlp.router`) remain in BF16
81
+ - Embeddings and lm_head remain in BF16
82
+ - The default `*mlp.gate.*` exclusion was removed because Trinity uses `mlp.gate_proj` as a standard MLP projection (not a routing gate)
83
+
84
+ ### Calibration Data
85
+
86
+ | Domain | Samples | Dataset |
87
+ |--------|---------|---------|
88
+ | Korean | 128 | [heegyu/open-korean-instructions](https://huggingface.co/datasets/heegyu/open-korean-instructions) |
89
+ | Code | 128 | [m-a-p/CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) |
90
+ | Creative Writing | 128 | [Gryphe/ChatGPT-4o-Writing-Prompts](https://huggingface.co/datasets/Gryphe/ChatGPT-4o-Writing-Prompts) |
91
+ | General English | 128 | [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) |
92
+
93
+ ## Files
94
+
95
+ | File | Description |
96
+ |------|-------------|
97
+ | `model-00001-of-00005.safetensors` ... `model-00005-of-00005.safetensors` | Quantized model weights (5 shards, ~43 GB each) |
98
+ | `model.safetensors.index.json` | Weight shard index |
99
+ | `config.json` | Model configuration with `quantization_config` |
100
+ | `hf_quant_config.json` | ModelOpt quantization metadata (consumed by TensorRT-LLM) |
101
+ | `generation_config.json` | Generation configuration |
102
+ | `tokenizer.json` | Tokenizer |
103
+ | `tokenizer_config.json` | Tokenizer configuration |
104
+ | `chat_template.jinja` | Chat template |
105
+
106
+ ## Hardware
107
+
108
+ Quantization was performed on 8x NVIDIA A100-SXM4-80GB with ~1.8 TiB system RAM. Total quantization time was approximately 9 hours (dominated by calibration forward passes). Quantization on A100 does not require Blackwell hardware; only inference with native FP4 execution does.
109
+
110
+ ## Limitations
111
+
112
+ - Requires NVIDIA Blackwell GPUs (SM100) for native NVFP4 inference via TensorRT-LLM
113
+ - Quality may differ from the original BF16 model, particularly on tasks sensitive to numerical precision
114
+ - Calibration was bilingual (Korean + English) with code; other languages may see slightly higher degradation
115
+ - This quantization targets the MLP/expert layers only; KV cache is not quantized
116
+
117
+ ## License
118
+
119
+ Same license as the base model [arcee-ai/Trinity-Large-TrueBase](https://huggingface.co/arcee-ai/Trinity-Large-TrueBase).
chat_template.jinja ADDED
@@ -0,0 +1 @@
 
 
1
+ {{ bos_token }}{% for message in messages %}{{ message['content'] }}{% endfor %}
config.json ADDED
@@ -0,0 +1,259 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "AfmoeForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_afmoe.AfmoeConfig",
9
+ "AutoModel": "modeling_afmoe.AfmoeModel",
10
+ "AutoModelForCausalLM": "modeling_afmoe.AfmoeForCausalLM"
11
+ },
12
+ "bos_token_id": null,
13
+ "dtype": "bfloat16",
14
+ "eos_token_id": null,
15
+ "global_attn_every_n_layers": 4,
16
+ "head_dim": 128,
17
+ "hidden_act": "silu",
18
+ "hidden_size": 3072,
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 12288,
21
+ "layer_types": [
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "sliding_attention",
25
+ "full_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "full_attention",
30
+ "sliding_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "full_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "sliding_attention",
37
+ "full_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "full_attention",
42
+ "sliding_attention",
43
+ "sliding_attention",
44
+ "sliding_attention",
45
+ "full_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "sliding_attention",
49
+ "full_attention",
50
+ "sliding_attention",
51
+ "sliding_attention",
52
+ "sliding_attention",
53
+ "full_attention",
54
+ "sliding_attention",
55
+ "sliding_attention",
56
+ "sliding_attention",
57
+ "full_attention",
58
+ "sliding_attention",
59
+ "sliding_attention",
60
+ "sliding_attention",
61
+ "full_attention",
62
+ "sliding_attention",
63
+ "sliding_attention",
64
+ "sliding_attention",
65
+ "full_attention",
66
+ "sliding_attention",
67
+ "sliding_attention",
68
+ "sliding_attention",
69
+ "full_attention",
70
+ "sliding_attention",
71
+ "sliding_attention",
72
+ "sliding_attention",
73
+ "full_attention",
74
+ "sliding_attention",
75
+ "sliding_attention",
76
+ "sliding_attention",
77
+ "full_attention",
78
+ "sliding_attention",
79
+ "sliding_attention",
80
+ "sliding_attention",
81
+ "full_attention"
82
+ ],
83
+ "load_balance_coeff": 5e-05,
84
+ "max_position_embeddings": 8192,
85
+ "model_type": "afmoe",
86
+ "moe_intermediate_size": 3072,
87
+ "mup_enabled": true,
88
+ "n_group": 1,
89
+ "num_attention_heads": 48,
90
+ "num_dense_layers": 6,
91
+ "num_expert_groups": 1,
92
+ "num_experts": 256,
93
+ "num_experts_per_tok": 4,
94
+ "num_hidden_layers": 60,
95
+ "num_key_value_heads": 8,
96
+ "num_limited_groups": 1,
97
+ "num_shared_experts": 1,
98
+ "pad_token_id": null,
99
+ "rms_norm_eps": 1e-05,
100
+ "rope_parameters": {
101
+ "rope_theta": 10000.0,
102
+ "rope_type": "default"
103
+ },
104
+ "rope_theta": 10000,
105
+ "route_norm": true,
106
+ "route_scale": 2.448,
107
+ "score_func": "sigmoid",
108
+ "sliding_window": 4096,
109
+ "tie_word_embeddings": false,
110
+ "topk_group": 1,
111
+ "transformers_version": "5.1.0",
112
+ "use_cache": true,
113
+ "use_grouped_mm": true,
114
+ "vocab_size": 200192,
115
+ "quantization_config": {
116
+ "config_groups": {
117
+ "group_0": {
118
+ "input_activations": {
119
+ "dynamic": false,
120
+ "num_bits": 4,
121
+ "type": "float",
122
+ "group_size": 16
123
+ },
124
+ "weights": {
125
+ "dynamic": false,
126
+ "num_bits": 4,
127
+ "type": "float",
128
+ "group_size": 16
129
+ },
130
+ "targets": [
131
+ "Linear"
132
+ ]
133
+ }
134
+ },
135
+ "ignore": [
136
+ "lm_head",
137
+ "model.layers.0.self_attn*",
138
+ "model.layers.1.self_attn*",
139
+ "model.layers.10.mlp.router*",
140
+ "model.layers.10.self_attn*",
141
+ "model.layers.11.mlp.router*",
142
+ "model.layers.11.self_attn*",
143
+ "model.layers.12.mlp.router*",
144
+ "model.layers.12.self_attn*",
145
+ "model.layers.13.mlp.router*",
146
+ "model.layers.13.self_attn*",
147
+ "model.layers.14.mlp.router*",
148
+ "model.layers.14.self_attn*",
149
+ "model.layers.15.mlp.router*",
150
+ "model.layers.15.self_attn*",
151
+ "model.layers.16.mlp.router*",
152
+ "model.layers.16.self_attn*",
153
+ "model.layers.17.mlp.router*",
154
+ "model.layers.17.self_attn*",
155
+ "model.layers.18.mlp.router*",
156
+ "model.layers.18.self_attn*",
157
+ "model.layers.19.mlp.router*",
158
+ "model.layers.19.self_attn*",
159
+ "model.layers.2.self_attn*",
160
+ "model.layers.20.mlp.router*",
161
+ "model.layers.20.self_attn*",
162
+ "model.layers.21.mlp.router*",
163
+ "model.layers.21.self_attn*",
164
+ "model.layers.22.mlp.router*",
165
+ "model.layers.22.self_attn*",
166
+ "model.layers.23.mlp.router*",
167
+ "model.layers.23.self_attn*",
168
+ "model.layers.24.mlp.router*",
169
+ "model.layers.24.self_attn*",
170
+ "model.layers.25.mlp.router*",
171
+ "model.layers.25.self_attn*",
172
+ "model.layers.26.mlp.router*",
173
+ "model.layers.26.self_attn*",
174
+ "model.layers.27.mlp.router*",
175
+ "model.layers.27.self_attn*",
176
+ "model.layers.28.mlp.router*",
177
+ "model.layers.28.self_attn*",
178
+ "model.layers.29.mlp.router*",
179
+ "model.layers.29.self_attn*",
180
+ "model.layers.3.self_attn*",
181
+ "model.layers.30.mlp.router*",
182
+ "model.layers.30.self_attn*",
183
+ "model.layers.31.mlp.router*",
184
+ "model.layers.31.self_attn*",
185
+ "model.layers.32.mlp.router*",
186
+ "model.layers.32.self_attn*",
187
+ "model.layers.33.mlp.router*",
188
+ "model.layers.33.self_attn*",
189
+ "model.layers.34.mlp.router*",
190
+ "model.layers.34.self_attn*",
191
+ "model.layers.35.mlp.router*",
192
+ "model.layers.35.self_attn*",
193
+ "model.layers.36.mlp.router*",
194
+ "model.layers.36.self_attn*",
195
+ "model.layers.37.mlp.router*",
196
+ "model.layers.37.self_attn*",
197
+ "model.layers.38.mlp.router*",
198
+ "model.layers.38.self_attn*",
199
+ "model.layers.39.mlp.router*",
200
+ "model.layers.39.self_attn*",
201
+ "model.layers.4.self_attn*",
202
+ "model.layers.40.mlp.router*",
203
+ "model.layers.40.self_attn*",
204
+ "model.layers.41.mlp.router*",
205
+ "model.layers.41.self_attn*",
206
+ "model.layers.42.mlp.router*",
207
+ "model.layers.42.self_attn*",
208
+ "model.layers.43.mlp.router*",
209
+ "model.layers.43.self_attn*",
210
+ "model.layers.44.mlp.router*",
211
+ "model.layers.44.self_attn*",
212
+ "model.layers.45.mlp.router*",
213
+ "model.layers.45.self_attn*",
214
+ "model.layers.46.mlp.router*",
215
+ "model.layers.46.self_attn*",
216
+ "model.layers.47.mlp.router*",
217
+ "model.layers.47.self_attn*",
218
+ "model.layers.48.mlp.router*",
219
+ "model.layers.48.self_attn*",
220
+ "model.layers.49.mlp.router*",
221
+ "model.layers.49.self_attn*",
222
+ "model.layers.5.self_attn*",
223
+ "model.layers.50.mlp.router*",
224
+ "model.layers.50.self_attn*",
225
+ "model.layers.51.mlp.router*",
226
+ "model.layers.51.self_attn*",
227
+ "model.layers.52.mlp.router*",
228
+ "model.layers.52.self_attn*",
229
+ "model.layers.53.mlp.router*",
230
+ "model.layers.53.self_attn*",
231
+ "model.layers.54.mlp.router*",
232
+ "model.layers.54.self_attn*",
233
+ "model.layers.55.mlp.router*",
234
+ "model.layers.55.self_attn*",
235
+ "model.layers.56.mlp.router*",
236
+ "model.layers.56.self_attn*",
237
+ "model.layers.57.mlp.router*",
238
+ "model.layers.57.self_attn*",
239
+ "model.layers.58.mlp.router*",
240
+ "model.layers.58.self_attn*",
241
+ "model.layers.59.mlp.router*",
242
+ "model.layers.59.self_attn*",
243
+ "model.layers.6.mlp.router*",
244
+ "model.layers.6.self_attn*",
245
+ "model.layers.7.mlp.router*",
246
+ "model.layers.7.self_attn*",
247
+ "model.layers.8.mlp.router*",
248
+ "model.layers.8.self_attn*",
249
+ "model.layers.9.mlp.router*",
250
+ "model.layers.9.self_attn*"
251
+ ],
252
+ "quant_algo": "NVFP4",
253
+ "producer": {
254
+ "name": "modelopt",
255
+ "version": "0.41.0"
256
+ },
257
+ "quant_method": "modelopt"
258
+ }
259
+ }
generation_config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "transformers_version": "5.1.0",
4
+ "use_cache": true
5
+ }
hf_quant_config.json ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "producer": {
3
+ "name": "modelopt",
4
+ "version": "0.41.0"
5
+ },
6
+ "quantization": {
7
+ "quant_algo": "NVFP4",
8
+ "kv_cache_quant_algo": null,
9
+ "group_size": 16,
10
+ "exclude_modules": [
11
+ "lm_head",
12
+ "model.layers.0.self_attn*",
13
+ "model.layers.1.self_attn*",
14
+ "model.layers.10.mlp.router*",
15
+ "model.layers.10.self_attn*",
16
+ "model.layers.11.mlp.router*",
17
+ "model.layers.11.self_attn*",
18
+ "model.layers.12.mlp.router*",
19
+ "model.layers.12.self_attn*",
20
+ "model.layers.13.mlp.router*",
21
+ "model.layers.13.self_attn*",
22
+ "model.layers.14.mlp.router*",
23
+ "model.layers.14.self_attn*",
24
+ "model.layers.15.mlp.router*",
25
+ "model.layers.15.self_attn*",
26
+ "model.layers.16.mlp.router*",
27
+ "model.layers.16.self_attn*",
28
+ "model.layers.17.mlp.router*",
29
+ "model.layers.17.self_attn*",
30
+ "model.layers.18.mlp.router*",
31
+ "model.layers.18.self_attn*",
32
+ "model.layers.19.mlp.router*",
33
+ "model.layers.19.self_attn*",
34
+ "model.layers.2.self_attn*",
35
+ "model.layers.20.mlp.router*",
36
+ "model.layers.20.self_attn*",
37
+ "model.layers.21.mlp.router*",
38
+ "model.layers.21.self_attn*",
39
+ "model.layers.22.mlp.router*",
40
+ "model.layers.22.self_attn*",
41
+ "model.layers.23.mlp.router*",
42
+ "model.layers.23.self_attn*",
43
+ "model.layers.24.mlp.router*",
44
+ "model.layers.24.self_attn*",
45
+ "model.layers.25.mlp.router*",
46
+ "model.layers.25.self_attn*",
47
+ "model.layers.26.mlp.router*",
48
+ "model.layers.26.self_attn*",
49
+ "model.layers.27.mlp.router*",
50
+ "model.layers.27.self_attn*",
51
+ "model.layers.28.mlp.router*",
52
+ "model.layers.28.self_attn*",
53
+ "model.layers.29.mlp.router*",
54
+ "model.layers.29.self_attn*",
55
+ "model.layers.3.self_attn*",
56
+ "model.layers.30.mlp.router*",
57
+ "model.layers.30.self_attn*",
58
+ "model.layers.31.mlp.router*",
59
+ "model.layers.31.self_attn*",
60
+ "model.layers.32.mlp.router*",
61
+ "model.layers.32.self_attn*",
62
+ "model.layers.33.mlp.router*",
63
+ "model.layers.33.self_attn*",
64
+ "model.layers.34.mlp.router*",
65
+ "model.layers.34.self_attn*",
66
+ "model.layers.35.mlp.router*",
67
+ "model.layers.35.self_attn*",
68
+ "model.layers.36.mlp.router*",
69
+ "model.layers.36.self_attn*",
70
+ "model.layers.37.mlp.router*",
71
+ "model.layers.37.self_attn*",
72
+ "model.layers.38.mlp.router*",
73
+ "model.layers.38.self_attn*",
74
+ "model.layers.39.mlp.router*",
75
+ "model.layers.39.self_attn*",
76
+ "model.layers.4.self_attn*",
77
+ "model.layers.40.mlp.router*",
78
+ "model.layers.40.self_attn*",
79
+ "model.layers.41.mlp.router*",
80
+ "model.layers.41.self_attn*",
81
+ "model.layers.42.mlp.router*",
82
+ "model.layers.42.self_attn*",
83
+ "model.layers.43.mlp.router*",
84
+ "model.layers.43.self_attn*",
85
+ "model.layers.44.mlp.router*",
86
+ "model.layers.44.self_attn*",
87
+ "model.layers.45.mlp.router*",
88
+ "model.layers.45.self_attn*",
89
+ "model.layers.46.mlp.router*",
90
+ "model.layers.46.self_attn*",
91
+ "model.layers.47.mlp.router*",
92
+ "model.layers.47.self_attn*",
93
+ "model.layers.48.mlp.router*",
94
+ "model.layers.48.self_attn*",
95
+ "model.layers.49.mlp.router*",
96
+ "model.layers.49.self_attn*",
97
+ "model.layers.5.self_attn*",
98
+ "model.layers.50.mlp.router*",
99
+ "model.layers.50.self_attn*",
100
+ "model.layers.51.mlp.router*",
101
+ "model.layers.51.self_attn*",
102
+ "model.layers.52.mlp.router*",
103
+ "model.layers.52.self_attn*",
104
+ "model.layers.53.mlp.router*",
105
+ "model.layers.53.self_attn*",
106
+ "model.layers.54.mlp.router*",
107
+ "model.layers.54.self_attn*",
108
+ "model.layers.55.mlp.router*",
109
+ "model.layers.55.self_attn*",
110
+ "model.layers.56.mlp.router*",
111
+ "model.layers.56.self_attn*",
112
+ "model.layers.57.mlp.router*",
113
+ "model.layers.57.self_attn*",
114
+ "model.layers.58.mlp.router*",
115
+ "model.layers.58.self_attn*",
116
+ "model.layers.59.mlp.router*",
117
+ "model.layers.59.self_attn*",
118
+ "model.layers.6.mlp.router*",
119
+ "model.layers.6.self_attn*",
120
+ "model.layers.7.mlp.router*",
121
+ "model.layers.7.self_attn*",
122
+ "model.layers.8.mlp.router*",
123
+ "model.layers.8.self_attn*",
124
+ "model.layers.9.mlp.router*",
125
+ "model.layers.9.self_attn*"
126
+ ]
127
+ }
128
+ }
model-00001-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd2ac4af3cda1f3dc4e5943a89d6720d9644eaf1f0b94752e107c8203b7a150b
3
+ size 49979822160
model-00002-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0df8931d99bb0207433d38d4ca3d56d6976244f922a54b1599659122186de2b
3
+ size 50001038716
model-00003-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bbf5110d67df6249999e39118a96a099839b8a39567360e1e3194b951a84448
3
+ size 50004196600
model-00004-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e3f7df7b68ccc5267de3230fefbc360dab50b0ba72903a9f5c74cdb59099b2a
3
+ size 50000068080
model-00005-of-00005.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6afe686a0fb46699d9c9ad09d4c762ef6cd669d55466334235414df0a06496e3
3
+ size 31524413620
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a626457320b5a5245cb6e8e4113dc4c4ff697c1feeae9334e6ab0432d0f2073
3
+ size 15989867
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:afb39058c41984d943eda4ccaeababb686cf6f75e6dc08a653074de4d39ce038
3
+ size 14615153
tokenizer_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|begin_of_text|>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|im_end|>",
7
+ "is_local": false,
8
+ "model_max_length": 65536,
9
+ "pad_token": "<|pad|>",
10
+ "tokenizer_class": "TokenizersBackend",
11
+ "use_default_system_prompt": false
12
+ }