ywchiu commited on
Commit
3796359
·
1 Parent(s): 28eb6d5

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -12,7 +12,6 @@ tags:
12
  - gemma-4
13
  - vllm
14
  - fp8
15
- - fp8-dynamic
16
  - compressed-tensors
17
  - quantization
18
  - h200
@@ -20,7 +19,6 @@ tags:
20
  - mixture-of-experts
21
  - moe
22
  - inference
23
- - production-ready
24
  - largitdata
25
  quantized_by: largitdata-inc
26
  base_model:
@@ -28,107 +26,108 @@ base_model:
28
  model_type: gemma4
29
  ---
30
 
31
- # Gemma 4 26B-A4B IT FP8 Dynamic Norouter
32
 
33
- **Production-ready offline FP8 checkpoint for vLLM 47% less VRAM, 80% more concurrency vs BF16.**
34
 
35
- We searched for a usable offline FP8 checkpoint of Gemma 4 26B-A4B-it but couldn't find one that worked cleanly with vLLM. So we vibe-coded our own and are sharing it with the community.
36
 
37
- This repository hosts an offline FP8 checkpoint derived from [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) for vLLM serving. No on-the-fly quantization needed at startup.
 
38
 
39
- Published by [**Largitdata Inc.**](https://www.largitdata.com/)
40
 
41
- > **Note:** This is a derived operational checkpoint, not an official Google release. The original model's license terms, safety guidance, and documentation remain authoritative.
42
 
43
- 📖 [中文說明 / Chinese Version](#中文說明)
44
 
45
- ---
 
 
 
 
 
 
 
 
 
 
 
46
 
47
- ## Model Details
48
 
49
- - **Base model:** [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it)
50
- - **Derived format:** offline FP8 checkpoint for vLLM
51
- - **Quantization tool:** [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
52
- - **Quantization method:** `FP8_DYNAMIC`
53
- - **Calibration data:** None required (dynamic quantization)
54
- - **Excluded weights:**
55
- - `norm`-class 1D tensors — excluded to avoid `expected 2D linear weight` validation errors during quantization
56
- - `re:.*router\.proj$` — MoE router weights excluded to maintain compatibility with the Gemma4 vLLM loading path
57
- - **Output directory name:** `gemma-4-26B-A4B-it-FP8-DYNAMIC-NOROUTER`
58
- - **Primary serving target:** `vllm/vllm-openai:gemma4`
59
- - **Organization:** [Largitdata Inc.](https://www.largitdata.com/)
60
 
61
- ## Test Environment
62
 
63
- - **GPU:** `NVIDIA H200 NVL` (`143 GB VRAM`)
64
- - **Runtime:** `vllm/vllm-openai:gemma4`
65
- - **KV cache dtype:** `fp8`
66
- - **`max_model_len`:** `32768`
67
- - **`gpu_memory_utilization`:** `0.55`
68
 
69
- Observed vLLM startup characteristics:
 
 
 
 
70
 
71
- - model weight loading: `15.76 s`
72
- - model loading total: `16.88 s`
73
- - `torch.compile`: `56.98 s`
74
- - engine init: `102.17 s`
75
- - total time to `/v1/models` ready: about `153 s`
76
 
77
- Observed runtime capacity:
 
 
 
78
 
 
 
 
 
79
  - `max_num_batched_tokens = 8192`
 
 
 
 
80
  - available KV cache memory: `46.37 GiB`
81
  - GPU KV cache size: `405,184 tokens`
82
  - maximum concurrency at `32,768` tokens/request: `38.87x`
83
 
84
- ## Serving Capacity Comparison
85
 
86
- | Metric | FP8 Dynamic Norouter | BF16 Baseline |
87
- |---|---|---|
88
- | Model loading memory | **25.75 GiB** | 48.5 GiB |
89
- | GPU KV cache size | **405,184 tokens** | 225,376 tokens |
90
- | Max concurrency @ 32K tokens/req | **38.87x** | 21.62x |
91
- | VRAM savings | **47% less** | — |
92
- | KV cache gain | **80% more** | — |
93
 
94
- ## Basic Benchmark
95
 
96
- Single-request warm benchmark against the OpenAI-compatible vLLM endpoint:
97
 
98
- - prompt tokens: `38`
99
- - completion tokens: `256`
100
- - temperature: `0`
101
-
102
- | Metric | FP8 Dynamic Norouter | BF16 Baseline |
103
- |---|---|---|
104
- | Avg end-to-end latency | 1.629 s | **1.536 s** |
105
- | Avg completion throughput | 157.19 tok/s | **166.62 tok/s** |
106
- | Avg total throughput | 180.53 tok/s | **191.36 tok/s** |
107
-
108
- These numbers are single-request warm-path measurements, not multi-client throughput tests. In production multi-client scenarios, the FP8 variant's larger KV cache is expected to provide superior aggregate throughput.
109
-
110
- **BF16 is ~6% faster on single-request latency, but the FP8 variant uses 47% less VRAM and provides 80% more KV cache capacity.** For production environments serving multiple concurrent users, the FP8 variant offers a better trade-off.
111
 
112
- ### Accuracy Evaluation
113
 
114
- Formal accuracy benchmarks (MMLU, MT-Bench, etc.) have not yet been conducted on this FP8 checkpoint. Based on prior community findings with FP8 dynamic quantization on similar architectures, accuracy degradation is typically negligible (<0.5% on MMLU). Community contributions with benchmark results are welcome — please open a discussion or PR.
 
 
115
 
116
  ## Usage
117
 
118
- Example vLLM launch:
119
 
120
  ```bash
121
  docker run -d \
122
- --name vllm-gemma4-26b-fp8-norouter \
123
  --restart unless-stopped \
124
  --ipc=host \
125
  --shm-size 16G \
126
  --gpus all \
127
- -v /models \
128
  -p 8001:8000 \
129
  -e NVIDIA_VISIBLE_DEVICES=0 \
130
- vllm/vllm-openai:gemma4 \
131
- --model /models/gemma-4-26B-A4B-it-FP8-DYNAMIC-NOROUTER \
132
  --trust-remote-code \
133
  --kv-cache-dtype fp8 \
134
  --gpu-memory-utilization 0.55 \
@@ -141,21 +140,10 @@ docker run -d \
141
 
142
  ## Known Limitations
143
 
144
- - Single-request latency is ~6% higher than BF16 due to FP8 dequantization overhead.
145
- - No formal accuracy benchmarks (MMLU, MT-Bench, etc.) have been run yet. Community contributions are welcome.
146
- - Only tested on NVIDIA H200 NVL. Other GPUs (A100, H100) may require adjusting `gpu-memory-utilization`.
147
- - MoE router weights (`router.proj`) and `norm`-class 1D tensors are excluded from quantization for vLLM compatibility. No routing degradation has been observed, but systematic evaluation has not been performed.
148
-
149
- ## Intended Use
150
-
151
- This artifact is intended for:
152
-
153
- - Operational vLLM deployment on H200-class hardware
154
- - Reproducible offline FP8 serving experiments
155
- - Environments where startup-time on-the-fly quantization is undesirable
156
- - Production inference with higher concurrency requirements
157
-
158
- This artifact is not intended to replace the original base model documentation, safety guidance, or license terms.
159
 
160
  ## License
161
 
@@ -163,15 +151,13 @@ This repository contains a derived checkpoint based on [`google/gemma-4-26B-A4B-
163
 
164
  ## Citation
165
 
166
- If you use this artifact, please cite both the derived checkpoint and the upstream base model.
167
-
168
  ```bibtex
169
- @misc{largitdata_gemma4_26b_a4b_it_fp8_dynamic_norouter_2026,
170
- title = {Gemma 4 26B-A4B IT FP8 Dynamic Norouter},
171
  author = {David Chiu},
172
  year = {2026},
173
- howpublished = {\url{https://huggingface.co/largitdata-inc/gemma-4-26b-a4b-it-fp8-dynamic-norouter}},
174
- note = {Derived offline FP8 checkpoint from google/gemma-4-26B-A4B-it for vLLM serving, published by Largitdata Inc. \url{https://www.largitdata.com/}}
175
  }
176
 
177
  @misc{google_gemma4_26b_a4b_it,
@@ -182,139 +168,27 @@ If you use this artifact, please cite both the derived checkpoint and the upstre
182
  }
183
  ```
184
 
185
- ## Disclaimer
186
-
187
- Users are responsible for verifying license compatibility, downstream serving behavior, numerical quality, and safety characteristics for their own environment.
188
-
189
  ---
190
 
191
  ## 中文說明
192
 
193
- **給 vLLM 的離線 FP8 checkpoint BF16 省 47% VRAM平行處理能力多 80%。**
194
-
195
- 我們在網路上找了一輪,沒有找到堪用的 Gemma 4 26B 離線 FP8 版本,索性自己 vibe coding 做了一版,貢獻給社群。
196
-
197
- 這個 Repo 提供從 [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) 衍生出的離線 `FP8` checkpoint,讓 `vLLM` 可以直接載入服務,不需要在啟動時執行 on-the-fly 量化。
198
-
199
- 由 [**Largitdata Inc.**](https://www.largitdata.com/) 發佈。
200
-
201
- > **注意:** 這是衍生的操作用 checkpoint,並非 Google 官方發佈。原始模型的授權條款、安全指引與文件仍以官方為準。
202
-
203
- ### 模型細節
204
-
205
- - **基底模型:** [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it)
206
- - **格式:** 離線 FP8 checkpoint,供 vLLM 使用
207
- - **量化工具:** [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
208
- - **量化方式:** `FP8_DYNAMIC`
209
- - **校準資料:** 不需要(動態量化)
210
- - **排除的權重:**
211
- - `norm` 類一維 tensor — 避免量化驗證時產生 `expected 2D linear weight` 類錯誤
212
- - `re:.*router\.proj$` — MoE router 權重,維持與 Gemma4 vLLM 載入路徑的相容性
213
- - **主要部署目標:** `vllm/vllm-openai:gemma4`
214
-
215
- ### 測試環境
216
-
217
- - **GPU:** `NVIDIA H200 NVL`(`143 GB VRAM`)
218
- - **Runtime:** `vllm/vllm-openai:gemma4`
219
- - **KV cache dtype:** `fp8`
220
- - **`max_model_len`:** `32768`
221
- - **`gpu_memory_utilization`:** `0.55`
222
-
223
- 啟動實測數據:
224
-
225
- - 模型權重載入:`15.76 s`
226
- - 模型載入總計:`16.88 s`
227
- - `torch.compile`:`56.98 s`
228
- - 引擎初始化:`102.17 s`
229
- - `/v1/models` 就緒總時間:約 `153 s`
230
-
231
- 執行期容量:
232
-
233
- - `max_num_batched_tokens = 8192`
234
- - 可用 KV cache 記憶體:`46.37 GiB`
235
- - GPU KV cache 大小:`405,184 tokens`
236
- - 最大平行處理量(`32,768` tokens/request):`38.87x`
237
-
238
- ### 服務容量比較
239
-
240
- | 指標 | FP8 Dynamic Norouter | BF16 原版 |
241
- |---|---|---|
242
- | 模型載入記憶體 | **25.75 GiB** | 48.5 GiB |
243
- | GPU KV cache 大小 | **405,184 tokens** | 225,376 tokens |
244
- | 最大平行處理量 @ 32K tokens/req | **38.87x** | 21.62x |
245
- | VRAM 節省 | **47%** | — |
246
- | KV cache 增加 | **80%** | — |
247
-
248
- ### 基礎效能測試
249
-
250
- 單請求暖機測試(OpenAI 相容 vLLM endpoint):
251
-
252
- - prompt tokens:`38`
253
- - completion tokens:`256`
254
- - temperature:`0`
255
-
256
- | 指標 | FP8 Dynamic Norouter | BF16 原版 |
257
- |---|---|---|
258
- | 平均端到端延遲 | 1.629 s | **1.536 s** |
259
- | 平均 completion 吞吐量 | 157.19 tok/s | **166.62 tok/s** |
260
- | 平均總吞吐量 | 180.53 tok/s | **191.36 tok/s** |
261
-
262
- 以上為單請求暖機路徑測量值,非多用戶吞吐量測試。在生產環境多用戶場景下,FP8 版本更大的 KV cache 預期能提供更好的整體吞吐量。
263
-
264
- **結論**:`BF16` 單請求略快(約 6%),但 FP8 版本 VRAM 用量減少 47%,可用 KV cache 增加 80%。需要同時服務多用戶的生產環境,FP8 版本更具優勢。
265
-
266
- ### 精度評估
267
-
268
- 尚未對此 FP8 checkpoint 進行正式精度 benchmark(MMLU、MT-Bench 等)。根據社群先前在類似架構上使用 FP8 動態量化的經驗,精度下降通常可忽略(MMLU < 0.5%)。歡迎社群貢獻 benchmark 結果,請開 discussion 或提交 PR。
269
-
270
- ### 使用方式
271
-
272
- vLLM 啟動範例:
273
-
274
- ```bash
275
- docker run -d \
276
- --name vllm-gemma4-26b-fp8-norouter \
277
- --restart unless-stopped \
278
- --ipc=host \
279
- --shm-size 16G \
280
- --gpus all \
281
- -v /models \
282
- -p 8001:8000 \
283
- -e NVIDIA_VISIBLE_DEVICES=0 \
284
- vllm/vllm-openai:gemma4 \
285
- --model /models/gemma-4-26B-A4B-it-FP8-DYNAMIC-NOROUTER \
286
- --trust-remote-code \
287
- --kv-cache-dtype fp8 \
288
- --gpu-memory-utilization 0.55 \
289
- --max-model-len 32768 \
290
- --enable-auto-tool-choice \
291
- --tool-call-parser gemma4 \
292
- --host 0.0.0.0 \
293
- --port 8000
294
- ```
295
-
296
- ### 已知限制
297
-
298
- - 單請求延遲比 BF16 高約 6%,主因為 FP8 dequantization 的額外開銷
299
- - 尚未進行 MMLU / MT-Bench 等精度 benchmark(歡迎社群補充)
300
- - 僅在 H200 NVL 上實測,其他 GPU(如 A100、H100)可能需要調整 `gpu-memory-utilization`
301
- - MoE router 權重(`router.proj`)與 `norm` 類一維 tensor 被排除在量化範圍外以維持 vLLM 相容性,目前未觀察到分流品質下降,但尚無系統性評估
302
-
303
- ### 使用場景
304
-
305
- 此 checkpoint 適用於:
306
-
307
- - 在 H200 等級硬體上以 vLLM 進行生產部署
308
- - 可重現的離線 FP8 服務實驗
309
- - 不希望在啟動時執行 on-the-fly 量化的環境
310
- - 需要更高平行處理能力的生產推論場景
311
 
312
- checkpoint 不取��原始基底模型的文件、安全指引或授權條款。
 
313
 
314
- ### 授權
315
 
316
- 此 Repo 包含基於 [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) 的衍生 checkpoint,使用須遵守 [Gemma 使用條款](https://ai.google.dev/gemma/terms)。
 
 
 
 
 
 
 
 
317
 
318
- ### 免責聲明
319
 
320
- 使用者需自行驗證授權容性、下游服務行為、數值品質與安全特性
 
12
  - gemma-4
13
  - vllm
14
  - fp8
 
15
  - compressed-tensors
16
  - quantization
17
  - h200
 
19
  - mixture-of-experts
20
  - moe
21
  - inference
 
22
  - largitdata
23
  quantized_by: largitdata-inc
24
  base_model:
 
26
  model_type: gemma4
27
  ---
28
 
29
+ # Gemma 4 26B-A4B IT FP8
30
 
31
+ Packed-expert offline FP8 checkpoint for [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it), built for vLLM serving on H200-class GPUs.
32
 
33
+ This artifact is the final production checkpoint we derived after patching both:
34
 
35
+ - `llmcompressor model_free_ptq`, so packed MoE experts are actually quantized to `FP8`
36
+ - `vLLM Gemma4` loader, so expert `weight_scale` tensors can be loaded correctly
37
 
38
+ Published by [Largitdata Inc.](https://www.largitdata.com/).
39
 
40
+ > This is a derived operational checkpoint, not an official Google release. The upstream model card, license terms, and safety guidance remain authoritative.
41
 
42
+ ## Model Details
43
 
44
+ - Base model: [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it)
45
+ - Format: offline `FP8` checkpoint for vLLM
46
+ - Quantization tool: [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
47
+ - Quantization method: `FP8_DYNAMIC`
48
+ - Packed expert quantization: enabled
49
+ - Excluded weights:
50
+ - `norm`-class 1D tensors
51
+ - `router.proj`
52
+ - Final checkpoint size: about `26 GB`
53
+ - Weight shards:
54
+ - `model-00001-of-00002.safetensors`: about `25 GB`
55
+ - `model-00002-of-00002.safetensors`: about `817 MB`
56
 
57
+ ## Important Compatibility Note
58
 
59
+ This checkpoint requires a patched Gemma4 loader in vLLM to load packed-expert `weight_scale` tensors correctly.
 
 
 
 
 
 
 
 
 
 
60
 
61
+ The production image we used is:
62
 
63
+ - `vllm-gemma4:packed-expert-loader-v1`
 
 
 
 
64
 
65
+ If you use an unpatched upstream vLLM image, loading may fail with errors similar to:
66
+
67
+ ```text
68
+ KeyError: 'layers.0.moe.experts.0.down_proj.weight_scale'
69
+ ```
70
 
71
+ ## Tested Environment
 
 
 
 
72
 
73
+ - GPU: `NVIDIA H200 NVL` (`143 GB VRAM`)
74
+ - Runtime image: `vllm-gemma4:packed-expert-loader-v1`
75
+ - KV cache dtype: `fp8`
76
+ - Final production serving path: `/models/gemma-4-26B-A4B-it-FP8`
77
 
78
+ ### Production Configuration
79
+
80
+ - `gpu_memory_utilization = 0.55`
81
+ - `max_model_len = 32768`
82
  - `max_num_batched_tokens = 8192`
83
+
84
+ Observed startup and capacity:
85
+
86
+ - model loading memory: `25.75 GiB`
87
  - available KV cache memory: `46.37 GiB`
88
  - GPU KV cache size: `405,184 tokens`
89
  - maximum concurrency at `32,768` tokens/request: `38.87x`
90
 
91
+ Observed warm single-request benchmark:
92
 
93
+ - `~1k` prompt: `156.50 tok/s`
94
+ - `~8k` prompt: `136.57 tok/s`
 
 
 
 
 
95
 
96
+ ### Apples-to-Apples Comparison at `gpu_memory_utilization = 0.75`
97
 
98
+ Same H200, same `max_model_len = 32768`, same benchmark method, same single-request setting:
99
 
100
+ | Metric | FP8 | BF16 |
101
+ | --- | ---: | ---: |
102
+ | Model loading memory | **25.75 GiB** | 48.5 GiB |
103
+ | Available KV cache memory | **74.33 GiB** | 51.59 GiB |
104
+ | GPU KV cache size | **649,504 tokens** | 225,376 tokens |
105
+ | Max concurrency @ `32k` | **62.31x** | 21.62x |
106
+ | `~1k` prompt decode throughput | 156.28 tok/s | **161.07 tok/s** |
107
+ | `~8k` prompt decode throughput | 136.32 tok/s | **138.01 tok/s** |
 
 
 
 
 
108
 
109
+ Takeaway:
110
 
111
+ - `BF16` is still slightly faster on single-request decode speed
112
+ - `FP8` cuts model memory sharply and converts most of that headroom into KV cache
113
+ - for concurrency-oriented serving, the FP8 checkpoint is the better trade-off
114
 
115
  ## Usage
116
 
117
+ Example launch command:
118
 
119
  ```bash
120
  docker run -d \
121
+ --name vllm-gemma4-26b-fp8 \
122
  --restart unless-stopped \
123
  --ipc=host \
124
  --shm-size 16G \
125
  --gpus all \
126
+ -v /models:/models \
127
  -p 8001:8000 \
128
  -e NVIDIA_VISIBLE_DEVICES=0 \
129
+ vllm-gemma4:packed-expert-loader-v1 \
130
+ --model /models/gemma-4-26B-A4B-it-FP8 \
131
  --trust-remote-code \
132
  --kv-cache-dtype fp8 \
133
  --gpu-memory-utilization 0.55 \
 
140
 
141
  ## Known Limitations
142
 
143
+ - Requires patched vLLM loader support for packed-expert `weight_scale`
144
+ - `BF16` remains slightly faster for single-request decode throughput
145
+ - No formal benchmark suite such as MMLU or MT-Bench has been run yet
146
+ - Tested on `NVIDIA H200 NVL`; other GPUs may need different settings
 
 
 
 
 
 
 
 
 
 
 
147
 
148
  ## License
149
 
 
151
 
152
  ## Citation
153
 
 
 
154
  ```bibtex
155
+ @misc{largitdata_gemma4_26b_a4b_it_fp8_2026,
156
+ title = {Gemma 4 26B-A4B IT FP8},
157
  author = {David Chiu},
158
  year = {2026},
159
+ howpublished = {\url{https://huggingface.co/LargitData/gemma-4-26b-a4b-it-fp8}},
160
+ note = {Derived offline FP8 packed-expert checkpoint from google/gemma-4-26B-A4B-it for patched vLLM serving}
161
  }
162
 
163
  @misc{google_gemma4_26b_a4b_it,
 
168
  }
169
  ```
170
 
 
 
 
 
171
  ---
172
 
173
  ## 中文說明
174
 
175
+ 這個 Repo 提供從 [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) 衍生出最終版離線 `FP8` checkpoint。這一版不是早期的 `Dynamic Norouter` 中間產物而是:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
176
 
177
+ - `llmcompressor` 已補成可量化 packed MoE experts
178
+ - `vLLM Gemma4` loader 已補成可讀取 expert `weight_scale`
179
 
180
+ ### 重點
181
 
182
+ - 最終 checkpoint 大小:約 `26 GB`
183
+ - 模型載入顯存:約 `25.75 GiB`
184
+ - `gpu_memory_utilization=0.55` 時:
185
+ - KV cache:`46.37 GiB`
186
+ - GPU KV cache:`405,184 tokens`
187
+ - `32k` concurrency:`38.87x`
188
+ - `gpu_memory_utilization=0.75` 時,和 BF16 同條件相比:
189
+ - `FP8` 單請求速度略慢一點
190
+ - 但 KV cache 明顯更大,`32k` concurrency 由 `21.62x` 提升到 `62.31x`
191
 
192
+ ### 使用限制
193
 
194
+ 這份 checkpoint 需要 patched `vLLM` loader。若直接使用未修補的 upstream `vLLM`,可能會在載入時遇到 expert `weight_scale` 關錯誤
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb3c810035ce85853a1be7dd348f2f73d1417959d623a882e51522de7b1fdf1
3
+ size 26305626460
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5efcdd24651d5061d49178ba88b86dac518729d25e84f1879d15570852806c6
3
+ size 856521512
model.safetensors.index.json CHANGED
@@ -1,12 +1,14 @@
1
  {
2
  "metadata": {
3
- "total_size": 49967520412
4
  },
5
  "weight_map": {
6
  "model.embed_vision.embedding_projection.weight": "model-00001-of-00002.safetensors",
7
  "model.language_model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
  "model.language_model.layers.0.experts.down_proj": "model-00001-of-00002.safetensors",
 
9
  "model.language_model.layers.0.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
10
  "model.language_model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
11
  "model.language_model.layers.0.layer_scalar": "model-00001-of-00002.safetensors",
12
  "model.language_model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -35,7 +37,9 @@
35
  "model.language_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
36
  "model.language_model.layers.0.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
37
  "model.language_model.layers.1.experts.down_proj": "model-00001-of-00002.safetensors",
 
38
  "model.language_model.layers.1.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
39
  "model.language_model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
40
  "model.language_model.layers.1.layer_scalar": "model-00001-of-00002.safetensors",
41
  "model.language_model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -64,7 +68,9 @@
64
  "model.language_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
65
  "model.language_model.layers.1.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
66
  "model.language_model.layers.10.experts.down_proj": "model-00001-of-00002.safetensors",
 
67
  "model.language_model.layers.10.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
68
  "model.language_model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
69
  "model.language_model.layers.10.layer_scalar": "model-00001-of-00002.safetensors",
70
  "model.language_model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -93,7 +99,9 @@
93
  "model.language_model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
94
  "model.language_model.layers.10.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
95
  "model.language_model.layers.11.experts.down_proj": "model-00001-of-00002.safetensors",
 
96
  "model.language_model.layers.11.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
97
  "model.language_model.layers.11.input_layernorm.weight": "model-00002-of-00002.safetensors",
98
  "model.language_model.layers.11.layer_scalar": "model-00001-of-00002.safetensors",
99
  "model.language_model.layers.11.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
@@ -120,7 +128,9 @@
120
  "model.language_model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
121
  "model.language_model.layers.11.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
122
  "model.language_model.layers.12.experts.down_proj": "model-00001-of-00002.safetensors",
 
123
  "model.language_model.layers.12.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
124
  "model.language_model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
125
  "model.language_model.layers.12.layer_scalar": "model-00001-of-00002.safetensors",
126
  "model.language_model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -149,7 +159,9 @@
149
  "model.language_model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
150
  "model.language_model.layers.12.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
151
  "model.language_model.layers.13.experts.down_proj": "model-00001-of-00002.safetensors",
 
152
  "model.language_model.layers.13.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
153
  "model.language_model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
154
  "model.language_model.layers.13.layer_scalar": "model-00001-of-00002.safetensors",
155
  "model.language_model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -178,7 +190,9 @@
178
  "model.language_model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
179
  "model.language_model.layers.13.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
180
  "model.language_model.layers.14.experts.down_proj": "model-00001-of-00002.safetensors",
 
181
  "model.language_model.layers.14.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
182
  "model.language_model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
183
  "model.language_model.layers.14.layer_scalar": "model-00001-of-00002.safetensors",
184
  "model.language_model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -207,7 +221,9 @@
207
  "model.language_model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
208
  "model.language_model.layers.14.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
209
  "model.language_model.layers.15.experts.down_proj": "model-00001-of-00002.safetensors",
 
210
  "model.language_model.layers.15.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
211
  "model.language_model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
212
  "model.language_model.layers.15.layer_scalar": "model-00001-of-00002.safetensors",
213
  "model.language_model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -236,7 +252,9 @@
236
  "model.language_model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
237
  "model.language_model.layers.15.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
238
  "model.language_model.layers.16.experts.down_proj": "model-00001-of-00002.safetensors",
 
239
  "model.language_model.layers.16.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
240
  "model.language_model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
241
  "model.language_model.layers.16.layer_scalar": "model-00001-of-00002.safetensors",
242
  "model.language_model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -265,7 +283,9 @@
265
  "model.language_model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
266
  "model.language_model.layers.16.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
267
  "model.language_model.layers.17.experts.down_proj": "model-00002-of-00002.safetensors",
 
268
  "model.language_model.layers.17.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
269
  "model.language_model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
270
  "model.language_model.layers.17.layer_scalar": "model-00001-of-00002.safetensors",
271
  "model.language_model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
@@ -292,7 +312,9 @@
292
  "model.language_model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
293
  "model.language_model.layers.17.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
294
  "model.language_model.layers.18.experts.down_proj": "model-00001-of-00002.safetensors",
 
295
  "model.language_model.layers.18.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
296
  "model.language_model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
297
  "model.language_model.layers.18.layer_scalar": "model-00001-of-00002.safetensors",
298
  "model.language_model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -321,7 +343,9 @@
321
  "model.language_model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
322
  "model.language_model.layers.18.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
323
  "model.language_model.layers.19.experts.down_proj": "model-00001-of-00002.safetensors",
 
324
  "model.language_model.layers.19.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
325
  "model.language_model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
326
  "model.language_model.layers.19.layer_scalar": "model-00001-of-00002.safetensors",
327
  "model.language_model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -350,7 +374,9 @@
350
  "model.language_model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
351
  "model.language_model.layers.19.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
352
  "model.language_model.layers.2.experts.down_proj": "model-00001-of-00002.safetensors",
 
353
  "model.language_model.layers.2.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
354
  "model.language_model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
355
  "model.language_model.layers.2.layer_scalar": "model-00001-of-00002.safetensors",
356
  "model.language_model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -379,7 +405,9 @@
379
  "model.language_model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
380
  "model.language_model.layers.2.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
381
  "model.language_model.layers.20.experts.down_proj": "model-00001-of-00002.safetensors",
 
382
  "model.language_model.layers.20.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
383
  "model.language_model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
384
  "model.language_model.layers.20.layer_scalar": "model-00001-of-00002.safetensors",
385
  "model.language_model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -408,7 +436,9 @@
408
  "model.language_model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
409
  "model.language_model.layers.20.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
410
  "model.language_model.layers.21.experts.down_proj": "model-00001-of-00002.safetensors",
 
411
  "model.language_model.layers.21.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
412
  "model.language_model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
413
  "model.language_model.layers.21.layer_scalar": "model-00001-of-00002.safetensors",
414
  "model.language_model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -437,7 +467,9 @@
437
  "model.language_model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
438
  "model.language_model.layers.21.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
439
  "model.language_model.layers.22.experts.down_proj": "model-00001-of-00002.safetensors",
 
440
  "model.language_model.layers.22.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
441
  "model.language_model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
442
  "model.language_model.layers.22.layer_scalar": "model-00001-of-00002.safetensors",
443
  "model.language_model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -466,7 +498,9 @@
466
  "model.language_model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
467
  "model.language_model.layers.22.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
468
  "model.language_model.layers.23.experts.down_proj": "model-00002-of-00002.safetensors",
 
469
  "model.language_model.layers.23.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
470
  "model.language_model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
471
  "model.language_model.layers.23.layer_scalar": "model-00001-of-00002.safetensors",
472
  "model.language_model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
@@ -493,7 +527,9 @@
493
  "model.language_model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
494
  "model.language_model.layers.23.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
495
  "model.language_model.layers.24.experts.down_proj": "model-00001-of-00002.safetensors",
 
496
  "model.language_model.layers.24.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
497
  "model.language_model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
498
  "model.language_model.layers.24.layer_scalar": "model-00001-of-00002.safetensors",
499
  "model.language_model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -522,7 +558,9 @@
522
  "model.language_model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
523
  "model.language_model.layers.24.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
524
  "model.language_model.layers.25.experts.down_proj": "model-00001-of-00002.safetensors",
 
525
  "model.language_model.layers.25.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
526
  "model.language_model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
527
  "model.language_model.layers.25.layer_scalar": "model-00001-of-00002.safetensors",
528
  "model.language_model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -551,7 +589,9 @@
551
  "model.language_model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
552
  "model.language_model.layers.25.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
553
  "model.language_model.layers.26.experts.down_proj": "model-00001-of-00002.safetensors",
 
554
  "model.language_model.layers.26.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
555
  "model.language_model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
556
  "model.language_model.layers.26.layer_scalar": "model-00001-of-00002.safetensors",
557
  "model.language_model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -580,7 +620,9 @@
580
  "model.language_model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
581
  "model.language_model.layers.26.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
582
  "model.language_model.layers.27.experts.down_proj": "model-00001-of-00002.safetensors",
 
583
  "model.language_model.layers.27.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
584
  "model.language_model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
585
  "model.language_model.layers.27.layer_scalar": "model-00001-of-00002.safetensors",
586
  "model.language_model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -609,7 +651,9 @@
609
  "model.language_model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
610
  "model.language_model.layers.27.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
611
  "model.language_model.layers.28.experts.down_proj": "model-00001-of-00002.safetensors",
 
612
  "model.language_model.layers.28.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
613
  "model.language_model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
614
  "model.language_model.layers.28.layer_scalar": "model-00001-of-00002.safetensors",
615
  "model.language_model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -638,7 +682,9 @@
638
  "model.language_model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
639
  "model.language_model.layers.28.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
640
  "model.language_model.layers.29.experts.down_proj": "model-00002-of-00002.safetensors",
 
641
  "model.language_model.layers.29.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
642
  "model.language_model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
643
  "model.language_model.layers.29.layer_scalar": "model-00001-of-00002.safetensors",
644
  "model.language_model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
@@ -665,7 +711,9 @@
665
  "model.language_model.layers.29.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
666
  "model.language_model.layers.29.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
667
  "model.language_model.layers.3.experts.down_proj": "model-00001-of-00002.safetensors",
 
668
  "model.language_model.layers.3.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
669
  "model.language_model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
670
  "model.language_model.layers.3.layer_scalar": "model-00001-of-00002.safetensors",
671
  "model.language_model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -694,7 +742,9 @@
694
  "model.language_model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
695
  "model.language_model.layers.3.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
696
  "model.language_model.layers.4.experts.down_proj": "model-00001-of-00002.safetensors",
 
697
  "model.language_model.layers.4.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
698
  "model.language_model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
699
  "model.language_model.layers.4.layer_scalar": "model-00001-of-00002.safetensors",
700
  "model.language_model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -723,7 +773,9 @@
723
  "model.language_model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
724
  "model.language_model.layers.4.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
725
  "model.language_model.layers.5.experts.down_proj": "model-00001-of-00002.safetensors",
 
726
  "model.language_model.layers.5.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
727
  "model.language_model.layers.5.input_layernorm.weight": "model-00002-of-00002.safetensors",
728
  "model.language_model.layers.5.layer_scalar": "model-00001-of-00002.safetensors",
729
  "model.language_model.layers.5.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
@@ -750,7 +802,9 @@
750
  "model.language_model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
751
  "model.language_model.layers.5.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
752
  "model.language_model.layers.6.experts.down_proj": "model-00001-of-00002.safetensors",
 
753
  "model.language_model.layers.6.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
754
  "model.language_model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
755
  "model.language_model.layers.6.layer_scalar": "model-00001-of-00002.safetensors",
756
  "model.language_model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -779,7 +833,9 @@
779
  "model.language_model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
780
  "model.language_model.layers.6.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
781
  "model.language_model.layers.7.experts.down_proj": "model-00001-of-00002.safetensors",
 
782
  "model.language_model.layers.7.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
783
  "model.language_model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
784
  "model.language_model.layers.7.layer_scalar": "model-00001-of-00002.safetensors",
785
  "model.language_model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -808,7 +864,9 @@
808
  "model.language_model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
809
  "model.language_model.layers.7.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
810
  "model.language_model.layers.8.experts.down_proj": "model-00001-of-00002.safetensors",
 
811
  "model.language_model.layers.8.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
812
  "model.language_model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
813
  "model.language_model.layers.8.layer_scalar": "model-00001-of-00002.safetensors",
814
  "model.language_model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
@@ -837,7 +895,9 @@
837
  "model.language_model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
838
  "model.language_model.layers.8.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
839
  "model.language_model.layers.9.experts.down_proj": "model-00001-of-00002.safetensors",
 
840
  "model.language_model.layers.9.experts.gate_up_proj": "model-00001-of-00002.safetensors",
 
841
  "model.language_model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
842
  "model.language_model.layers.9.layer_scalar": "model-00001-of-00002.safetensors",
843
  "model.language_model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
1
  {
2
  "metadata": {
3
+ "total_size": 27161975452
4
  },
5
  "weight_map": {
6
  "model.embed_vision.embedding_projection.weight": "model-00001-of-00002.safetensors",
7
  "model.language_model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
  "model.language_model.layers.0.experts.down_proj": "model-00001-of-00002.safetensors",
9
+ "model.language_model.layers.0.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
10
  "model.language_model.layers.0.experts.gate_up_proj": "model-00001-of-00002.safetensors",
11
+ "model.language_model.layers.0.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
12
  "model.language_model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
13
  "model.language_model.layers.0.layer_scalar": "model-00001-of-00002.safetensors",
14
  "model.language_model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
37
  "model.language_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
38
  "model.language_model.layers.0.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
39
  "model.language_model.layers.1.experts.down_proj": "model-00001-of-00002.safetensors",
40
+ "model.language_model.layers.1.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
41
  "model.language_model.layers.1.experts.gate_up_proj": "model-00001-of-00002.safetensors",
42
+ "model.language_model.layers.1.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
43
  "model.language_model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
44
  "model.language_model.layers.1.layer_scalar": "model-00001-of-00002.safetensors",
45
  "model.language_model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
68
  "model.language_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
69
  "model.language_model.layers.1.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
70
  "model.language_model.layers.10.experts.down_proj": "model-00001-of-00002.safetensors",
71
+ "model.language_model.layers.10.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
72
  "model.language_model.layers.10.experts.gate_up_proj": "model-00001-of-00002.safetensors",
73
+ "model.language_model.layers.10.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
74
  "model.language_model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
75
  "model.language_model.layers.10.layer_scalar": "model-00001-of-00002.safetensors",
76
  "model.language_model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
99
  "model.language_model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
100
  "model.language_model.layers.10.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
101
  "model.language_model.layers.11.experts.down_proj": "model-00001-of-00002.safetensors",
102
+ "model.language_model.layers.11.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
103
  "model.language_model.layers.11.experts.gate_up_proj": "model-00001-of-00002.safetensors",
104
+ "model.language_model.layers.11.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
105
  "model.language_model.layers.11.input_layernorm.weight": "model-00002-of-00002.safetensors",
106
  "model.language_model.layers.11.layer_scalar": "model-00001-of-00002.safetensors",
107
  "model.language_model.layers.11.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
 
128
  "model.language_model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
129
  "model.language_model.layers.11.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
130
  "model.language_model.layers.12.experts.down_proj": "model-00001-of-00002.safetensors",
131
+ "model.language_model.layers.12.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
132
  "model.language_model.layers.12.experts.gate_up_proj": "model-00001-of-00002.safetensors",
133
+ "model.language_model.layers.12.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
134
  "model.language_model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
135
  "model.language_model.layers.12.layer_scalar": "model-00001-of-00002.safetensors",
136
  "model.language_model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
159
  "model.language_model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
160
  "model.language_model.layers.12.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
161
  "model.language_model.layers.13.experts.down_proj": "model-00001-of-00002.safetensors",
162
+ "model.language_model.layers.13.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
163
  "model.language_model.layers.13.experts.gate_up_proj": "model-00001-of-00002.safetensors",
164
+ "model.language_model.layers.13.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
165
  "model.language_model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
166
  "model.language_model.layers.13.layer_scalar": "model-00001-of-00002.safetensors",
167
  "model.language_model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
190
  "model.language_model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
191
  "model.language_model.layers.13.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
192
  "model.language_model.layers.14.experts.down_proj": "model-00001-of-00002.safetensors",
193
+ "model.language_model.layers.14.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
194
  "model.language_model.layers.14.experts.gate_up_proj": "model-00001-of-00002.safetensors",
195
+ "model.language_model.layers.14.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
196
  "model.language_model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
197
  "model.language_model.layers.14.layer_scalar": "model-00001-of-00002.safetensors",
198
  "model.language_model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
221
  "model.language_model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
222
  "model.language_model.layers.14.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
223
  "model.language_model.layers.15.experts.down_proj": "model-00001-of-00002.safetensors",
224
+ "model.language_model.layers.15.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
225
  "model.language_model.layers.15.experts.gate_up_proj": "model-00001-of-00002.safetensors",
226
+ "model.language_model.layers.15.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
227
  "model.language_model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
228
  "model.language_model.layers.15.layer_scalar": "model-00001-of-00002.safetensors",
229
  "model.language_model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
252
  "model.language_model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
253
  "model.language_model.layers.15.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
254
  "model.language_model.layers.16.experts.down_proj": "model-00001-of-00002.safetensors",
255
+ "model.language_model.layers.16.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
256
  "model.language_model.layers.16.experts.gate_up_proj": "model-00001-of-00002.safetensors",
257
+ "model.language_model.layers.16.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
258
  "model.language_model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
259
  "model.language_model.layers.16.layer_scalar": "model-00001-of-00002.safetensors",
260
  "model.language_model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
283
  "model.language_model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
284
  "model.language_model.layers.16.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
285
  "model.language_model.layers.17.experts.down_proj": "model-00002-of-00002.safetensors",
286
+ "model.language_model.layers.17.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
287
  "model.language_model.layers.17.experts.gate_up_proj": "model-00001-of-00002.safetensors",
288
+ "model.language_model.layers.17.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
289
  "model.language_model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
290
  "model.language_model.layers.17.layer_scalar": "model-00001-of-00002.safetensors",
291
  "model.language_model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
 
312
  "model.language_model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
313
  "model.language_model.layers.17.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
314
  "model.language_model.layers.18.experts.down_proj": "model-00001-of-00002.safetensors",
315
+ "model.language_model.layers.18.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
316
  "model.language_model.layers.18.experts.gate_up_proj": "model-00001-of-00002.safetensors",
317
+ "model.language_model.layers.18.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
318
  "model.language_model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
319
  "model.language_model.layers.18.layer_scalar": "model-00001-of-00002.safetensors",
320
  "model.language_model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
343
  "model.language_model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
344
  "model.language_model.layers.18.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
345
  "model.language_model.layers.19.experts.down_proj": "model-00001-of-00002.safetensors",
346
+ "model.language_model.layers.19.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
347
  "model.language_model.layers.19.experts.gate_up_proj": "model-00001-of-00002.safetensors",
348
+ "model.language_model.layers.19.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
349
  "model.language_model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
350
  "model.language_model.layers.19.layer_scalar": "model-00001-of-00002.safetensors",
351
  "model.language_model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
374
  "model.language_model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
375
  "model.language_model.layers.19.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
376
  "model.language_model.layers.2.experts.down_proj": "model-00001-of-00002.safetensors",
377
+ "model.language_model.layers.2.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
378
  "model.language_model.layers.2.experts.gate_up_proj": "model-00001-of-00002.safetensors",
379
+ "model.language_model.layers.2.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
380
  "model.language_model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
381
  "model.language_model.layers.2.layer_scalar": "model-00001-of-00002.safetensors",
382
  "model.language_model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
405
  "model.language_model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
406
  "model.language_model.layers.2.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
407
  "model.language_model.layers.20.experts.down_proj": "model-00001-of-00002.safetensors",
408
+ "model.language_model.layers.20.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
409
  "model.language_model.layers.20.experts.gate_up_proj": "model-00001-of-00002.safetensors",
410
+ "model.language_model.layers.20.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
411
  "model.language_model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
412
  "model.language_model.layers.20.layer_scalar": "model-00001-of-00002.safetensors",
413
  "model.language_model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
436
  "model.language_model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
437
  "model.language_model.layers.20.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
438
  "model.language_model.layers.21.experts.down_proj": "model-00001-of-00002.safetensors",
439
+ "model.language_model.layers.21.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
440
  "model.language_model.layers.21.experts.gate_up_proj": "model-00001-of-00002.safetensors",
441
+ "model.language_model.layers.21.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
442
  "model.language_model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
443
  "model.language_model.layers.21.layer_scalar": "model-00001-of-00002.safetensors",
444
  "model.language_model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
467
  "model.language_model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
468
  "model.language_model.layers.21.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
469
  "model.language_model.layers.22.experts.down_proj": "model-00001-of-00002.safetensors",
470
+ "model.language_model.layers.22.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
471
  "model.language_model.layers.22.experts.gate_up_proj": "model-00001-of-00002.safetensors",
472
+ "model.language_model.layers.22.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
473
  "model.language_model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
474
  "model.language_model.layers.22.layer_scalar": "model-00001-of-00002.safetensors",
475
  "model.language_model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
498
  "model.language_model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
499
  "model.language_model.layers.22.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
500
  "model.language_model.layers.23.experts.down_proj": "model-00002-of-00002.safetensors",
501
+ "model.language_model.layers.23.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
502
  "model.language_model.layers.23.experts.gate_up_proj": "model-00001-of-00002.safetensors",
503
+ "model.language_model.layers.23.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
504
  "model.language_model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
505
  "model.language_model.layers.23.layer_scalar": "model-00001-of-00002.safetensors",
506
  "model.language_model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
 
527
  "model.language_model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
528
  "model.language_model.layers.23.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
529
  "model.language_model.layers.24.experts.down_proj": "model-00001-of-00002.safetensors",
530
+ "model.language_model.layers.24.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
531
  "model.language_model.layers.24.experts.gate_up_proj": "model-00001-of-00002.safetensors",
532
+ "model.language_model.layers.24.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
533
  "model.language_model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
534
  "model.language_model.layers.24.layer_scalar": "model-00001-of-00002.safetensors",
535
  "model.language_model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
558
  "model.language_model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
559
  "model.language_model.layers.24.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
560
  "model.language_model.layers.25.experts.down_proj": "model-00001-of-00002.safetensors",
561
+ "model.language_model.layers.25.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
562
  "model.language_model.layers.25.experts.gate_up_proj": "model-00001-of-00002.safetensors",
563
+ "model.language_model.layers.25.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
564
  "model.language_model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
565
  "model.language_model.layers.25.layer_scalar": "model-00001-of-00002.safetensors",
566
  "model.language_model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
589
  "model.language_model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
590
  "model.language_model.layers.25.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
591
  "model.language_model.layers.26.experts.down_proj": "model-00001-of-00002.safetensors",
592
+ "model.language_model.layers.26.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
593
  "model.language_model.layers.26.experts.gate_up_proj": "model-00001-of-00002.safetensors",
594
+ "model.language_model.layers.26.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
595
  "model.language_model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
596
  "model.language_model.layers.26.layer_scalar": "model-00001-of-00002.safetensors",
597
  "model.language_model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
620
  "model.language_model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
621
  "model.language_model.layers.26.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
622
  "model.language_model.layers.27.experts.down_proj": "model-00001-of-00002.safetensors",
623
+ "model.language_model.layers.27.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
624
  "model.language_model.layers.27.experts.gate_up_proj": "model-00001-of-00002.safetensors",
625
+ "model.language_model.layers.27.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
626
  "model.language_model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
627
  "model.language_model.layers.27.layer_scalar": "model-00001-of-00002.safetensors",
628
  "model.language_model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
651
  "model.language_model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
652
  "model.language_model.layers.27.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
653
  "model.language_model.layers.28.experts.down_proj": "model-00001-of-00002.safetensors",
654
+ "model.language_model.layers.28.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
655
  "model.language_model.layers.28.experts.gate_up_proj": "model-00001-of-00002.safetensors",
656
+ "model.language_model.layers.28.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
657
  "model.language_model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
658
  "model.language_model.layers.28.layer_scalar": "model-00001-of-00002.safetensors",
659
  "model.language_model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
682
  "model.language_model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
683
  "model.language_model.layers.28.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
684
  "model.language_model.layers.29.experts.down_proj": "model-00002-of-00002.safetensors",
685
+ "model.language_model.layers.29.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
686
  "model.language_model.layers.29.experts.gate_up_proj": "model-00001-of-00002.safetensors",
687
+ "model.language_model.layers.29.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
688
  "model.language_model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
689
  "model.language_model.layers.29.layer_scalar": "model-00001-of-00002.safetensors",
690
  "model.language_model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
 
711
  "model.language_model.layers.29.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
712
  "model.language_model.layers.29.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
713
  "model.language_model.layers.3.experts.down_proj": "model-00001-of-00002.safetensors",
714
+ "model.language_model.layers.3.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
715
  "model.language_model.layers.3.experts.gate_up_proj": "model-00001-of-00002.safetensors",
716
+ "model.language_model.layers.3.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
717
  "model.language_model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
718
  "model.language_model.layers.3.layer_scalar": "model-00001-of-00002.safetensors",
719
  "model.language_model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
742
  "model.language_model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
743
  "model.language_model.layers.3.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
744
  "model.language_model.layers.4.experts.down_proj": "model-00001-of-00002.safetensors",
745
+ "model.language_model.layers.4.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
746
  "model.language_model.layers.4.experts.gate_up_proj": "model-00001-of-00002.safetensors",
747
+ "model.language_model.layers.4.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
748
  "model.language_model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
749
  "model.language_model.layers.4.layer_scalar": "model-00001-of-00002.safetensors",
750
  "model.language_model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
773
  "model.language_model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
774
  "model.language_model.layers.4.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
775
  "model.language_model.layers.5.experts.down_proj": "model-00001-of-00002.safetensors",
776
+ "model.language_model.layers.5.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
777
  "model.language_model.layers.5.experts.gate_up_proj": "model-00001-of-00002.safetensors",
778
+ "model.language_model.layers.5.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
779
  "model.language_model.layers.5.input_layernorm.weight": "model-00002-of-00002.safetensors",
780
  "model.language_model.layers.5.layer_scalar": "model-00001-of-00002.safetensors",
781
  "model.language_model.layers.5.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
 
802
  "model.language_model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
803
  "model.language_model.layers.5.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
804
  "model.language_model.layers.6.experts.down_proj": "model-00001-of-00002.safetensors",
805
+ "model.language_model.layers.6.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
806
  "model.language_model.layers.6.experts.gate_up_proj": "model-00001-of-00002.safetensors",
807
+ "model.language_model.layers.6.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
808
  "model.language_model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
809
  "model.language_model.layers.6.layer_scalar": "model-00001-of-00002.safetensors",
810
  "model.language_model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
833
  "model.language_model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
834
  "model.language_model.layers.6.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
835
  "model.language_model.layers.7.experts.down_proj": "model-00001-of-00002.safetensors",
836
+ "model.language_model.layers.7.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
837
  "model.language_model.layers.7.experts.gate_up_proj": "model-00001-of-00002.safetensors",
838
+ "model.language_model.layers.7.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
839
  "model.language_model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
840
  "model.language_model.layers.7.layer_scalar": "model-00001-of-00002.safetensors",
841
  "model.language_model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
864
  "model.language_model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
865
  "model.language_model.layers.7.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
866
  "model.language_model.layers.8.experts.down_proj": "model-00001-of-00002.safetensors",
867
+ "model.language_model.layers.8.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
868
  "model.language_model.layers.8.experts.gate_up_proj": "model-00001-of-00002.safetensors",
869
+ "model.language_model.layers.8.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
870
  "model.language_model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
871
  "model.language_model.layers.8.layer_scalar": "model-00001-of-00002.safetensors",
872
  "model.language_model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
 
895
  "model.language_model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
896
  "model.language_model.layers.8.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
897
  "model.language_model.layers.9.experts.down_proj": "model-00001-of-00002.safetensors",
898
+ "model.language_model.layers.9.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
899
  "model.language_model.layers.9.experts.gate_up_proj": "model-00001-of-00002.safetensors",
900
+ "model.language_model.layers.9.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
901
  "model.language_model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
902
  "model.language_model.layers.9.layer_scalar": "model-00001-of-00002.safetensors",
903
  "model.language_model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",