Add files using upload-large-folder tool
Browse files- README.md +87 -213
- model-00001-of-00002.safetensors +3 -0
- model-00002-of-00002.safetensors +3 -0
- model.safetensors.index.json +61 -1
README.md
CHANGED
|
@@ -12,7 +12,6 @@ tags:
|
|
| 12 |
- gemma-4
|
| 13 |
- vllm
|
| 14 |
- fp8
|
| 15 |
-
- fp8-dynamic
|
| 16 |
- compressed-tensors
|
| 17 |
- quantization
|
| 18 |
- h200
|
|
@@ -20,7 +19,6 @@ tags:
|
|
| 20 |
- mixture-of-experts
|
| 21 |
- moe
|
| 22 |
- inference
|
| 23 |
-
- production-ready
|
| 24 |
- largitdata
|
| 25 |
quantized_by: largitdata-inc
|
| 26 |
base_model:
|
|
@@ -28,107 +26,108 @@ base_model:
|
|
| 28 |
model_type: gemma4
|
| 29 |
---
|
| 30 |
|
| 31 |
-
# Gemma 4 26B-A4B IT FP8
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
|
|
|
| 38 |
|
| 39 |
-
Published by [
|
| 40 |
|
| 41 |
-
>
|
| 42 |
|
| 43 |
-
|
| 44 |
|
| 45 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
##
|
| 48 |
|
| 49 |
-
|
| 50 |
-
- **Derived format:** offline FP8 checkpoint for vLLM
|
| 51 |
-
- **Quantization tool:** [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
|
| 52 |
-
- **Quantization method:** `FP8_DYNAMIC`
|
| 53 |
-
- **Calibration data:** None required (dynamic quantization)
|
| 54 |
-
- **Excluded weights:**
|
| 55 |
-
- `norm`-class 1D tensors — excluded to avoid `expected 2D linear weight` validation errors during quantization
|
| 56 |
-
- `re:.*router\.proj$` — MoE router weights excluded to maintain compatibility with the Gemma4 vLLM loading path
|
| 57 |
-
- **Output directory name:** `gemma-4-26B-A4B-it-FP8-DYNAMIC-NOROUTER`
|
| 58 |
-
- **Primary serving target:** `vllm/vllm-openai:gemma4`
|
| 59 |
-
- **Organization:** [Largitdata Inc.](https://www.largitdata.com/)
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
-
|
| 64 |
-
- **Runtime:** `vllm/vllm-openai:gemma4`
|
| 65 |
-
- **KV cache dtype:** `fp8`
|
| 66 |
-
- **`max_model_len`:** `32768`
|
| 67 |
-
- **`gpu_memory_utilization`:** `0.55`
|
| 68 |
|
| 69 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
|
| 72 |
-
- model loading total: `16.88 s`
|
| 73 |
-
- `torch.compile`: `56.98 s`
|
| 74 |
-
- engine init: `102.17 s`
|
| 75 |
-
- total time to `/v1/models` ready: about `153 s`
|
| 76 |
|
| 77 |
-
|
|
|
|
|
|
|
|
|
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
- `max_num_batched_tokens = 8192`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
- available KV cache memory: `46.37 GiB`
|
| 81 |
- GPU KV cache size: `405,184 tokens`
|
| 82 |
- maximum concurrency at `32,768` tokens/request: `38.87x`
|
| 83 |
|
| 84 |
-
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
| Model loading memory | **25.75 GiB** | 48.5 GiB |
|
| 89 |
-
| GPU KV cache size | **405,184 tokens** | 225,376 tokens |
|
| 90 |
-
| Max concurrency @ 32K tokens/req | **38.87x** | 21.62x |
|
| 91 |
-
| VRAM savings | **47% less** | — |
|
| 92 |
-
| KV cache gain | **80% more** | — |
|
| 93 |
|
| 94 |
-
##
|
| 95 |
|
| 96 |
-
|
| 97 |
|
| 98 |
-
|
| 99 |
-
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
|
| 103 |
-
|
|
| 104 |
-
|
|
| 105 |
-
|
|
| 106 |
-
| Avg total throughput | 180.53 tok/s | **191.36 tok/s** |
|
| 107 |
-
|
| 108 |
-
These numbers are single-request warm-path measurements, not multi-client throughput tests. In production multi-client scenarios, the FP8 variant's larger KV cache is expected to provide superior aggregate throughput.
|
| 109 |
-
|
| 110 |
-
**BF16 is ~6% faster on single-request latency, but the FP8 variant uses 47% less VRAM and provides 80% more KV cache capacity.** For production environments serving multiple concurrent users, the FP8 variant offers a better trade-off.
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
|
| 116 |
## Usage
|
| 117 |
|
| 118 |
-
Example
|
| 119 |
|
| 120 |
```bash
|
| 121 |
docker run -d \
|
| 122 |
-
--name vllm-gemma4-26b-fp8
|
| 123 |
--restart unless-stopped \
|
| 124 |
--ipc=host \
|
| 125 |
--shm-size 16G \
|
| 126 |
--gpus all \
|
| 127 |
-
-v /models \
|
| 128 |
-p 8001:8000 \
|
| 129 |
-e NVIDIA_VISIBLE_DEVICES=0 \
|
| 130 |
-
vllm
|
| 131 |
-
--model /models/gemma-4-26B-A4B-it-FP8
|
| 132 |
--trust-remote-code \
|
| 133 |
--kv-cache-dtype fp8 \
|
| 134 |
--gpu-memory-utilization 0.55 \
|
|
@@ -141,21 +140,10 @@ docker run -d \
|
|
| 141 |
|
| 142 |
## Known Limitations
|
| 143 |
|
| 144 |
-
-
|
| 145 |
-
-
|
| 146 |
-
-
|
| 147 |
-
-
|
| 148 |
-
|
| 149 |
-
## Intended Use
|
| 150 |
-
|
| 151 |
-
This artifact is intended for:
|
| 152 |
-
|
| 153 |
-
- Operational vLLM deployment on H200-class hardware
|
| 154 |
-
- Reproducible offline FP8 serving experiments
|
| 155 |
-
- Environments where startup-time on-the-fly quantization is undesirable
|
| 156 |
-
- Production inference with higher concurrency requirements
|
| 157 |
-
|
| 158 |
-
This artifact is not intended to replace the original base model documentation, safety guidance, or license terms.
|
| 159 |
|
| 160 |
## License
|
| 161 |
|
|
@@ -163,15 +151,13 @@ This repository contains a derived checkpoint based on [`google/gemma-4-26B-A4B-
|
|
| 163 |
|
| 164 |
## Citation
|
| 165 |
|
| 166 |
-
If you use this artifact, please cite both the derived checkpoint and the upstream base model.
|
| 167 |
-
|
| 168 |
```bibtex
|
| 169 |
-
@misc{
|
| 170 |
-
title = {Gemma 4 26B-A4B IT FP8
|
| 171 |
author = {David Chiu},
|
| 172 |
year = {2026},
|
| 173 |
-
howpublished = {\url{https://huggingface.co/
|
| 174 |
-
note = {Derived offline FP8 checkpoint from google/gemma-4-26B-A4B-it for vLLM serving
|
| 175 |
}
|
| 176 |
|
| 177 |
@misc{google_gemma4_26b_a4b_it,
|
|
@@ -182,139 +168,27 @@ If you use this artifact, please cite both the derived checkpoint and the upstre
|
|
| 182 |
}
|
| 183 |
```
|
| 184 |
|
| 185 |
-
## Disclaimer
|
| 186 |
-
|
| 187 |
-
Users are responsible for verifying license compatibility, downstream serving behavior, numerical quality, and safety characteristics for their own environment.
|
| 188 |
-
|
| 189 |
---
|
| 190 |
|
| 191 |
## 中文說明
|
| 192 |
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
我們在網路上找了一輪,沒有找到堪用的 Gemma 4 26B 離線 FP8 版本,索性自己 vibe coding 做了一版,貢獻給社群。
|
| 196 |
-
|
| 197 |
-
這個 Repo 提供從 [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) 衍生出的離線 `FP8` checkpoint,讓 `vLLM` 可以直接載入服務,不需要在啟動時執行 on-the-fly 量化。
|
| 198 |
-
|
| 199 |
-
由 [**Largitdata Inc.**](https://www.largitdata.com/) 發佈。
|
| 200 |
-
|
| 201 |
-
> **注意:** 這是衍生的操作用 checkpoint,並非 Google 官方發佈。原始模型的授權條款、安全指引與文件仍以官方為準。
|
| 202 |
-
|
| 203 |
-
### 模型細節
|
| 204 |
-
|
| 205 |
-
- **基底模型:** [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it)
|
| 206 |
-
- **格式:** 離線 FP8 checkpoint,供 vLLM 使用
|
| 207 |
-
- **量化工具:** [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
|
| 208 |
-
- **量化方式:** `FP8_DYNAMIC`
|
| 209 |
-
- **校準資料:** 不需要(動態量化)
|
| 210 |
-
- **排除的權重:**
|
| 211 |
-
- `norm` 類一維 tensor — 避免量化驗證時產生 `expected 2D linear weight` 類錯誤
|
| 212 |
-
- `re:.*router\.proj$` — MoE router 權重,維持與 Gemma4 vLLM 載入路徑的相容性
|
| 213 |
-
- **主要部署目標:** `vllm/vllm-openai:gemma4`
|
| 214 |
-
|
| 215 |
-
### 測試環境
|
| 216 |
-
|
| 217 |
-
- **GPU:** `NVIDIA H200 NVL`(`143 GB VRAM`)
|
| 218 |
-
- **Runtime:** `vllm/vllm-openai:gemma4`
|
| 219 |
-
- **KV cache dtype:** `fp8`
|
| 220 |
-
- **`max_model_len`:** `32768`
|
| 221 |
-
- **`gpu_memory_utilization`:** `0.55`
|
| 222 |
-
|
| 223 |
-
啟動實測數據:
|
| 224 |
-
|
| 225 |
-
- 模型權重載入:`15.76 s`
|
| 226 |
-
- 模型載入總計:`16.88 s`
|
| 227 |
-
- `torch.compile`:`56.98 s`
|
| 228 |
-
- 引擎初始化:`102.17 s`
|
| 229 |
-
- `/v1/models` 就緒總時間:約 `153 s`
|
| 230 |
-
|
| 231 |
-
執行期容量:
|
| 232 |
-
|
| 233 |
-
- `max_num_batched_tokens = 8192`
|
| 234 |
-
- 可用 KV cache 記憶體:`46.37 GiB`
|
| 235 |
-
- GPU KV cache 大小:`405,184 tokens`
|
| 236 |
-
- 最大平行處理量(`32,768` tokens/request):`38.87x`
|
| 237 |
-
|
| 238 |
-
### 服務容量比較
|
| 239 |
-
|
| 240 |
-
| 指標 | FP8 Dynamic Norouter | BF16 原版 |
|
| 241 |
-
|---|---|---|
|
| 242 |
-
| 模型載入記憶體 | **25.75 GiB** | 48.5 GiB |
|
| 243 |
-
| GPU KV cache 大小 | **405,184 tokens** | 225,376 tokens |
|
| 244 |
-
| 最大平行處理量 @ 32K tokens/req | **38.87x** | 21.62x |
|
| 245 |
-
| VRAM 節省 | **47%** | — |
|
| 246 |
-
| KV cache 增加 | **80%** | — |
|
| 247 |
-
|
| 248 |
-
### 基礎效能測試
|
| 249 |
-
|
| 250 |
-
單請求暖機測試(OpenAI 相容 vLLM endpoint):
|
| 251 |
-
|
| 252 |
-
- prompt tokens:`38`
|
| 253 |
-
- completion tokens:`256`
|
| 254 |
-
- temperature:`0`
|
| 255 |
-
|
| 256 |
-
| 指標 | FP8 Dynamic Norouter | BF16 原版 |
|
| 257 |
-
|---|---|---|
|
| 258 |
-
| 平均端到端延遲 | 1.629 s | **1.536 s** |
|
| 259 |
-
| 平均 completion 吞吐量 | 157.19 tok/s | **166.62 tok/s** |
|
| 260 |
-
| 平均總吞吐量 | 180.53 tok/s | **191.36 tok/s** |
|
| 261 |
-
|
| 262 |
-
以上為單請求暖機路徑測量值,非多用戶吞吐量測試。在生產環境多用戶場景下,FP8 版本更大的 KV cache 預期能提供更好的整體吞吐量。
|
| 263 |
-
|
| 264 |
-
**結論**:`BF16` 單請求略快(約 6%),但 FP8 版本 VRAM 用量減少 47%,可用 KV cache 增加 80%。需要同時服務多用戶的生產環境,FP8 版本更具優勢。
|
| 265 |
-
|
| 266 |
-
### 精度評估
|
| 267 |
-
|
| 268 |
-
尚未對此 FP8 checkpoint 進行正式精度 benchmark(MMLU、MT-Bench 等)。根據社群先前在類似架構上使用 FP8 動態量化的經驗,精度下降通常可忽略(MMLU < 0.5%)。歡迎社群貢獻 benchmark 結果,請開 discussion 或提交 PR。
|
| 269 |
-
|
| 270 |
-
### 使用方式
|
| 271 |
-
|
| 272 |
-
vLLM 啟動範例:
|
| 273 |
-
|
| 274 |
-
```bash
|
| 275 |
-
docker run -d \
|
| 276 |
-
--name vllm-gemma4-26b-fp8-norouter \
|
| 277 |
-
--restart unless-stopped \
|
| 278 |
-
--ipc=host \
|
| 279 |
-
--shm-size 16G \
|
| 280 |
-
--gpus all \
|
| 281 |
-
-v /models \
|
| 282 |
-
-p 8001:8000 \
|
| 283 |
-
-e NVIDIA_VISIBLE_DEVICES=0 \
|
| 284 |
-
vllm/vllm-openai:gemma4 \
|
| 285 |
-
--model /models/gemma-4-26B-A4B-it-FP8-DYNAMIC-NOROUTER \
|
| 286 |
-
--trust-remote-code \
|
| 287 |
-
--kv-cache-dtype fp8 \
|
| 288 |
-
--gpu-memory-utilization 0.55 \
|
| 289 |
-
--max-model-len 32768 \
|
| 290 |
-
--enable-auto-tool-choice \
|
| 291 |
-
--tool-call-parser gemma4 \
|
| 292 |
-
--host 0.0.0.0 \
|
| 293 |
-
--port 8000
|
| 294 |
-
```
|
| 295 |
-
|
| 296 |
-
### 已知限制
|
| 297 |
-
|
| 298 |
-
- 單請求延遲比 BF16 高約 6%,主因為 FP8 dequantization 的額外開銷
|
| 299 |
-
- 尚未進行 MMLU / MT-Bench 等精度 benchmark(歡迎社群補充)
|
| 300 |
-
- 僅在 H200 NVL 上實測,其他 GPU(如 A100、H100)可能需要調整 `gpu-memory-utilization`
|
| 301 |
-
- MoE router 權重(`router.proj`)與 `norm` 類一維 tensor 被排除在量化範圍外以維持 vLLM 相容性,目前未觀察到分流品質下降,但尚無系統性評估
|
| 302 |
-
|
| 303 |
-
### 使用場景
|
| 304 |
-
|
| 305 |
-
此 checkpoint 適用於:
|
| 306 |
-
|
| 307 |
-
- 在 H200 等級硬體上以 vLLM 進行生產部署
|
| 308 |
-
- 可重現的離線 FP8 服務實驗
|
| 309 |
-
- 不希望在啟動時執行 on-the-fly 量化的環境
|
| 310 |
-
- 需要更高平行處理能力的生產推論場景
|
| 311 |
|
| 312 |
-
|
|
|
|
| 313 |
|
| 314 |
-
###
|
| 315 |
|
| 316 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 317 |
|
| 318 |
-
###
|
| 319 |
|
| 320 |
-
使用
|
|
|
|
| 12 |
- gemma-4
|
| 13 |
- vllm
|
| 14 |
- fp8
|
|
|
|
| 15 |
- compressed-tensors
|
| 16 |
- quantization
|
| 17 |
- h200
|
|
|
|
| 19 |
- mixture-of-experts
|
| 20 |
- moe
|
| 21 |
- inference
|
|
|
|
| 22 |
- largitdata
|
| 23 |
quantized_by: largitdata-inc
|
| 24 |
base_model:
|
|
|
|
| 26 |
model_type: gemma4
|
| 27 |
---
|
| 28 |
|
| 29 |
+
# Gemma 4 26B-A4B IT FP8
|
| 30 |
|
| 31 |
+
Packed-expert offline FP8 checkpoint for [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it), built for vLLM serving on H200-class GPUs.
|
| 32 |
|
| 33 |
+
This artifact is the final production checkpoint we derived after patching both:
|
| 34 |
|
| 35 |
+
- `llmcompressor model_free_ptq`, so packed MoE experts are actually quantized to `FP8`
|
| 36 |
+
- `vLLM Gemma4` loader, so expert `weight_scale` tensors can be loaded correctly
|
| 37 |
|
| 38 |
+
Published by [Largitdata Inc.](https://www.largitdata.com/).
|
| 39 |
|
| 40 |
+
> This is a derived operational checkpoint, not an official Google release. The upstream model card, license terms, and safety guidance remain authoritative.
|
| 41 |
|
| 42 |
+
## Model Details
|
| 43 |
|
| 44 |
+
- Base model: [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it)
|
| 45 |
+
- Format: offline `FP8` checkpoint for vLLM
|
| 46 |
+
- Quantization tool: [`llmcompressor`](https://github.com/vllm-project/llm-compressor)
|
| 47 |
+
- Quantization method: `FP8_DYNAMIC`
|
| 48 |
+
- Packed expert quantization: enabled
|
| 49 |
+
- Excluded weights:
|
| 50 |
+
- `norm`-class 1D tensors
|
| 51 |
+
- `router.proj`
|
| 52 |
+
- Final checkpoint size: about `26 GB`
|
| 53 |
+
- Weight shards:
|
| 54 |
+
- `model-00001-of-00002.safetensors`: about `25 GB`
|
| 55 |
+
- `model-00002-of-00002.safetensors`: about `817 MB`
|
| 56 |
|
| 57 |
+
## Important Compatibility Note
|
| 58 |
|
| 59 |
+
This checkpoint requires a patched Gemma4 loader in vLLM to load packed-expert `weight_scale` tensors correctly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
+
The production image we used is:
|
| 62 |
|
| 63 |
+
- `vllm-gemma4:packed-expert-loader-v1`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
+
If you use an unpatched upstream vLLM image, loading may fail with errors similar to:
|
| 66 |
+
|
| 67 |
+
```text
|
| 68 |
+
KeyError: 'layers.0.moe.experts.0.down_proj.weight_scale'
|
| 69 |
+
```
|
| 70 |
|
| 71 |
+
## Tested Environment
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
- GPU: `NVIDIA H200 NVL` (`143 GB VRAM`)
|
| 74 |
+
- Runtime image: `vllm-gemma4:packed-expert-loader-v1`
|
| 75 |
+
- KV cache dtype: `fp8`
|
| 76 |
+
- Final production serving path: `/models/gemma-4-26B-A4B-it-FP8`
|
| 77 |
|
| 78 |
+
### Production Configuration
|
| 79 |
+
|
| 80 |
+
- `gpu_memory_utilization = 0.55`
|
| 81 |
+
- `max_model_len = 32768`
|
| 82 |
- `max_num_batched_tokens = 8192`
|
| 83 |
+
|
| 84 |
+
Observed startup and capacity:
|
| 85 |
+
|
| 86 |
+
- model loading memory: `25.75 GiB`
|
| 87 |
- available KV cache memory: `46.37 GiB`
|
| 88 |
- GPU KV cache size: `405,184 tokens`
|
| 89 |
- maximum concurrency at `32,768` tokens/request: `38.87x`
|
| 90 |
|
| 91 |
+
Observed warm single-request benchmark:
|
| 92 |
|
| 93 |
+
- `~1k` prompt: `156.50 tok/s`
|
| 94 |
+
- `~8k` prompt: `136.57 tok/s`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
### Apples-to-Apples Comparison at `gpu_memory_utilization = 0.75`
|
| 97 |
|
| 98 |
+
Same H200, same `max_model_len = 32768`, same benchmark method, same single-request setting:
|
| 99 |
|
| 100 |
+
| Metric | FP8 | BF16 |
|
| 101 |
+
| --- | ---: | ---: |
|
| 102 |
+
| Model loading memory | **25.75 GiB** | 48.5 GiB |
|
| 103 |
+
| Available KV cache memory | **74.33 GiB** | 51.59 GiB |
|
| 104 |
+
| GPU KV cache size | **649,504 tokens** | 225,376 tokens |
|
| 105 |
+
| Max concurrency @ `32k` | **62.31x** | 21.62x |
|
| 106 |
+
| `~1k` prompt decode throughput | 156.28 tok/s | **161.07 tok/s** |
|
| 107 |
+
| `~8k` prompt decode throughput | 136.32 tok/s | **138.01 tok/s** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
Takeaway:
|
| 110 |
|
| 111 |
+
- `BF16` is still slightly faster on single-request decode speed
|
| 112 |
+
- `FP8` cuts model memory sharply and converts most of that headroom into KV cache
|
| 113 |
+
- for concurrency-oriented serving, the FP8 checkpoint is the better trade-off
|
| 114 |
|
| 115 |
## Usage
|
| 116 |
|
| 117 |
+
Example launch command:
|
| 118 |
|
| 119 |
```bash
|
| 120 |
docker run -d \
|
| 121 |
+
--name vllm-gemma4-26b-fp8 \
|
| 122 |
--restart unless-stopped \
|
| 123 |
--ipc=host \
|
| 124 |
--shm-size 16G \
|
| 125 |
--gpus all \
|
| 126 |
+
-v /models:/models \
|
| 127 |
-p 8001:8000 \
|
| 128 |
-e NVIDIA_VISIBLE_DEVICES=0 \
|
| 129 |
+
vllm-gemma4:packed-expert-loader-v1 \
|
| 130 |
+
--model /models/gemma-4-26B-A4B-it-FP8 \
|
| 131 |
--trust-remote-code \
|
| 132 |
--kv-cache-dtype fp8 \
|
| 133 |
--gpu-memory-utilization 0.55 \
|
|
|
|
| 140 |
|
| 141 |
## Known Limitations
|
| 142 |
|
| 143 |
+
- Requires patched vLLM loader support for packed-expert `weight_scale`
|
| 144 |
+
- `BF16` remains slightly faster for single-request decode throughput
|
| 145 |
+
- No formal benchmark suite such as MMLU or MT-Bench has been run yet
|
| 146 |
+
- Tested on `NVIDIA H200 NVL`; other GPUs may need different settings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
## License
|
| 149 |
|
|
|
|
| 151 |
|
| 152 |
## Citation
|
| 153 |
|
|
|
|
|
|
|
| 154 |
```bibtex
|
| 155 |
+
@misc{largitdata_gemma4_26b_a4b_it_fp8_2026,
|
| 156 |
+
title = {Gemma 4 26B-A4B IT FP8},
|
| 157 |
author = {David Chiu},
|
| 158 |
year = {2026},
|
| 159 |
+
howpublished = {\url{https://huggingface.co/LargitData/gemma-4-26b-a4b-it-fp8}},
|
| 160 |
+
note = {Derived offline FP8 packed-expert checkpoint from google/gemma-4-26B-A4B-it for patched vLLM serving}
|
| 161 |
}
|
| 162 |
|
| 163 |
@misc{google_gemma4_26b_a4b_it,
|
|
|
|
| 168 |
}
|
| 169 |
```
|
| 170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
---
|
| 172 |
|
| 173 |
## 中文說明
|
| 174 |
|
| 175 |
+
這個 Repo 提供從 [`google/gemma-4-26B-A4B-it`](https://huggingface.co/google/gemma-4-26B-A4B-it) 衍生出的最終版離線 `FP8` checkpoint。這一版不是早期的 `Dynamic Norouter` 中間產物,而是:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
+
- `llmcompressor` 已補成可量化 packed MoE experts
|
| 178 |
+
- `vLLM Gemma4` loader 已補成可讀取 expert `weight_scale`
|
| 179 |
|
| 180 |
+
### 重點
|
| 181 |
|
| 182 |
+
- 最終 checkpoint 大小:約 `26 GB`
|
| 183 |
+
- 模型載入顯存:約 `25.75 GiB`
|
| 184 |
+
- `gpu_memory_utilization=0.55` 時:
|
| 185 |
+
- KV cache:`46.37 GiB`
|
| 186 |
+
- GPU KV cache:`405,184 tokens`
|
| 187 |
+
- `32k` concurrency:`38.87x`
|
| 188 |
+
- `gpu_memory_utilization=0.75` 時,和 BF16 同條件相比:
|
| 189 |
+
- `FP8` 單請求速度略慢一點
|
| 190 |
+
- 但 KV cache 明顯更大,`32k` concurrency 由 `21.62x` 提升到 `62.31x`
|
| 191 |
|
| 192 |
+
### 使用限制
|
| 193 |
|
| 194 |
+
這份 checkpoint 需要 patched `vLLM` loader。若直接使用未修補的 upstream `vLLM`,可能會在載入時遇到 expert `weight_scale` 相關錯誤。
|
model-00001-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aeb3c810035ce85853a1be7dd348f2f73d1417959d623a882e51522de7b1fdf1
|
| 3 |
+
size 26305626460
|
model-00002-of-00002.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e5efcdd24651d5061d49178ba88b86dac518729d25e84f1879d15570852806c6
|
| 3 |
+
size 856521512
|
model.safetensors.index.json
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
{
|
| 2 |
"metadata": {
|
| 3 |
-
"total_size":
|
| 4 |
},
|
| 5 |
"weight_map": {
|
| 6 |
"model.embed_vision.embedding_projection.weight": "model-00001-of-00002.safetensors",
|
| 7 |
"model.language_model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
| 8 |
"model.language_model.layers.0.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 9 |
"model.language_model.layers.0.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 10 |
"model.language_model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 11 |
"model.language_model.layers.0.layer_scalar": "model-00001-of-00002.safetensors",
|
| 12 |
"model.language_model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -35,7 +37,9 @@
|
|
| 35 |
"model.language_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 36 |
"model.language_model.layers.0.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 37 |
"model.language_model.layers.1.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 38 |
"model.language_model.layers.1.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 39 |
"model.language_model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 40 |
"model.language_model.layers.1.layer_scalar": "model-00001-of-00002.safetensors",
|
| 41 |
"model.language_model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -64,7 +68,9 @@
|
|
| 64 |
"model.language_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 65 |
"model.language_model.layers.1.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 66 |
"model.language_model.layers.10.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 67 |
"model.language_model.layers.10.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 68 |
"model.language_model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 69 |
"model.language_model.layers.10.layer_scalar": "model-00001-of-00002.safetensors",
|
| 70 |
"model.language_model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -93,7 +99,9 @@
|
|
| 93 |
"model.language_model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 94 |
"model.language_model.layers.10.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 95 |
"model.language_model.layers.11.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 96 |
"model.language_model.layers.11.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 97 |
"model.language_model.layers.11.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 98 |
"model.language_model.layers.11.layer_scalar": "model-00001-of-00002.safetensors",
|
| 99 |
"model.language_model.layers.11.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
@@ -120,7 +128,9 @@
|
|
| 120 |
"model.language_model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 121 |
"model.language_model.layers.11.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 122 |
"model.language_model.layers.12.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 123 |
"model.language_model.layers.12.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 124 |
"model.language_model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 125 |
"model.language_model.layers.12.layer_scalar": "model-00001-of-00002.safetensors",
|
| 126 |
"model.language_model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -149,7 +159,9 @@
|
|
| 149 |
"model.language_model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 150 |
"model.language_model.layers.12.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 151 |
"model.language_model.layers.13.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 152 |
"model.language_model.layers.13.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 153 |
"model.language_model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 154 |
"model.language_model.layers.13.layer_scalar": "model-00001-of-00002.safetensors",
|
| 155 |
"model.language_model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -178,7 +190,9 @@
|
|
| 178 |
"model.language_model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 179 |
"model.language_model.layers.13.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 180 |
"model.language_model.layers.14.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 181 |
"model.language_model.layers.14.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 182 |
"model.language_model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 183 |
"model.language_model.layers.14.layer_scalar": "model-00001-of-00002.safetensors",
|
| 184 |
"model.language_model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -207,7 +221,9 @@
|
|
| 207 |
"model.language_model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 208 |
"model.language_model.layers.14.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 209 |
"model.language_model.layers.15.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 210 |
"model.language_model.layers.15.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 211 |
"model.language_model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 212 |
"model.language_model.layers.15.layer_scalar": "model-00001-of-00002.safetensors",
|
| 213 |
"model.language_model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -236,7 +252,9 @@
|
|
| 236 |
"model.language_model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 237 |
"model.language_model.layers.15.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 238 |
"model.language_model.layers.16.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 239 |
"model.language_model.layers.16.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 240 |
"model.language_model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 241 |
"model.language_model.layers.16.layer_scalar": "model-00001-of-00002.safetensors",
|
| 242 |
"model.language_model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -265,7 +283,9 @@
|
|
| 265 |
"model.language_model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 266 |
"model.language_model.layers.16.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 267 |
"model.language_model.layers.17.experts.down_proj": "model-00002-of-00002.safetensors",
|
|
|
|
| 268 |
"model.language_model.layers.17.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 269 |
"model.language_model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 270 |
"model.language_model.layers.17.layer_scalar": "model-00001-of-00002.safetensors",
|
| 271 |
"model.language_model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
@@ -292,7 +312,9 @@
|
|
| 292 |
"model.language_model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 293 |
"model.language_model.layers.17.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 294 |
"model.language_model.layers.18.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 295 |
"model.language_model.layers.18.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 296 |
"model.language_model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 297 |
"model.language_model.layers.18.layer_scalar": "model-00001-of-00002.safetensors",
|
| 298 |
"model.language_model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -321,7 +343,9 @@
|
|
| 321 |
"model.language_model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 322 |
"model.language_model.layers.18.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 323 |
"model.language_model.layers.19.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 324 |
"model.language_model.layers.19.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 325 |
"model.language_model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 326 |
"model.language_model.layers.19.layer_scalar": "model-00001-of-00002.safetensors",
|
| 327 |
"model.language_model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -350,7 +374,9 @@
|
|
| 350 |
"model.language_model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 351 |
"model.language_model.layers.19.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 352 |
"model.language_model.layers.2.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 353 |
"model.language_model.layers.2.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 354 |
"model.language_model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 355 |
"model.language_model.layers.2.layer_scalar": "model-00001-of-00002.safetensors",
|
| 356 |
"model.language_model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -379,7 +405,9 @@
|
|
| 379 |
"model.language_model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 380 |
"model.language_model.layers.2.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 381 |
"model.language_model.layers.20.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 382 |
"model.language_model.layers.20.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 383 |
"model.language_model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 384 |
"model.language_model.layers.20.layer_scalar": "model-00001-of-00002.safetensors",
|
| 385 |
"model.language_model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -408,7 +436,9 @@
|
|
| 408 |
"model.language_model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 409 |
"model.language_model.layers.20.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 410 |
"model.language_model.layers.21.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 411 |
"model.language_model.layers.21.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 412 |
"model.language_model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 413 |
"model.language_model.layers.21.layer_scalar": "model-00001-of-00002.safetensors",
|
| 414 |
"model.language_model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -437,7 +467,9 @@
|
|
| 437 |
"model.language_model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 438 |
"model.language_model.layers.21.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 439 |
"model.language_model.layers.22.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 440 |
"model.language_model.layers.22.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 441 |
"model.language_model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 442 |
"model.language_model.layers.22.layer_scalar": "model-00001-of-00002.safetensors",
|
| 443 |
"model.language_model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -466,7 +498,9 @@
|
|
| 466 |
"model.language_model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 467 |
"model.language_model.layers.22.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 468 |
"model.language_model.layers.23.experts.down_proj": "model-00002-of-00002.safetensors",
|
|
|
|
| 469 |
"model.language_model.layers.23.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 470 |
"model.language_model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 471 |
"model.language_model.layers.23.layer_scalar": "model-00001-of-00002.safetensors",
|
| 472 |
"model.language_model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
@@ -493,7 +527,9 @@
|
|
| 493 |
"model.language_model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 494 |
"model.language_model.layers.23.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 495 |
"model.language_model.layers.24.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 496 |
"model.language_model.layers.24.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 497 |
"model.language_model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 498 |
"model.language_model.layers.24.layer_scalar": "model-00001-of-00002.safetensors",
|
| 499 |
"model.language_model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -522,7 +558,9 @@
|
|
| 522 |
"model.language_model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 523 |
"model.language_model.layers.24.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 524 |
"model.language_model.layers.25.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 525 |
"model.language_model.layers.25.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 526 |
"model.language_model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 527 |
"model.language_model.layers.25.layer_scalar": "model-00001-of-00002.safetensors",
|
| 528 |
"model.language_model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -551,7 +589,9 @@
|
|
| 551 |
"model.language_model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 552 |
"model.language_model.layers.25.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 553 |
"model.language_model.layers.26.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 554 |
"model.language_model.layers.26.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 555 |
"model.language_model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 556 |
"model.language_model.layers.26.layer_scalar": "model-00001-of-00002.safetensors",
|
| 557 |
"model.language_model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -580,7 +620,9 @@
|
|
| 580 |
"model.language_model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 581 |
"model.language_model.layers.26.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 582 |
"model.language_model.layers.27.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 583 |
"model.language_model.layers.27.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 584 |
"model.language_model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 585 |
"model.language_model.layers.27.layer_scalar": "model-00001-of-00002.safetensors",
|
| 586 |
"model.language_model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -609,7 +651,9 @@
|
|
| 609 |
"model.language_model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 610 |
"model.language_model.layers.27.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 611 |
"model.language_model.layers.28.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 612 |
"model.language_model.layers.28.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 613 |
"model.language_model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 614 |
"model.language_model.layers.28.layer_scalar": "model-00001-of-00002.safetensors",
|
| 615 |
"model.language_model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -638,7 +682,9 @@
|
|
| 638 |
"model.language_model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 639 |
"model.language_model.layers.28.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 640 |
"model.language_model.layers.29.experts.down_proj": "model-00002-of-00002.safetensors",
|
|
|
|
| 641 |
"model.language_model.layers.29.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 642 |
"model.language_model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 643 |
"model.language_model.layers.29.layer_scalar": "model-00001-of-00002.safetensors",
|
| 644 |
"model.language_model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
@@ -665,7 +711,9 @@
|
|
| 665 |
"model.language_model.layers.29.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 666 |
"model.language_model.layers.29.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 667 |
"model.language_model.layers.3.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 668 |
"model.language_model.layers.3.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 669 |
"model.language_model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 670 |
"model.language_model.layers.3.layer_scalar": "model-00001-of-00002.safetensors",
|
| 671 |
"model.language_model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -694,7 +742,9 @@
|
|
| 694 |
"model.language_model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 695 |
"model.language_model.layers.3.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 696 |
"model.language_model.layers.4.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 697 |
"model.language_model.layers.4.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 698 |
"model.language_model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 699 |
"model.language_model.layers.4.layer_scalar": "model-00001-of-00002.safetensors",
|
| 700 |
"model.language_model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -723,7 +773,9 @@
|
|
| 723 |
"model.language_model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 724 |
"model.language_model.layers.4.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 725 |
"model.language_model.layers.5.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 726 |
"model.language_model.layers.5.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 727 |
"model.language_model.layers.5.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 728 |
"model.language_model.layers.5.layer_scalar": "model-00001-of-00002.safetensors",
|
| 729 |
"model.language_model.layers.5.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
@@ -750,7 +802,9 @@
|
|
| 750 |
"model.language_model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 751 |
"model.language_model.layers.5.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 752 |
"model.language_model.layers.6.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 753 |
"model.language_model.layers.6.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 754 |
"model.language_model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 755 |
"model.language_model.layers.6.layer_scalar": "model-00001-of-00002.safetensors",
|
| 756 |
"model.language_model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -779,7 +833,9 @@
|
|
| 779 |
"model.language_model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 780 |
"model.language_model.layers.6.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 781 |
"model.language_model.layers.7.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 782 |
"model.language_model.layers.7.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 783 |
"model.language_model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 784 |
"model.language_model.layers.7.layer_scalar": "model-00001-of-00002.safetensors",
|
| 785 |
"model.language_model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -808,7 +864,9 @@
|
|
| 808 |
"model.language_model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 809 |
"model.language_model.layers.7.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 810 |
"model.language_model.layers.8.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 811 |
"model.language_model.layers.8.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 812 |
"model.language_model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 813 |
"model.language_model.layers.8.layer_scalar": "model-00001-of-00002.safetensors",
|
| 814 |
"model.language_model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
@@ -837,7 +895,9 @@
|
|
| 837 |
"model.language_model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 838 |
"model.language_model.layers.8.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 839 |
"model.language_model.layers.9.experts.down_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 840 |
"model.language_model.layers.9.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
|
|
|
| 841 |
"model.language_model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 842 |
"model.language_model.layers.9.layer_scalar": "model-00001-of-00002.safetensors",
|
| 843 |
"model.language_model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 1 |
{
|
| 2 |
"metadata": {
|
| 3 |
+
"total_size": 27161975452
|
| 4 |
},
|
| 5 |
"weight_map": {
|
| 6 |
"model.embed_vision.embedding_projection.weight": "model-00001-of-00002.safetensors",
|
| 7 |
"model.language_model.embed_tokens.weight": "model-00001-of-00002.safetensors",
|
| 8 |
"model.language_model.layers.0.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 9 |
+
"model.language_model.layers.0.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 10 |
"model.language_model.layers.0.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 11 |
+
"model.language_model.layers.0.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 12 |
"model.language_model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 13 |
"model.language_model.layers.0.layer_scalar": "model-00001-of-00002.safetensors",
|
| 14 |
"model.language_model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 37 |
"model.language_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 38 |
"model.language_model.layers.0.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 39 |
"model.language_model.layers.1.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 40 |
+
"model.language_model.layers.1.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 41 |
"model.language_model.layers.1.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 42 |
+
"model.language_model.layers.1.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 43 |
"model.language_model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 44 |
"model.language_model.layers.1.layer_scalar": "model-00001-of-00002.safetensors",
|
| 45 |
"model.language_model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 68 |
"model.language_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 69 |
"model.language_model.layers.1.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 70 |
"model.language_model.layers.10.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 71 |
+
"model.language_model.layers.10.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 72 |
"model.language_model.layers.10.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 73 |
+
"model.language_model.layers.10.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 74 |
"model.language_model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 75 |
"model.language_model.layers.10.layer_scalar": "model-00001-of-00002.safetensors",
|
| 76 |
"model.language_model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 99 |
"model.language_model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 100 |
"model.language_model.layers.10.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 101 |
"model.language_model.layers.11.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 102 |
+
"model.language_model.layers.11.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 103 |
"model.language_model.layers.11.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 104 |
+
"model.language_model.layers.11.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 105 |
"model.language_model.layers.11.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 106 |
"model.language_model.layers.11.layer_scalar": "model-00001-of-00002.safetensors",
|
| 107 |
"model.language_model.layers.11.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
|
|
| 128 |
"model.language_model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 129 |
"model.language_model.layers.11.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 130 |
"model.language_model.layers.12.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 131 |
+
"model.language_model.layers.12.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 132 |
"model.language_model.layers.12.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 133 |
+
"model.language_model.layers.12.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 134 |
"model.language_model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 135 |
"model.language_model.layers.12.layer_scalar": "model-00001-of-00002.safetensors",
|
| 136 |
"model.language_model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 159 |
"model.language_model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 160 |
"model.language_model.layers.12.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 161 |
"model.language_model.layers.13.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 162 |
+
"model.language_model.layers.13.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 163 |
"model.language_model.layers.13.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 164 |
+
"model.language_model.layers.13.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 165 |
"model.language_model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 166 |
"model.language_model.layers.13.layer_scalar": "model-00001-of-00002.safetensors",
|
| 167 |
"model.language_model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 190 |
"model.language_model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 191 |
"model.language_model.layers.13.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 192 |
"model.language_model.layers.14.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 193 |
+
"model.language_model.layers.14.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 194 |
"model.language_model.layers.14.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 195 |
+
"model.language_model.layers.14.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 196 |
"model.language_model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 197 |
"model.language_model.layers.14.layer_scalar": "model-00001-of-00002.safetensors",
|
| 198 |
"model.language_model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 221 |
"model.language_model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 222 |
"model.language_model.layers.14.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 223 |
"model.language_model.layers.15.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 224 |
+
"model.language_model.layers.15.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 225 |
"model.language_model.layers.15.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 226 |
+
"model.language_model.layers.15.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 227 |
"model.language_model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 228 |
"model.language_model.layers.15.layer_scalar": "model-00001-of-00002.safetensors",
|
| 229 |
"model.language_model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 252 |
"model.language_model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 253 |
"model.language_model.layers.15.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 254 |
"model.language_model.layers.16.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 255 |
+
"model.language_model.layers.16.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 256 |
"model.language_model.layers.16.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 257 |
+
"model.language_model.layers.16.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 258 |
"model.language_model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 259 |
"model.language_model.layers.16.layer_scalar": "model-00001-of-00002.safetensors",
|
| 260 |
"model.language_model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 283 |
"model.language_model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 284 |
"model.language_model.layers.16.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 285 |
"model.language_model.layers.17.experts.down_proj": "model-00002-of-00002.safetensors",
|
| 286 |
+
"model.language_model.layers.17.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
|
| 287 |
"model.language_model.layers.17.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 288 |
+
"model.language_model.layers.17.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 289 |
"model.language_model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 290 |
"model.language_model.layers.17.layer_scalar": "model-00001-of-00002.safetensors",
|
| 291 |
"model.language_model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
|
|
| 312 |
"model.language_model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 313 |
"model.language_model.layers.17.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 314 |
"model.language_model.layers.18.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 315 |
+
"model.language_model.layers.18.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 316 |
"model.language_model.layers.18.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 317 |
+
"model.language_model.layers.18.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 318 |
"model.language_model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 319 |
"model.language_model.layers.18.layer_scalar": "model-00001-of-00002.safetensors",
|
| 320 |
"model.language_model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 343 |
"model.language_model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 344 |
"model.language_model.layers.18.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 345 |
"model.language_model.layers.19.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 346 |
+
"model.language_model.layers.19.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 347 |
"model.language_model.layers.19.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 348 |
+
"model.language_model.layers.19.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 349 |
"model.language_model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 350 |
"model.language_model.layers.19.layer_scalar": "model-00001-of-00002.safetensors",
|
| 351 |
"model.language_model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 374 |
"model.language_model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 375 |
"model.language_model.layers.19.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 376 |
"model.language_model.layers.2.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 377 |
+
"model.language_model.layers.2.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 378 |
"model.language_model.layers.2.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 379 |
+
"model.language_model.layers.2.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 380 |
"model.language_model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 381 |
"model.language_model.layers.2.layer_scalar": "model-00001-of-00002.safetensors",
|
| 382 |
"model.language_model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 405 |
"model.language_model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 406 |
"model.language_model.layers.2.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 407 |
"model.language_model.layers.20.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 408 |
+
"model.language_model.layers.20.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 409 |
"model.language_model.layers.20.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 410 |
+
"model.language_model.layers.20.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 411 |
"model.language_model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 412 |
"model.language_model.layers.20.layer_scalar": "model-00001-of-00002.safetensors",
|
| 413 |
"model.language_model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 436 |
"model.language_model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 437 |
"model.language_model.layers.20.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 438 |
"model.language_model.layers.21.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 439 |
+
"model.language_model.layers.21.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 440 |
"model.language_model.layers.21.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 441 |
+
"model.language_model.layers.21.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 442 |
"model.language_model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 443 |
"model.language_model.layers.21.layer_scalar": "model-00001-of-00002.safetensors",
|
| 444 |
"model.language_model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 467 |
"model.language_model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 468 |
"model.language_model.layers.21.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 469 |
"model.language_model.layers.22.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 470 |
+
"model.language_model.layers.22.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 471 |
"model.language_model.layers.22.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 472 |
+
"model.language_model.layers.22.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 473 |
"model.language_model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 474 |
"model.language_model.layers.22.layer_scalar": "model-00001-of-00002.safetensors",
|
| 475 |
"model.language_model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 498 |
"model.language_model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 499 |
"model.language_model.layers.22.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 500 |
"model.language_model.layers.23.experts.down_proj": "model-00002-of-00002.safetensors",
|
| 501 |
+
"model.language_model.layers.23.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
|
| 502 |
"model.language_model.layers.23.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 503 |
+
"model.language_model.layers.23.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 504 |
"model.language_model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 505 |
"model.language_model.layers.23.layer_scalar": "model-00001-of-00002.safetensors",
|
| 506 |
"model.language_model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
|
|
| 527 |
"model.language_model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 528 |
"model.language_model.layers.23.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 529 |
"model.language_model.layers.24.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 530 |
+
"model.language_model.layers.24.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 531 |
"model.language_model.layers.24.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 532 |
+
"model.language_model.layers.24.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 533 |
"model.language_model.layers.24.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 534 |
"model.language_model.layers.24.layer_scalar": "model-00001-of-00002.safetensors",
|
| 535 |
"model.language_model.layers.24.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 558 |
"model.language_model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 559 |
"model.language_model.layers.24.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 560 |
"model.language_model.layers.25.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 561 |
+
"model.language_model.layers.25.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 562 |
"model.language_model.layers.25.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 563 |
+
"model.language_model.layers.25.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 564 |
"model.language_model.layers.25.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 565 |
"model.language_model.layers.25.layer_scalar": "model-00001-of-00002.safetensors",
|
| 566 |
"model.language_model.layers.25.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 589 |
"model.language_model.layers.25.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 590 |
"model.language_model.layers.25.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 591 |
"model.language_model.layers.26.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 592 |
+
"model.language_model.layers.26.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 593 |
"model.language_model.layers.26.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 594 |
+
"model.language_model.layers.26.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 595 |
"model.language_model.layers.26.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 596 |
"model.language_model.layers.26.layer_scalar": "model-00001-of-00002.safetensors",
|
| 597 |
"model.language_model.layers.26.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 620 |
"model.language_model.layers.26.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 621 |
"model.language_model.layers.26.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 622 |
"model.language_model.layers.27.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 623 |
+
"model.language_model.layers.27.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 624 |
"model.language_model.layers.27.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 625 |
+
"model.language_model.layers.27.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 626 |
"model.language_model.layers.27.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 627 |
"model.language_model.layers.27.layer_scalar": "model-00001-of-00002.safetensors",
|
| 628 |
"model.language_model.layers.27.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 651 |
"model.language_model.layers.27.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 652 |
"model.language_model.layers.27.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 653 |
"model.language_model.layers.28.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 654 |
+
"model.language_model.layers.28.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 655 |
"model.language_model.layers.28.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 656 |
+
"model.language_model.layers.28.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 657 |
"model.language_model.layers.28.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 658 |
"model.language_model.layers.28.layer_scalar": "model-00001-of-00002.safetensors",
|
| 659 |
"model.language_model.layers.28.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 682 |
"model.language_model.layers.28.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 683 |
"model.language_model.layers.28.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 684 |
"model.language_model.layers.29.experts.down_proj": "model-00002-of-00002.safetensors",
|
| 685 |
+
"model.language_model.layers.29.experts.down_proj.weight_scale": "model-00002-of-00002.safetensors",
|
| 686 |
"model.language_model.layers.29.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 687 |
+
"model.language_model.layers.29.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 688 |
"model.language_model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 689 |
"model.language_model.layers.29.layer_scalar": "model-00001-of-00002.safetensors",
|
| 690 |
"model.language_model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
|
|
| 711 |
"model.language_model.layers.29.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 712 |
"model.language_model.layers.29.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 713 |
"model.language_model.layers.3.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 714 |
+
"model.language_model.layers.3.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 715 |
"model.language_model.layers.3.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 716 |
+
"model.language_model.layers.3.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 717 |
"model.language_model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 718 |
"model.language_model.layers.3.layer_scalar": "model-00001-of-00002.safetensors",
|
| 719 |
"model.language_model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 742 |
"model.language_model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 743 |
"model.language_model.layers.3.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 744 |
"model.language_model.layers.4.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 745 |
+
"model.language_model.layers.4.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 746 |
"model.language_model.layers.4.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 747 |
+
"model.language_model.layers.4.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 748 |
"model.language_model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 749 |
"model.language_model.layers.4.layer_scalar": "model-00001-of-00002.safetensors",
|
| 750 |
"model.language_model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 773 |
"model.language_model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 774 |
"model.language_model.layers.4.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 775 |
"model.language_model.layers.5.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 776 |
+
"model.language_model.layers.5.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 777 |
"model.language_model.layers.5.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 778 |
+
"model.language_model.layers.5.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 779 |
"model.language_model.layers.5.input_layernorm.weight": "model-00002-of-00002.safetensors",
|
| 780 |
"model.language_model.layers.5.layer_scalar": "model-00001-of-00002.safetensors",
|
| 781 |
"model.language_model.layers.5.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
|
|
|
|
| 802 |
"model.language_model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
|
| 803 |
"model.language_model.layers.5.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 804 |
"model.language_model.layers.6.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 805 |
+
"model.language_model.layers.6.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 806 |
"model.language_model.layers.6.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 807 |
+
"model.language_model.layers.6.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 808 |
"model.language_model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 809 |
"model.language_model.layers.6.layer_scalar": "model-00001-of-00002.safetensors",
|
| 810 |
"model.language_model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 833 |
"model.language_model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 834 |
"model.language_model.layers.6.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 835 |
"model.language_model.layers.7.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 836 |
+
"model.language_model.layers.7.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 837 |
"model.language_model.layers.7.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 838 |
+
"model.language_model.layers.7.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 839 |
"model.language_model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 840 |
"model.language_model.layers.7.layer_scalar": "model-00001-of-00002.safetensors",
|
| 841 |
"model.language_model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 864 |
"model.language_model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 865 |
"model.language_model.layers.7.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 866 |
"model.language_model.layers.8.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 867 |
+
"model.language_model.layers.8.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 868 |
"model.language_model.layers.8.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 869 |
+
"model.language_model.layers.8.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 870 |
"model.language_model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 871 |
"model.language_model.layers.8.layer_scalar": "model-00001-of-00002.safetensors",
|
| 872 |
"model.language_model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|
|
|
|
| 895 |
"model.language_model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
|
| 896 |
"model.language_model.layers.8.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 897 |
"model.language_model.layers.9.experts.down_proj": "model-00001-of-00002.safetensors",
|
| 898 |
+
"model.language_model.layers.9.experts.down_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 899 |
"model.language_model.layers.9.experts.gate_up_proj": "model-00001-of-00002.safetensors",
|
| 900 |
+
"model.language_model.layers.9.experts.gate_up_proj.weight_scale": "model-00001-of-00002.safetensors",
|
| 901 |
"model.language_model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
|
| 902 |
"model.language_model.layers.9.layer_scalar": "model-00001-of-00002.safetensors",
|
| 903 |
"model.language_model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
|