Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- .gitattributes +2 -0
- README.md +96 -0
- chat_template.jinja +86 -0
- config.json +276 -0
- generation_config.json +74 -0
- hf_quant_config.json +121 -0
- input_scales.safetensors +3 -0
- model-00003-of-00082.safetensors +3 -0
- model-00005-of-00082.safetensors +3 -0
- model-00010-of-00082.safetensors +3 -0
- model-00011-of-00082.safetensors +3 -0
- model-00012-of-00082.safetensors +3 -0
- model-00014-of-00082.safetensors +3 -0
- model-00015-of-00082.safetensors +3 -0
- model-00016-of-00082.safetensors +3 -0
- model-00017-of-00082.safetensors +3 -0
- model-00020-of-00082.safetensors +3 -0
- model-00022-of-00082.safetensors +3 -0
- model-00023-of-00082.safetensors +3 -0
- model-00024-of-00082.safetensors +3 -0
- model-00026-of-00082.safetensors +3 -0
- model-00027-of-00082.safetensors +3 -0
- model-00028-of-00082.safetensors +3 -0
- model-00030-of-00082.safetensors +3 -0
- model-00031-of-00082.safetensors +3 -0
- model-00033-of-00082.safetensors +3 -0
- model-00035-of-00082.safetensors +3 -0
- model-00042-of-00082.safetensors +3 -0
- model-00043-of-00082.safetensors +3 -0
- model-00045-of-00082.safetensors +3 -0
- model-00046-of-00082.safetensors +3 -0
- model-00048-of-00082.safetensors +3 -0
- model-00049-of-00082.safetensors +3 -0
- model-00050-of-00082.safetensors +3 -0
- model-00058-of-00082.safetensors +3 -0
- model-00059-of-00082.safetensors +3 -0
- model-00062-of-00082.safetensors +3 -0
- model-00065-of-00082.safetensors +3 -0
- model-00066-of-00082.safetensors +3 -0
- model-00067-of-00082.safetensors +3 -0
- model-00068-of-00082.safetensors +3 -0
- model-00070-of-00082.safetensors +3 -0
- model-00074-of-00082.safetensors +3 -0
- model-00077-of-00082.safetensors +3 -0
- model-00079-of-00082.safetensors +3 -0
- model-00080-of-00082.safetensors +3 -0
- model-00081-of-00082.safetensors +3 -0
- model-00082-of-00082.safetensors +3 -0
- model.safetensors.index.json +3 -0
- tokenizer.json +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model:
|
| 3 |
+
- zai-org/GLM-5
|
| 4 |
+
license: mit
|
| 5 |
+
tags:
|
| 6 |
+
- mtp
|
| 7 |
+
- speculative-decoding
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Model Description
|
| 11 |
+
|
| 12 |
+
**GLM-5-NVFP4-MTP** is an NVFP4-quantized version of [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5) with **Multi-Token Prediction (MTP) weights restored**, enabling speculative decoding with vLLM and SGLang.
|
| 13 |
+
|
| 14 |
+
This is based on [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4) — a 744B-parameter Mixture-of-Experts language model with 40B active parameters, 256 experts per MoE layer (8 activated per token), and DeepSeek Sparse Attention (DSA).
|
| 15 |
+
|
| 16 |
+
Quantized directly from the full BF16 checkpoint ([zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)), *not the FP8 release*, to NVFP4 (4-bit with blockwise FP8 scales per 16 elements) using [NVIDIA Model Optimizer](https://github.com/NVIDIA/Model-Optimizer).
|
| 17 |
+
|
| 18 |
+
### MTP Layer Addition
|
| 19 |
+
|
| 20 |
+
The original [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4) declares `"num_nextn_predict_layers": 1` in its config but ships without the MTP layer weights (layer 78). This repo fixes that by extracting the MTP layer directly from the original BF16 model ([zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)).
|
| 21 |
+
|
| 22 |
+
**What was done:**
|
| 23 |
+
- Extracted all 791 tensors for `model.layers.78.*` from the BF16 model (shards 271–274 of 282) and saved them as `mtp.safetensors` (~19 GB, full BF16 precision)
|
| 24 |
+
- Updated `model.safetensors.index.json` to include the 791 layer 78 → `mtp.safetensors` mappings
|
| 25 |
+
- Added `model.layers.78.*` glob patterns to `quantization_config.ignore` in both `config.json` and `hf_quant_config.json` so the NVFP4 dequantizer skips the MTP layer
|
| 26 |
+
- All other weights (layers 0–77, embeddings, lm_head) remain unchanged from the original NVFP4 quantization
|
| 27 |
+
|
| 28 |
+
The MTP layer contains:
|
| 29 |
+
- Special MTP components: `eh_proj`, `enorm`, `hnorm`, `shared_head.norm`
|
| 30 |
+
- A complete transformer block: self-attention + MoE MLP (256 experts)
|
| 31 |
+
|
| 32 |
+
### What's quantized
|
| 33 |
+
|
| 34 |
+
Only the MoE expert MLP layers (gate, up, and down projections) for layers 0–77 are quantized to NVFP4. Attention layers are left in BF16. The MTP layer (layer 78) is entirely in BF16. Since the expert weights constitute the vast majority of model parameters in an MoE architecture, this still yields significant memory savings.
|
| 35 |
+
|
| 36 |
+
Calibration uses natural top-k routing rather than forcing all experts to activate, so each expert's quantization scales reflect the token distributions it actually sees during inference. To compensate, calibration was run on a much larger number of samples than typical to ensure broad expert coverage through natural routing alone.
|
| 37 |
+
|
| 38 |
+
### Calibration dataset
|
| 39 |
+
|
| 40 |
+
Three calibration passes were run:
|
| 41 |
+
|
| 42 |
+
1. **Coding pass** — Agentic coding samples (tool calling, multi-turn code generation, function calling) with English and Chinese system prompts.
|
| 43 |
+
2. **Broad pass** — Large-scale diverse samples drawn from WildChat and LMSYS-Chat covering real user conversations across a wide range of topics and languages.
|
| 44 |
+
3. **Deep pass** — Long-context samples (>8K tokens) from coding and diverse sources to exercise deep-sequence expert activation patterns.
|
| 45 |
+
|
| 46 |
+
Merged via element-wise max across all calibration runs.
|
| 47 |
+
|
| 48 |
+
### How to Run
|
| 49 |
+
|
| 50 |
+
NVFP4 requires Blackwell GPUs (RTX 5090, RTX Pro 6000, B100, B200, etc.). Even quantized, this is a huge model — tested on 8x RTX Pro 6000 Blackwell (96 GB each, 768 GB total).
|
| 51 |
+
|
| 52 |
+
If you experience NCCL hangs with P2P, make sure you have `iommu=pt` (and `amd_iommu=pt` on AMD platforms) in your kernel command line.
|
| 53 |
+
|
| 54 |
+
#### SGLang (with MTP speculative decoding)
|
| 55 |
+
|
| 56 |
+
```bash
|
| 57 |
+
export NCCL_IB_DISABLE=1
|
| 58 |
+
export NCCL_P2P_LEVEL=PHB
|
| 59 |
+
export NCCL_ALLOC_P2P_NET_LL_BUFFERS=1
|
| 60 |
+
export NCCL_MIN_NCHANNELS=8
|
| 61 |
+
export OMP_NUM_THREADS=8
|
| 62 |
+
export SAFETENSORS_FAST_GPU=1
|
| 63 |
+
|
| 64 |
+
python3 -m sglang.launch_server \
|
| 65 |
+
--model festr2/GLM-5-NVFP4-MTP \
|
| 66 |
+
--served-model-name glm-5 \
|
| 67 |
+
--reasoning-parser glm45 \
|
| 68 |
+
--tool-call-parser glm47 \
|
| 69 |
+
--trust-remote-code \
|
| 70 |
+
--tp 8 \
|
| 71 |
+
--mem-fraction-static 0.95 \
|
| 72 |
+
--max-running-requests 8 \
|
| 73 |
+
--kv-cache-dtype fp8_e4m3 \
|
| 74 |
+
--quantization modelopt_fp4 \
|
| 75 |
+
--attention-backend flashinfer \
|
| 76 |
+
--moe-runner-backend flashinfer_cutlass \
|
| 77 |
+
--disable-custom-all-reduce \
|
| 78 |
+
--enable-flashinfer-allreduce-fusion \
|
| 79 |
+
--speculative-algorithm mtp \
|
| 80 |
+
--num-speculative-tokens 1 \
|
| 81 |
+
--host 0.0.0.0 \
|
| 82 |
+
--port 8000
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
#### vLLM (with MTP speculative decoding)
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
vllm serve festr2/GLM-5-NVFP4-MTP \
|
| 89 |
+
--speculative-config '{"method":"mtp","num_speculative_tokens":1}'
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
### Credits
|
| 93 |
+
|
| 94 |
+
- Original model: [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)
|
| 95 |
+
- NVFP4 quantization: [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4)
|
| 96 |
+
- MTP layer restoration: extracted from the original BF16 weights
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[gMASK]<sop>
|
| 2 |
+
{%- if tools -%}
|
| 3 |
+
<|system|>
|
| 4 |
+
# Tools
|
| 5 |
+
|
| 6 |
+
You may call one or more functions to assist with the user query.
|
| 7 |
+
|
| 8 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 9 |
+
<tools>
|
| 10 |
+
{% for tool in tools %}
|
| 11 |
+
{{ tool | tojson(ensure_ascii=False) }}
|
| 12 |
+
{% endfor %}
|
| 13 |
+
</tools>
|
| 14 |
+
|
| 15 |
+
For each function call, output the function name and arguments within the following XML format:
|
| 16 |
+
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
|
| 17 |
+
{%- macro visible_text(content) -%}
|
| 18 |
+
{%- if content is string -%}
|
| 19 |
+
{{- content }}
|
| 20 |
+
{%- elif content is iterable and content is not mapping -%}
|
| 21 |
+
{%- for item in content -%}
|
| 22 |
+
{%- if item is mapping and item.type == 'text' -%}
|
| 23 |
+
{{- item.text }}
|
| 24 |
+
{%- elif item is string -%}
|
| 25 |
+
{{- item }}
|
| 26 |
+
{%- endif -%}
|
| 27 |
+
{%- endfor -%}
|
| 28 |
+
{%- else -%}
|
| 29 |
+
{{- content }}
|
| 30 |
+
{%- endif -%}
|
| 31 |
+
{%- endmacro -%}
|
| 32 |
+
{%- set ns = namespace(last_user_index=-1) %}
|
| 33 |
+
{%- for m in messages %}
|
| 34 |
+
{%- if m.role == 'user' %}
|
| 35 |
+
{% set ns.last_user_index = loop.index0 -%}
|
| 36 |
+
{%- endif %}
|
| 37 |
+
{%- endfor %}
|
| 38 |
+
{% for m in messages %}
|
| 39 |
+
{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
|
| 40 |
+
{%- elif m.role == 'assistant' -%}
|
| 41 |
+
<|assistant|>
|
| 42 |
+
{%- set reasoning_content = '' %}
|
| 43 |
+
{%- set content = visible_text(m.content) %}
|
| 44 |
+
{%- if m.reasoning_content is string %}
|
| 45 |
+
{%- set reasoning_content = m.reasoning_content %}
|
| 46 |
+
{%- else %}
|
| 47 |
+
{%- if '</think>' in content %}
|
| 48 |
+
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
|
| 49 |
+
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
|
| 50 |
+
{%- endif %}
|
| 51 |
+
{%- endif %}
|
| 52 |
+
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
|
| 53 |
+
{{ '<think>' + reasoning_content.strip() + '</think>'}}
|
| 54 |
+
{%- else -%}
|
| 55 |
+
{{ '</think>' }}
|
| 56 |
+
{%- endif -%}
|
| 57 |
+
{%- if content.strip() -%}
|
| 58 |
+
{{ content.strip() }}
|
| 59 |
+
{%- endif -%}
|
| 60 |
+
{% if m.tool_calls %}
|
| 61 |
+
{% for tc in m.tool_calls %}
|
| 62 |
+
{%- if tc.function %}
|
| 63 |
+
{%- set tc = tc.function %}
|
| 64 |
+
{%- endif %}
|
| 65 |
+
{{- '<tool_call>' + tc.name -}}
|
| 66 |
+
{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
|
| 67 |
+
{% endif %}
|
| 68 |
+
{%- elif m.role == 'tool' -%}
|
| 69 |
+
{%- if m.content is string -%}
|
| 70 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 71 |
+
{{- '<|observation|>' }}
|
| 72 |
+
{%- endif %}
|
| 73 |
+
{{- '<tool_response>' }}
|
| 74 |
+
{{- m.content }}
|
| 75 |
+
{{- '</tool_response>' }}
|
| 76 |
+
{%- else -%}
|
| 77 |
+
<|observation|>{% for tr in m.content %}
|
| 78 |
+
<tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
|
| 79 |
+
{% endif -%}
|
| 80 |
+
{%- elif m.role == 'system' -%}
|
| 81 |
+
<|system|>{{ visible_text(m.content) }}
|
| 82 |
+
{%- endif -%}
|
| 83 |
+
{%- endfor -%}
|
| 84 |
+
{%- if add_generation_prompt -%}
|
| 85 |
+
<|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
|
| 86 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,276 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"vocab_size": 154880,
|
| 3 |
+
"max_position_embeddings": 202752,
|
| 4 |
+
"hidden_size": 6144,
|
| 5 |
+
"intermediate_size": 12288,
|
| 6 |
+
"num_hidden_layers": 78,
|
| 7 |
+
"mlp_layer_types": [
|
| 8 |
+
"dense",
|
| 9 |
+
"dense",
|
| 10 |
+
"dense",
|
| 11 |
+
"sparse",
|
| 12 |
+
"sparse",
|
| 13 |
+
"sparse",
|
| 14 |
+
"sparse",
|
| 15 |
+
"sparse",
|
| 16 |
+
"sparse",
|
| 17 |
+
"sparse",
|
| 18 |
+
"sparse",
|
| 19 |
+
"sparse",
|
| 20 |
+
"sparse",
|
| 21 |
+
"sparse",
|
| 22 |
+
"sparse",
|
| 23 |
+
"sparse",
|
| 24 |
+
"sparse",
|
| 25 |
+
"sparse",
|
| 26 |
+
"sparse",
|
| 27 |
+
"sparse",
|
| 28 |
+
"sparse",
|
| 29 |
+
"sparse",
|
| 30 |
+
"sparse",
|
| 31 |
+
"sparse",
|
| 32 |
+
"sparse",
|
| 33 |
+
"sparse",
|
| 34 |
+
"sparse",
|
| 35 |
+
"sparse",
|
| 36 |
+
"sparse",
|
| 37 |
+
"sparse",
|
| 38 |
+
"sparse",
|
| 39 |
+
"sparse",
|
| 40 |
+
"sparse",
|
| 41 |
+
"sparse",
|
| 42 |
+
"sparse",
|
| 43 |
+
"sparse",
|
| 44 |
+
"sparse",
|
| 45 |
+
"sparse",
|
| 46 |
+
"sparse",
|
| 47 |
+
"sparse",
|
| 48 |
+
"sparse",
|
| 49 |
+
"sparse",
|
| 50 |
+
"sparse",
|
| 51 |
+
"sparse",
|
| 52 |
+
"sparse",
|
| 53 |
+
"sparse",
|
| 54 |
+
"sparse",
|
| 55 |
+
"sparse",
|
| 56 |
+
"sparse",
|
| 57 |
+
"sparse",
|
| 58 |
+
"sparse",
|
| 59 |
+
"sparse",
|
| 60 |
+
"sparse",
|
| 61 |
+
"sparse",
|
| 62 |
+
"sparse",
|
| 63 |
+
"sparse",
|
| 64 |
+
"sparse",
|
| 65 |
+
"sparse",
|
| 66 |
+
"sparse",
|
| 67 |
+
"sparse",
|
| 68 |
+
"sparse",
|
| 69 |
+
"sparse",
|
| 70 |
+
"sparse",
|
| 71 |
+
"sparse",
|
| 72 |
+
"sparse",
|
| 73 |
+
"sparse",
|
| 74 |
+
"sparse",
|
| 75 |
+
"sparse",
|
| 76 |
+
"sparse",
|
| 77 |
+
"sparse",
|
| 78 |
+
"sparse",
|
| 79 |
+
"sparse",
|
| 80 |
+
"sparse",
|
| 81 |
+
"sparse",
|
| 82 |
+
"sparse",
|
| 83 |
+
"sparse",
|
| 84 |
+
"sparse",
|
| 85 |
+
"sparse"
|
| 86 |
+
],
|
| 87 |
+
"moe_intermediate_size": 2048,
|
| 88 |
+
"num_attention_heads": 64,
|
| 89 |
+
"n_shared_experts": 1,
|
| 90 |
+
"n_routed_experts": 256,
|
| 91 |
+
"routed_scaling_factor": 2.5,
|
| 92 |
+
"kv_lora_rank": 512,
|
| 93 |
+
"q_lora_rank": 2048,
|
| 94 |
+
"qk_rope_head_dim": 64,
|
| 95 |
+
"v_head_dim": 256,
|
| 96 |
+
"qk_nope_head_dim": 192,
|
| 97 |
+
"qk_head_dim": 256,
|
| 98 |
+
"head_dim": 64,
|
| 99 |
+
"n_group": 1,
|
| 100 |
+
"topk_group": 1,
|
| 101 |
+
"num_experts_per_tok": 8,
|
| 102 |
+
"norm_topk_prob": true,
|
| 103 |
+
"rope_interleave": true,
|
| 104 |
+
"num_key_value_heads": 64,
|
| 105 |
+
"hidden_act": "silu",
|
| 106 |
+
"initializer_range": 0.02,
|
| 107 |
+
"index_topk": 2048,
|
| 108 |
+
"rms_norm_eps": 1e-05,
|
| 109 |
+
"use_cache": true,
|
| 110 |
+
"attention_bias": false,
|
| 111 |
+
"attention_dropout": 0.0,
|
| 112 |
+
"rope_parameters": {
|
| 113 |
+
"rope_theta": 1000000,
|
| 114 |
+
"rope_type": "default"
|
| 115 |
+
},
|
| 116 |
+
"pad_token_id": 154820,
|
| 117 |
+
"bos_token_id": 0,
|
| 118 |
+
"eos_token_id": [
|
| 119 |
+
154820,
|
| 120 |
+
154827,
|
| 121 |
+
154829
|
| 122 |
+
],
|
| 123 |
+
"tie_word_embeddings": false,
|
| 124 |
+
"return_dict": true,
|
| 125 |
+
"output_hidden_states": false,
|
| 126 |
+
"dtype": "bfloat16",
|
| 127 |
+
"chunk_size_feed_forward": 0,
|
| 128 |
+
"is_encoder_decoder": false,
|
| 129 |
+
"architectures": [
|
| 130 |
+
"GlmMoeDsaForCausalLM"
|
| 131 |
+
],
|
| 132 |
+
"id2label": {
|
| 133 |
+
"0": "LABEL_0",
|
| 134 |
+
"1": "LABEL_1"
|
| 135 |
+
},
|
| 136 |
+
"label2id": {
|
| 137 |
+
"LABEL_0": 0,
|
| 138 |
+
"LABEL_1": 1
|
| 139 |
+
},
|
| 140 |
+
"problem_type": null,
|
| 141 |
+
"_name_or_path": "zai-org/GLM-5",
|
| 142 |
+
"transformers_version": "5.2.0.dev0",
|
| 143 |
+
"ep_size": 1,
|
| 144 |
+
"first_k_dense_replace": 3,
|
| 145 |
+
"index_head_dim": 128,
|
| 146 |
+
"index_n_heads": 32,
|
| 147 |
+
"indexer_rope_interleave": true,
|
| 148 |
+
"moe_layer_freq": 1,
|
| 149 |
+
"model_type": "glm_moe_dsa",
|
| 150 |
+
"num_nextn_predict_layers": 1,
|
| 151 |
+
"pretraining_tp": 1,
|
| 152 |
+
"scoring_func": "sigmoid",
|
| 153 |
+
"topk_method": "noaux_tc",
|
| 154 |
+
"output_attentions": false,
|
| 155 |
+
"quantization_config": {
|
| 156 |
+
"config_groups": {
|
| 157 |
+
"group_0": {
|
| 158 |
+
"input_activations": {
|
| 159 |
+
"dynamic": false,
|
| 160 |
+
"num_bits": 4,
|
| 161 |
+
"type": "float",
|
| 162 |
+
"group_size": 16
|
| 163 |
+
},
|
| 164 |
+
"weights": {
|
| 165 |
+
"dynamic": false,
|
| 166 |
+
"num_bits": 4,
|
| 167 |
+
"type": "float",
|
| 168 |
+
"group_size": 16
|
| 169 |
+
},
|
| 170 |
+
"targets": [
|
| 171 |
+
"Linear"
|
| 172 |
+
]
|
| 173 |
+
}
|
| 174 |
+
},
|
| 175 |
+
"ignore": [
|
| 176 |
+
"lm_head",
|
| 177 |
+
"model.layers.0.self_attn*",
|
| 178 |
+
"model.layers.1.self_attn*",
|
| 179 |
+
"model.layers.10.self_attn*",
|
| 180 |
+
"model.layers.11.self_attn*",
|
| 181 |
+
"model.layers.12.self_attn*",
|
| 182 |
+
"model.layers.13.self_attn*",
|
| 183 |
+
"model.layers.14.self_attn*",
|
| 184 |
+
"model.layers.15.self_attn*",
|
| 185 |
+
"model.layers.16.self_attn*",
|
| 186 |
+
"model.layers.17.self_attn*",
|
| 187 |
+
"model.layers.18.self_attn*",
|
| 188 |
+
"model.layers.19.self_attn*",
|
| 189 |
+
"model.layers.2.self_attn*",
|
| 190 |
+
"model.layers.20.self_attn*",
|
| 191 |
+
"model.layers.21.self_attn*",
|
| 192 |
+
"model.layers.22.self_attn*",
|
| 193 |
+
"model.layers.23.self_attn*",
|
| 194 |
+
"model.layers.24.self_attn*",
|
| 195 |
+
"model.layers.25.self_attn*",
|
| 196 |
+
"model.layers.26.self_attn*",
|
| 197 |
+
"model.layers.27.self_attn*",
|
| 198 |
+
"model.layers.28.self_attn*",
|
| 199 |
+
"model.layers.29.self_attn*",
|
| 200 |
+
"model.layers.3.self_attn*",
|
| 201 |
+
"model.layers.30.self_attn*",
|
| 202 |
+
"model.layers.31.self_attn*",
|
| 203 |
+
"model.layers.32.self_attn*",
|
| 204 |
+
"model.layers.33.self_attn*",
|
| 205 |
+
"model.layers.34.self_attn*",
|
| 206 |
+
"model.layers.35.self_attn*",
|
| 207 |
+
"model.layers.36.self_attn*",
|
| 208 |
+
"model.layers.37.self_attn*",
|
| 209 |
+
"model.layers.38.self_attn*",
|
| 210 |
+
"model.layers.39.self_attn*",
|
| 211 |
+
"model.layers.4.self_attn*",
|
| 212 |
+
"model.layers.40.self_attn*",
|
| 213 |
+
"model.layers.41.self_attn*",
|
| 214 |
+
"model.layers.42.self_attn*",
|
| 215 |
+
"model.layers.43.self_attn*",
|
| 216 |
+
"model.layers.44.self_attn*",
|
| 217 |
+
"model.layers.45.self_attn*",
|
| 218 |
+
"model.layers.46.self_attn*",
|
| 219 |
+
"model.layers.47.self_attn*",
|
| 220 |
+
"model.layers.48.self_attn*",
|
| 221 |
+
"model.layers.49.self_attn*",
|
| 222 |
+
"model.layers.5.self_attn*",
|
| 223 |
+
"model.layers.50.self_attn*",
|
| 224 |
+
"model.layers.51.self_attn*",
|
| 225 |
+
"model.layers.52.self_attn*",
|
| 226 |
+
"model.layers.53.self_attn*",
|
| 227 |
+
"model.layers.54.self_attn*",
|
| 228 |
+
"model.layers.55.self_attn*",
|
| 229 |
+
"model.layers.56.self_attn*",
|
| 230 |
+
"model.layers.57.self_attn*",
|
| 231 |
+
"model.layers.58.self_attn*",
|
| 232 |
+
"model.layers.59.self_attn*",
|
| 233 |
+
"model.layers.6.self_attn*",
|
| 234 |
+
"model.layers.60.self_attn*",
|
| 235 |
+
"model.layers.61.self_attn*",
|
| 236 |
+
"model.layers.62.self_attn*",
|
| 237 |
+
"model.layers.63.self_attn*",
|
| 238 |
+
"model.layers.64.self_attn*",
|
| 239 |
+
"model.layers.65.self_attn*",
|
| 240 |
+
"model.layers.66.self_attn*",
|
| 241 |
+
"model.layers.67.self_attn*",
|
| 242 |
+
"model.layers.68.self_attn*",
|
| 243 |
+
"model.layers.69.self_attn*",
|
| 244 |
+
"model.layers.7.self_attn*",
|
| 245 |
+
"model.layers.70.self_attn*",
|
| 246 |
+
"model.layers.71.self_attn*",
|
| 247 |
+
"model.layers.72.self_attn*",
|
| 248 |
+
"model.layers.73.self_attn*",
|
| 249 |
+
"model.layers.74.self_attn*",
|
| 250 |
+
"model.layers.75.self_attn*",
|
| 251 |
+
"model.layers.76.self_attn*",
|
| 252 |
+
"model.layers.77.self_attn*",
|
| 253 |
+
"model.layers.78.eh_proj*",
|
| 254 |
+
"model.layers.78.enorm*",
|
| 255 |
+
"model.layers.78.hnorm*",
|
| 256 |
+
"model.layers.78.input_layernorm*",
|
| 257 |
+
"model.layers.78.mlp*",
|
| 258 |
+
"model.layers.78.post_attention_layernorm*",
|
| 259 |
+
"model.layers.78.self_attn*",
|
| 260 |
+
"model.layers.78.shared_head*",
|
| 261 |
+
"model.layers.8.self_attn*",
|
| 262 |
+
"model.layers.9.self_attn*"
|
| 263 |
+
],
|
| 264 |
+
"quant_algo": "NVFP4",
|
| 265 |
+
"kv_cache_scheme": {
|
| 266 |
+
"dynamic": false,
|
| 267 |
+
"num_bits": 8,
|
| 268 |
+
"type": "float"
|
| 269 |
+
},
|
| 270 |
+
"producer": {
|
| 271 |
+
"name": "modelopt",
|
| 272 |
+
"version": "0.39.0.dev290+gf9d9a71de.d20260214"
|
| 273 |
+
},
|
| 274 |
+
"quant_method": "modelopt"
|
| 275 |
+
}
|
| 276 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"max_length": null,
|
| 3 |
+
"max_new_tokens": null,
|
| 4 |
+
"min_length": null,
|
| 5 |
+
"min_new_tokens": null,
|
| 6 |
+
"early_stopping": null,
|
| 7 |
+
"max_time": null,
|
| 8 |
+
"stop_strings": null,
|
| 9 |
+
"do_sample": null,
|
| 10 |
+
"num_beams": null,
|
| 11 |
+
"use_cache": true,
|
| 12 |
+
"cache_implementation": null,
|
| 13 |
+
"cache_config": null,
|
| 14 |
+
"temperature": null,
|
| 15 |
+
"top_k": null,
|
| 16 |
+
"top_p": null,
|
| 17 |
+
"min_p": null,
|
| 18 |
+
"top_h": null,
|
| 19 |
+
"typical_p": null,
|
| 20 |
+
"epsilon_cutoff": null,
|
| 21 |
+
"eta_cutoff": null,
|
| 22 |
+
"repetition_penalty": null,
|
| 23 |
+
"encoder_repetition_penalty": null,
|
| 24 |
+
"length_penalty": null,
|
| 25 |
+
"no_repeat_ngram_size": null,
|
| 26 |
+
"bad_words_ids": null,
|
| 27 |
+
"renormalize_logits": null,
|
| 28 |
+
"forced_bos_token_id": null,
|
| 29 |
+
"forced_eos_token_id": null,
|
| 30 |
+
"remove_invalid_values": null,
|
| 31 |
+
"exponential_decay_length_penalty": null,
|
| 32 |
+
"suppress_tokens": null,
|
| 33 |
+
"begin_suppress_tokens": null,
|
| 34 |
+
"sequence_bias": null,
|
| 35 |
+
"token_healing": null,
|
| 36 |
+
"guidance_scale": null,
|
| 37 |
+
"watermarking_config": null,
|
| 38 |
+
"num_return_sequences": null,
|
| 39 |
+
"output_attentions": false,
|
| 40 |
+
"output_hidden_states": false,
|
| 41 |
+
"output_scores": null,
|
| 42 |
+
"output_logits": null,
|
| 43 |
+
"return_dict_in_generate": null,
|
| 44 |
+
"pad_token_id": 154820,
|
| 45 |
+
"bos_token_id": 0,
|
| 46 |
+
"eos_token_id": [
|
| 47 |
+
154820,
|
| 48 |
+
154827,
|
| 49 |
+
154829
|
| 50 |
+
],
|
| 51 |
+
"encoder_no_repeat_ngram_size": null,
|
| 52 |
+
"decoder_start_token_id": null,
|
| 53 |
+
"is_assistant": null,
|
| 54 |
+
"num_assistant_tokens": null,
|
| 55 |
+
"num_assistant_tokens_schedule": null,
|
| 56 |
+
"assistant_confidence_threshold": null,
|
| 57 |
+
"prompt_lookup_num_tokens": null,
|
| 58 |
+
"max_matching_ngram_size": null,
|
| 59 |
+
"assistant_early_exit": null,
|
| 60 |
+
"assistant_lookbehind": null,
|
| 61 |
+
"target_lookbehind": null,
|
| 62 |
+
"compile_config": null,
|
| 63 |
+
"disable_compile": null,
|
| 64 |
+
"low_memory": null,
|
| 65 |
+
"penalty_alpha": null,
|
| 66 |
+
"dola_layers": null,
|
| 67 |
+
"diversity_penalty": null,
|
| 68 |
+
"num_beam_groups": null,
|
| 69 |
+
"constraints": null,
|
| 70 |
+
"force_words_ids": null,
|
| 71 |
+
"prefill_chunk_size": null,
|
| 72 |
+
"_from_model_config": true,
|
| 73 |
+
"transformers_version": "5.2.0.dev0"
|
| 74 |
+
}
|
hf_quant_config.json
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"config_groups": {
|
| 3 |
+
"group_0": {
|
| 4 |
+
"input_activations": {
|
| 5 |
+
"dynamic": false,
|
| 6 |
+
"num_bits": 4,
|
| 7 |
+
"type": "float",
|
| 8 |
+
"group_size": 16
|
| 9 |
+
},
|
| 10 |
+
"weights": {
|
| 11 |
+
"dynamic": false,
|
| 12 |
+
"num_bits": 4,
|
| 13 |
+
"type": "float",
|
| 14 |
+
"group_size": 16
|
| 15 |
+
},
|
| 16 |
+
"targets": [
|
| 17 |
+
"Linear"
|
| 18 |
+
]
|
| 19 |
+
}
|
| 20 |
+
},
|
| 21 |
+
"ignore": [
|
| 22 |
+
"lm_head",
|
| 23 |
+
"model.layers.0.self_attn*",
|
| 24 |
+
"model.layers.1.self_attn*",
|
| 25 |
+
"model.layers.10.self_attn*",
|
| 26 |
+
"model.layers.11.self_attn*",
|
| 27 |
+
"model.layers.12.self_attn*",
|
| 28 |
+
"model.layers.13.self_attn*",
|
| 29 |
+
"model.layers.14.self_attn*",
|
| 30 |
+
"model.layers.15.self_attn*",
|
| 31 |
+
"model.layers.16.self_attn*",
|
| 32 |
+
"model.layers.17.self_attn*",
|
| 33 |
+
"model.layers.18.self_attn*",
|
| 34 |
+
"model.layers.19.self_attn*",
|
| 35 |
+
"model.layers.2.self_attn*",
|
| 36 |
+
"model.layers.20.self_attn*",
|
| 37 |
+
"model.layers.21.self_attn*",
|
| 38 |
+
"model.layers.22.self_attn*",
|
| 39 |
+
"model.layers.23.self_attn*",
|
| 40 |
+
"model.layers.24.self_attn*",
|
| 41 |
+
"model.layers.25.self_attn*",
|
| 42 |
+
"model.layers.26.self_attn*",
|
| 43 |
+
"model.layers.27.self_attn*",
|
| 44 |
+
"model.layers.28.self_attn*",
|
| 45 |
+
"model.layers.29.self_attn*",
|
| 46 |
+
"model.layers.3.self_attn*",
|
| 47 |
+
"model.layers.30.self_attn*",
|
| 48 |
+
"model.layers.31.self_attn*",
|
| 49 |
+
"model.layers.32.self_attn*",
|
| 50 |
+
"model.layers.33.self_attn*",
|
| 51 |
+
"model.layers.34.self_attn*",
|
| 52 |
+
"model.layers.35.self_attn*",
|
| 53 |
+
"model.layers.36.self_attn*",
|
| 54 |
+
"model.layers.37.self_attn*",
|
| 55 |
+
"model.layers.38.self_attn*",
|
| 56 |
+
"model.layers.39.self_attn*",
|
| 57 |
+
"model.layers.4.self_attn*",
|
| 58 |
+
"model.layers.40.self_attn*",
|
| 59 |
+
"model.layers.41.self_attn*",
|
| 60 |
+
"model.layers.42.self_attn*",
|
| 61 |
+
"model.layers.43.self_attn*",
|
| 62 |
+
"model.layers.44.self_attn*",
|
| 63 |
+
"model.layers.45.self_attn*",
|
| 64 |
+
"model.layers.46.self_attn*",
|
| 65 |
+
"model.layers.47.self_attn*",
|
| 66 |
+
"model.layers.48.self_attn*",
|
| 67 |
+
"model.layers.49.self_attn*",
|
| 68 |
+
"model.layers.5.self_attn*",
|
| 69 |
+
"model.layers.50.self_attn*",
|
| 70 |
+
"model.layers.51.self_attn*",
|
| 71 |
+
"model.layers.52.self_attn*",
|
| 72 |
+
"model.layers.53.self_attn*",
|
| 73 |
+
"model.layers.54.self_attn*",
|
| 74 |
+
"model.layers.55.self_attn*",
|
| 75 |
+
"model.layers.56.self_attn*",
|
| 76 |
+
"model.layers.57.self_attn*",
|
| 77 |
+
"model.layers.58.self_attn*",
|
| 78 |
+
"model.layers.59.self_attn*",
|
| 79 |
+
"model.layers.6.self_attn*",
|
| 80 |
+
"model.layers.60.self_attn*",
|
| 81 |
+
"model.layers.61.self_attn*",
|
| 82 |
+
"model.layers.62.self_attn*",
|
| 83 |
+
"model.layers.63.self_attn*",
|
| 84 |
+
"model.layers.64.self_attn*",
|
| 85 |
+
"model.layers.65.self_attn*",
|
| 86 |
+
"model.layers.66.self_attn*",
|
| 87 |
+
"model.layers.67.self_attn*",
|
| 88 |
+
"model.layers.68.self_attn*",
|
| 89 |
+
"model.layers.69.self_attn*",
|
| 90 |
+
"model.layers.7.self_attn*",
|
| 91 |
+
"model.layers.70.self_attn*",
|
| 92 |
+
"model.layers.71.self_attn*",
|
| 93 |
+
"model.layers.72.self_attn*",
|
| 94 |
+
"model.layers.73.self_attn*",
|
| 95 |
+
"model.layers.74.self_attn*",
|
| 96 |
+
"model.layers.75.self_attn*",
|
| 97 |
+
"model.layers.76.self_attn*",
|
| 98 |
+
"model.layers.77.self_attn*",
|
| 99 |
+
"model.layers.78.eh_proj*",
|
| 100 |
+
"model.layers.78.enorm*",
|
| 101 |
+
"model.layers.78.hnorm*",
|
| 102 |
+
"model.layers.78.input_layernorm*",
|
| 103 |
+
"model.layers.78.mlp*",
|
| 104 |
+
"model.layers.78.post_attention_layernorm*",
|
| 105 |
+
"model.layers.78.self_attn*",
|
| 106 |
+
"model.layers.78.shared_head*",
|
| 107 |
+
"model.layers.8.self_attn*",
|
| 108 |
+
"model.layers.9.self_attn*"
|
| 109 |
+
],
|
| 110 |
+
"quant_algo": "NVFP4",
|
| 111 |
+
"kv_cache_scheme": {
|
| 112 |
+
"dynamic": false,
|
| 113 |
+
"num_bits": 8,
|
| 114 |
+
"type": "float"
|
| 115 |
+
},
|
| 116 |
+
"producer": {
|
| 117 |
+
"name": "modelopt",
|
| 118 |
+
"version": "0.39.0.dev290+gf9d9a71de.d20260214"
|
| 119 |
+
},
|
| 120 |
+
"quant_method": "modelopt"
|
| 121 |
+
}
|
input_scales.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3dc72ddf5e0a470c46bad790020c5bd48b0c75c3f12afaa688a9ed23bef30656
|
| 3 |
+
size 6700728
|
model-00003-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:31a4f8568520c2243295633e5652272d86356102b0f842457a7c852a81168bd5
|
| 3 |
+
size 5370445012
|
model-00005-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1debaedd04b77f1a42bd61c2f565b3087c84299969bbf1b6fd585cdb7907f265
|
| 3 |
+
size 5370445220
|
model-00010-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:07e0a19471da0515e042f386aa0d5cf23d768209ae04dd3d75f074b09fcf006d
|
| 3 |
+
size 5369253496
|
model-00011-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fa750340797b11cbbc56dfc6f41c8a1277459c559c3a36847f78896422c722c6
|
| 3 |
+
size 5373591496
|
model-00012-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e44e99405d81bd4e6410668c6e40426d64bfd87ba5b0fbbf9384c95ebc6e984a
|
| 3 |
+
size 5370447164
|
model-00014-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9552d2c38b93b44624663b6250077f0fe3460eeb44c2270fa01e6991ff1b0432
|
| 3 |
+
size 5370447100
|
model-00015-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7569d4539b3e236654dc28ebeba18761066236f6d960e58257374c4c0207cf79
|
| 3 |
+
size 5370447156
|
model-00016-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9c4aad9b9b1f1a786ae5a5bca7730b63c05f9afa21d2b562b88f2b5f5c50fb58
|
| 3 |
+
size 5370447156
|
model-00017-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3370803228952f375f3bd9e958ffc3605269cac200a836678bb01cb7014bdea5
|
| 3 |
+
size 5370447212
|
model-00020-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bf2bad486df10c4f91d21a0e7f4724c019cd7bdd948b2ed28c3b19927b9ac496
|
| 3 |
+
size 5370446780
|
model-00022-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f92d9c2d9994a97c023aa92e24eff9866778a28abf832182b14d077bdade509d
|
| 3 |
+
size 5370446972
|
model-00023-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:70abd450ad1bdd6f8ed13dbb34916a89d20992f06f1ab5c80e156a2d59a70223
|
| 3 |
+
size 5369540220
|
model-00024-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:da3b2a3034106ad4824468ec20d382e74f68fa36eed974b1faa23aeeacb3941b
|
| 3 |
+
size 5373305076
|
model-00026-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1620bc371c7107d320d48e1e00b085fbb9b954f36e4be387f78961298fd305a2
|
| 3 |
+
size 5370447108
|
model-00027-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b92b91e2db32518d6556e77a1eaea14d00cdf7d4e59423a297cd3f4646b2c72c
|
| 3 |
+
size 5370447132
|
model-00028-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:da6f18b3b4d8eb97ccd55faba99bd7ce4cdd5661ca4610f62a483682052cf0ad
|
| 3 |
+
size 5370447148
|
model-00030-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2992a773fb12e3fc5fdfaef7122234561e51d12ad178b2ba6e81436c331c2f97
|
| 3 |
+
size 5370447172
|
model-00031-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5af20934e85a52715bdff6c39f6c5bc534cd53bd57ada772be8909e634851989
|
| 3 |
+
size 5370447340
|
model-00033-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:82754607522ae6ef8019e4b929e1a7ba433775382e6c3aede342572c5b3a1204
|
| 3 |
+
size 5370446780
|
model-00035-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:527ffea841f460c78cd1ef414ab638b7a4da8a7224cac3a301c315e04d27a7b9
|
| 3 |
+
size 5370446956
|
model-00042-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:388b591778853ce156a7ba5dd950e4bc5c080e9fecc4d64b0ae7a03ba2d6a557
|
| 3 |
+
size 5370447156
|
model-00043-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:eeebcebacaafbc9d7790ef2b0053c5cd0078393cebd0b7903672a5caa0e08270
|
| 3 |
+
size 5370447156
|
model-00045-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3c702dfcf632c7fa67eb71bab94894d7e8879599e91652b948ac6e0e93b65270
|
| 3 |
+
size 5370447364
|
model-00046-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:130ef0e80ad96d9a6ec0841f53af68abc5a39fd5dd2d086b9513528c593e3115
|
| 3 |
+
size 5370446820
|
model-00048-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:589265f505d8ecf84e848782e4b6cdcfe3653a38a3bc6d642ed77caa00d89777
|
| 3 |
+
size 5370446860
|
model-00049-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:be50e922b74c33334cdec93a10ff4172d6508bada39ced8e085a220dadc8e1df
|
| 3 |
+
size 5370446972
|
model-00050-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8b908325b29e44afcab5f8cf228108da0457d8901a96eec89e0947157c9ce61c
|
| 3 |
+
size 5369253496
|
model-00058-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:298e5b03b88cc7453754489118e850ae5e28e658d9a82e66e360a799c6227a2d
|
| 3 |
+
size 5370447340
|
model-00059-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bed696bcf98029fd4f8c945743a5887d5faa24a52f66c03dd65b9056378abecd
|
| 3 |
+
size 5370446980
|
model-00062-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6269d6956ab08eb64fdae9110e9efcd074d1b3fd33cbc038bb225c478454d4be
|
| 3 |
+
size 5370446972
|
model-00065-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:272a0de8407ba91a31340797285f793b7800b872dffb66256d5f725bac4eb67f
|
| 3 |
+
size 5370447188
|
model-00066-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:24be4bab71a064a462d2ac9a5c87eb669c42f258fe6256f266e26545f510fc5f
|
| 3 |
+
size 5370447108
|
model-00067-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:673c901ca97241d24d87cdd436802ac68522f5aa2eb510db7100111414d5ebb5
|
| 3 |
+
size 5370447132
|
model-00068-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1f5d4b951d93fc6cc47fe1205cf71bf706c311eed668c812b9c1e3a4cf74ef29
|
| 3 |
+
size 5370447148
|
model-00070-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:42ddf71da81e1cf7d5332c5a0d5abc8b0c92e58c991330cd478f348278930f52
|
| 3 |
+
size 5370447172
|
model-00074-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:38d0c5e6ca04e97aa0b94dcc0e9b0a78aa8668cfa195f56f2327d293c6f941de
|
| 3 |
+
size 5370446780
|
model-00077-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7136d2f01d39a2a38a4d3eda9c7c8be45667a9682e2401fe6856b0f23ce159da
|
| 3 |
+
size 5369121124
|
model-00079-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cfb19bdc09c4042909a359b229626d48020673187b2ed22ed4b9201d358a224a
|
| 3 |
+
size 5370447132
|
model-00080-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:03c89b0cee28f2df86ae0f040c1f2b98961c2fed148d316f2b768064df8ca100
|
| 3 |
+
size 5370447132
|
model-00081-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e2cb8eb8160a3757cc2ec9cd676653e78f57674c229897624d55b56797d93d28
|
| 3 |
+
size 5370447116
|
model-00082-of-00082.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2c972e6fad24d956a14a33aedae33e19bc02d57105888af0d4a6fa40cea34076
|
| 3 |
+
size 5671884736
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d6b05371bcc7dd3383aa6064edbe9886498aeb397979d6e25eaa4888cd4f3f77
|
| 3 |
+
size 21814298
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
|
| 3 |
+
size 20217442
|