festr2 commited on Mar 1

Commit

9ccaedd

verified ·

1 Parent(s): 72ed061

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
README.md +96 -0
chat_template.jinja +86 -0
config.json +276 -0
generation_config.json +74 -0
hf_quant_config.json +121 -0
input_scales.safetensors +3 -0
model-00003-of-00082.safetensors +3 -0
model-00005-of-00082.safetensors +3 -0
model-00010-of-00082.safetensors +3 -0
model-00011-of-00082.safetensors +3 -0
model-00012-of-00082.safetensors +3 -0
model-00014-of-00082.safetensors +3 -0
model-00015-of-00082.safetensors +3 -0
model-00016-of-00082.safetensors +3 -0
model-00017-of-00082.safetensors +3 -0
model-00020-of-00082.safetensors +3 -0
model-00022-of-00082.safetensors +3 -0
model-00023-of-00082.safetensors +3 -0
model-00024-of-00082.safetensors +3 -0
model-00026-of-00082.safetensors +3 -0
model-00027-of-00082.safetensors +3 -0
model-00028-of-00082.safetensors +3 -0
model-00030-of-00082.safetensors +3 -0
model-00031-of-00082.safetensors +3 -0
model-00033-of-00082.safetensors +3 -0
model-00035-of-00082.safetensors +3 -0
model-00042-of-00082.safetensors +3 -0
model-00043-of-00082.safetensors +3 -0
model-00045-of-00082.safetensors +3 -0
model-00046-of-00082.safetensors +3 -0
model-00048-of-00082.safetensors +3 -0
model-00049-of-00082.safetensors +3 -0
model-00050-of-00082.safetensors +3 -0
model-00058-of-00082.safetensors +3 -0
model-00059-of-00082.safetensors +3 -0
model-00062-of-00082.safetensors +3 -0
model-00065-of-00082.safetensors +3 -0
model-00066-of-00082.safetensors +3 -0
model-00067-of-00082.safetensors +3 -0
model-00068-of-00082.safetensors +3 -0
model-00070-of-00082.safetensors +3 -0
model-00074-of-00082.safetensors +3 -0
model-00077-of-00082.safetensors +3 -0
model-00079-of-00082.safetensors +3 -0
model-00080-of-00082.safetensors +3 -0
model-00081-of-00082.safetensors +3 -0
model-00082-of-00082.safetensors +3 -0
model.safetensors.index.json +3 -0
tokenizer.json +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+base_model:
+- zai-org/GLM-5
+license: mit
+tags:
+- mtp
+- speculative-decoding
+---
+## Model Description
+**GLM-5-NVFP4-MTP** is an NVFP4-quantized version of [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5) with **Multi-Token Prediction (MTP) weights restored**, enabling speculative decoding with vLLM and SGLang.
+This is based on [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4) — a 744B-parameter Mixture-of-Experts language model with 40B active parameters, 256 experts per MoE layer (8 activated per token), and DeepSeek Sparse Attention (DSA).
+Quantized directly from the full BF16 checkpoint ([zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)), *not the FP8 release*, to NVFP4 (4-bit with blockwise FP8 scales per 16 elements) using [NVIDIA Model Optimizer](https://github.com/NVIDIA/Model-Optimizer).
+### MTP Layer Addition
+The original [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4) declares `"num_nextn_predict_layers": 1` in its config but ships without the MTP layer weights (layer 78). This repo fixes that by extracting the MTP layer directly from the original BF16 model ([zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)).
+**What was done:**
+- Extracted all 791 tensors for `model.layers.78.*` from the BF16 model (shards 271–274 of 282) and saved them as `mtp.safetensors` (~19 GB, full BF16 precision)
+- Updated `model.safetensors.index.json` to include the 791 layer 78 → `mtp.safetensors` mappings
+- Added `model.layers.78.*` glob patterns to `quantization_config.ignore` in both `config.json` and `hf_quant_config.json` so the NVFP4 dequantizer skips the MTP layer
+- All other weights (layers 0–77, embeddings, lm_head) remain unchanged from the original NVFP4 quantization
+The MTP layer contains:
+- Special MTP components: `eh_proj`, `enorm`, `hnorm`, `shared_head.norm`
+- A complete transformer block: self-attention + MoE MLP (256 experts)
+### What's quantized
+Only the MoE expert MLP layers (gate, up, and down projections) for layers 0–77 are quantized to NVFP4. Attention layers are left in BF16. The MTP layer (layer 78) is entirely in BF16. Since the expert weights constitute the vast majority of model parameters in an MoE architecture, this still yields significant memory savings.
+Calibration uses natural top-k routing rather than forcing all experts to activate, so each expert's quantization scales reflect the token distributions it actually sees during inference. To compensate, calibration was run on a much larger number of samples than typical to ensure broad expert coverage through natural routing alone.
+### Calibration dataset
+Three calibration passes were run:
+1. **Coding pass** — Agentic coding samples (tool calling, multi-turn code generation, function calling) with English and Chinese system prompts.
+2. **Broad pass** — Large-scale diverse samples drawn from WildChat and LMSYS-Chat covering real user conversations across a wide range of topics and languages.
+3. **Deep pass** — Long-context samples (>8K tokens) from coding and diverse sources to exercise deep-sequence expert activation patterns.
+Merged via element-wise max across all calibration runs.
+### How to Run
+NVFP4 requires Blackwell GPUs (RTX 5090, RTX Pro 6000, B100, B200, etc.). Even quantized, this is a huge model — tested on 8x RTX Pro 6000 Blackwell (96 GB each, 768 GB total).
+If you experience NCCL hangs with P2P, make sure you have `iommu=pt` (and `amd_iommu=pt` on AMD platforms) in your kernel command line.
+#### SGLang (with MTP speculative decoding)
+```bash
+export NCCL_IB_DISABLE=1
+export NCCL_P2P_LEVEL=PHB
+export NCCL_ALLOC_P2P_NET_LL_BUFFERS=1
+export NCCL_MIN_NCHANNELS=8
+export OMP_NUM_THREADS=8
+export SAFETENSORS_FAST_GPU=1
+python3 -m sglang.launch_server \
+  --model festr2/GLM-5-NVFP4-MTP \
+  --served-model-name glm-5 \
+  --reasoning-parser glm45 \
+  --tool-call-parser glm47 \
+  --trust-remote-code \
+  --tp 8 \
+  --mem-fraction-static 0.95 \
+  --max-running-requests 8 \
+  --kv-cache-dtype fp8_e4m3 \
+  --quantization modelopt_fp4 \
+  --attention-backend flashinfer \
+  --moe-runner-backend flashinfer_cutlass \
+  --disable-custom-all-reduce \
+  --enable-flashinfer-allreduce-fusion \
+  --speculative-algorithm mtp \
+  --num-speculative-tokens 1 \
+  --host 0.0.0.0 \
+  --port 8000
+```
+#### vLLM (with MTP speculative decoding)
+```bash
+vllm serve festr2/GLM-5-NVFP4-MTP \
+  --speculative-config '{"method":"mtp","num_speculative_tokens":1}'
+```
+### Credits
+- Original model: [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5)
+- NVFP4 quantization: [lukealonso/GLM-5-NVFP4](https://huggingface.co/lukealonso/GLM-5-NVFP4)
+- MTP layer restoration: extracted from the original BF16 weights

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,86 @@

+[gMASK]<sop>
+{%- if tools -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{{ tool | tojson(ensure_ascii=False) }}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1) %}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {% set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{% for m in messages %}
+{%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set reasoning_content = '' %}
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- else %}
+    {%- if '</think>' in content %}
+        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+    {%- endif %}
+{%- endif %}
+{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
+{{ '<think>' + reasoning_content.strip() +  '</think>'}}
+{%- else -%}
+{{ '</think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{- '<tool_call>' + tc.name -}}
+{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if m.content is string -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' }}
+{%- endif %}
+{{- '<tool_response>' }}
+{{- m.content }}
+{{- '</tool_response>' }}
+{%- else -%}
+<|observation|>{% for tr in m.content %}
+<tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
+{% endif -%}
+{%- elif m.role == 'system' -%}
+<|system|>{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,276 @@

+{
+    "vocab_size": 154880,
+    "max_position_embeddings": 202752,
+    "hidden_size": 6144,
+    "intermediate_size": 12288,
+    "num_hidden_layers": 78,
+    "mlp_layer_types": [
+        "dense",
+        "dense",
+        "dense",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse",
+        "sparse"
+    ],
+    "moe_intermediate_size": 2048,
+    "num_attention_heads": 64,
+    "n_shared_experts": 1,
+    "n_routed_experts": 256,
+    "routed_scaling_factor": 2.5,
+    "kv_lora_rank": 512,
+    "q_lora_rank": 2048,
+    "qk_rope_head_dim": 64,
+    "v_head_dim": 256,
+    "qk_nope_head_dim": 192,
+    "qk_head_dim": 256,
+    "head_dim": 64,
+    "n_group": 1,
+    "topk_group": 1,
+    "num_experts_per_tok": 8,
+    "norm_topk_prob": true,
+    "rope_interleave": true,
+    "num_key_value_heads": 64,
+    "hidden_act": "silu",
+    "initializer_range": 0.02,
+    "index_topk": 2048,
+    "rms_norm_eps": 1e-05,
+    "use_cache": true,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "rope_parameters": {
+        "rope_theta": 1000000,
+        "rope_type": "default"
+    },
+    "pad_token_id": 154820,
+    "bos_token_id": 0,
+    "eos_token_id": [
+        154820,
+        154827,
+        154829
+    ],
+    "tie_word_embeddings": false,
+    "return_dict": true,
+    "output_hidden_states": false,
+    "dtype": "bfloat16",
+    "chunk_size_feed_forward": 0,
+    "is_encoder_decoder": false,
+    "architectures": [
+        "GlmMoeDsaForCausalLM"
+    ],
+    "id2label": {
+        "0": "LABEL_0",
+        "1": "LABEL_1"
+    },
+    "label2id": {
+        "LABEL_0": 0,
+        "LABEL_1": 1
+    },
+    "problem_type": null,
+    "_name_or_path": "zai-org/GLM-5",
+    "transformers_version": "5.2.0.dev0",
+    "ep_size": 1,
+    "first_k_dense_replace": 3,
+    "index_head_dim": 128,
+    "index_n_heads": 32,
+    "indexer_rope_interleave": true,
+    "moe_layer_freq": 1,
+    "model_type": "glm_moe_dsa",
+    "num_nextn_predict_layers": 1,
+    "pretraining_tp": 1,
+    "scoring_func": "sigmoid",
+    "topk_method": "noaux_tc",
+    "output_attentions": false,
+    "quantization_config": {
+        "config_groups": {
+            "group_0": {
+                "input_activations": {
+                    "dynamic": false,
+                    "num_bits": 4,
+                    "type": "float",
+                    "group_size": 16
+                },
+                "weights": {
+                    "dynamic": false,
+                    "num_bits": 4,
+                    "type": "float",
+                    "group_size": 16
+                },
+                "targets": [
+                    "Linear"
+                ]
+            }
+        },
+        "ignore": [
+            "lm_head",
+            "model.layers.0.self_attn*",
+            "model.layers.1.self_attn*",
+            "model.layers.10.self_attn*",
+            "model.layers.11.self_attn*",
+            "model.layers.12.self_attn*",
+            "model.layers.13.self_attn*",
+            "model.layers.14.self_attn*",
+            "model.layers.15.self_attn*",
+            "model.layers.16.self_attn*",
+            "model.layers.17.self_attn*",
+            "model.layers.18.self_attn*",
+            "model.layers.19.self_attn*",
+            "model.layers.2.self_attn*",
+            "model.layers.20.self_attn*",
+            "model.layers.21.self_attn*",
+            "model.layers.22.self_attn*",
+            "model.layers.23.self_attn*",
+            "model.layers.24.self_attn*",
+            "model.layers.25.self_attn*",
+            "model.layers.26.self_attn*",
+            "model.layers.27.self_attn*",
+            "model.layers.28.self_attn*",
+            "model.layers.29.self_attn*",
+            "model.layers.3.self_attn*",
+            "model.layers.30.self_attn*",
+            "model.layers.31.self_attn*",
+            "model.layers.32.self_attn*",
+            "model.layers.33.self_attn*",
+            "model.layers.34.self_attn*",
+            "model.layers.35.self_attn*",
+            "model.layers.36.self_attn*",
+            "model.layers.37.self_attn*",
+            "model.layers.38.self_attn*",
+            "model.layers.39.self_attn*",
+            "model.layers.4.self_attn*",
+            "model.layers.40.self_attn*",
+            "model.layers.41.self_attn*",
+            "model.layers.42.self_attn*",
+            "model.layers.43.self_attn*",
+            "model.layers.44.self_attn*",
+            "model.layers.45.self_attn*",
+            "model.layers.46.self_attn*",
+            "model.layers.47.self_attn*",
+            "model.layers.48.self_attn*",
+            "model.layers.49.self_attn*",
+            "model.layers.5.self_attn*",
+            "model.layers.50.self_attn*",
+            "model.layers.51.self_attn*",
+            "model.layers.52.self_attn*",
+            "model.layers.53.self_attn*",
+            "model.layers.54.self_attn*",
+            "model.layers.55.self_attn*",
+            "model.layers.56.self_attn*",
+            "model.layers.57.self_attn*",
+            "model.layers.58.self_attn*",
+            "model.layers.59.self_attn*",
+            "model.layers.6.self_attn*",
+            "model.layers.60.self_attn*",
+            "model.layers.61.self_attn*",
+            "model.layers.62.self_attn*",
+            "model.layers.63.self_attn*",
+            "model.layers.64.self_attn*",
+            "model.layers.65.self_attn*",
+            "model.layers.66.self_attn*",
+            "model.layers.67.self_attn*",
+            "model.layers.68.self_attn*",
+            "model.layers.69.self_attn*",
+            "model.layers.7.self_attn*",
+            "model.layers.70.self_attn*",
+            "model.layers.71.self_attn*",
+            "model.layers.72.self_attn*",
+            "model.layers.73.self_attn*",
+            "model.layers.74.self_attn*",
+            "model.layers.75.self_attn*",
+            "model.layers.76.self_attn*",
+            "model.layers.77.self_attn*",
+            "model.layers.78.eh_proj*",
+            "model.layers.78.enorm*",
+            "model.layers.78.hnorm*",
+            "model.layers.78.input_layernorm*",
+            "model.layers.78.mlp*",
+            "model.layers.78.post_attention_layernorm*",
+            "model.layers.78.self_attn*",
+            "model.layers.78.shared_head*",
+            "model.layers.8.self_attn*",
+            "model.layers.9.self_attn*"
+        ],
+        "quant_algo": "NVFP4",
+        "kv_cache_scheme": {
+            "dynamic": false,
+            "num_bits": 8,
+            "type": "float"
+        },
+        "producer": {
+            "name": "modelopt",
+            "version": "0.39.0.dev290+gf9d9a71de.d20260214"
+        },
+        "quant_method": "modelopt"
+    }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,74 @@

+{
+    "max_length": null,
+    "max_new_tokens": null,
+    "min_length": null,
+    "min_new_tokens": null,
+    "early_stopping": null,
+    "max_time": null,
+    "stop_strings": null,
+    "do_sample": null,
+    "num_beams": null,
+    "use_cache": true,
+    "cache_implementation": null,
+    "cache_config": null,
+    "temperature": null,
+    "top_k": null,
+    "top_p": null,
+    "min_p": null,
+    "top_h": null,
+    "typical_p": null,
+    "epsilon_cutoff": null,
+    "eta_cutoff": null,
+    "repetition_penalty": null,
+    "encoder_repetition_penalty": null,
+    "length_penalty": null,
+    "no_repeat_ngram_size": null,
+    "bad_words_ids": null,
+    "renormalize_logits": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "remove_invalid_values": null,
+    "exponential_decay_length_penalty": null,
+    "suppress_tokens": null,
+    "begin_suppress_tokens": null,
+    "sequence_bias": null,
+    "token_healing": null,
+    "guidance_scale": null,
+    "watermarking_config": null,
+    "num_return_sequences": null,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": null,
+    "output_logits": null,
+    "return_dict_in_generate": null,
+    "pad_token_id": 154820,
+    "bos_token_id": 0,
+    "eos_token_id": [
+        154820,
+        154827,
+        154829
+    ],
+    "encoder_no_repeat_ngram_size": null,
+    "decoder_start_token_id": null,
+    "is_assistant": null,
+    "num_assistant_tokens": null,
+    "num_assistant_tokens_schedule": null,
+    "assistant_confidence_threshold": null,
+    "prompt_lookup_num_tokens": null,
+    "max_matching_ngram_size": null,
+    "assistant_early_exit": null,
+    "assistant_lookbehind": null,
+    "target_lookbehind": null,
+    "compile_config": null,
+    "disable_compile": null,
+    "low_memory": null,
+    "penalty_alpha": null,
+    "dola_layers": null,
+    "diversity_penalty": null,
+    "num_beam_groups": null,
+    "constraints": null,
+    "force_words_ids": null,
+    "prefill_chunk_size": null,
+    "_from_model_config": true,
+    "transformers_version": "5.2.0.dev0"
+}

hf_quant_config.json ADDED Viewed

	@@ -0,0 +1,121 @@

+{
+    "config_groups": {
+        "group_0": {
+            "input_activations": {
+                "dynamic": false,
+                "num_bits": 4,
+                "type": "float",
+                "group_size": 16
+            },
+            "weights": {
+                "dynamic": false,
+                "num_bits": 4,
+                "type": "float",
+                "group_size": 16
+            },
+            "targets": [
+                "Linear"
+            ]
+        }
+    },
+    "ignore": [
+        "lm_head",
+        "model.layers.0.self_attn*",
+        "model.layers.1.self_attn*",
+        "model.layers.10.self_attn*",
+        "model.layers.11.self_attn*",
+        "model.layers.12.self_attn*",
+        "model.layers.13.self_attn*",
+        "model.layers.14.self_attn*",
+        "model.layers.15.self_attn*",
+        "model.layers.16.self_attn*",
+        "model.layers.17.self_attn*",
+        "model.layers.18.self_attn*",
+        "model.layers.19.self_attn*",
+        "model.layers.2.self_attn*",
+        "model.layers.20.self_attn*",
+        "model.layers.21.self_attn*",
+        "model.layers.22.self_attn*",
+        "model.layers.23.self_attn*",
+        "model.layers.24.self_attn*",
+        "model.layers.25.self_attn*",
+        "model.layers.26.self_attn*",
+        "model.layers.27.self_attn*",
+        "model.layers.28.self_attn*",
+        "model.layers.29.self_attn*",
+        "model.layers.3.self_attn*",
+        "model.layers.30.self_attn*",
+        "model.layers.31.self_attn*",
+        "model.layers.32.self_attn*",
+        "model.layers.33.self_attn*",
+        "model.layers.34.self_attn*",
+        "model.layers.35.self_attn*",
+        "model.layers.36.self_attn*",
+        "model.layers.37.self_attn*",
+        "model.layers.38.self_attn*",
+        "model.layers.39.self_attn*",
+        "model.layers.4.self_attn*",
+        "model.layers.40.self_attn*",
+        "model.layers.41.self_attn*",
+        "model.layers.42.self_attn*",
+        "model.layers.43.self_attn*",
+        "model.layers.44.self_attn*",
+        "model.layers.45.self_attn*",
+        "model.layers.46.self_attn*",
+        "model.layers.47.self_attn*",
+        "model.layers.48.self_attn*",
+        "model.layers.49.self_attn*",
+        "model.layers.5.self_attn*",
+        "model.layers.50.self_attn*",
+        "model.layers.51.self_attn*",
+        "model.layers.52.self_attn*",
+        "model.layers.53.self_attn*",
+        "model.layers.54.self_attn*",
+        "model.layers.55.self_attn*",
+        "model.layers.56.self_attn*",
+        "model.layers.57.self_attn*",
+        "model.layers.58.self_attn*",
+        "model.layers.59.self_attn*",
+        "model.layers.6.self_attn*",
+        "model.layers.60.self_attn*",
+        "model.layers.61.self_attn*",
+        "model.layers.62.self_attn*",
+        "model.layers.63.self_attn*",
+        "model.layers.64.self_attn*",
+        "model.layers.65.self_attn*",
+        "model.layers.66.self_attn*",
+        "model.layers.67.self_attn*",
+        "model.layers.68.self_attn*",
+        "model.layers.69.self_attn*",
+        "model.layers.7.self_attn*",
+        "model.layers.70.self_attn*",
+        "model.layers.71.self_attn*",
+        "model.layers.72.self_attn*",
+        "model.layers.73.self_attn*",
+        "model.layers.74.self_attn*",
+        "model.layers.75.self_attn*",
+        "model.layers.76.self_attn*",
+        "model.layers.77.self_attn*",
+        "model.layers.78.eh_proj*",
+        "model.layers.78.enorm*",
+        "model.layers.78.hnorm*",
+        "model.layers.78.input_layernorm*",
+        "model.layers.78.mlp*",
+        "model.layers.78.post_attention_layernorm*",
+        "model.layers.78.self_attn*",
+        "model.layers.78.shared_head*",
+        "model.layers.8.self_attn*",
+        "model.layers.9.self_attn*"
+    ],
+    "quant_algo": "NVFP4",
+    "kv_cache_scheme": {
+        "dynamic": false,
+        "num_bits": 8,
+        "type": "float"
+    },
+    "producer": {
+        "name": "modelopt",
+        "version": "0.39.0.dev290+gf9d9a71de.d20260214"
+    },
+    "quant_method": "modelopt"
+}

input_scales.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3dc72ddf5e0a470c46bad790020c5bd48b0c75c3f12afaa688a9ed23bef30656
+size 6700728

model-00003-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31a4f8568520c2243295633e5652272d86356102b0f842457a7c852a81168bd5
+size 5370445012

model-00005-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1debaedd04b77f1a42bd61c2f565b3087c84299969bbf1b6fd585cdb7907f265
+size 5370445220

model-00010-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07e0a19471da0515e042f386aa0d5cf23d768209ae04dd3d75f074b09fcf006d
+size 5369253496

model-00011-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa750340797b11cbbc56dfc6f41c8a1277459c559c3a36847f78896422c722c6
+size 5373591496

model-00012-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e44e99405d81bd4e6410668c6e40426d64bfd87ba5b0fbbf9384c95ebc6e984a
+size 5370447164

model-00014-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9552d2c38b93b44624663b6250077f0fe3460eeb44c2270fa01e6991ff1b0432
+size 5370447100

model-00015-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7569d4539b3e236654dc28ebeba18761066236f6d960e58257374c4c0207cf79
+size 5370447156

model-00016-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c4aad9b9b1f1a786ae5a5bca7730b63c05f9afa21d2b562b88f2b5f5c50fb58
+size 5370447156

model-00017-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3370803228952f375f3bd9e958ffc3605269cac200a836678bb01cb7014bdea5
+size 5370447212

model-00020-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf2bad486df10c4f91d21a0e7f4724c019cd7bdd948b2ed28c3b19927b9ac496
+size 5370446780

model-00022-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f92d9c2d9994a97c023aa92e24eff9866778a28abf832182b14d077bdade509d
+size 5370446972

model-00023-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70abd450ad1bdd6f8ed13dbb34916a89d20992f06f1ab5c80e156a2d59a70223
+size 5369540220

model-00024-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da3b2a3034106ad4824468ec20d382e74f68fa36eed974b1faa23aeeacb3941b
+size 5373305076

model-00026-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1620bc371c7107d320d48e1e00b085fbb9b954f36e4be387f78961298fd305a2
+size 5370447108

model-00027-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b92b91e2db32518d6556e77a1eaea14d00cdf7d4e59423a297cd3f4646b2c72c
+size 5370447132

model-00028-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:da6f18b3b4d8eb97ccd55faba99bd7ce4cdd5661ca4610f62a483682052cf0ad
+size 5370447148

model-00030-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2992a773fb12e3fc5fdfaef7122234561e51d12ad178b2ba6e81436c331c2f97
+size 5370447172

model-00031-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5af20934e85a52715bdff6c39f6c5bc534cd53bd57ada772be8909e634851989
+size 5370447340

model-00033-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82754607522ae6ef8019e4b929e1a7ba433775382e6c3aede342572c5b3a1204
+size 5370446780

model-00035-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:527ffea841f460c78cd1ef414ab638b7a4da8a7224cac3a301c315e04d27a7b9
+size 5370446956

model-00042-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:388b591778853ce156a7ba5dd950e4bc5c080e9fecc4d64b0ae7a03ba2d6a557
+size 5370447156

model-00043-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eeebcebacaafbc9d7790ef2b0053c5cd0078393cebd0b7903672a5caa0e08270
+size 5370447156

model-00045-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3c702dfcf632c7fa67eb71bab94894d7e8879599e91652b948ac6e0e93b65270
+size 5370447364

model-00046-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:130ef0e80ad96d9a6ec0841f53af68abc5a39fd5dd2d086b9513528c593e3115
+size 5370446820

model-00048-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:589265f505d8ecf84e848782e4b6cdcfe3653a38a3bc6d642ed77caa00d89777
+size 5370446860

model-00049-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be50e922b74c33334cdec93a10ff4172d6508bada39ced8e085a220dadc8e1df
+size 5370446972

model-00050-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b908325b29e44afcab5f8cf228108da0457d8901a96eec89e0947157c9ce61c
+size 5369253496

model-00058-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:298e5b03b88cc7453754489118e850ae5e28e658d9a82e66e360a799c6227a2d
+size 5370447340

model-00059-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bed696bcf98029fd4f8c945743a5887d5faa24a52f66c03dd65b9056378abecd
+size 5370446980

model-00062-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6269d6956ab08eb64fdae9110e9efcd074d1b3fd33cbc038bb225c478454d4be
+size 5370446972

model-00065-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:272a0de8407ba91a31340797285f793b7800b872dffb66256d5f725bac4eb67f
+size 5370447188

model-00066-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24be4bab71a064a462d2ac9a5c87eb669c42f258fe6256f266e26545f510fc5f
+size 5370447108

model-00067-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:673c901ca97241d24d87cdd436802ac68522f5aa2eb510db7100111414d5ebb5
+size 5370447132

model-00068-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f5d4b951d93fc6cc47fe1205cf71bf706c311eed668c812b9c1e3a4cf74ef29
+size 5370447148

model-00070-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42ddf71da81e1cf7d5332c5a0d5abc8b0c92e58c991330cd478f348278930f52
+size 5370447172

model-00074-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38d0c5e6ca04e97aa0b94dcc0e9b0a78aa8668cfa195f56f2327d293c6f941de
+size 5370446780

model-00077-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7136d2f01d39a2a38a4d3eda9c7c8be45667a9682e2401fe6856b0f23ce159da
+size 5369121124

model-00079-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfb19bdc09c4042909a359b229626d48020673187b2ed22ed4b9201d358a224a
+size 5370447132

model-00080-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:03c89b0cee28f2df86ae0f040c1f2b98961c2fed148d316f2b768064df8ca100
+size 5370447132

model-00081-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2cb8eb8160a3757cc2ec9cd676653e78f57674c229897624d55b56797d93d28
+size 5370447116

model-00082-of-00082.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2c972e6fad24d956a14a33aedae33e19bc02d57105888af0d4a6fa40cea34076
+size 5671884736

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6b05371bcc7dd3383aa6064edbe9886498aeb397979d6e25eaa4888cd4f3f77
+size 21814298

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
+size 20217442