kleinpanic93 commited on Mar 4

Commit

d196e2c

verified ·

1 Parent(s): d30664e

NVFP4 quantization of Qwen3-Coder-30B-A3B-Instruct via spark-maker v3

Browse files

Files changed (23) hide show

README.md +40 -105
chat_template.jinja +117 -0
config.json +124 -0
generation_config.json +13 -0
merges.txt +0 -0
model-00001-of-00013.safetensors +3 -0
model-00002-of-00013.safetensors +3 -0
model-00003-of-00013.safetensors +3 -0
model-00004-of-00013.safetensors +3 -0
model-00005-of-00013.safetensors +3 -0
model-00006-of-00013.safetensors +3 -0
model-00007-of-00013.safetensors +3 -0
model-00008-of-00013.safetensors +3 -0
model-00009-of-00013.safetensors +3 -0
model-00010-of-00013.safetensors +3 -0
model-00011-of-00013.safetensors +3 -0
model-00012-of-00013.safetensors +3 -0
model-00013-of-00013.safetensors +3 -0
model.safetensors.index.json +0 -0
spark_quantizer_provenance.json +11 -0
tokenizer.json +0 -0
tokenizer_config.json +29 -0
vocab.json +0 -0

README.md CHANGED Viewed

@@ -5,88 +5,60 @@ tags:
   - qwen3
   - moe
   - nvfp4
-  - 4-bit
   - quantized
   - nvidia-modelopt
-  - blackwell
-  - dgx-spark
   - coding
-  - code-generation
-  - mixture-of-experts
 model_type: qwen3_moe
 quantized_by: kleinpanic93
 pipeline_tag: text-generation
 library_name: transformers
-inference: false
 ---
-<div align="center">
-# 🧠 Qwen3-Coder-30B-A3B-Instruct — NVFP4
-**4-bit quantization of Qwen's 30B Mixture-of-Experts coding model**
-**Optimized for NVIDIA Blackwell (GB10 / GB200 / B200)**
-[![Base Model](https://img.shields.io/badge/Base-Qwen3--Coder--30B--A3B-blue)](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
-[![Quantization](https://img.shields.io/badge/Quant-NVFP4_(4--bit)-green)](https://github.com/NVIDIA/TensorRT-Model-Optimizer)
-[![Hardware](https://img.shields.io/badge/Hardware-DGX_Spark_GB10-76b900)](https://www.nvidia.com/en-us/products/workstations/dgx-spark/)
-[![License](https://img.shields.io/badge/License-Apache_2.0-orange)](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE)
-</div>
----
-## Overview
-NVFP4 post-training quantization of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) — a 30 billion parameter Mixture-of-Experts model specialized for code generation, with 3B parameters active per token across 128 experts per layer.
-Quantized on an **NVIDIA DGX Spark** (Blackwell GB10, 128 GB unified memory) using NVIDIA ModelOpt with 512 calibration samples. All MoE routing/gate layers are preserved in original precision to maintain expert selection quality.
-## Specifications
-<table>
-<tr><td><b>Architecture</b></td><td>Qwen3MoeForCausalLM — Mixture-of-Experts</td></tr>
-<tr><td><b>Parameters</b></td><td>30B total · 3B active per token</td></tr>
-<tr><td><b>Experts</b></td><td>128 per layer · 48 layers</td></tr>
-<tr><td><b>Quantization</b></td><td>NVFP4 — 4-bit NVIDIA floating point (weights + activations)</td></tr>
-<tr><td><b>KV Cache</b></td><td>FP8 quantized</td></tr>
-<tr><td><b>Block Size</b></td><td>16 (weights and activations)</td></tr>
-<tr><td><b>Preserved Layers</b></td><td><code>lm_head</code> + 48 MoE gate/router layers (full precision)</td></tr>
-<tr><td><b>Original Precision</b></td><td>BF16</td></tr>
-<tr><td><b>Model Size</b></td><td>~57 GB</td></tr>
-<tr><td><b>Context Length</b></td><td>Up to 131,072 tokens</td></tr>
-</table>
 ## Quantization Details
-| | |
-|---|---|
-| **Tool** | [NVIDIA ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) 0.41.0 |
-| **Config** | `NVFP4_DEFAULT_CFG` |
-| **Calibration** | 512 samples (synthetic) |
-| **Export** | `save_pretrained` + manual `quantization_config` injection |
-| **Quantization Time** | 7 minutes on DGX Spark GB10 |
-| **Hardware** | NVIDIA DGX Spark — Blackwell GB10, 128 GB unified memory |
-> **Note on export method:** ModelOpt 0.41.0's native HF checkpoint exporter does not yet support `Qwen3MoeExperts` in its allowlist (only `Llama4TextExperts` and `GptOssExperts`). The quantization math is identical — calibration and weight conversion run through ModelOpt's standard NVFP4 pipeline. The checkpoint is serialized via HuggingFace `save_pretrained()` with a manually constructed `quantization_config` matching NVIDIA's schema. NVIDIA's own pre-quantized MoE models (e.g., Qwen3-Next-80B-A3B-NVFP4) use an internal dev build of ModelOpt that includes this export path.
 ## Usage
-### vLLM (Recommended)
 ```bash
 vllm serve kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4 \
   --quantization modelopt \
   --trust-remote-code \
-  --max-model-len 32768 \
-  --gpu-memory-utilization 0.90
 ```
-### Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-import torch
 model = AutoModelForCausalLM.from_pretrained(
     "kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4",
@@ -96,77 +68,40 @@ model = AutoModelForCausalLM.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained(
     "kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4"
 )
-messages = [{"role": "user", "content": "Write a Python async web scraper with rate limiting."}]
-text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(text, return_tensors="pt").to(model.device)
-with torch.no_grad():
-    output = model.generate(**inputs, max_new_tokens=2048, temperature=0.7)
-print(tokenizer.decode(output[0], skip_special_tokens=True))
-```
-### OpenAI-Compatible API (via vLLM)
-```python
-from openai import OpenAI
-client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
-response = client.chat.completions.create(
-    model="kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4",
-    messages=[{"role": "user", "content": "Implement a B-tree in Rust."}],
-    max_tokens=4096,
-)
-print(response.choices[0].message.content)
 ```
 ## Hardware Requirements
-| Configuration | Status |
-|--------------|--------|
-| NVIDIA DGX Spark (GB10, 128 GB UMA) | ✅ Tested — primary target |
-| NVIDIA GB200 / B200 (Blackwell HBM) | ✅ Should work |
-| NVIDIA H100 / A100 (80 GB HBM) | ⚠️ Tight — 57 GB model + KV cache may exceed 80 GB at long contexts |
-| NVIDIA L40S (48 GB) | ❌ Insufficient VRAM |
 ## Provenance
-This model was quantized using [spark-maker](https://github.com/kleinpanic), a quantization and fine-tuning toolkit for the NVIDIA DGX Spark platform.
 ```json
 {
   "source_model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
   "quantization": "NVFP4",
   "tool": "nvidia-modelopt 0.41.0",
   "export_method": "save_pretrained_manual",
-  "calibration_samples": 512,
-  "calibration_dataset": "synthetic-random",
-  "hardware": "NVIDIA DGX Spark GB10 (Blackwell)",
-  "quantization_time_seconds": 472
 }
 ```
 ## Limitations
-- **Synthetic calibration data**: Quantized with random token sequences because the container runs in offline mode. Real calibration data (C4, RedPajama, The Stack) would improve quantization quality, particularly for code-heavy workloads. Re-quantization with domain-specific calibration data is recommended for production use.
-- **Export path**: Uses `save_pretrained` serialization rather than ModelOpt's native checkpoint exporter. Functionally equivalent, but compatibility with all NVFP4-aware inference backends should be verified.
-- **MoE routing preserved**: All 48 gate/router layers remain in original BF16 precision by design — quantizing these would degrade expert selection quality.
-## Citation
-```bibtex
-@misc{kleinpanic2026qwen3codernvfp4,
-  title={Qwen3-Coder-30B-A3B-Instruct-NVFP4},
-  author={kleinpanic},
-  year={2026},
-  url={https://huggingface.co/kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4},
-  note={NVFP4 quantization via NVIDIA ModelOpt on DGX Spark GB10}
-}
-```
 ## Acknowledgments
-- **[Qwen Team](https://huggingface.co/Qwen)** — Base model architecture and training
-- **[NVIDIA ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer)** — Quantization toolkit
-- **[NVIDIA DGX Spark](https://www.nvidia.com/en-us/products/workstations/dgx-spark/)** — Hardware platform

   - qwen3
   - moe
   - nvfp4
   - quantized
   - nvidia-modelopt
   - coding
+  - dgx-spark
 model_type: qwen3_moe
 quantized_by: kleinpanic93
 pipeline_tag: text-generation
 library_name: transformers
 ---
+# Qwen3-Coder-30B-A3B-Instruct-NVFP4
+NVFP4 (4-bit floating point) quantization of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct), optimized for NVIDIA Blackwell GPUs.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | Qwen/Qwen3-Coder-30B-A3B-Instruct |
+| **Architecture** | Qwen3MoeForCausalLM (Mixture-of-Experts) |
+| **Total Parameters** | 30B (3B active per token) |
+| **Experts** | 128 per layer |
+| **Quantization** | NVFP4 (4-bit NV floating point) |
+| **KV Cache** | FP8 (8-bit float) |
+| **Original Precision** | BF16 |
+| **Quantized Size** | ~57 GB |
+| **Quantization Tool** | NVIDIA ModelOpt 0.41.0 |
+| **Calibration** | 512 samples (synthetic) |
+| **Hardware** | NVIDIA DGX Spark GB10 (Blackwell) |
 ## Quantization Details
+- **Method:** Post-training quantization via `nvidia-modelopt` with `NVFP4_DEFAULT_CFG`
+- **Weights:** 4-bit NV floating point, group size 16
+- **Activations:** 4-bit NV floating point, group size 16
+- **KV Cache:** FP8 quantized for reduced memory during inference
+- **Excluded layers:** `lm_head` and all MoE router/gate layers (48 total) — these remain in original precision to preserve routing quality
+- **Export method:** HuggingFace `save_pretrained` with manual `quantization_config` injection (ModelOpt 0.41.0 native export does not yet support `Qwen3MoeExperts`)
 ## Usage
+### With vLLM (Recommended)
 ```bash
 vllm serve kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4 \
   --quantization modelopt \
   --trust-remote-code \
+  --max-model-len 32768
 ```
+### With Transformers
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained(
     "kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4",
 tokenizer = AutoTokenizer.from_pretrained(
     "kleinpanic93/Qwen3-Coder-30B-A3B-Instruct-NVFP4"
 )
 ```
 ## Hardware Requirements
+- **Minimum VRAM:** ~57 GB (unified memory or dedicated)
+- **Tested on:** NVIDIA DGX Spark (GB10, 128 GB unified memory)
+- **Recommended:** NVIDIA Blackwell GPUs (GB10, GB200, B200)
 ## Provenance
 ```json
 {
   "source_model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
   "quantization": "NVFP4",
   "tool": "nvidia-modelopt 0.41.0",
   "export_method": "save_pretrained_manual",
+  "calib_size": 512,
+  "calib_dataset": "synthetic-random",
+  "hardware": "NVIDIA GB10 (Blackwell)",
+  "elapsed_sec": 472
 }
 ```
 ## Limitations
+- This quantization uses **synthetic calibration data** (random tokens) because the container runs in offline mode. Production-grade quantization with real calibration data (e.g., C4, RedPajama) may yield slightly better quality.
+- The export uses `save_pretrained` fallback rather than ModelOpt's native HF checkpoint exporter, since `Qwen3MoeExperts` is not yet in ModelOpt 0.41.0's export allowlist. The quantization math is identical — only the serialization path differs.
+- MoE gate/router layers are preserved in original precision by design.
+## License
+This model inherits the [Apache 2.0 license](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/LICENSE) from the base Qwen3-Coder-30B-A3B-Instruct model.
 ## Acknowledgments
+- [Qwen Team](https://huggingface.co/Qwen) for the base model
+- [NVIDIA](https://github.com/NVIDIA/TensorRT-Model-Optimizer) for ModelOpt quantization toolkit

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,117 @@

+{% macro render_extra_keys(json_dict, handled_keys) %}
+    {%- if json_dict is mapping %}
+        {%- for json_key in json_dict if json_key not in handled_keys %}
+            {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) %}
+                {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
+            {%- else %}
+                {{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
+            {%- endif %}
+        {%- endfor %}
+    {%- endif %}
+{% endmacro %}
+{%- if messages[0]["role"] == "system" %}
+    {%- set system_message = messages[0]["content"] %}
+    {%- set loop_messages = messages[1:] %}
+{%- else %}
+    {%- set loop_messages = messages %}
+{%- endif %}
+{%- if not tools is defined %}
+    {%- set tools = [] %}
+{%- endif %}
+{%- if system_message is defined %}
+    {{- "<|im_start|>system\n" + system_message }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
+    {%- endif %}
+{%- endif %}
+{%- if tools is iterable and tools | length > 0 %}
+    {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
+    {{- "<tools>" }}
+    {%- for tool in tools %}
+        {%- if tool.function is defined %}
+            {%- set tool = tool.function %}
+        {%- endif %}
+        {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
+        {%- if tool.description is defined %}
+            {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
+        {%- endif %}
+        {{- '\n<parameters>' }}
+        {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
+            {%- for param_name, param_fields in tool.parameters.properties|items %}
+                {{- '\n<parameter>' }}
+                {{- '\n<name>' ~ param_name ~ '</name>' }}
+                {%- if param_fields.type is defined %}
+                    {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
+                {%- endif %}
+                {%- if param_fields.description is defined %}
+                    {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
+                {%- endif %}
+                {%- set handled_keys = ['name', 'type', 'description'] %}
+                {{- render_extra_keys(param_fields, handled_keys) }}
+                {{- '\n</parameter>' }}
+            {%- endfor %}
+        {%- endif %}
+        {% set handled_keys = ['type', 'properties'] %}
+        {{- render_extra_keys(tool.parameters, handled_keys) }}
+        {{- '\n</parameters>' }}
+        {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
+        {{- render_extra_keys(tool, handled_keys) }}
+        {{- '\n</function>' }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+{%- endif %}
+{%- if system_message is defined %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if tools is iterable and tools | length > 0 %}
+        {{- '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in loop_messages %}
+    {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
+            {{- '\n' + message.content | trim + '\n' }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+            {%- if tool_call.arguments is defined %}
+                {%- for args_name, args_value in tool_call.arguments|items %}
+                    {{- '<parameter=' + args_name + '>\n' }}
+                    {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                    {{- args_value }}
+                    {{- '\n</parameter>\n' }}
+                {%- endfor %}
+            {%- endif %}
+            {{- '</function>\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user\n' }}
+        {%- endif %}
+        {{- '<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>\n' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,124 @@

+{
+  "architectures": [
+    "Qwen3MoeForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "decoder_sparse_step": 1,
+  "dtype": "bfloat16",
+  "eos_token_id": 151645,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 6144,
+  "max_position_embeddings": 262144,
+  "max_window_layers": 48,
+  "mlp_only_layers": [],
+  "model_type": "qwen3_moe",
+  "moe_intermediate_size": 768,
+  "norm_topk_prob": true,
+  "num_attention_heads": 32,
+  "num_experts_per_tok": 8,
+  "num_hidden_layers": 48,
+  "num_key_value_heads": 4,
+  "num_local_experts": 128,
+  "output_router_logits": false,
+  "pad_token_id": null,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 10000000,
+    "rope_type": "default"
+  },
+  "router_aux_loss_coef": 0.001,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "transformers_version": "5.2.0",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936,
+  "quantization_config": {
+    "config_groups": {
+      "group_0": {
+        "input_activations": {
+          "dynamic": false,
+          "num_bits": 4,
+          "type": "float",
+          "group_size": 16
+        },
+        "weights": {
+          "dynamic": false,
+          "num_bits": 4,
+          "type": "float",
+          "group_size": 16
+        },
+        "targets": [
+          "Linear"
+        ]
+      }
+    },
+    "ignore": [
+      "lm_head",
+      "model.layers.0.mlp.gate",
+      "model.layers.1.mlp.gate",
+      "model.layers.10.mlp.gate",
+      "model.layers.11.mlp.gate",
+      "model.layers.12.mlp.gate",
+      "model.layers.13.mlp.gate",
+      "model.layers.14.mlp.gate",
+      "model.layers.15.mlp.gate",
+      "model.layers.16.mlp.gate",
+      "model.layers.17.mlp.gate",
+      "model.layers.18.mlp.gate",
+      "model.layers.19.mlp.gate",
+      "model.layers.2.mlp.gate",
+      "model.layers.20.mlp.gate",
+      "model.layers.21.mlp.gate",
+      "model.layers.22.mlp.gate",
+      "model.layers.23.mlp.gate",
+      "model.layers.24.mlp.gate",
+      "model.layers.25.mlp.gate",
+      "model.layers.26.mlp.gate",
+      "model.layers.27.mlp.gate",
+      "model.layers.28.mlp.gate",
+      "model.layers.29.mlp.gate",
+      "model.layers.3.mlp.gate",
+      "model.layers.30.mlp.gate",
+      "model.layers.31.mlp.gate",
+      "model.layers.32.mlp.gate",
+      "model.layers.33.mlp.gate",
+      "model.layers.34.mlp.gate",
+      "model.layers.35.mlp.gate",
+      "model.layers.36.mlp.gate",
+      "model.layers.37.mlp.gate",
+      "model.layers.38.mlp.gate",
+      "model.layers.39.mlp.gate",
+      "model.layers.4.mlp.gate",
+      "model.layers.40.mlp.gate",
+      "model.layers.41.mlp.gate",
+      "model.layers.42.mlp.gate",
+      "model.layers.43.mlp.gate",
+      "model.layers.44.mlp.gate",
+      "model.layers.45.mlp.gate",
+      "model.layers.46.mlp.gate",
+      "model.layers.47.mlp.gate",
+      "model.layers.5.mlp.gate",
+      "model.layers.6.mlp.gate",
+      "model.layers.7.mlp.gate",
+      "model.layers.8.mlp.gate",
+      "model.layers.9.mlp.gate"
+    ],
+    "quant_algo": "NVFP4",
+    "kv_cache_scheme": {
+      "dynamic": false,
+      "num_bits": 8,
+      "type": "float"
+    },
+    "producer": {
+      "name": "modelopt",
+      "version": "spark-maker-v3"
+    },
+    "quant_method": "modelopt"
+  }
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.05,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "5.2.0"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model-00001-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a18eec4121f8a8b2995c10d70a143156a44c0ef3e8492278c00a4cfd83faa2c
+size 4983536328

model-00002-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:12bc6f12aee5c3d3e687e08a2a1db718a65a9748bf453d7a0c6e7504faafd9e3
+size 4985162232

model-00003-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9952f65e81af77e1a32ae9c7b6b3bf2636c655177d66a8b9826ad90aace63375
+size 4985162624

model-00004-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d5582d3e191ef8b7218f13096f758d5d5f648471564556e7bc2cb1bd9522bf2
+size 4985163840

model-00005-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:90ce43163ec9f20758fc580c1341c4c33fd4881f5727e28ce44e8687087abb48
+size 4985163840

model-00006-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3868292be9a82e9981bf9f135af80393bf266eecfc02aadc95ad61481053b94
+size 4985163840

model-00007-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37f09a9120da23b618182c14559a9938efeca6cf7e30f7005b362aed7de32a50
+size 4985163840

model-00008-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a86027d8cf9cca275245be7a50ac9737af34d393fe88706fb423b3bc0cc78a7
+size 4985163840

model-00009-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f36d603c3c2331ad6734af98dae4acf0482a440e37dd46f20ebcd46c8b35b822
+size 4985163840

model-00010-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a20cbb84a72429236a85764af1c507f454cf867a32f3a24e2fb58ef4dfa3fff6
+size 4985163840

model-00011-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eacf05ef40177d0a204c65bb1c2f2a05c33640fc10a12ee8e167663df1b67a6c
+size 4985163840

model-00012-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcdb40014518a76790006917c070a371280a6440dd2165911eac25b0377f2d4e
+size 4985163840

model-00013-of-00013.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64ae03adf7887f50a8358c85eba191f42005bf577faecccbd376dfff88d1e98f
+size 1246290440

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

spark_quantizer_provenance.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "source_model": "Qwen/Qwen3-Coder-30B-A3B-Instruct",
+  "quantization": "NVFP4",
+  "tool": "nvidia-modelopt",
+  "export_method": "save_pretrained_manual",
+  "calib_size": 512,
+  "calib_dataset": "synthetic-random",
+  "hardware": "NVIDIA GB10 (Blackwell)",
+  "offload_used": true,
+  "elapsed_sec": 472
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "is_local": false,
+  "model_max_length": 1048576,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff