Instructions to use mlx-community/LongCat-Flash-Lite-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/LongCat-Flash-Lite-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/LongCat-Flash-Lite-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Transformers

How to use mlx-community/LongCat-Flash-Lite-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mlx-community/LongCat-Flash-Lite-4bit", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mlx-community/LongCat-Flash-Lite-4bit", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

vLLM

How to use mlx-community/LongCat-Flash-Lite-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mlx-community/LongCat-Flash-Lite-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlx-community/LongCat-Flash-Lite-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mlx-community/LongCat-Flash-Lite-4bit

SGLang

How to use mlx-community/LongCat-Flash-Lite-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mlx-community/LongCat-Flash-Lite-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlx-community/LongCat-Flash-Lite-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mlx-community/LongCat-Flash-Lite-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mlx-community/LongCat-Flash-Lite-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

How to use mlx-community/LongCat-Flash-Lite-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/LongCat-Flash-Lite-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/LongCat-Flash-Lite-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/LongCat-Flash-Lite-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/LongCat-Flash-Lite-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/LongCat-Flash-Lite-4bit

Run Hermes

hermes

MLX LM

How to use mlx-community/LongCat-Flash-Lite-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/LongCat-Flash-Lite-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/LongCat-Flash-Lite-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/LongCat-Flash-Lite-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use mlx-community/LongCat-Flash-Lite-4bit with Docker Model Runner:
```
docker model run hf.co/mlx-community/LongCat-Flash-Lite-4bit
```

kernelpool commited on Jan 29

Commit

ea039fc

verified ·

1 Parent(s): 1cd6799

Add files using upload-large-folder tool

Browse files

Files changed (18) hide show

README.md +37 -0
chat_template.jinja +81 -0
config.json +173 -0
configuration_longcat_ngram.py +216 -0
generation_config.json +7 -0
model-00001-of-00008.safetensors +3 -0
model-00002-of-00008.safetensors +3 -0
model-00003-of-00008.safetensors +3 -0
model-00004-of-00008.safetensors +3 -0
model-00005-of-00008.safetensors +3 -0
model-00006-of-00008.safetensors +3 -0
model-00007-of-00008.safetensors +3 -0
model-00008-of-00008.safetensors +3 -0
model.safetensors.index.json +1053 -0
modeling_longcat_ngram.py +338 -0
parse_model_response.py +236 -0
tokenizer.json +0 -0
tokenizer_config.json +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,37 @@

+---
+license: mit
+library_name: mlx
+pipeline_tag: text-generation
+tags:
+- transformers
+- mlx
+base_model: meituan-longcat/LongCat-Flash-Lite
+---
+# mlx-community/LongCat-Flash-Lite-4bit
+This model [mlx-community/LongCat-Flash-Lite-4bit](https://huggingface.co/mlx-community/LongCat-Flash-Lite-4bit) was
+converted to MLX format from [meituan-longcat/LongCat-Flash-Lite](https://huggingface.co/meituan-longcat/LongCat-Flash-Lite)
+using mlx-lm version **0.30.5**.
+## Use with mlx
+```bash
+pip install mlx-lm
+```
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("mlx-community/LongCat-Flash-Lite-4bit")
+prompt = "hello"
+if tokenizer.chat_template is not None:
+    messages = [{"role": "user", "content": prompt}]
+    prompt = tokenizer.apply_chat_template(
+        messages, add_generation_prompt=True, return_dict=False,
+    )
+response = generate(model, tokenizer, prompt=prompt, verbose=True)
+```

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,81 @@

+{%- set tool_choice = tool_choice | default('auto') %}
+{%- set ns = namespace(tool_types = [], last_query_index = -1) %}
+{%- if tools and tool_choice != 'none' %}
+    {{- "<longcat_tool_declare>\n"-}}
+    {{- "# Tools\n" }}
+    {{- "You have access to the following tools:\n\n" }}
+    {%- for tool in tools %}
+        {%- if tool.type not in ns.tool_types %}
+            {%- set ns.tool_types = ns.tool_types + [tool.type] %}
+            {{- "## Tool namespace: " ~ tool.type ~ "\n\n" }}
+        {%- endif %}
+        {%- if tool.type == 'code_interpreter' %}
+            {%- set tool = {"type":"code_interpreter","function":{"name":"code_interpreter_preview","description":"The code will be executed in a stateful Jupyter notebook sandbox environment, only supports local computation, data processing, and file operations.\nCode sandbox environment (network isolated) Any external network requests or online API calls are prohibited.\nIf online functionality is needed, please use other permitted tools.\nCode will respond with the output of the execution or time out after 60.0 seconds. ","parameters":{"type":"object","properties":{"language":{"type":"string","description":"The programming language of the code to be executed. Available values: python (Default), java, go, js, ts, c, c++."},"code":{"type":"string","description":"Python code to be executed must not include the following:\n- Importing network libraries such as requests, httplib, etc.\n- Any form of HTTP requests.\n- External API calls.\n- Network port operations. Example: ```python\nimport pandas as pd\npd.DataFrame({'A':[1,2]})\n```"},"timeout":{"type":"number","description":"The maximum execution time of the code, in seconds. Default is 60.0."}}},"required":["code"]}} %}
+        {%- endif %}
+        {{- "### Tool name: " + tool.function.name + "\n" }}
+        {{- "Description: " + tool.function.description + "\n\n" }}
+        {{- "InputSchema: " + tool.function.parameters | tojson(ensure_ascii=False) + "\n\n" }}
+    {%- endfor %}
+    {{- '**Note**: For each function call, output the function name and arguments within the following XML format:\n<longcat_tool_call>{function-name}\n<longcat_arg_key>{arg-key-1}</longcat_arg_key>\n<longcat_arg_value>{arg-value-1}</longcat_arg_value>\n<longcat_arg_key>{arg-key-2}</longcat_arg_key>\n<longcat_arg_value>{arg-value-2}</longcat_arg_value>\n...\n</longcat_tool_call>\n' }}
+    {{- "</longcat_tool_declare>"-}}
+    {%- for idx in range(messages|length - 1) %}
+        {%- set msg = messages[idx] %}
+        {%- if msg.role == 'assistant' and not msg.tool_calls %}
+            {%- set ns.last_query_index = idx %}
+        {%- endif %}
+    {%- endfor%}
+{%- endif %}
+{%- for msg in messages %}
+    {%- if msg.role == "system" %}
+        {{- "<longcat_system>" + msg.content }}
+    {%- elif msg.role == "user" %}
+        {{- "<longcat_user>" }}
+        {%- if msg["files"] %}
+            {{- '<longcat_files>\n' ~ msg.files | tojson(indent=2) ~ '\n</longcat_files>' }}
+        {%- endif %}
+        {{- msg.content }}
+    {%- elif msg.role == "assistant" %}
+        {{- "<longcat_assistant>" }}
+        {%- if enable_thinking == true and msg.reasoning_content and ns.tool_types != [] and loop.index0 > ns.last_query_index %}
+            {{- "\n<longcat_think>\n" ~ msg.reasoning_content ~ "\n</longcat_think>\n" }}
+        {%- endif %}
+        {%- if msg.content%}
+            {{- msg.content }}
+        {%- endif %}
+        {%- if msg.tool_calls %}
+            {%- for tool_call in msg.tool_calls -%}
+                {{- "<longcat_tool_call>" ~ tool_call.function.name ~ "\n" -}}
+                {% set _args = tool_call.function.arguments %}
+                {% for k, v in _args.items() %}
+                    {{- "<longcat_arg_key>" ~ k ~ "</longcat_arg_key>\n" -}}
+                    {{- "<longcat_arg_value>" ~ (v if v is string else v | tojson(ensure_ascii=False)) ~ "</longcat_arg_value>\n" -}}
+                {% endfor %}
+                {{- "</longcat_tool_call>\n" }}
+            {%- endfor %}
+        {%- endif %}
+        {{- "</longcat_s>" -}}
+    {%- elif msg.role == "tool" %}
+        {%- if messages[loop.index0 - 1].role != "tool"%}
+            {{- "<longcat_user>" -}}
+        {%- endif %}
+        {{- "<longcat_tool_response>" ~ msg.content ~ "</longcat_tool_response>"-}}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {%- if enable_thinking == true %}
+        {{- " /think_on" }}
+        {%- if thinking_budget %}
+            {%- if thinking_budget < 1024 %}
+                {%- set thinking_budget = 1024 %}
+            {%- endif%}
+            {{- "\nthinking_budget: < " ~ thinking_budget ~ "."}}
+        {%- endif %}
+        {{- " <longcat_assistant><longcat_think>\n"}}
+    {%- elif enable_thinking == false %}
+        {{- " /think_off <longcat_assistant><longcat_think>\n\n</longcat_think>\n" }}
+    {%- else %}
+        {{- "<longcat_assistant>" }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,173 @@

+{
+    "architectures": [
+        "LongcatFlashNgramForCausalLM"
+    ],
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "auto_map": {
+        "AutoConfig": "configuration_longcat_ngram.LongcatFlashNgramConfig",
+        "AutoModel": "modeling_longcat_ngram.LongcatFlashNgramModel",
+        "AutoModelForCausalLM": "modeling_longcat_ngram.LongcatFlashNgramForCausalLM"
+    },
+    "bos_token_id": 1,
+    "emb_neighbor_num": 4,
+    "emb_split_num": 4,
+    "eos_token_id": 2,
+    "expert_ffn_hidden_size": 1024,
+    "ffn_hidden_size": 6144,
+    "hidden_size": 3072,
+    "kv_lora_rank": 512,
+    "max_position_embeddings": 327680,
+    "mla_scale_kv_lora": true,
+    "mla_scale_q_lora": true,
+    "model_type": "longcat_flash_ngram",
+    "moe_topk": 12,
+    "n_routed_experts": 256,
+    "ngram_vocab_size_ratio": 78,
+    "num_attention_heads": 32,
+    "num_layers": 14,
+    "q_lora_rank": 1536,
+    "qk_nope_head_dim": 128,
+    "qk_rope_head_dim": 64,
+    "quantization": {
+        "group_size": 64,
+        "bits": 4,
+        "mode": "affine",
+        "model.layers.0.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.1.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.2.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.3.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.4.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.5.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.6.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.7.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.8.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.9.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.10.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.11.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.12.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.13.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        }
+    },
+    "quantization_config": {
+        "group_size": 64,
+        "bits": 4,
+        "mode": "affine",
+        "model.layers.0.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.1.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.2.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.3.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.4.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.5.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.6.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.7.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.8.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.9.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.10.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.11.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.12.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        },
+        "model.layers.13.mlp.router.classifier": {
+            "group_size": 64,
+            "bits": 8
+        }
+    },
+    "rms_norm_eps": 1e-05,
+    "rope_scaling": {
+        "original_max_position_embeddings": 32768,
+        "rope_type": "yarn",
+        "factor": 10,
+        "beta_fast": 32,
+        "beta_slow": 1,
+        "mscale": 1,
+        "mscale_all_dim": 1
+    },
+    "rope_theta": 5000000.0,
+    "routed_scaling_factor": 6.0,
+    "torch_dtype": "bfloat16",
+    "transformers_version": "4.57.6",
+    "use_cache": true,
+    "v_head_dim": 128,
+    "vocab_size": 131072,
+    "zero_expert_num": 128,
+    "zero_expert_type": "identity"
+}

configuration_longcat_ngram.py ADDED Viewed

	@@ -0,0 +1,216 @@

+from transformers.models.longcat_flash import LongcatFlashConfig
+class LongcatFlashNgramConfig(LongcatFlashConfig):
+    r"""
+    This is the configuration class to store the configuration of a [`LongcatFlashNgramModel`]. It is used to instantiate
+    a LongCat Flash model with N-gram enhanced embeddings according to the specified arguments, defining the model architecture.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
+    documentation from [`PretrainedConfig`] for more information.
+    Args:
+        vocab_size (`int`, *optional*, defaults to 131072):
+            Vocabulary size of the LongCat Flash model. Defines the number of different tokens that can be represented by the
+            `input_ids` passed when calling [`LongcatFlashNgramModel`]
+        hidden_size (`int`, *optional*, defaults to 6144):
+            Dimension of the hidden representations.
+        num_hidden_layers (`int`, *optional*, defaults to 56):
+            Number of hidden layers in the Transformer decoder.
+        num_layers (`int`, *optional*, defaults to 28):
+            Number of layers, each with 2 sublayers.
+        num_attention_heads (`int`, *optional*, defaults to 64):
+            Number of attention heads for each attention layer in the Transformer decoder.
+        num_key_value_heads (`int`, *optional*):
+            This is the number of key_value heads that should be used to implement Grouped Query Attention. If
+            `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
+            `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
+            converting from a multi-head checkpoint to a GQA checkpoint, each group key and value head should be
+            constructed by meanpooling all the original heads within that group. For more details checkout [this
+            paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
+            `num_attention_heads`.
+        hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
+            The non-linear activation function (function or string) in the decoder.
+        max_position_embeddings (`int`, *optional*, defaults to 131072):
+            The maximum sequence length that this model might ever be used with. Typically set this to something large
+            just in case (e.g., 512 or 1024 or 2048).
+        initializer_range (`float`, *optional*, defaults to 0.02):
+            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
+        rms_norm_eps (`float`, *optional*, defaults to 1e-05):
+            The epsilon value used by the RMS normalization layers.
+        use_cache (`bool`, *optional*, defaults to `True`):
+            Whether or not the model should return the last key/values attentions (not used by all models). Only
+            relevant if `config.is_decoder=True`.
+        pad_token_id (`int`, *optional*):
+            Padding token id.
+        bos_token_id (`int`, *optional*, defaults to 1):
+            Beginning of stream token id.
+        eos_token_id (`int`, *optional*, defaults to 2):
+            End of stream token id.
+        tie_word_embeddings (`bool`, *optional*, defaults to `False`):
+            Whether to tie input and output embeddings.
+        rope_theta (`float`, *optional*, defaults to 10000000.0):
+            The base period of the RoPE embeddings.
+        rope_scaling (`Dict`, *optional*):
+            Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
+            strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
+            `{"type": strategy name, "factor": scaling factor}`.
+        attention_bias (`bool`, *optional*, defaults to `False`):
+            Whether to use a bias in the query, key, value and output projection layers during self-attention.
+        attention_dropout (`float`, *optional*, defaults to 0.0):
+            The dropout ratio for the attention probabilities.
+        ffn_hidden_size (`int`, *optional*, defaults to 12288):
+            Dimension of the MLP representations.
+        q_lora_rank (`int`, *optional*, defaults to 1536):
+            The rank of the query LoRA projection in MLA (Multi-head Latent Attention).
+        kv_lora_rank (`int`, *optional*, defaults to 512):
+            The rank of the key-value LoRA projection in MLA.
+        qk_nope_head_dim (`int`, *optional*, defaults to 128):
+            The dimension of the non-position encoding part of query/key heads.
+        qk_rope_head_dim (`int`, *optional*, defaults to 64):
+            The dimension of the RoPE part of query/key heads.
+        head_dim (`int`, *optional*, defaults to 64):
+            Standard dimension of qk heads, unused except for CI.
+        v_head_dim (`int`, *optional*, defaults to 128):
+            The dimension of value heads.
+        qk_head_dim (`int`, *optional*):
+            The total dimension of query/key heads. If not specified, set to `qk_nope_head_dim + qk_rope_head_dim`.
+        moe_topk (`int`, *optional*, defaults to 12):
+            Number of experts to route to for each token in the MoE layer.
+        n_routed_experts (`int`, *optional*, defaults to 512):
+            Number of routed experts in the MoE layer.
+        zero_expert_num (`int`, *optional*, defaults to 256):
+            Number of zero experts (identity function) to add to the expert pool.
+        expert_ffn_hidden_size (`int`, *optional*, defaults to 2048):
+            Hidden size of individual expert FFN layers.
+        routed_scaling_factor (`float`, *optional*, defaults to 6.0):
+            Scaling factor applied to the routing weights.
+        emb_neighbor_num (`int`, *optional*):
+            Maximum N-gram length for N-gram embeddings. This parameter determines the context window size for N-gram computation. Higher values capture
+            longer-range lexical patterns but increase memory usage.
+        emb_split_num (`int`, *optional*):
+            Number of hash functions (or splits) to use for N-gram embeddings. Multiple hash functions help improve the quality of N-gram representations.
+        ngram_vocab_size_ratio (`float`, *optional*):
+            Ratio multiplier for N-gram vocabulary size relative to the base vocabulary size. The N-gram vocabulary
+            size is calculated as `vocab_size * ngram_vocab_size_ratio`.
+    Example:
+    ```python
+    >>> from transformers import LongcatFlashNgramModel, LongcatFlashNgramConfig
+    >>> # Initializing a LongCat Flash N-gram style configuration
+    >>> configuration = LongcatFlashNgramConfig(
+    ...     emb_neighbor_num=3,
+    ...     emb_split_num=4,
+    ...     ngram_vocab_size_ratio=1.5
+    ... )
+    >>> # Initializing a model from the configuration
+    >>> model = LongcatFlashNgramModel(configuration)
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+    model_type = "longcat_flash_ngram"
+    keys_to_ignore_at_inference = ["past_key_values"]
+    base_model_tp_plan = {
+        "layers.*.self_attn.*.q_b_proj": "colwise",
+        "layers.*.self_attn.*.kv_b_proj": "colwise",
+        "layers.*.self_attn.*.o_proj": "rowwise",
+        "layers.*.mlps.*.gate_proj": "colwise",
+        "layers.*.mlps.*.up_proj": "colwise",
+        "layers.*.mlps.*.down_proj": "rowwise",
+        "layers.*.mlp.experts.*.gate_proj": "colwise",
+        "layers.*.mlp.experts.*.up_proj": "colwise",
+        "layers.*.mlp.experts.*.down_proj": "rowwise",
+    }
+    base_model_pp_plan = {
+        "embed_tokens": (["input_ids"], ["inputs_embeds"]),
+        "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
+        "norm": (["hidden_states"], ["hidden_states"]),
+    }
+    def __init__(
+        self,
+        vocab_size=131072,
+        hidden_size=6144,
+        num_hidden_layers=56,
+        num_layers=28,
+        num_attention_heads=64,
+        num_key_value_heads=None,
+        hidden_act="silu",
+        max_position_embeddings=131072,
+        initializer_range=0.02,
+        rms_norm_eps=1e-5,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=1,
+        eos_token_id=2,
+        tie_word_embeddings=False,
+        rope_theta=10000000.0,
+        rope_scaling=None,
+        attention_bias=False,
+        attention_dropout=0.0,
+        ffn_hidden_size=12288,
+        q_lora_rank=1536,
+        kv_lora_rank=512,
+        qk_nope_head_dim=128,
+        qk_rope_head_dim=64,
+        head_dim=64,
+        v_head_dim=128,
+        qk_head_dim=None,
+        moe_topk=12,
+        n_routed_experts=512,
+        zero_expert_num=256,
+        expert_ffn_hidden_size=2048,
+        routed_scaling_factor=6.0,
+        emb_neighbor_num=None,
+        emb_split_num=None,
+        ngram_vocab_size_ratio=None,
+        **kwargs,
+    ):
+        # N-gram embedding specific parameters
+        self.emb_neighbor_num = emb_neighbor_num
+        self.emb_split_num = emb_split_num
+        self.ngram_vocab_size_ratio = ngram_vocab_size_ratio
+        super().__init__(
+            vocab_size=vocab_size,
+            hidden_size=hidden_size,
+            num_hidden_layers=num_hidden_layers,
+            num_layers=num_layers,
+            num_attention_heads=num_attention_heads,
+            num_key_value_heads=num_key_value_heads,
+            hidden_act=hidden_act,
+            max_position_embeddings=max_position_embeddings,
+            initializer_range=initializer_range,
+            rms_norm_eps=rms_norm_eps,
+            use_cache=use_cache,
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            rope_theta=rope_theta,
+            rope_scaling=rope_scaling,
+            attention_bias=attention_bias,
+            attention_dropout=attention_dropout,
+            ffn_hidden_size=ffn_hidden_size,
+            q_lora_rank=q_lora_rank,
+            kv_lora_rank=kv_lora_rank,
+            qk_nope_head_dim=qk_nope_head_dim,
+            qk_rope_head_dim=qk_rope_head_dim,
+            head_dim=head_dim,
+            v_head_dim=v_head_dim,
+            qk_head_dim=qk_head_dim,
+            moe_topk=moe_topk,
+            n_routed_experts=n_routed_experts,
+            zero_expert_num=zero_expert_num,
+            expert_ffn_hidden_size=expert_ffn_hidden_size,
+            routed_scaling_factor=routed_scaling_factor,
+            **kwargs,
+        )
+__all__ = ["LongcatFlashNgramConfig"]

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 3,
+  "transformers_version": "4.55.0"
+}

model-00001-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:712e891edf8cdb7ce4fdae86a2fb62b5da3bbb9b25914e3a2cf1379941874ef5
+size 4643097317

model-00002-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d7ca2240289eb3285d1e79759d2be511fdf7829b6712fa8795f8e411cfde7d62
+size 4416607121

model-00003-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:583d349b2c08dad77e9e9b20edf853948d3cad33ef2e6e7a5fb036c18c8913f7
+size 4416609707

model-00004-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:72d24e8271a1ed4d1fd12cd153174e9f68f0261755f2bdc334ba1cbd82536109
+size 5327895389

model-00005-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3c14cd0aba0e1edffeafadd547ccf29fc0a07d09e2140c80c31e2ff8aa1d705
+size 5364815506

model-00006-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ccf365e928c643b1c5cd0c5c67d3b7d0ebd4200d1e9b75e220e5c1b0ecbc2d05
+size 5362036433

model-00007-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e743cb169a58afe6a44dda1dad6e8678b561ef8ed637f8fba61b3b51612fdc0a
+size 5341306407

model-00008-of-00008.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e78319eab47ad7d5e59ed0adf2aec8fdca53a7515e34bd31f20b8bbb86337a2b
+size 3702753648

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,1053 @@

+{
+    "metadata": {
+        "total_size": 38574996736,
+        "total_parameters": 68562459648
+    },
+    "weight_map": {
+        "lm_head.biases": "model-00008-of-00008.safetensors",
+        "lm_head.scales": "model-00008-of-00008.safetensors",
+        "lm_head.weight": "model-00008-of-00008.safetensors",
+        "model.layers.0.input_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.input_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.router.classifier.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.router.classifier.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.router.classifier.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.router.e_score_correction_bias": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.gate_proj.biases": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.gate_proj.scales": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.gate_proj.weight": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.up_proj.biases": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.up_proj.scales": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlp.switch_mlp.up_proj.weight": "model-00004-of-00008.safetensors",
+        "model.layers.0.mlps.0.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.0.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.mlps.1.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.post_attention_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.post_attention_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.0.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.0.self_attn.1.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.input_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.input_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.router.classifier.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.router.classifier.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.router.classifier.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.router.e_score_correction_bias": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlp.switch_mlp.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.0.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.mlps.1.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.post_attention_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.post_attention_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.0.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.1.self_attn.1.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.10.input_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.input_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.router.classifier.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.router.classifier.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.router.classifier.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.router.e_score_correction_bias": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlp.switch_mlp.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.0.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.mlps.1.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.post_attention_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.post_attention_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.0.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.10.self_attn.1.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.11.input_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.input_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.router.classifier.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.router.classifier.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.router.classifier.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.router.e_score_correction_bias": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlp.switch_mlp.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.11.mlps.0.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.0.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.mlps.1.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.post_attention_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.post_attention_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.0.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.11.self_attn.1.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.input_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.input_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.router.classifier.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.router.classifier.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.router.classifier.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.router.e_score_correction_bias": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlp.switch_mlp.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.0.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.mlps.1.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.post_attention_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.post_attention_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.0.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.12.self_attn.1.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.input_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.input_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.router.classifier.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.router.classifier.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.router.classifier.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.router.e_score_correction_bias": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlp.switch_mlp.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.0.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.down_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.down_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.down_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.gate_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.gate_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.gate_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.up_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.up_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.mlps.1.up_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.post_attention_layernorm.0.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.post_attention_layernorm.1.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.0.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_a_proj_with_mqa.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_a_proj_with_mqa.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_a_proj_with_mqa.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.kv_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.o_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.o_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.o_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_a_layernorm.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_a_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_a_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_a_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_b_proj.biases": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_b_proj.scales": "model-00008-of-00008.safetensors",
+        "model.layers.13.self_attn.1.q_b_proj.weight": "model-00008-of-00008.safetensors",
+        "model.layers.2.input_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.input_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.router.classifier.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.router.classifier.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.router.classifier.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.router.e_score_correction_bias": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlp.switch_mlp.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.0.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.mlps.1.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.post_attention_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.post_attention_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.0.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.2.self_attn.1.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.input_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.input_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.router.classifier.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.router.classifier.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.router.classifier.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.router.e_score_correction_bias": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlp.switch_mlp.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.0.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.down_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.down_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.down_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.gate_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.up_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.up_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.mlps.1.up_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.post_attention_layernorm.0.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.post_attention_layernorm.1.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.0.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_a_proj_with_mqa.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_a_proj_with_mqa.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_a_proj_with_mqa.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.kv_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.o_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.o_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.o_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_a_layernorm.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_a_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_a_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_a_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_b_proj.biases": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_b_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.3.self_attn.1.q_b_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.4.input_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.input_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.router.classifier.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.router.classifier.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.router.classifier.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.router.e_score_correction_bias": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.gate_proj.scales": "model-00005-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.gate_proj.weight": "model-00005-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlp.switch_mlp.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.0.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.mlps.1.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.post_attention_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.post_attention_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.0.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.4.self_attn.1.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.input_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.input_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.router.classifier.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.router.classifier.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.router.classifier.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.router.e_score_correction_bias": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlp.switch_mlp.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.0.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.mlps.1.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.post_attention_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.post_attention_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.0.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.5.self_attn.1.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.input_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.input_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.router.classifier.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.router.classifier.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.router.classifier.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.router.e_score_correction_bias": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlp.switch_mlp.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.0.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.mlps.1.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.post_attention_layernorm.0.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.post_attention_layernorm.1.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.0.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.6.self_attn.1.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.input_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.input_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlp.router.classifier.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.router.classifier.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.router.classifier.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.router.e_score_correction_bias": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.down_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.down_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.down_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.up_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.up_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlp.switch_mlp.up_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlps.0.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.0.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.0.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.0.gate_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlps.0.gate_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlps.0.gate_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.mlps.0.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.0.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.0.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.7.mlps.1.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.post_attention_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.post_attention_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.0.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_a_proj_with_mqa.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_a_proj_with_mqa.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_a_proj_with_mqa.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.kv_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.o_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.o_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.o_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_a_layernorm.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_a_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_a_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_a_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_b_proj.biases": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_b_proj.scales": "model-00006-of-00008.safetensors",
+        "model.layers.7.self_attn.1.q_b_proj.weight": "model-00006-of-00008.safetensors",
+        "model.layers.8.input_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.input_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.router.classifier.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.router.classifier.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.router.classifier.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.router.e_score_correction_bias": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlp.switch_mlp.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.0.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.mlps.1.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.post_attention_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.post_attention_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.0.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.8.self_attn.1.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.input_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.input_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.router.classifier.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.router.classifier.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.router.classifier.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.router.e_score_correction_bias": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlp.switch_mlp.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.0.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.down_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.down_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.down_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.gate_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.gate_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.gate_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.up_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.up_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.mlps.1.up_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.post_attention_layernorm.0.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.post_attention_layernorm.1.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.0.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_a_proj_with_mqa.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_a_proj_with_mqa.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_a_proj_with_mqa.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.kv_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.o_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.o_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.o_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_a_layernorm.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_a_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_a_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_a_proj.weight": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_b_proj.biases": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_b_proj.scales": "model-00007-of-00008.safetensors",
+        "model.layers.9.self_attn.1.q_b_proj.weight": "model-00007-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.0.biases": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.0.scales": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.0.weight": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.1.biases": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.1.scales": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.1.weight": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.10.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.10.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.10.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.11.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.11.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.11.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.2.biases": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.2.scales": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.2.weight": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.3.biases": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.3.scales": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.3.weight": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.4.biases": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.4.scales": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.4.weight": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.5.biases": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.5.scales": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.5.weight": "model-00002-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.6.biases": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.6.scales": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.6.weight": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.7.biases": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.7.scales": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.7.weight": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.8.biases": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.8.scales": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.8.weight": "model-00003-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.9.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.9.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.embedders.9.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.0.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.0.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.0.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.1.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.1.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.1.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.10.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.10.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.10.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.11.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.11.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.11.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.2.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.2.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.2.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.3.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.3.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.3.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.4.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.4.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.4.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.5.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.5.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.5.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.6.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.6.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.6.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.7.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.7.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.7.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.8.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.8.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.8.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.9.biases": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.9.scales": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.post_projs.9.weight": "model-00004-of-00008.safetensors",
+        "model.ngram_embeddings.word_embeddings.biases": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.word_embeddings.scales": "model-00001-of-00008.safetensors",
+        "model.ngram_embeddings.word_embeddings.weight": "model-00001-of-00008.safetensors",
+        "model.norm.weight": "model-00008-of-00008.safetensors"
+    }
+}

modeling_longcat_ngram.py ADDED Viewed

	@@ -0,0 +1,338 @@

+# -*- coding: utf-8 -*-
+# Copyright (c) 2025 Meituan
+# This code is licensed under the MIT License, for details, see the ./LICENSE file.
+from typing import Optional, Tuple, Dict, List
+import torch
+from torch import nn
+from transformers.cache_utils import Cache, DynamicCache
+from transformers.masking_utils import create_causal_mask
+from transformers.modeling_outputs import BaseModelOutputWithPast
+from transformers.processing_utils import Unpack
+from transformers.utils import auto_docstring, logging
+from transformers.models.longcat_flash.modeling_longcat_flash import (
+    LongcatFlashForCausalLM,
+    LongcatFlashModel,
+    LongcatFlashRMSNorm,
+    LongcatFlashRotaryEmbedding,
+    LongcatFlashDecoderLayer,
+    LongcatFlashPreTrainedModel,
+)
+from .configuration_longcat_ngram import LongcatFlashNgramConfig
+logger = logging.get_logger(__name__)
+@auto_docstring
+class LongcatFlashNgramPreTrainedModel(LongcatFlashPreTrainedModel):
+    pass
+class NgramCache(DynamicCache):
+    """
+    Extended DynamicCache for storing N-gram context alongside KV cache.
+    """
+    def __init__(self, config=None):
+        super().__init__()
+        self.ngram_context = None
+        # Keep only n-1 tokens (minimum needed for N-gram computation)
+        self.max_context_len = config.emb_neighbor_num - 1
+    def update_ngram_context(self, new_tokens: torch.Tensor) -> None:
+        """
+        Update N-gram context with window management.
+        Args:
+            new_tokens: New tokens to append, shape (batch_size, seq_len)
+        """
+        if self.ngram_context is None:
+            self.ngram_context = new_tokens.clone()
+        else:
+            self.ngram_context = torch.cat([self.ngram_context, new_tokens], dim=-1)
+        # Truncate to maintain constant memory footprint
+        if self.ngram_context.size(-1) > self.max_context_len:
+            self.ngram_context = self.ngram_context[..., -self.max_context_len:]
+    def reorder_cache(self, beam_idx: torch.LongTensor) -> "Cache":
+        """Reorder cache for beam search."""
+        # Reorder parent's KV cache
+        super().reorder_cache(beam_idx)
+        # Reorder N-gram context
+        if self.ngram_context is not None:
+            self.ngram_context = self.ngram_context.index_select(0, beam_idx.to(self.ngram_context.device))
+        return self
+class NgramEmbedding(nn.Module):
+    """
+    Computes embeddings enriched with N-gram features without maintaining internal state.
+    """
+    def __init__(self, config, base_embeddings):
+        super().__init__()
+        self.config = config
+        self.word_embeddings = base_embeddings
+        self.m = config.ngram_vocab_size_ratio * config.vocab_size
+        self.k = config.emb_split_num
+        self.n = config.emb_neighbor_num
+        self._init_ngram_embeddings()
+        self._vocab_mods_cache = None
+    def _init_ngram_embeddings(self) -> None:
+        """Initialize N-gram embedding and projection layers."""
+        num_embedders = self.k * (self.n - 1)
+        emb_dim = self.config.hidden_size // num_embedders
+        embedders = []
+        post_projs = []
+        for i in range(num_embedders):
+            vocab_size = int(self.m + i * 2 + 1)
+            emb = nn.Embedding(vocab_size, emb_dim, padding_idx=self.config.pad_token_id)
+            proj = nn.Linear(emb_dim, self.config.hidden_size, bias=False)
+            embedders.append(emb)
+            post_projs.append(proj)
+        self.embedders = nn.ModuleList(embedders)
+        self.post_projs = nn.ModuleList(post_projs)
+    def _shift_right_ignore_eos(self, tensor: torch.Tensor, n: int, eos_token_id: int = 2) -> torch.Tensor:
+        """Shift tensor right by n positions, resetting at EOS tokens."""
+        batch_size, seq_len = tensor.shape
+        result = torch.zeros_like(tensor)
+        eos_mask = (tensor == eos_token_id)
+        for i in range(batch_size):
+            eos_positions = eos_mask[i].nonzero(as_tuple=True)[0]
+            prev_idx = 0
+            for eos_idx in eos_positions:
+                end_idx = eos_idx.item() + 1
+                if end_idx - prev_idx > n:
+                    result[i, prev_idx+n:end_idx] = tensor[i, prev_idx:end_idx-n]
+                prev_idx = end_idx
+            if prev_idx < seq_len and seq_len - prev_idx > n:
+                result[i, prev_idx+n:seq_len] = tensor[i, prev_idx:seq_len-n]
+        return result
+    def _precompute_vocab_mods(self) -> Dict[Tuple[int, int], List[int]]:
+        """Precompute modular arithmetic values for vocabulary."""
+        if self._vocab_mods_cache is not None:
+            return self._vocab_mods_cache
+        vocab_mods = {}
+        vocab_size = self.config.vocab_size
+        for i in range(2, self.n + 1):
+            for j in range(self.k):
+                index = (i - 2) * self.k + j
+                emb_vocab_dim = int(self.m + index * 2 + 1)
+                mods = []
+                power_mod = 1
+                for _ in range(i - 1):
+                    power_mod = (power_mod * vocab_size) % emb_vocab_dim
+                    mods.append(power_mod)
+                vocab_mods[(i, j)] = mods
+        self._vocab_mods_cache = vocab_mods
+        return vocab_mods
+    def _get_ngram_ids(
+        self,
+        input_ids: torch.Tensor,
+        shifted_ids: Dict[int, torch.Tensor],
+        vocab_mods: List[int],
+        ngram: int
+    ) -> torch.Tensor:
+        """Compute N-gram hash IDs using polynomial rolling hash."""
+        ngram_ids = input_ids.clone()
+        for k in range(2, ngram + 1):
+            ngram_ids = ngram_ids + shifted_ids[k] * vocab_mods[k - 2]
+        return ngram_ids
+    def forward(
+        self,
+        input_ids: torch.Tensor,
+        ngram_context: Optional[torch.Tensor] = None
+    ) -> torch.Tensor:
+        """
+        Stateless forward pass.
+        Args:
+            input_ids: Current input token IDs of shape (batch_size, seq_len)
+            ngram_context: Optional historical context of shape (batch_size, context_len)
+        Returns:
+            Embedding tensor of shape (batch_size, seq_len, hidden_size)
+        """
+        seq_len = input_ids.size(-1)
+        # Determine complete context
+        if ngram_context is not None:
+            context = torch.cat([ngram_context[..., -(self.n-1):], input_ids], dim=-1)
+        else:
+            context = input_ids
+        # Base word embeddings
+        device = self.word_embeddings.weight.device
+        x = self.word_embeddings(input_ids.to(device)).clone()
+        # Precompute modular values
+        vocab_mods = self._precompute_vocab_mods()
+        # Compute shifted IDs
+        shifted_ids = {}
+        for i in range(2, self.n + 1):
+            shifted_ids[i] = self._shift_right_ignore_eos(
+                context, i - 1, eos_token_id=self.config.eos_token_id
+            )
+        # Add N-gram embeddings
+        for i in range(2, self.n + 1):
+            for j in range(self.k):
+                index = (i - 2) * self.k + j
+                emb_vocab_dim = int(self.m + index * 2 + 1)
+                ngram_ids = self._get_ngram_ids(context, shifted_ids, vocab_mods[(i, j)], ngram=i)
+                new_ids = (ngram_ids % emb_vocab_dim)[..., -seq_len:]
+                embedder_device = self.embedders[index].weight.device
+                x_ngram = self.embedders[index](new_ids.to(embedder_device))
+                proj_device = self.post_projs[index].weight.device
+                x_proj = self.post_projs[index](x_ngram.to(proj_device))
+                x = x + x_proj.to(x.device)
+        # Normalize
+        x = x / (1 + self.k * (self.n - 1))
+        return x
+class LongcatFlashNgramModel(LongcatFlashModel):
+    """LongcatFlash model with N-gram enhanced embeddings."""
+    _keys_to_ignore_on_load_unexpected = [r"model\.mtp.*"]
+    config_class = LongcatFlashNgramConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
+        self.ngram_embeddings = NgramEmbedding(config, self.embed_tokens)
+        self.layers = nn.ModuleList(
+            [LongcatFlashDecoderLayer(config, layer_idx) for layer_idx in range(config.num_layers)]
+        )
+        self.head_dim = config.head_dim
+        self.config.num_hidden_layers = 2 * config.num_layers
+        self.norm = LongcatFlashRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
+        self.rotary_emb = LongcatFlashRotaryEmbedding(config=config)
+        self.gradient_checkpointing = False
+        self.post_init()
+    def forward(
+        self,
+        input_ids: Optional[torch.LongTensor] = None,
+        attention_mask: Optional[torch.Tensor] = None,
+        position_ids: Optional[torch.LongTensor] = None,
+        past_key_values: Optional[Cache] = None,
+        inputs_embeds: Optional[torch.FloatTensor] = None,
+        cache_position: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        **kwargs
+    ) -> BaseModelOutputWithPast:
+        if (input_ids is None) ^ (inputs_embeds is not None):
+            raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
+        # Extract N-gram context if available
+        ngram_context = None
+        if isinstance(past_key_values, NgramCache) and past_key_values.ngram_context is not None:
+            ngram_context = past_key_values.ngram_context
+        if inputs_embeds is None:
+            inputs_embeds = self.ngram_embeddings(input_ids, ngram_context=ngram_context)
+        # Initialize NgramCache if needed
+        if use_cache and past_key_values is None:
+            past_key_values = NgramCache(config=self.config)
+        # Update N-gram context
+        if use_cache and isinstance(past_key_values, NgramCache):
+            past_key_values.update_ngram_context(input_ids)
+        # Prepare cache position
+        if cache_position is None:
+            past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
+            cache_position = torch.arange(
+                inputs_embeds.shape[1], device=inputs_embeds.device
+            ) + past_seen_tokens
+        if position_ids is None:
+            position_ids = cache_position.unsqueeze(0)
+        # Create causal mask
+        causal_mask = create_causal_mask(
+            config=self.config,
+            input_embeds=inputs_embeds,
+            attention_mask=attention_mask,
+            cache_position=cache_position,
+            past_key_values=past_key_values,
+            position_ids=position_ids,
+        )
+        # Forward through decoder layers
+        hidden_states = inputs_embeds
+        position_embeddings = self.rotary_emb(hidden_states, position_ids)
+        for decoder_layer in self.layers[: self.config.num_layers]:
+            hidden_states = decoder_layer(
+                hidden_states,
+                attention_mask=causal_mask,
+                position_ids=position_ids,
+                past_key_values=past_key_values,
+                cache_position=cache_position,
+                position_embeddings=position_embeddings,
+                **kwargs,
+            )
+        hidden_states = self.norm(hidden_states)
+        return BaseModelOutputWithPast(
+            last_hidden_state=hidden_states,
+            past_key_values=past_key_values,
+            hidden_states=None,
+            attentions=None,
+        )
+class LongcatFlashNgramForCausalLM(LongcatFlashForCausalLM):
+    """LongcatFlash model for causal language modeling with N-gram embeddings."""
+    _keys_to_ignore_on_load_unexpected = [r"model\.mtp.*"]
+    config_class = LongcatFlashNgramConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.model = LongcatFlashNgramModel(config)
+    @torch.no_grad()
+    def generate(self, inputs=None, generation_config=None, **kwargs):
+        """Override to ensure NgramCache is used."""
+        if "past_key_values" not in kwargs or kwargs["past_key_values"] is None:
+            kwargs["past_key_values"] = NgramCache(config=self.config)
+        return super().generate(inputs=inputs, generation_config=generation_config, **kwargs)
+__all__ = ["LongcatFlashNgramPreTrainedModel", "LongcatFlashNgramModel", "LongcatFlashNgramForCausalLM"]

parse_model_response.py ADDED Viewed

	@@ -0,0 +1,236 @@

+import re
+import json
+import uuid
+def parse_arguments(json_value):
+    """
+    Attempt to parse a string as JSON
+    Args:
+        json_value: String to parse
+    Returns:
+        tuple: (parsed_value, is_valid_json)
+    """
+    try:
+        parsed_value = json.loads(json_value)
+        return parsed_value, True
+    except:
+        return json_value, False
+def get_argument_type(func_name: str, arg_key: str, defined_tools: list):
+    """
+    Get the type definition of a tool parameter
+    Args:
+        func_name: Name of the function/tool
+        arg_key: Parameter key name
+        defined_tools: List of tool definitions
+    Returns:
+        str or None: Type of the parameter ('string', 'object', 'array', 'integer', 'number', 'boolean')
+    """
+    name2tool = {tool["name"]: tool for tool in defined_tools}
+    if func_name not in name2tool:
+        return None
+    tool = name2tool[func_name]
+    if "parameters" not in tool or "properties" not in tool["parameters"]:
+        return None
+    if arg_key not in tool["parameters"]["properties"]:
+        return None
+    return tool["parameters"]["properties"][arg_key].get("type")
+def parse_model_response(response: str, defined_tools: list=[]):
+    """
+    Parse model response to extract reasoning_content, content, and tool_calls
+    Args:
+        response: Raw response text from the model
+        defined_tools: List of tool definitions
+    Returns:
+        dict: Message containing role, reasoning_content (optional), content (optional),
+              and tool_calls (optional)
+    """
+    text = response
+    reasoning_content = None
+    content = None
+    tool_calls = []
+    formatted_tools = []
+    for tool in defined_tools:
+        if "function" in tool:
+            formatted_tools.append(tool['function'])
+        else:
+            formatted_tools.append(tool)
+    if '</longcat_think>' in text:
+        text = text.replace('<longcat_think>', '')
+        thinking_end = text.find('</longcat_think>')
+        reasoning_content = text[: thinking_end].strip()
+        text = text[thinking_end + len('</longcat_think>'):].lstrip()
+    assert '<longcat_think>' not in text, "Unclosed <longcat_think> tag found in remaining text"
+    assert '</longcat_think>' not in text, "Unexpected </longcat_think> tag found without opening tag"
+    if '<longcat_tool_call>' in text:
+        index = text.find('<longcat_tool_call>')
+        content = text[:index]
+        text = text[index:].strip()
+    else:
+        content = text
+        text = ""
+    open_tags = text.count('<longcat_tool_call>')
+    close_tags = text.count('</longcat_tool_call>')
+    assert open_tags == close_tags, \
+        f"Mismatched tool_call tags: {open_tags} opening tags, {close_tags} closing tags"
+    tool_call_strs = re.findall(
+        r'<longcat_tool_call>(.*?)</longcat_tool_call>',
+        text,
+        re.DOTALL
+    )
+    for call in tool_call_strs:
+        func_name_match = re.match(r'([^\n<]+)', call.strip())
+        assert func_name_match, f"Missing function name in tool call: {call[:100]}"
+        func_name = func_name_match.group(1).strip()
+        assert func_name, "Empty function name in tool call"
+        # Verify argument tags are properly paired
+        arg_key_count = call.count('<longcat_arg_key>')
+        arg_key_close_count = call.count('</longcat_arg_key>')
+        arg_value_count = call.count('<longcat_arg_value>')
+        arg_value_close_count = call.count('</longcat_arg_value>')
+        assert arg_key_count == arg_key_close_count, \
+            f"Mismatched arg_key tags in function {func_name}: {arg_key_count} opening, {arg_key_close_count} closing"
+        assert arg_value_count == arg_value_close_count, \
+            f"Mismatched arg_value tags in function {func_name}: {arg_value_count} opening, {arg_value_close_count} closing"
+        assert arg_key_count == arg_value_count, \
+            f"Mismatched arg_key and arg_value count in function {func_name}: {arg_key_count} keys, {arg_value_count} values"
+        pairs = re.findall(
+            r'<longcat_arg_key>(.*?)</longcat_arg_key>\s*<longcat_arg_value>(.*?)</longcat_arg_value>',
+            call,
+            re.DOTALL
+        )
+        assert len(pairs) == arg_key_count, \
+            f"Failed to parse all arguments in function {func_name}: expected {arg_key_count}, got {len(pairs)}"
+        arguments = {}
+        for arg_key, arg_value in pairs:
+            arg_key = arg_key.strip()
+            arg_value = arg_value.strip()
+            assert arg_key, f"Empty argument key in function {func_name}"
+            assert arg_key not in arguments, \
+                f"Duplicate argument key '{arg_key}' in function {func_name}"
+            arg_type = get_argument_type(func_name, arg_key, formatted_tools)
+            if arg_type and arg_type != 'string':
+                parsed_value, is_good_json = parse_arguments(arg_value)
+                arg_value = parsed_value
+            arguments[arg_key] = arg_value
+        tool_calls.append({
+            'id': "tool-call-" + str(uuid.uuid4()),
+            'type': "function",
+            'function': {
+                'name': func_name,
+                'arguments': arguments
+            }
+        })
+    message = {'role': 'assistant'}
+    if reasoning_content:
+        message['reasoning_content'] = reasoning_content
+    message['content'] = content
+    if tool_calls:
+        message['tool_calls'] = tool_calls
+    return message
+if __name__=="__main__":
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    from parse_model_response import parse_model_response
+    model_name = "meituan-longcat/LongCat-Flash-Lite"
+    model = AutoModelForCausalLM.from_pretrained(
+        model_name,
+        torch_dtype="auto",
+        device_map="auto",
+        trust_remote_code=True
+    )
+    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+    messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Give me a brief introduction to large language models."}
+    ]
+    input_ids = tokenizer.apply_chat_template(
+        messages,
+        add_generation_prompt=True,
+        return_tensors="pt"
+    ).to(model.device)
+    generated_ids = model.generate(inputs=input_ids, max_new_tokens=256)
+    output_ids = generated_ids[0][len(input_ids[0]):].tolist()
+    response = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
+    print("Example 1: sample response.")
+    print("\nRaw response:")
+    print(response)
+    print("\nParsed result:")
+    response = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
+    parsed_message = parse_model_response(response)
+    print(json.dumps(parsed_message, indent=2, ensure_ascii=False))
+    tools = [
+        {
+            "type": "function",
+            "function": {
+                "name": "func_add",
+                "description": "Calculate the sum of two numbers",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "x1": {"type": "number", "description": "The first addend"},
+                        "x2": {"type": "number", "description": "The second addend"}
+                    },
+                    "required": ["x1", "x2"]
+                }
+            }
+        }
+    ]
+    messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Please tell me what is $$125679 + 234519$$?"},
+        # {
+        #     "role": "assistant",
+        #     "content": "I'll calculate the sum of 125679 and 234519 for you.",
+        #     "tool_calls": [{"type": "function", "function": {"name": "func_add", "arguments": {"x1": 125679, "x2": 234519}}}]
+        # },
+        # {"role": "tool", "name": "func_add", "content": '{"ans": 360198}'}
+    ]
+    input_ids = tokenizer.apply_chat_template(
+        messages,
+        tools=tools,
+        add_generation_prompt=True,
+        return_tensors="pt"
+    ).to(model.device)
+    generated_ids = model.generate(inputs=input_ids, max_new_tokens=256)
+    output_ids = generated_ids[0][len(input_ids[0]):].tolist()
+    response = tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n")
+    print("Example 2: tool call response.")
+    print("\nRaw response:")
+    print(response)
+    print("\nParsed result:")
+    parsed_message = parse_model_response(response, tools)
+    print(json.dumps(parsed_message, indent=2, ensure_ascii=False))

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": "<longcat_s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</longcat_s>",
+  "is_local": true,
+  "model_max_length": 131072,
+  "pad_token": "<longcat_pad>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "TokenizersBackend",
+  "tool_parser_type": "longcat",
+  "unk_token": "<longcat_unk>"
+}