casperhansen commited on May 22, 2025

Commit

b3e3307

verified ·

1 Parent(s): 0c3d5ac

Add files using upload-large-folder tool

Browse files

Files changed (19) hide show

.gitattributes +1 -0
README.md +220 -0
config.json +26 -0
convert.py +430 -0
generation_config.json +6 -0
model-00001-of-00010.safetensors +3 -0
model-00002-of-00010.safetensors +3 -0
model-00003-of-00010.safetensors +3 -0
model-00004-of-00010.safetensors +3 -0
model-00005-of-00010.safetensors +3 -0
model-00006-of-00010.safetensors +3 -0
model-00007-of-00010.safetensors +3 -0
model-00008-of-00010.safetensors +3 -0
model-00009-of-00010.safetensors +3 -0
model-00010-of-00010.safetensors +3 -0
model.safetensors.index.json +370 -0
special_tokens_map.json +0 -0
tokenizer.json +3 -0
tokenizer_config.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,220 @@

+---
+language:
+- en
+- fr
+- de
+- es
+- pt
+- it
+- ja
+- ko
+- ru
+- zh
+- ar
+- fa
+- id
+- ms
+- ne
+- pl
+- ro
+- sr
+- sv
+- tr
+- uk
+- vi
+- hi
+- bn
+license: apache-2.0
+library_name: vllm
+inference: false
+---
+# Model Card for Mistral-Small-3.1-24B-Base-2503 (TEXT ONLY)
+This is the text-only variant of [mistralai/Mistral-Small-3.1-24B-Base-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).
+This also serves as the base-model for [mistralai/Devstral-Small-2505](https://huggingface.co/mistralai/Devstral-Small-2505), which had no official base model released.
+Features:
+- Text-only, no multimodality.
+- 128k context length.
+How was a text-only model achieved? The vision encoder was removed and the model architecture was converted from mistral3 to mistral. The tokenizer was not modified.
+## Reproduced eval
+Serve with vLLM:
+```
+vllm serve casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only
+```
+The reproduced results can be seen below.
+| Model                              | MMLU (0-shot)   |
+|------------------------------------|-----------------|
+| Small 3.1 24B Base (Text Only)     | 77.25% ± 0.0033 |
+| Small 3.1 24B Base (Multimodal)    | 77.34% ± 0.0033 |
+### Original Multimodal: Full MMLU (Reproduced)
+```
+lm_eval --model local-completions \
+  --model_args "base_url=http://localhost:8000/v1/completions,model=mistralai/Mistral-Small-3.1-24B-Base-2503" \
+  --tasks mmlu \
+  --batch_size 128
+```
+|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
+|mmlu                                   |      2|none  |      |acc   |↑  |0.7734|±  |0.0033|
+| - humanities                          |      2|none  |      |acc   |↑  |0.6820|±  |0.0062|
+|  - formal_logic                       |      1|none  |     0|acc   |↑  |0.5714|±  |0.0443|
+|  - high_school_european_history       |      1|none  |     0|acc   |↑  |0.8303|±  |0.0293|
+|  - high_school_us_history             |      1|none  |     0|acc   |↑  |0.9363|±  |0.0171|
+|  - high_school_world_history          |      1|none  |     0|acc   |↑  |0.9241|±  |0.0172|
+|  - international_law                  |      1|none  |     0|acc   |↑  |0.9091|±  |0.0262|
+|  - jurisprudence                      |      1|none  |     0|acc   |↑  |0.8148|±  |0.0376|
+|  - logical_fallacies                  |      1|none  |     0|acc   |↑  |0.8589|±  |0.0274|
+|  - moral_disputes                     |      1|none  |     0|acc   |↑  |0.8208|±  |0.0206|
+|  - moral_scenarios                    |      1|none  |     0|acc   |↑  |0.3844|±  |0.0163|
+|  - philosophy                         |      1|none  |     0|acc   |↑  |0.8296|±  |0.0214|
+|  - prehistory                         |      1|none  |     0|acc   |↑  |0.8704|±  |0.0187|
+|  - professional_law                   |      1|none  |     0|acc   |↑  |0.6095|±  |0.0125|
+|  - world_religions                    |      1|none  |     0|acc   |↑  |0.8713|±  |0.0257|
+| - other                               |      2|none  |      |acc   |↑  |0.8317|±  |0.0064|
+|  - business_ethics                    |      1|none  |     0|acc   |↑  |0.8200|±  |0.0386|
+|  - clinical_knowledge                 |      1|none  |     0|acc   |↑  |0.8679|±  |0.0208|
+|  - college_medicine                   |      1|none  |     0|acc   |↑  |0.7803|±  |0.0316|
+|  - global_facts                       |      1|none  |     0|acc   |↑  |0.6600|±  |0.0476|
+|  - human_aging                        |      1|none  |     0|acc   |↑  |0.7982|±  |0.0269|
+|  - management                         |      1|none  |     0|acc   |↑  |0.9029|±  |0.0293|
+|  - marketing                          |      1|none  |     0|acc   |↑  |0.9359|±  |0.0160|
+|  - medical_genetics                   |      1|none  |     0|acc   |↑  |0.8900|±  |0.0314|
+|  - miscellaneous                      |      1|none  |     0|acc   |↑  |0.9183|±  |0.0098|
+|  - nutrition                          |      1|none  |     0|acc   |↑  |0.8791|±  |0.0187|
+|  - professional_accounting            |      1|none  |     0|acc   |↑  |0.6277|±  |0.0288|
+|  - professional_medicine              |      1|none  |     0|acc   |↑  |0.8603|±  |0.0211|
+|  - virology                           |      1|none  |     0|acc   |↑  |0.5602|±  |0.0386|
+| - social sciences                     |      2|none  |      |acc   |↑  |0.8736|±  |0.0059|
+|  - econometrics                       |      1|none  |     0|acc   |↑  |0.6491|±  |0.0449|
+|  - high_school_geography              |      1|none  |     0|acc   |↑  |0.8990|±  |0.0215|
+|  - high_school_government_and_politics|      1|none  |     0|acc   |↑  |0.9637|±  |0.0135|
+|  - high_school_macroeconomics         |      1|none  |     0|acc   |↑  |0.8103|±  |0.0199|
+|  - high_school_microeconomics         |      1|none  |     0|acc   |↑  |0.9034|±  |0.0192|
+|  - high_school_psychology             |      1|none  |     0|acc   |↑  |0.9358|±  |0.0105|
+|  - human_sexuality                    |      1|none  |     0|acc   |↑  |0.8855|±  |0.0279|
+|  - professional_psychology            |      1|none  |     0|acc   |↑  |0.8578|±  |0.0141|
+|  - public_relations                   |      1|none  |     0|acc   |↑  |0.7909|±  |0.0390|
+|  - security_studies                   |      1|none  |     0|acc   |↑  |0.8327|±  |0.0239|
+|  - sociology                          |      1|none  |     0|acc   |↑  |0.9154|±  |0.0197|
+|  - us_foreign_policy                  |      1|none  |     0|acc   |↑  |0.9300|±  |0.0256|
+| - stem                                |      2|none  |      |acc   |↑  |0.7545|±  |0.0073|
+|  - abstract_algebra                   |      1|none  |     0|acc   |↑  |0.4600|±  |0.0501|
+|  - anatomy                            |      1|none  |     0|acc   |↑  |0.8148|±  |0.0336|
+|  - astronomy                          |      1|none  |     0|acc   |↑  |0.9211|±  |0.0219|
+|  - college_biology                    |      1|none  |     0|acc   |↑  |0.9444|±  |0.0192|
+|  - college_chemistry                  |      1|none  |     0|acc   |↑  |0.5700|±  |0.0498|
+|  - college_computer_science           |      1|none  |     0|acc   |↑  |0.7100|±  |0.0456|
+|  - college_mathematics                |      1|none  |     0|acc   |↑  |0.6200|±  |0.0488|
+|  - college_physics                    |      1|none  |     0|acc   |↑  |0.6569|±  |0.0472|
+|  - computer_security                  |      1|none  |     0|acc   |↑  |0.8300|±  |0.0378|
+|  - conceptual_physics                 |      1|none  |     0|acc   |↑  |0.8170|±  |0.0253|
+|  - electrical_engineering             |      1|none  |     0|acc   |↑  |0.7931|±  |0.0338|
+|  - elementary_mathematics             |      1|none  |     0|acc   |↑  |0.7910|±  |0.0209|
+|  - high_school_biology                |      1|none  |     0|acc   |↑  |0.9323|±  |0.0143|
+|  - high_school_chemistry              |      1|none  |     0|acc   |↑  |0.7586|±  |0.0301|
+|  - high_school_computer_science       |      1|none  |     0|acc   |↑  |0.8900|±  |0.0314|
+|  - high_school_mathematics            |      1|none  |     0|acc   |↑  |0.5185|±  |0.0305|
+|  - high_school_physics                |      1|none  |     0|acc   |↑  |0.6291|±  |0.0394|
+|  - high_school_statistics             |      1|none  |     0|acc   |↑  |0.7593|±  |0.0292|
+|  - machine_learning                   |      1|none  |     0|acc   |↑  |0.6250|±  |0.0460|
+|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|------------------|------:|------|------|------|---|-----:|---|-----:|
+|mmlu              |      2|none  |      |acc   |↑  |0.7734|±  |0.0033|
+| - humanities     |      2|none  |      |acc   |↑  |0.6820|±  |0.0062|
+| - other          |      2|none  |      |acc   |↑  |0.8317|±  |0.0064|
+| - social sciences|      2|none  |      |acc   |↑  |0.8736|±  |0.0059|
+| - stem           |      2|none  |      |acc   |↑  |0.7545|±  |0.0073|
+### Text Only: Full MMLU
+```
+lm_eval --model local-completions \
+  --model_args "base_url=http://localhost:8000/v1/completions,model=casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only" \
+  --tasks mmlu \
+  --batch_size 128
+```
+|                 Tasks                 |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|---------------------------------------|------:|------|-----:|------|---|-----:|---|-----:|
+|mmlu                                   |      2|none  |      |acc   |↑  |0.7725|±  |0.0033|
+| - humanities                          |      2|none  |      |acc   |↑  |0.6793|±  |0.0062|
+|  - formal_logic                       |      1|none  |     0|acc   |↑  |0.5397|±  |0.0446|
+|  - high_school_european_history       |      1|none  |     0|acc   |↑  |0.8364|±  |0.0289|
+|  - high_school_us_history             |      1|none  |     0|acc   |↑  |0.9363|±  |0.0171|
+|  - high_school_world_history          |      1|none  |     0|acc   |↑  |0.9198|±  |0.0177|
+|  - international_law                  |      1|none  |     0|acc   |↑  |0.9008|±  |0.0273|
+|  - jurisprudence                      |      1|none  |     0|acc   |↑  |0.8148|±  |0.0376|
+|  - logical_fallacies                  |      1|none  |     0|acc   |↑  |0.8405|±  |0.0288|
+|  - moral_disputes                     |      1|none  |     0|acc   |↑  |0.8237|±  |0.0205|
+|  - moral_scenarios                    |      1|none  |     0|acc   |↑  |0.3765|±  |0.0162|
+|  - philosophy                         |      1|none  |     0|acc   |↑  |0.8264|±  |0.0215|
+|  - prehistory                         |      1|none  |     0|acc   |↑  |0.8704|±  |0.0187|
+|  - professional_law                   |      1|none  |     0|acc   |↑  |0.6108|±  |0.0125|
+|  - world_religions                    |      1|none  |     0|acc   |↑  |0.8713|±  |0.0257|
+| - other                               |      2|none  |      |acc   |↑  |0.8339|±  |0.0064|
+|  - business_ethics                    |      1|none  |     0|acc   |↑  |0.8300|±  |0.0378|
+|  - clinical_knowledge                 |      1|none  |     0|acc   |↑  |0.8679|±  |0.0208|
+|  - college_medicine                   |      1|none  |     0|acc   |↑  |0.7746|±  |0.0319|
+|  - global_facts                       |      1|none  |     0|acc   |↑  |0.6800|±  |0.0469|
+|  - human_aging                        |      1|none  |     0|acc   |↑  |0.8027|±  |0.0267|
+|  - management                         |      1|none  |     0|acc   |↑  |0.9029|±  |0.0293|
+|  - marketing                          |      1|none  |     0|acc   |↑  |0.9402|±  |0.0155|
+|  - medical_genetics                   |      1|none  |     0|acc   |↑  |0.8900|±  |0.0314|
+|  - miscellaneous                      |      1|none  |     0|acc   |↑  |0.9208|±  |0.0097|
+|  - nutrition                          |      1|none  |     0|acc   |↑  |0.8791|±  |0.0187|
+|  - professional_accounting            |      1|none  |     0|acc   |↑  |0.6312|±  |0.0288|
+|  - professional_medicine              |      1|none  |     0|acc   |↑  |0.8603|±  |0.0211|
+|  - virology                           |      1|none  |     0|acc   |↑  |0.5602|±  |0.0386|
+| - social sciences                     |      2|none  |      |acc   |↑  |0.8739|±  |0.0059|
+|  - econometrics                       |      1|none  |     0|acc   |↑  |0.6667|±  |0.0443|
+|  - high_school_geography              |      1|none  |     0|acc   |↑  |0.8939|±  |0.0219|
+|  - high_school_government_and_politics|      1|none  |     0|acc   |↑  |0.9585|±  |0.0144|
+|  - high_school_macroeconomics         |      1|none  |     0|acc   |↑  |0.8103|±  |0.0199|
+|  - high_school_microeconomics         |      1|none  |     0|acc   |↑  |0.9076|±  |0.0188|
+|  - high_school_psychology             |      1|none  |     0|acc   |↑  |0.9358|±  |0.0105|
+|  - human_sexuality                    |      1|none  |     0|acc   |↑  |0.8855|±  |0.0279|
+|  - professional_psychology            |      1|none  |     0|acc   |↑  |0.8578|±  |0.0141|
+|  - public_relations                   |      1|none  |     0|acc   |↑  |0.7909|±  |0.0390|
+|  - security_studies                   |      1|none  |     0|acc   |↑  |0.8327|±  |0.0239|
+|  - sociology                          |      1|none  |     0|acc   |↑  |0.9104|±  |0.0202|
+|  - us_foreign_policy                  |      1|none  |     0|acc   |↑  |0.9400|±  |0.0239|
+| - stem                                |      2|none  |      |acc   |↑  |0.7520|±  |0.0073|
+|  - abstract_algebra                   |      1|none  |     0|acc   |↑  |0.4500|±  |0.0500|
+|  - anatomy                            |      1|none  |     0|acc   |↑  |0.8296|±  |0.0325|
+|  - astronomy                          |      1|none  |     0|acc   |↑  |0.9211|±  |0.0219|
+|  - college_biology                    |      1|none  |     0|acc   |↑  |0.9444|±  |0.0192|
+|  - college_chemistry                  |      1|none  |     0|acc   |↑  |0.5600|±  |0.0499|
+|  - college_computer_science           |      1|none  |     0|acc   |↑  |0.7100|±  |0.0456|
+|  - college_mathematics                |      1|none  |     0|acc   |↑  |0.6200|±  |0.0488|
+|  - college_physics                    |      1|none  |     0|acc   |↑  |0.6569|±  |0.0472|
+|  - computer_security                  |      1|none  |     0|acc   |↑  |0.8300|±  |0.0378|
+|  - conceptual_physics                 |      1|none  |     0|acc   |↑  |0.8213|±  |0.0250|
+|  - electrical_engineering             |      1|none  |     0|acc   |↑  |0.7862|±  |0.0342|
+|  - elementary_mathematics             |      1|none  |     0|acc   |↑  |0.7804|±  |0.0213|
+|  - high_school_biology                |      1|none  |     0|acc   |↑  |0.9290|±  |0.0146|
+|  - high_school_chemistry              |      1|none  |     0|acc   |↑  |0.7488|±  |0.0305|
+|  - high_school_computer_science       |      1|none  |     0|acc   |↑  |0.8900|±  |0.0314|
+|  - high_school_mathematics            |      1|none  |     0|acc   |↑  |0.5222|±  |0.0305|
+|  - high_school_physics                |      1|none  |     0|acc   |↑  |0.6225|±  |0.0396|
+|  - high_school_statistics             |      1|none  |     0|acc   |↑  |0.7500|±  |0.0295|
+|  - machine_learning                   |      1|none  |     0|acc   |↑  |0.6339|±  |0.0457|
+|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|------------------|------:|------|------|------|---|-----:|---|-----:|
+|mmlu              |      2|none  |      |acc   |↑  |0.7725|±  |0.0033|
+| - humanities     |      2|none  |      |acc   |↑  |0.6793|±  |0.0062|
+| - other          |      2|none  |      |acc   |↑  |0.8339|±  |0.0064|
+| - social sciences|      2|none  |      |acc   |↑  |0.8739|±  |0.0059|
+| - stem           |      2|none  |      |acc   |↑  |0.7520|±  |0.0073|

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": [
+    "MistralForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 5120,
+  "initializer_range": 0.02,
+  "intermediate_size": 32768,
+  "max_position_embeddings": 131072,
+  "model_type": "mistral",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 40,
+  "num_key_value_heads": 8,
+  "rms_norm_eps": 1e-05,
+  "rope_theta": 1000000000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "vocab_size": 131072
+}

convert.py ADDED Viewed

	@@ -0,0 +1,430 @@

+#!/usr/bin/env python3
+"""
+Mistral Model Transformer
+This script transforms Mistral-Small-3.1-24B-Base-2503 into a text-only model by:
+1. Removing multimodality features
+2. Removing the vision encoder
+3. Changing the architecture from "mistral3" to "mistral"
+4. Ensuring weight mapping structure matches Devstral-Small-2505 exactly
+Usage:
+    python convert.py --input-model mistralai/Mistral-Small-3.1-24B-Base-2503 --output-path ./mistral-small-text-only --reference-model mistralai/Devstral-Small-2505
+Note:
+    This script requires significant disk space to download and process the full model.
+"""
+import argparse
+import json
+import os
+import shutil
+from pathlib import Path
+import logging
+from huggingface_hub import snapshot_download, hf_hub_download
+from safetensors.torch import load_file, save_file
+from transformers import AutoConfig, AutoModelForCausalLM
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+def parse_args():
+    parser = argparse.ArgumentParser(description="Transform Mistral model to text-only version")
+    parser.add_argument(
+        "--input-model",
+        type=str,
+        default="mistralai/Mistral-Small-3.1-24B-Base-2503",
+        help="Path or HF repo id of the input model"
+    )
+    parser.add_argument(
+        "--output-path",
+        type=str,
+        required=True,
+        help="Path to save the transformed model"
+    )
+    parser.add_argument(
+        "--cache-dir",
+        type=str,
+        default=None,
+        help="Cache directory for downloading models"
+    )
+    parser.add_argument(
+        "--reference-model",
+        type=str,
+        default="mistralai/Devstral-Small-2505",
+        help="Path or HF repo id of the reference model for weight mapping"
+    )
+    return parser.parse_args()
+def transform_config(config_path, output_path, reference_config=None):
+    """
+    Transform the model config by:
+    1. Changing model_type from "mistral3" to "mistral"
+    2. Removing vision_config
+    3. Removing multimodal parameters
+    4. Updating architectures to match Devstral exactly
+    5. Ensuring all parameters match Devstral's config exactly
+    """
+    logger.info(f"Transforming config at {config_path}")
+    with open(config_path, "r") as f:
+        config = json.load(f)
+    if reference_config:
+        logger.info("Using reference config as template")
+        new_config = reference_config.copy()
+        text_config = config.get("text_config", config)
+        for key, value in text_config.items():
+            if key not in new_config and key != "model_type":
+                new_config[key] = value
+                logger.info(f"Added parameter from original config: {key}")
+    else:
+        logger.info("No reference config available, using basic transformation")
+        new_config = config.copy()
+        # Change model_type from mistral3 to mistral
+        if new_config.get("model_type") == "mistral3":
+            new_config["model_type"] = "mistral"
+            logger.info("Changed model_type from 'mistral3' to 'mistral'")
+        # Update architectures to use MistralForCausalLM
+        if "architectures" in new_config:
+            new_config["architectures"] = ["MistralForCausalLM"]
+            logger.info("Changed architecture to 'MistralForCausalLM'")
+        # Remove vision_config
+        if "vision_config" in new_config:
+            del new_config["vision_config"]
+            logger.info("Removed vision_config")
+        # Remove multimodal-related parameters
+        multimodal_params = [
+            "image_token_index",
+            "multimodal_projector_bias",
+            "projector_hidden_act",
+            "spatial_merge_size",
+            "vision_tower_layer_list",
+            "vision_feature_layer"
+        ]
+        for param in multimodal_params:
+            if param in new_config:
+                del new_config[param]
+                logger.info(f"Removed multimodal parameter: {param}")
+        if "text_config" in new_config:
+            text_config = new_config.pop("text_config")
+            for key, value in text_config.items():
+                if key != "model_type":  # Don't overwrite the model_type
+                    new_config[key] = value
+            logger.info("Moved text_config parameters to top level")
+        if "bos_token_id" not in new_config:
+            new_config["bos_token_id"] = 1
+            logger.info("Added bos_token_id: 1")
+        if "eos_token_id" not in new_config:
+            new_config["eos_token_id"] = 2
+            logger.info("Added eos_token_id: 2")
+        if "tie_word_embeddings" not in new_config:
+            new_config["tie_word_embeddings"] = False
+            logger.info("Added tie_word_embeddings: false")
+        new_config["transformers_version"] = "4.51.3"
+        logger.info("Updated transformers_version to 4.51.3")
+    os_output_path = Path(output_path) / "config.json"
+    with open(os_output_path, "w") as f:
+        json.dump(new_config, f, indent=2)
+    logger.info(f"Saved transformed config to {os_output_path}")
+    return new_config
+def is_vision_weight(weight_name):
+    """Check if a weight is related to vision functionality"""
+    vision_patterns = ["vision_tower", "multi_modal_projector"]
+    return any(pattern in weight_name for pattern in vision_patterns)
+def transform_weights(model_path, output_path, safetensors_index_path, reference_weight_map=None):
+    """
+    Transform model weights by:
+    1. Loading the weight map from safetensors index
+    2. Filtering out vision-related weights
+    3. Removing the "language_model." prefix from weight names
+    4. Ensuring the exact same partitioning as Devstral
+    5. Saving the filtered weights to the output path
+    """
+    logger.info(f"Transforming weights using index at {safetensors_index_path}")
+    with open(safetensors_index_path, "r") as f:
+        index_data = json.load(f)
+    original_weight_map = index_data.get("weight_map", {})
+    # Count vision and non-vision weights
+    vision_weights = [name for name in original_weight_map if is_vision_weight(name)]
+    non_vision_weights = [name for name in original_weight_map if not is_vision_weight(name)]
+    logger.info(f"Found {len(vision_weights)} vision-related weights to remove")
+    logger.info(f"Found {len(non_vision_weights)} non-vision weights to keep")
+    # Create a mapping from original weight names to Devstral-style weight names
+    weight_name_mapping = {}
+    for original_name in non_vision_weights:
+        if original_name.startswith("language_model."):
+            new_name = original_name[len("language_model."):]
+            weight_name_mapping[original_name] = new_name
+        else:
+            weight_name_mapping[original_name] = original_name
+    logger.info(f"Created mapping for {len(weight_name_mapping)} weight names")
+    new_weight_map = {}
+    if reference_weight_map and "weight_map" in reference_weight_map:
+        devstral_weight_map = reference_weight_map["weight_map"]
+        logger.info(f"Using Devstral reference weight map with {len(devstral_weight_map)} entries")
+        for original_name, new_name in weight_name_mapping.items():
+            if new_name in devstral_weight_map:
+                new_weight_map[new_name] = devstral_weight_map[new_name]
+            else:
+                logger.warning(f"Weight {new_name} not found in Devstral reference map")
+    else:
+        logger.warning("No Devstral reference map available, using original partitioning")
+        for original_name, new_name in weight_name_mapping.items():
+            new_weight_map[new_name] = original_weight_map[original_name]
+    # Group weights by their safetensor file for the actual transformation
+    file_to_weights = {}
+    for new_name, file_name in new_weight_map.items():
+        if file_name not in file_to_weights:
+            file_to_weights[file_name] = []
+        original_names = [orig for orig, new in weight_name_mapping.items() if new == new_name]
+        if original_names:
+            file_to_weights[file_name].append((original_names[0], new_name))
+    os.makedirs(Path(output_path), exist_ok=True)
+    # Process each safetensor file
+    for file_name, weight_pairs in file_to_weights.items():
+        logger.info(f"Processing {file_name} with {len(weight_pairs)} weights")
+        tensors_to_save = {}
+        for original_name, new_name in weight_pairs:
+            original_file = original_weight_map.get(original_name)
+            if not original_file:
+                logger.warning(f"Original file not found for weight {original_name}")
+                continue
+            input_file_path = Path(model_path) / original_file
+            if not input_file_path.exists():
+                logger.warning(f"File {input_file_path} does not exist, skipping")
+                continue
+            try:
+                original_tensors = load_file(input_file_path)
+                if original_name in original_tensors:
+                    tensors_to_save[new_name] = original_tensors[original_name]
+                else:
+                    logger.warning(f"Weight {original_name} not found in {original_file}")
+            except Exception as e:
+                logger.error(f"Error loading {original_file}: {e}")
+        if tensors_to_save:
+            output_file_path = Path(output_path) / file_name
+            try:
+                save_file(tensors_to_save, output_file_path)
+                logger.info(f"Saved {len(tensors_to_save)} weights to {file_name}")
+            except Exception as e:
+                logger.error(f"Error saving {file_name}: {e}")
+    # Save the new safetensors index
+    new_index = {
+        "metadata": {"total_size": reference_weight_map.get("metadata", {}).get("total_size", 0)}
+                  if reference_weight_map else index_data.get("metadata", {}),
+        "weight_map": new_weight_map
+    }
+    output_index_path = Path(output_path) / "model.safetensors.index.json"
+    with open(output_index_path, "w") as f:
+        json.dump(new_index, f, indent=2)
+    logger.info(f"Saved transformed safetensors index to {output_index_path}")
+def copy_additional_files(model_path, output_path):
+    """Copy additional model files like tokenizer, generation config, etc."""
+    additional_files = [
+        "tokenizer.json",
+        "tokenizer_config.json",
+        "special_tokens_map.json",
+        "generation_config.json"
+    ]
+    for filename in additional_files:
+        src_path = Path(model_path) / filename
+        if src_path.exists():
+            dst_path = Path(output_path) / filename
+            shutil.copy(src_path, dst_path)
+            logger.info(f"Copied {filename} to output directory")
+        else:
+            logger.warning(f"File {filename} not found in model directory")
+def download_minimal_files(repo_id, output_dir, cache_dir=None):
+    """Download only the necessary files for transformation without the full model"""
+    logger.info(f"Downloading minimal files from {repo_id}")
+    # List of files to download
+    files_to_download = [
+        "config.json",
+        "model.safetensors.index.json",
+        "tokenizer_config.json",
+        "special_tokens_map.json",
+        "generation_config.json"
+    ]
+    downloaded_files = {}
+    for filename in files_to_download:
+        try:
+            file_path = hf_hub_download(
+                repo_id=repo_id,
+                filename=filename,
+                cache_dir=cache_dir,
+                local_files_only=False
+            )
+            downloaded_files[filename] = file_path
+            logger.info(f"Downloaded {filename} to {file_path}")
+        except Exception as e:
+            logger.warning(f"Failed to download {filename}: {e}")
+    return downloaded_files
+def download_reference_weight_map(reference_model, cache_dir=None):
+    """Download reference model's weight map to use as a reference"""
+    logger.info(f"Downloading reference weight map from {reference_model}")
+    try:
+        file_path = hf_hub_download(
+            repo_id=reference_model,
+            filename="model.safetensors.index.json",
+            cache_dir=cache_dir,
+            local_files_only=False
+        )
+        with open(file_path, "r") as f:
+            reference_map = json.load(f)
+        logger.info(f"Successfully loaded reference weight map with {len(reference_map.get('weight_map', {}))} weights")
+        return reference_map
+    except Exception as e:
+        logger.error(f"Failed to download reference weight map: {e}")
+        return None
+def download_reference_config(reference_model, cache_dir=None):
+    """Download reference model's config.json to use as a reference"""
+    logger.info(f"Downloading reference config from {reference_model}")
+    try:
+        file_path = hf_hub_download(
+            repo_id=reference_model,
+            filename="config.json",
+            cache_dir=cache_dir,
+            local_files_only=False
+        )
+        with open(file_path, "r") as f:
+            reference_config = json.load(f)
+        logger.info(f"Successfully loaded reference config")
+        return reference_config
+    except Exception as e:
+        logger.error(f"Failed to download reference config: {e}")
+        return None
+def verify_model(output_path):
+    """Verify that the transformed model can be loaded without errors"""
+    logger.info(f"Verifying transformed model at {output_path}")
+    try:
+        config = AutoConfig.from_pretrained(output_path)
+        logger.info(f"Successfully loaded config with model_type={config.model_type}")
+        # Attempt to load just the model architecture (without weights)
+        # This verifies the configuration is valid
+        AutoModelForCausalLM.from_config(config)
+        logger.info("Successfully loaded model architecture from config")
+        return True
+    except Exception as e:
+        logger.error(f"Error verifying model: {e}")
+        return False
+def main():
+    args = parse_args()
+    input_model = args.input_model
+    output_path = args.output_path
+    cache_dir = args.cache_dir
+    reference_model = args.reference_model
+    # Download reference weight map and config
+    reference_weight_map = download_reference_weight_map(reference_model, cache_dir)
+    if not reference_weight_map:
+        logger.warning("Could not download reference weight map. The weight partitioning may not match exactly.")
+    reference_config = download_reference_config(reference_model, cache_dir)
+    if not reference_config:
+        logger.warning("Could not download reference config. The config may not match exactly.")
+    # Create output directory
+    os.makedirs(output_path, exist_ok=True)
+    # Download the full model
+    if not os.path.exists(input_model) or not os.path.isdir(input_model):
+        logger.info(f"Downloading model from {input_model}")
+        try:
+            model_path = snapshot_download(
+                repo_id=input_model,
+                cache_dir=cache_dir,
+                local_files_only=False,
+                ignore_patterns=["*consolidated*"]
+            )
+        except Exception as e:
+            logger.error(f"Error downloading model: {e}")
+            return
+    else:
+        model_path = input_model
+    logger.info(f"Model path: {model_path}")
+    # Transform config
+    config_path = os.path.join(model_path, "config.json")
+    transform_config(config_path, output_path, reference_config)
+    # Transform weights
+    safetensors_index_path = os.path.join(model_path, "model.safetensors.index.json")
+    transform_weights(
+        model_path,
+        output_path,
+        safetensors_index_path,
+        reference_weight_map=reference_weight_map
+    )
+    # Copy additional files
+    copy_additional_files(model_path, output_path)
+    # Verify the transformed model
+    success = verify_model(output_path)
+    if success:
+        logger.info(f"Successfully transformed model to {output_path}")
+    else:
+        logger.error(f"Failed to transform model properly")
+if __name__ == "__main__":
+    main()

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "transformers_version": "4.50.0.dev0"
+}

model-00001-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2a76fd51ca4d1842da1814eb6793722d583bda92679b92025924ec7a859cc70
+size 4781571704

model-00002-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87df53abceb14a4f8aa758554c681e6184f2f5e3eabe2e0e74e2356c524b733c
+size 4781592752

model-00003-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e532e79e000bf988b96beaead472ba4184a1eb4c4d8b5e054ee4889337c42061
+size 4781592768

model-00004-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1193aef9a3d4b739cceb8381190a2ebf9a80f3122e67559c27873bd8ce432be4
+size 4886471568

model-00005-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8dd57b2263c028892bbf0d1a932f05b127ea3bf3d0f652fdd145641c9ec6031
+size 4781592792

model-00006-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:690684d541a1acd8daaff26857cc5938e1da3431865ebcb894cdcb60e41f3ef5
+size 4781592784

model-00007-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:70b3262c332eb7735bf5316f577ef267178bbbf80485e9ded6a6278d7ee0dc1b
+size 4886471568

model-00008-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d716948f528122126056ad58dd14d1487d8f9fce29e9f6986bf5146fe568087e
+size 4781592792

model-00009-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05a38f3aad6f204a1af9d31d69eddaeb4fb26257aef912040c0c4bb5387900e3
+size 4781592784

model-00010-of-00010.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51d899b348647d9d6762d0bf1b2205b6b4f1bf3590c4c4c1619b7acf682d17a8
+size 3900777040

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,370 @@

+{
+  "metadata": {
+    "total_size": 47144806400
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00010-of-00010.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.input_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.32.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.33.input_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.33.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.33.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.33.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.33.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.33.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.33.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.33.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.33.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
+    "model.layers.34.input_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.34.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.input_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.35.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.input_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.36.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.input_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.37.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.37.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.37.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.37.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
+    "model.layers.38.input_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.38.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.input_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.39.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
+    "model.norm.weight": "model-00010-of-00010.safetensors"
+  }
+}

special_tokens_map.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949
+size 17078037

tokenizer_config.json ADDED Viewed

The diff for this file is too large to render. See raw diff