ehartford commited on Dec 8, 2025

Commit

7cb2f27

verified ·

1 Parent(s): 04048d8

Add files using upload-large-folder tool

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +152 -0
chat_template.jinja +141 -0
config.json +80 -0
convert.py +259 -0
generation_config.json +14 -0
model-00002-of-00041.safetensors +3 -0
model-00003-of-00041.safetensors +3 -0
model-00004-of-00041.safetensors +3 -0
model-00005-of-00041.safetensors +3 -0
model-00006-of-00041.safetensors +3 -0
model-00007-of-00041.safetensors +3 -0
model-00008-of-00041.safetensors +3 -0
model-00009-of-00041.safetensors +3 -0
model-00010-of-00041.safetensors +3 -0
model-00011-of-00041.safetensors +3 -0
model-00012-of-00041.safetensors +3 -0
model-00013-of-00041.safetensors +3 -0
model-00014-of-00041.safetensors +3 -0
model-00015-of-00041.safetensors +3 -0
model-00016-of-00041.safetensors +3 -0
model-00017-of-00041.safetensors +3 -0
model-00018-of-00041.safetensors +3 -0
model-00019-of-00041.safetensors +3 -0
model-00020-of-00041.safetensors +3 -0
model-00021-of-00041.safetensors +3 -0
model-00022-of-00041.safetensors +3 -0
model-00023-of-00041.safetensors +3 -0
model-00024-of-00041.safetensors +3 -0
model-00025-of-00041.safetensors +3 -0
model-00026-of-00041.safetensors +3 -0
model-00027-of-00041.safetensors +3 -0
model-00028-of-00041.safetensors +3 -0
model-00029-of-00041.safetensors +3 -0
model-00030-of-00041.safetensors +3 -0
model-00031-of-00041.safetensors +3 -0
model-00032-of-00041.safetensors +3 -0
model-00033-of-00041.safetensors +3 -0
model-00034-of-00041.safetensors +3 -0
model-00035-of-00041.safetensors +3 -0
model-00036-of-00041.safetensors +3 -0
model-00037-of-00041.safetensors +3 -0
model-00039-of-00041.safetensors +3 -0
model-00040-of-00041.safetensors +3 -0
model-00041-of-00041.safetensors +3 -0
model.safetensors.index.json +0 -0
preprocessor_config.json +11 -0
requirements.txt +3 -0
tokenizer.json +3 -0
tokenizer_config.json +327 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# INTELLECT-3-V
+A vision-language model created by grafting the language model weights from [INTELLECT-3](https://huggingface.co/PrimeIntellect/INTELLECT-3) into the [GLM-4.6V](https://huggingface.co/THUDM/GLM-4.6V) architecture.
+## Motivation
+INTELLECT-3 is a strong open-source language model, but lacks vision capabilities. GLM-4.6V is a vision-language model with an identical language model architecture. By replacing GLM-4.6V's language model weights with INTELLECT-3's weights while preserving the vision encoder and projection layers, we create a vision-language model powered by INTELLECT-3.
+## Architecture
+Both models share the same language model backbone:
+- 46 transformer layers (layer 0 is dense MLP, layers 1-45 are MoE)
+- 4096 hidden dimension
+- 128 routed experts + shared experts per MoE layer
+- Grouped Query Attention (12288 q_proj, 1024 k/v_proj)
+- 151552 vocabulary size
+- BF16 weights
+GLM-4.6V additionally includes:
+- 24-layer vision transformer (1536 hidden dim)
+- Visual merger projecting vision features to LLM hidden dimension
+- Downsampling convolution for spatial compression
+## What Was Grafted
+The following weights were copied from INTELLECT-3 to GLM-4.6V:
+| INTELLECT-3 | GLM-4.6V |
+|-------------|----------|
+| `model.layers.*` | `model.language_model.layers.*` |
+| `model.norm.weight` | `model.language_model.norm.weight` |
+## What Was Preserved (from GLM-4.6V)
+- `model.language_model.embed_tokens.weight` — kept to maintain vision token compatibility
+- `lm_head.weight` — kept aligned with embed_tokens
+- `model.visual.*` — entire vision encoder and merger preserved
+## Rationale
+**Why replace the final norm?** The RMSNorm after the last transformer layer is tightly coupled to the layer outputs it normalizes. INTELLECT-3's norm was trained end-to-end with its layers and learned to normalize their specific output distribution.
+**Why keep embed_tokens?** The vision merger projects visual features into the same embedding space as text tokens. Replacing embed_tokens could break the alignment between text and vision embeddings. Additionally, lm_head is often tied or co-trained with embed_tokens.
+**Why not replace lm_head?** Same reasoning — keeping lm_head and embed_tokens together maintains their learned relationship.
+## Known Limitations
+1. **Embedding space mismatch**: INTELLECT-3's layers learned representations in a potentially different embedding space than GLM-4.6V. This may cause some degradation in both language and vision-language performance.
+2. **Vision-language alignment**: The visual merger was trained to project into GLM-4.6V's representation space. INTELLECT-3 may have learned different internal representations, potentially affecting vision-language tasks.
+3. **Tokenizer compatibility**: While both models have the same vocabulary size (151552), verify tokenizer compatibility for your use case.
+## Creation Script
+The model was created using `graft_intellect3_to_glm.py`:
+```bash
+python graft_intellect3_to_glm.py \
+    --intellect3 ~/models/INTELLECT-3 \
+    --glm ~/models/GLM-4.6V \
+    --output ~/models/INTELLECT-3-V
+```
+## Source Model Architectures
+### INTELLECT-3
+```
+lm_head.weight,[151552,4096],BF16
+model.embed_tokens.weight,[151552,4096],BF16
+model.layers.0.mlp.down_proj.weight,[4096,10944],BF16
+model.layers.0.mlp.gate_proj.weight,[10944,4096],BF16
+model.layers.0.mlp.up_proj.weight,[10944,4096],BF16
+model.layers.[0-45].input_layernorm.weight,[4096],BF16
+model.layers.[0-45].post_attention_layernorm.weight,[4096],BF16
+model.layers.[0-45].self_attn.k_proj.bias,[1024],BF16
+model.layers.[0-45].self_attn.k_proj.weight,[1024,4096],BF16
+model.layers.[0-45].self_attn.o_proj.weight,[4096,12288],BF16
+model.layers.[0-45].self_attn.q_proj.bias,[12288],BF16
+model.layers.[0-45].self_attn.q_proj.weight,[12288,4096],BF16
+model.layers.[0-45].self_attn.v_proj.bias,[1024],BF16
+model.layers.[0-45].self_attn.v_proj.weight,[1024,4096],BF16
+model.layers.[1-45].mlp.experts.[0-127].down_proj.weight,[4096,1408],BF16
+model.layers.[1-45].mlp.experts.[0-127].gate_proj.weight,[1408,4096],BF16
+model.layers.[1-45].mlp.experts.[0-127].up_proj.weight,[1408,4096],BF16
+model.layers.[1-45].mlp.gate.e_score_correction_bias,[128],F32
+model.layers.[1-45].mlp.gate.weight,[128,4096],BF16
+model.layers.[1-45].mlp.shared_experts.down_proj.weight,[4096,1408],BF16
+model.layers.[1-45].mlp.shared_experts.gate_proj.weight,[1408,4096],BF16
+model.layers.[1-45].mlp.shared_experts.up_proj.weight,[1408,4096],BF16
+model.norm.weight,[4096],BF16
+```
+### GLM-4.6V
+```
+lm_head.weight,[151552,4096],BF16
+model.language_model.embed_tokens.weight,[151552,4096],BF16
+model.language_model.layers.0.mlp.down_proj.weight,[4096,10944],BF16
+model.language_model.layers.0.mlp.gate_proj.weight,[10944,4096],BF16
+model.language_model.layers.0.mlp.up_proj.weight,[10944,4096],BF16
+model.language_model.layers.[0-45].input_layernorm.weight,[4096],BF16
+model.language_model.layers.[0-45].post_attention_layernorm.weight,[4096],BF16
+model.language_model.layers.[0-45].self_attn.k_proj.bias,[1024],BF16
+model.language_model.layers.[0-45].self_attn.k_proj.weight,[1024,4096],BF16
+model.language_model.layers.[0-45].self_attn.o_proj.weight,[4096,12288],BF16
+model.language_model.layers.[0-45].self_attn.q_proj.bias,[12288],BF16
+model.language_model.layers.[0-45].self_attn.q_proj.weight,[12288,4096],BF16
+model.language_model.layers.[0-45].self_attn.v_proj.bias,[1024],BF16
+model.language_model.layers.[0-45].self_attn.v_proj.weight,[1024,4096],BF16
+model.language_model.layers.[1-45].mlp.experts.[0-127].down_proj.weight,[4096,1408],BF16
+model.language_model.layers.[1-45].mlp.experts.[0-127].gate_proj.weight,[1408,4096],BF16
+model.language_model.layers.[1-45].mlp.experts.[0-127].up_proj.weight,[1408,4096],BF16
+model.language_model.layers.[1-45].mlp.gate.e_score_correction_bias,[128],F32
+model.language_model.layers.[1-45].mlp.gate.weight,[128,4096],BF16
+model.language_model.layers.[1-45].mlp.shared_experts.down_proj.weight,[4096,1408],BF16
+model.language_model.layers.[1-45].mlp.shared_experts.gate_proj.weight,[1408,4096],BF16
+model.language_model.layers.[1-45].mlp.shared_experts.up_proj.weight,[1408,4096],BF16
+model.language_model.norm.weight,[4096],BF16
+model.visual.blocks.[0-23].attn.proj.weight,[1536,1536],BF16
+model.visual.blocks.[0-23].attn.qkv.weight,[4608,1536],BF16
+model.visual.blocks.[0-23].mlp.down_proj.weight,[1536,4096],BF16
+model.visual.blocks.[0-23].mlp.gate_proj.weight,[4096,1536],BF16
+model.visual.blocks.[0-23].mlp.up_proj.weight,[4096,1536],BF16
+model.visual.blocks.[0-23].norm[1-2].weight,[1536],BF16
+model.visual.downsample.bias,[4096],BF16
+model.visual.downsample.weight,[4096,1536,2,2],BF16
+model.visual.embeddings.position_embedding.weight,[576,1536],BF16
+model.visual.merger.down_proj.weight,[4096,10944],BF16
+model.visual.merger.gate_proj.weight,[10944,4096],BF16
+model.visual.merger.post_projection_norm.bias,[4096],BF16
+model.visual.merger.post_projection_norm.weight,[4096],BF16
+model.visual.merger.proj.weight,[4096,4096],BF16
+model.visual.merger.up_proj.weight,[10944,4096],BF16
+model.visual.patch_embed.proj.bias,[1536],BF16
+model.visual.patch_embed.proj.weight,[1536,3,2,14,14],BF16
+model.visual.post_conv_layernorm.weight,[1536],BF16
+model.visual.post_layernorm.weight,[1536],BF16
+```
+## License
+Please refer to the licenses of the source models:
+- [INTELLECT-3 License](https://huggingface.co/PrimeIntellect/INTELLECT-3)
+- [GLM-4.6V License](https://huggingface.co/THUDM/GLM-4.6V)
+## Acknowledgments
+- [Prime Intellect](https://www.primeintellect.ai/) for INTELLECT-3
+- [THUDM](https://github.com/THUDM) for GLM-4.6V

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,141 @@

+[gMASK]<sop>
+{%- if tools -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{{ tool | tojson(ensure_ascii=False) }}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}
+<arg_key>{arg-key-1}</arg_key>
+<arg_value>{arg-value-1}</arg_value>
+<arg_key>{arg-key-2}</arg_key>
+<arg_value>{arg-value-2}</arg_value>
+...
+</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is mapping and (item.type == 'image' or 'image' in item) -%}
+                <|begin_of_image|><|image|><|end_of_image|>
+            {%- elif item is mapping and (item.type == 'video' or 'video' in item) -%}
+                <|begin_of_video|><|video|><|end_of_video|>
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1) %}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {% set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{% for m in messages %}
+{%- if m.role == 'user' -%}<|user|>
+{% if m.content is string %}
+{{ m.content }}
+{%- else %}
+{%- for item in m.content %}
+{% if item.type == 'video' or 'video' in item %}
+<|begin_of_video|><|video|><|end_of_video|>{% elif item.type == 'image' or 'image' in item %}
+<|begin_of_image|><|image|><|end_of_image|>{% elif item.type == 'text' %}
+{{ item.text }}
+{%- endif %}
+{%- endfor %}
+{%- endif %}
+{{- '/nothink' if (enable_thinking is defined and not enable_thinking and not visible_text(m.content).endswith("/nothink")) else '' -}}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set reasoning_content = '' %}
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- else %}
+    {%- if '</think>' in content %}
+        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+    {%- endif %}
+{%- endif %}
+{%- if loop.index0 > ns.last_user_index and reasoning_content -%}
+{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
+{%- else -%}
+{{ '\n<think></think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ '\n' + content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{ '\n<tool_call>' + tc.name }}
+{% set _args = tc.arguments %}
+{% for k, v in _args.items() %}
+<arg_key>{{ k }}</arg_key>
+<arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>
+{% endfor %}
+</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if m.content is string -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' }}
+{%- endif %}
+{{- '\n<tool_response>\n' }}
+{{- m.content }}
+{{- '\n</tool_response>' }}
+{% elif m.content is iterable and m.content is not mapping %}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+{{- '<|observation|>' }}
+{%- endif %}
+{{- '\n<tool_response>\n' }}
+{%- for tr in m.content -%}
+  {%- if tr is mapping and tr.type is defined -%}
+    {%- set t = tr.type | lower -%}
+    {%- if t == 'text' and tr.text is defined -%}
+{{ tr.text }}
+    {%- elif t in ['image', 'image_url'] -%}
+<|begin_of_image|><|image|><|end_of_image|>
+    {%- elif t in ['video', 'video_url'] -%}
+<|begin_of_video|><|video|><|end_of_video|>
+    {%- else -%}
+{{ tr | tojson(ensure_ascii=False) }}
+    {%- endif -%}
+  {%- else -%}
+{{ tr.output if tr.output is defined else tr }}
+  {%- endif -%}
+{%- endfor -%}
+{{- '\n</tool_response>' }}
+{%- else -%}
+<|observation|>{% for tr in m.content %}
+<tool_response>
+{{ tr.output if tr.output is defined else tr }}
+</tool_response>{% endfor -%}
+{% endif -%}
+{# ====== 逻辑结束 ====== #}
+{%- elif m.role == 'system' -%}
+<|system|>
+{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+<|assistant|>
+{{'<think></think>\n' if (enable_thinking is defined and not enable_thinking) else ''}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,80 @@

+{
+  "architectures": [
+    "Glm4vMoeForConditionalGeneration"
+  ],
+  "model_type": "glm4v_moe",
+  "text_config": {
+    "attention_bias": true,
+    "attention_dropout": 0.0,
+    "dtype": "bfloat16",
+    "eos_token_id": [
+      151329,
+      151336,
+      151338
+    ],
+    "first_k_dense_replace": 1,
+    "head_dim": 128,
+    "hidden_act": "silu",
+    "hidden_size": 4096,
+    "initializer_range": 0.02,
+    "intermediate_size": 10944,
+    "max_position_embeddings": 131072,
+    "model_type": "glm4v_moe_text",
+    "moe_intermediate_size": 1408,
+    "n_group": 1,
+    "n_routed_experts": 128,
+    "n_shared_experts": 1,
+    "norm_topk_prob": true,
+    "num_attention_heads": 96,
+    "num_experts_per_tok": 8,
+    "num_hidden_layers": 46,
+    "num_key_value_heads": 8,
+    "num_nextn_predict_layers": 0,
+    "pad_token_id": 151329,
+    "partial_rotary_factor": 0.5,
+    "qk_layernorm": false,
+    "rms_norm_eps": 1e-05,
+    "rope_parameters": {
+      "mrope_section": [
+        8,
+        12,
+        12
+      ],
+      "partial_rotary_factor": 0.5,
+      "rope_theta": 500000,
+      "rope_type": "default"
+    },
+    "routed_scaling_factor": 1.0,
+    "topk_group": 1,
+    "use_cache": true,
+    "use_qk_norm": false,
+    "vocab_size": 151552
+  },
+  "tie_word_embeddings": false,
+  "transformers_version": "5.0.0rc0",
+  "image_start_token_id": 151339,
+  "image_end_token_id": 151340,
+  "video_start_token_id": 151341,
+  "video_end_token_id": 151342,
+  "image_token_id": 151363,
+  "video_token_id": 151364,
+  "vision_config": {
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "depth": 24,
+    "hidden_act": "silu",
+    "hidden_dropout_prob": 0.0,
+    "hidden_size": 1536,
+    "image_size": 336,
+    "in_channels": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 10944,
+    "model_type": "glm4v_moe_vision",
+    "num_heads": 12,
+    "out_hidden_size": 4096,
+    "patch_size": 14,
+    "rms_norm_eps": 1e-05,
+    "spatial_merge_size": 2,
+    "temporal_patch_size": 2
+  }
+}

convert.py ADDED Viewed

	@@ -0,0 +1,259 @@

+#!/usr/bin/env python3
+"""
+Graft INTELLECT-3 language model weights into GLM-4.6V vision-language model.
+This script:
+1. Loads both models into CPU memory
+2. Copies model.layers.* from INTELLECT-3 to model.language_model.layers.* in GLM-4.6V
+3. Copies model.norm.weight from INTELLECT-3 to model.language_model.norm.weight in GLM-4.6V
+4. Saves the resulting model to a new directory
+Does NOT touch:
+- model.language_model.embed_tokens (needed for vision token compatibility)
+- lm_head (kept aligned with embed_tokens)
+- model.visual.* (vision encoder preserved)
+"""
+import os
+import argparse
+import json
+import shutil
+from pathlib import Path
+from safetensors import safe_open
+from safetensors.torch import save_file
+import torch
+from tqdm import tqdm
+def get_safetensor_files(model_dir: Path) -> list[Path]:
+    """Get all safetensor files in a model directory."""
+    files = sorted(model_dir.glob("*.safetensors"))
+    if not files:
+        raise FileNotFoundError(f"No safetensor files found in {model_dir}")
+    return files
+def load_state_dict_from_safetensors(model_dir: Path) -> dict[str, torch.Tensor]:
+    """Load all tensors from safetensor files into a state dict."""
+    state_dict = {}
+    files = get_safetensor_files(model_dir)
+    for f in tqdm(files, desc=f"Loading {model_dir.name}"):
+        with safe_open(f, framework="pt", device="cpu") as st:
+            for key in st.keys():
+                state_dict[key] = st.get_tensor(key)
+    return state_dict
+def graft_weights(
+    intellect3_state: dict[str, torch.Tensor],
+    glm_state: dict[str, torch.Tensor]
+) -> dict[str, torch.Tensor]:
+    """
+    Graft INTELLECT-3 weights into GLM-4.6V state dict.
+    Mapping:
+    - model.layers.* -> model.language_model.layers.*
+    - model.norm.weight -> model.language_model.norm.weight
+    """
+    grafted_state = dict(glm_state)  # shallow copy
+    grafted_count = 0
+    skipped_keys = []
+    for intellect_key, tensor in tqdm(intellect3_state.items(), desc="Grafting weights"):
+        # Skip embed_tokens and lm_head from INTELLECT-3
+        if "embed_tokens" in intellect_key or "lm_head" in intellect_key:
+            skipped_keys.append(intellect_key)
+            continue
+        # Map model.layers.* -> model.language_model.layers.*
+        if intellect_key.startswith("model.layers."):
+            glm_key = intellect_key.replace("model.layers.", "model.language_model.layers.")
+        # Map model.norm.weight -> model.language_model.norm.weight
+        elif intellect_key == "model.norm.weight":
+            glm_key = "model.language_model.norm.weight"
+        else:
+            skipped_keys.append(intellect_key)
+            continue
+        # Verify the key exists in GLM and shapes match
+        if glm_key not in grafted_state:
+            print(f"WARNING: {glm_key} not found in GLM-4.6V state dict!")
+            continue
+        if grafted_state[glm_key].shape != tensor.shape:
+            print(f"WARNING: Shape mismatch for {glm_key}:")
+            print(f"  INTELLECT-3: {tensor.shape}")
+            print(f"  GLM-4.6V:    {grafted_state[glm_key].shape}")
+            continue
+        grafted_state[glm_key] = tensor
+        grafted_count += 1
+    print(f"\nGrafted {grafted_count} tensors from INTELLECT-3")
+    print(f"Skipped {len(skipped_keys)} tensors: {skipped_keys[:5]}{'...' if len(skipped_keys) > 5 else ''}")
+    return grafted_state
+def save_state_dict_to_safetensors(
+    state_dict: dict[str, torch.Tensor],
+    output_dir: Path,
+    max_shard_size: int = 5 * 1024 ** 3  # 5GB default
+):
+    """Save state dict to sharded safetensor files."""
+    output_dir.mkdir(parents=True, exist_ok=True)
+    # Calculate total size and plan shards
+    tensors_by_size = [(k, v, v.numel() * v.element_size()) for k, v in state_dict.items()]
+    total_size = sum(size for _, _, size in tensors_by_size)
+    print(f"\nTotal model size: {total_size / 1024**3:.2f} GB")
+    # Create shards
+    shards = []
+    current_shard = {}
+    current_size = 0
+    for key, tensor, size in tensors_by_size:
+        if current_size + size > max_shard_size and current_shard:
+            shards.append(current_shard)
+            current_shard = {}
+            current_size = 0
+        current_shard[key] = tensor
+        current_size += size
+    if current_shard:
+        shards.append(current_shard)
+    print(f"Saving to {len(shards)} shard(s)...")
+    # Save shards and build index
+    weight_map = {}
+    for i, shard in enumerate(tqdm(shards, desc="Saving shards")):
+        if len(shards) == 1:
+            filename = "model.safetensors"
+        else:
+            filename = f"model-{i+1:05d}-of-{len(shards):05d}.safetensors"
+        filepath = output_dir / filename
+        save_file(shard, filepath)
+        for key in shard.keys():
+            weight_map[key] = filename
+    # Save index if sharded
+    if len(shards) > 1:
+        index = {
+            "metadata": {"total_size": total_size},
+            "weight_map": weight_map
+        }
+        with open(output_dir / "model.safetensors.index.json", "w") as f:
+            json.dump(index, f, indent=2)
+    return weight_map
+def copy_config_files(src_dir: Path, dst_dir: Path):
+    """Copy config files from source to destination."""
+    config_files = [
+        "config.json",
+        "tokenizer.json",
+        "tokenizer_config.json",
+        "special_tokens_map.json",
+        "generation_config.json",
+        "preprocessor_config.json",
+        "chat_template.json",
+    ]
+    for filename in config_files:
+        src_file = src_dir / filename
+        if src_file.exists():
+            shutil.copy2(src_file, dst_dir / filename)
+            print(f"Copied {filename}")
+def main():
+    parser = argparse.ArgumentParser(
+        description="Graft INTELLECT-3 weights into GLM-4.6V"
+    )
+    parser.add_argument(
+        "--intellect3",
+        type=Path,
+        default=Path.home() / "models" / "INTELLECT-3",
+        help="Path to INTELLECT-3 model directory"
+    )
+    parser.add_argument(
+        "--glm",
+        type=Path,
+        default=Path.home() / "models" / "GLM-4.6V",
+        help="Path to GLM-4.6V model directory"
+    )
+    parser.add_argument(
+        "--output",
+        type=Path,
+        default=Path.home() / "models" / "INTELLECT-3-V",
+        help="Path to output directory"
+    )
+    parser.add_argument(
+        "--shard-size",
+        type=int,
+        default=5,
+        help="Maximum shard size in GB (default: 5)"
+    )
+    args = parser.parse_args()
+    print("=" * 60)
+    print("INTELLECT-3 -> GLM-4.6V Weight Grafting")
+    print("=" * 60)
+    print(f"INTELLECT-3 source: {args.intellect3}")
+    print(f"GLM-4.6V source:    {args.glm}")
+    print(f"Output directory:   {args.output}")
+    print("=" * 60)
+    # Verify source directories exist
+    if not args.intellect3.exists():
+        raise FileNotFoundError(f"INTELLECT-3 directory not found: {args.intellect3}")
+    if not args.glm.exists():
+        raise FileNotFoundError(f"GLM-4.6V directory not found: {args.glm}")
+    # Load both models
+    print("\nStep 1: Loading models into CPU memory...")
+    intellect3_state = load_state_dict_from_safetensors(args.intellect3)
+    glm_state = load_state_dict_from_safetensors(args.glm)
+    print(f"\nINTELLECT-3 tensors: {len(intellect3_state)}")
+    print(f"GLM-4.6V tensors:    {len(glm_state)}")
+    # Graft weights
+    print("\nStep 2: Grafting INTELLECT-3 weights into GLM-4.6V...")
+    grafted_state = graft_weights(intellect3_state, glm_state)
+    # Free memory from source models
+    del intellect3_state
+    del glm_state
+    # Save grafted model
+    print("\nStep 3: Saving grafted model...")
+    save_state_dict_to_safetensors(
+        grafted_state,
+        args.output,
+        max_shard_size=args.shard_size * 1024 ** 3
+    )
+    # Copy config files from GLM-4.6V (since we're keeping its architecture)
+    print("\nStep 4: Copying config files from GLM-4.6V...")
+    copy_config_files(args.glm, args.output)
+    print("\n" + "=" * 60)
+    print("Done! Grafted model saved to:", args.output)
+    print("=" * 60)
+if __name__ == "__main__":
+    main()

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "_from_model_config": true,
+  "do_sample": true,
+  "eos_token_id": [
+    151329,
+    151336,
+    151338
+  ],
+  "pad_token_id": 151329,
+  "top_p": 0.6,
+  "temperature": 0.8,
+  "top_k": 2,
+  "transformers_version": "5.0.0rc0"
+}

model-00002-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:04e16916549c3f2c5beaaa8ce76c48a73260af7e0a019ea96adbbe3ffca2923b
+size 5363575312

model-00003-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:294d4ade39bb96ef966cc241c190b17bff37210a1dd6ea41a53620c598214309
+size 5363619592

model-00004-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:070a9a69f9cceeb0514e07e565c0bca22ef02412b5ea67083470df9d9f862f02
+size 5363575224

model-00005-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5c6c7badf80f149045fd76946a03c4e152b1a9490606bde43b0e478249752e41
+size 5363575224

model-00006-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:43ae1a55c7a4f08899c926f2dc69ca09cbec5c0ee5066d043eb8df36a5b6d544
+size 5363575224

model-00007-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d57281997daedd2cd8bfb64bef13d01fc48fa69d8ae627d35d0dece2803704a
+size 5363575272

model-00008-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:acb579063d3e0aef416c935469aeeb4ef71121bbddf65c75bcfd04b5186b902d
+size 5363566960

model-00009-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5537541ed849d1aec5a03ba3f6fbd412420ecc123baf6407392ebf9ec79a1778
+size 5363583992

model-00010-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:910065ed4f943bde57e99b614995a025d22083f9dba9e74a78f5b0b08cf613a5
+size 5363620024

model-00011-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38d068184dedf2e75febd4500f7dbac3c304da64a916a5504325055bbb0db36b
+size 5363575680

model-00012-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62e8f7bb30f6f67beea4a9c046f2757ea9b6063a3e43c9751be12300caf4acac
+size 5363575680

model-00013-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b05c10ffd6afdce59491acbae09d8d6276210aca5aef1f6e7a54bcc48aa15750
+size 5363575688

model-00014-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a92ae88cb4cec2c5edae04e7a8ba515892131b9294e212d3e5297c6e7e6a3e56
+size 5363575728

model-00015-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6fd86d501d81c538c6215c096318a4d0813ac149eacf6b8dc32384ed3a6580fc
+size 5363575736

model-00016-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8e53b380fdb8442cf5199a8d099fbf633f7cc201030c4cbbec7a24e76636687
+size 5361489896

model-00017-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7dd04e94143cc3ab4959e7be4e99b725dcd6dfb6be448d5a71827ede6434e7ce
+size 5365705936

model-00018-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34921251b4117d709f9b870def609c06a0ee19a947591326740846f3e395c29a
+size 5363575680

model-00019-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfac271a321ffdf71f18339d68ca9a963b561e66482939d27037b037b6f591e7
+size 5363575680

model-00020-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc665d4223264593cdfb236f9983d35cbfb8d22b75659cfd70cf812392fa1560
+size 5363575696

model-00021-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e24a876ca2551d8f26bcbde6ab69eb0912b7d381c3d7032c44c85780c49c3ce8
+size 5363575736

model-00022-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2fe3b04644e856b30b50fd58f821458166b65807962d8d8afbd9789b063b96c8
+size 5363575736

model-00023-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:294316579802e473bc9067b5d63ff3e3c3336b2601fba86b0fcbc32a61dfca6d
+size 5280748552

model-00024-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:abbe67f9fe5867bd046d7952f035ea4a317695427ed09732da863a49f10f120f
+size 5365705928

model-00025-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c550e04a34e8996ff55eb04e5f07131dbeeeda8b6ce8c261262b3bf010ac6ee
+size 5363575680

model-00026-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51242ebda76d87c6b6554e89646efe52d5378c9c4151ccaa5682b241114f892a
+size 5363575680

model-00027-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:61ba10028421f788dd6bda5f923a7070a14da5ca94829e851eff5eec8e63d750
+size 5363575696

model-00028-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2465c61939b44de2a1c01e813784b1f5ed13174a31482e9ef27173d7c956cdf0
+size 5363575728

model-00029-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae8a9fe785fe86f299c3fd9a54c708eb3f7bb26c63f9ad15761776d23d6cf3c9
+size 5363575736

model-00030-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8f3d581024b5782391e7a848ab952e663a59ce93c62df55f9209e58e270b214
+size 5361478264

model-00031-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4ab10f73490a4d676b90eb06ea7b402fbbc8145c9cc3de9f2b2170b5b8a4739
+size 5365717520

model-00032-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0a222975edd317613a379b812f92d05ee00bb414af3fda0437ab0474ddc1c617
+size 5363575680

model-00033-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f02049b5ce95890500ce992b9cd75282e89e5ad276d5c1f6999336d4794eabf
+size 5363575680

model-00034-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:088a6843e5334436c9f85679a029b4b471f6f1b85238175b59bce1d27bf43cdb
+size 5363575704

model-00035-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:194df5d2ce09178c02104cc3a23823b739c458b7b9504b389461ce45f6f661e3
+size 5363575728

model-00036-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c5dee2d30f392e6ebe05c3569d12f3dacf3bf3b71c2e0c0ab130a1d1fe1e36e
+size 5363575736

model-00037-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:220dd8e4c825621a54251220ffa87d02080cd4e0955b29dcc61f050643778f38
+size 5309059384

model-00039-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:15e9c3446f8a3d34e9f7112e72532fa026d52738376269862db7c93ab5e1eb76
+size 5363567360

model-00040-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c482478e5d4f5e0e4d77ba4cb4f0ba75a749c94e526dee1ef6c1e5281ebcf22
+size 5360945920

model-00041-of-00041.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5ea489fae1616921703724208cf7c6b9aa2ef4eae0e0acc06f1e456f5cdd8646
+size 1023744840

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+    "size": {"shortest_edge": 12544, "longest_edge": 9633792},
+    "do_rescale": true,
+    "patch_size": 14,
+    "temporal_patch_size": 2,
+    "merge_size": 2,
+    "image_mean": [0.48145466, 0.4578275, 0.40821073],
+    "image_std": [0.26862954, 0.26130258, 0.27577711],
+    "image_processor_type": "Glm46VImageProcessor",
+    "processor_class": "Glm46VProcessor"
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+torch
+safetensors
+tqdm

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9340665016419c825c4bdabbcc9acc43b7ca2c68ce142724afa829abb1be5efd
+size 19970699

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,327 @@

+{
+  "added_tokens_decoder": {
+    "151329": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151330": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151331": {
+      "content": "[gMASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151332": {
+      "content": "[sMASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151333": {
+      "content": "<sop>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151334": {
+      "content": "<eop>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151335": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151336": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151337": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151338": {
+      "content": "<|observation|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151339": {
+      "content": "<|begin_of_image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151340": {
+      "content": "<|end_of_image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151341": {
+      "content": "<|begin_of_video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151342": {
+      "content": "<|end_of_video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151343": {
+      "content": "<|begin_of_audio|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151344": {
+      "content": "<|end_of_audio|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151345": {
+      "content": "<|begin_of_transcription|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151346": {
+      "content": "<|end_of_transcription|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151347": {
+      "content": "<|code_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151348": {
+      "content": "<|code_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151349": {
+      "content": "<|code_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151350": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151351": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151352": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151353": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151354": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151355": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151356": {
+      "content": "<arg_key>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151357": {
+      "content": "</arg_key>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151358": {
+      "content": "<arg_value>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151359": {
+      "content": "</arg_value>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151360": {
+      "content": "/nothink",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151361": {
+      "content": "<|begin_of_box|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151362": {
+      "content": "<|end_of_box|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151363": {
+      "content": "<|image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151364": {
+      "content": "<|video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "[MASK]",
+    "[gMASK]",
+    "[sMASK]",
+    "<sop>",
+    "<eop>",
+    "<|system|>",
+    "<|user|>",
+    "<|assistant|>",
+    "<|observation|>",
+    "<|begin_of_image|>",
+    "<|end_of_image|>",
+    "<|begin_of_video|>",
+    "<|end_of_video|>",
+    "<|begin_of_audio|>",
+    "<|end_of_audio|>",
+    "<|image|>",
+    "<|video|>",
+    "<|begin_of_transcription|>",
+    "<|end_of_transcription|>",
+    "<|code_prefix|>",
+    "<|code_middle|>",
+    "<|code_suffix|>",
+    "/nothink"
+  ],
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "model_max_length": 128000,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "left",
+  "remove_space": false,
+  "tokenizer_class": "PreTrainedTokenizer"
+}