Instructions to use Reza2kn/Cosmos3-Nano-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use Reza2kn/Cosmos3-Nano-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Cosmos3-Nano-MLX-4bit Reza2kn/Cosmos3-Nano-MLX-4bit
- Cosmos
How to use Reza2kn/Cosmos3-Nano-MLX-4bit with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Upload folder using huggingface_hub
Browse files- .gitattributes +9 -0
- README.md +66 -0
- mlx_cosmos3.py +193 -0
- mlx_pipeline.py +113 -0
- model_index.json +28 -0
- samples/anime.png +3 -0
- samples/barista.png +3 -0
- samples/city.png +3 -0
- samples/food.png +3 -0
- samples/panda.png +3 -0
- samples/portrait.png +3 -0
- samples/t2v_f0.png +0 -0
- samples/t2v_f16.png +3 -0
- samples/t2v_f8.png +3 -0
- samples/t2v_waves.mp4 +0 -0
- scheduler/scheduler_config.json +33 -0
- sound_tokenizer/config.json +64 -0
- sound_tokenizer/diffusion_pytorch_model.safetensors +3 -0
- text_tokenizer/added_tokens.json +28 -0
- text_tokenizer/chat_template.jinja +120 -0
- text_tokenizer/merges.txt +0 -0
- text_tokenizer/special_tokens_map.json +31 -0
- text_tokenizer/tokenizer.json +3 -0
- text_tokenizer/tokenizer_config.json +239 -0
- text_tokenizer/vocab.json +0 -0
- transformer/mlx_quant_config.json +474 -0
- transformer/model-00001-of-00007.safetensors +3 -0
- transformer/model-00002-of-00007.safetensors +3 -0
- transformer/model-00003-of-00007.safetensors +3 -0
- transformer/model-00004-of-00007.safetensors +3 -0
- transformer/model-00005-of-00007.safetensors +3 -0
- transformer/model-00006-of-00007.safetensors +3 -0
- transformer/model-00007-of-00007.safetensors +3 -0
- vae/config.json +129 -0
- vae/diffusion_pytorch_model.safetensors +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
samples/anime.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
samples/barista.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
samples/city.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
samples/food.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
samples/panda.png filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
samples/portrait.png filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
samples/t2v_f16.png filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
samples/t2v_f8.png filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
text_tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: other
|
| 3 |
+
license_name: openmdw-1.1
|
| 4 |
+
license_link: https://openmdw.ai/license/1-1/
|
| 5 |
+
base_model: nvidia/Cosmos3-Nano
|
| 6 |
+
base_model_relation: quantized
|
| 7 |
+
library_name: mlx
|
| 8 |
+
pipeline_tag: text-to-image
|
| 9 |
+
tags: [cosmos, cosmos3, mlx, apple-silicon, 4-bit, quantization, text-to-image]
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# Cosmos3-Nano — MLX 4-bit (Apple Silicon)
|
| 13 |
+
|
| 14 |
+
A **4-bit MLX** build of [`nvidia/Cosmos3-Nano`](https://huggingface.co/nvidia/Cosmos3-Nano) that
|
| 15 |
+
**runs on Apple Silicon** — not just quantized weights, a working text2image model. The custom
|
| 16 |
+
Cosmos3 omni-MoT diffusion transformer was ported to MLX from scratch (no `mlx-vlm` support exists
|
| 17 |
+
for this architecture) and every block was validated against the reference torch implementation.
|
| 18 |
+
|
| 19 |
+
> Derivative of `nvidia/Cosmos3-Nano`. © NVIDIA. Distributed under **OpenMDW-1.1** (license + NVIDIA
|
| 20 |
+
> copyright/origin notices retained). Not affiliated with, nor endorsed by, NVIDIA.
|
| 21 |
+
|
| 22 |
+
## Highlights
|
| 23 |
+
- **Transformer: 30.3 GB bf16 → 12.1 GB MLX-4bit** (468 attn+MLP linears quantized, group-64; embeddings/norms/lm_head kept bf16).
|
| 24 |
+
- **Runs ~11 GB peak** — fits a 16 GB Mac. ~12 s for a 256² image (M2 Ultra), longer at higher res.
|
| 25 |
+
- **Validated:** every module matches torch — primitives ~1e-6, full decoder layer ~1e-3 (bf16), patchify bit-exact.
|
| 26 |
+
|
| 27 |
+
## Usage
|
| 28 |
+
```python
|
| 29 |
+
import torch
|
| 30 |
+
from huggingface_hub import snapshot_download
|
| 31 |
+
from mlx_pipeline import MLXCosmos3Transformer # included in this repo
|
| 32 |
+
from diffusers import Cosmos3OmniPipeline, AutoencoderKLWan, UniPCMultistepScheduler
|
| 33 |
+
from diffusers.models.autoencoders.autoencoder_cosmos3_audio import Cosmos3AVAEAudioTokenizer
|
| 34 |
+
from transformers import AutoTokenizer
|
| 35 |
+
|
| 36 |
+
repo = snapshot_download("Reza2kn/Cosmos3-Nano-MLX-4bit")
|
| 37 |
+
vae = AutoencoderKLWan.from_pretrained(repo, subfolder="vae", torch_dtype=torch.float32).eval()
|
| 38 |
+
sched = UniPCMultistepScheduler.from_pretrained(repo, subfolder="scheduler")
|
| 39 |
+
tok = AutoTokenizer.from_pretrained(repo, subfolder="text_tokenizer")
|
| 40 |
+
st = Cosmos3AVAEAudioTokenizer.from_pretrained(repo, subfolder="sound_tokenizer", torch_dtype=torch.float32).eval()
|
| 41 |
+
pipe = Cosmos3OmniPipeline(transformer=MLXCosmos3Transformer(repo + "/transformer"),
|
| 42 |
+
text_tokenizer=tok, vae=vae, scheduler=sched, sound_tokenizer=st, enable_safety_checker=False)
|
| 43 |
+
img = pipe("A red panda astronaut floating in a nebula", num_frames=1,
|
| 44 |
+
height=384, width=384, num_inference_steps=24).video[0][0]
|
| 45 |
+
img.save("out.png")
|
| 46 |
+
```
|
| 47 |
+
**Requires:** `mlx`, `diffusers` (git main / ≥0.39 for Cosmos3), `transformers`, `torch` (VAE/scheduler only). The
|
| 48 |
+
heavy 16B transformer runs in MLX on the GPU; the small VAE/scheduler/tokenizer run in torch.
|
| 49 |
+
|
| 50 |
+
## Quality (honest)
|
| 51 |
+
Same profile as any 4-bit build: **clean on typical content** (portraits, scenes, objects, food —
|
| 52 |
+
see `samples/`), but **4-bit defects appear on hard anatomy** — e.g. fused/mangled **hands**
|
| 53 |
+
(`samples/barista.png`) and broken limbs in complex poses (`samples/anime.png`). PickScore (mean
|
| 54 |
+
**21.42**, vs the CUDA builds' ~21.8) does **not** reliably catch these — eyeball the hard cases.
|
| 55 |
+
Use FP8/BF16 if you need hands/complex anatomy to hold up.
|
| 56 |
+
|
| 57 |
+
## Status / honesty
|
| 58 |
+
- **text2image: working** (`samples/*.png`), with the 4-bit anatomy caveats above.
|
| 59 |
+
- **text2video: working** (`samples/t2v_waves.mp4`, `num_frames>1`).
|
| 60 |
+
- **image2video / audio:** not implemented yet (image-conditioning + sound paths).
|
| 61 |
+
- Quantization is 4-bit weight-only — near-original on typical content, with the usual 4-bit wobble on the
|
| 62 |
+
hardest cases (dense hands, on-image text), same as any 4-bit build.
|
| 63 |
+
|
| 64 |
+
## How it was built
|
| 65 |
+
`mlx_cosmos3.py` (validated MLX modules), `mlx_pipeline.py` (torch wrapper routing the transformer forward to MLX
|
| 66 |
+
while reusing torch tokenizer/UniPC/VAE/CFG). Quantized with `mx.quantize` (group-64, 4-bit), streamed shard-by-shard.
|
mlx_cosmos3.py
ADDED
|
@@ -0,0 +1,193 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""MLX port of the Cosmos3-Nano omni transformer. Built module-by-module, each validated
|
| 2 |
+
against the torch reference (validate_primitives.py). Runs the MLX 4-bit weights produced by
|
| 3 |
+
mlx_quant.py. WIP — primitives + attention first, then full transformer + pipeline glue."""
|
| 4 |
+
import mlx.core as mx
|
| 5 |
+
import mlx.nn as nn
|
| 6 |
+
QGROUP = 64
|
| 7 |
+
QBITS = 4 # set by loader from mlx_quant_config.json
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def rms_norm(x, weight, eps):
|
| 11 |
+
# matches diffusers RMSNorm: variance in float32, scale, then * weight
|
| 12 |
+
xf = x.astype(mx.float32)
|
| 13 |
+
var = mx.mean(xf * xf, axis=-1, keepdims=True)
|
| 14 |
+
xf = xf * mx.rsqrt(var + eps)
|
| 15 |
+
return (weight * xf.astype(x.dtype)) if weight is not None else xf.astype(x.dtype)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def silu(x):
|
| 19 |
+
return x * mx.sigmoid(x)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def swiglu_mlp(x, gate_w, up_w, down_w):
|
| 23 |
+
# down(silu(gate(x)) * up(x)); weights are [out,in] (torch Linear) -> x @ w.T
|
| 24 |
+
g = silu(x @ gate_w.T)
|
| 25 |
+
u = x @ up_w.T
|
| 26 |
+
return (g * u) @ down_w.T
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
def rotate_half(x):
|
| 30 |
+
half = x.shape[-1] // 2
|
| 31 |
+
return mx.concatenate([-x[..., half:], x[..., :half]], axis=-1)
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
def apply_rope(x, cos, sin):
|
| 35 |
+
# x: [N, heads, head_dim]; cos/sin: [N, head_dim] -> unsqueeze head axis
|
| 36 |
+
cos = mx.expand_dims(cos, 1)
|
| 37 |
+
sin = mx.expand_dims(sin, 1)
|
| 38 |
+
return x * cos + rotate_half(x) * sin
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
class RoPE3D:
|
| 42 |
+
"""Cosmos3VLTextRotaryEmbedding: interleaved 3D mRoPE."""
|
| 43 |
+
def __init__(self, head_dim, rope_theta, rope_axes_dim):
|
| 44 |
+
self.inv_freq = 1.0 / (rope_theta ** (mx.arange(0, head_dim, 2).astype(mx.float32) / head_dim))
|
| 45 |
+
self.rope_axes_dim = rope_axes_dim # e.g. [24,20,20]
|
| 46 |
+
|
| 47 |
+
def _interleave(self, freqs):
|
| 48 |
+
# freqs: [3, N, head_dim//2] -> [N, head_dim//2] interleaving H,W into T grid
|
| 49 |
+
freqs_t = freqs[0]
|
| 50 |
+
for dim, offset in ((1, 1), (2, 2)): # (axis idx, start offset)
|
| 51 |
+
length = self.rope_axes_dim[dim] * 3
|
| 52 |
+
idx = mx.arange(offset, length, 3)
|
| 53 |
+
# assign freqs_t[..., idx] = freqs[dim][..., idx]
|
| 54 |
+
sel = freqs[dim][..., idx]
|
| 55 |
+
freqs_t[..., idx] = sel
|
| 56 |
+
return freqs_t
|
| 57 |
+
|
| 58 |
+
def __call__(self, position_ids):
|
| 59 |
+
# position_ids: [3, N]
|
| 60 |
+
pid = position_ids.astype(mx.float32) # [3, N]
|
| 61 |
+
inv = self.inv_freq[None, :, None] # [1, d/2, 1]
|
| 62 |
+
inv = mx.broadcast_to(inv, (3, inv.shape[1], 1)) # [3, d/2, 1]
|
| 63 |
+
pe = pid[:, None, :] # [3, 1, N]
|
| 64 |
+
freqs = mx.transpose(inv @ pe, (0, 2, 1)) # [3, N, d/2]
|
| 65 |
+
freqs = self._interleave(freqs) # [N, d/2]
|
| 66 |
+
emb = mx.concatenate([freqs, freqs], axis=-1) # [N, d]
|
| 67 |
+
return mx.cos(emb), mx.sin(emb)
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def gqa_attention(q, k, v, n_heads, n_kv_heads, causal):
|
| 71 |
+
# q:[N,H,D] k,v:[M,Hkv,D]. expand kv groups, scaled-dot-product.
|
| 72 |
+
N, H, D = q.shape
|
| 73 |
+
M = k.shape[0]
|
| 74 |
+
rep = n_heads // n_kv_heads
|
| 75 |
+
k = mx.repeat(k, rep, axis=1) # [M, H, D]
|
| 76 |
+
v = mx.repeat(v, rep, axis=1)
|
| 77 |
+
q = mx.transpose(q, (1, 0, 2)) # [H, N, D]
|
| 78 |
+
k = mx.transpose(k, (1, 0, 2)) # [H, M, D]
|
| 79 |
+
v = mx.transpose(v, (1, 0, 2))
|
| 80 |
+
scale = 1.0 / (D ** 0.5)
|
| 81 |
+
scores = (q @ mx.transpose(k, (0, 2, 1))) * scale # [H, N, M]
|
| 82 |
+
if causal:
|
| 83 |
+
mask = mx.triu(mx.full((N, M), -1e9, dtype=scores.dtype), k=1)
|
| 84 |
+
scores = scores + mask
|
| 85 |
+
w = mx.softmax(scores.astype(mx.float32), axis=-1).astype(v.dtype)
|
| 86 |
+
out = w @ v # [H, N, D]
|
| 87 |
+
return mx.transpose(out, (1, 0, 2)) # [N, H, D]
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
def dual_attention(und_seq, gen_seq, rope, w, n_heads, n_kv_heads, head_dim, eps):
|
| 91 |
+
"""Cosmos3AttnProcessor in MLX: und=causal self-attn, gen=full attn over [und+gen] kv."""
|
| 92 |
+
cos_u, sin_u, cos_g, sin_g = rope
|
| 93 |
+
q_u = (und_seq @ w['to_q'].T).reshape(-1, n_heads, head_dim)
|
| 94 |
+
k_u = (und_seq @ w['to_k'].T).reshape(-1, n_kv_heads, head_dim)
|
| 95 |
+
v_u = (und_seq @ w['to_v'].T).reshape(-1, n_kv_heads, head_dim)
|
| 96 |
+
q_g = (gen_seq @ w['add_q_proj'].T).reshape(-1, n_heads, head_dim)
|
| 97 |
+
k_g = (gen_seq @ w['add_k_proj'].T).reshape(-1, n_kv_heads, head_dim)
|
| 98 |
+
v_g = (gen_seq @ w['add_v_proj'].T).reshape(-1, n_kv_heads, head_dim)
|
| 99 |
+
q_u = rms_norm(q_u, w['norm_q'], eps); k_u = rms_norm(k_u, w['norm_k'], eps)
|
| 100 |
+
q_g = rms_norm(q_g, w['norm_added_q'], eps); k_g = rms_norm(k_g, w['norm_added_k'], eps)
|
| 101 |
+
q_u = apply_rope(q_u, cos_u, sin_u); k_u = apply_rope(k_u, cos_u, sin_u)
|
| 102 |
+
q_g = apply_rope(q_g, cos_g, sin_g); k_g = apply_rope(k_g, cos_g, sin_g)
|
| 103 |
+
causal_out = gqa_attention(q_u, k_u, v_u, n_heads, n_kv_heads, causal=True).reshape(-1, n_heads * head_dim)
|
| 104 |
+
all_k = mx.concatenate([k_u, k_g], axis=0); all_v = mx.concatenate([v_u, v_g], axis=0)
|
| 105 |
+
full_out = gqa_attention(q_g, all_k, all_v, n_heads, n_kv_heads, causal=False).reshape(-1, n_heads * head_dim)
|
| 106 |
+
return causal_out @ w['to_out'].T, full_out @ w['to_add_out'].T
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
# ---- timestep embedding (diffusers Timesteps + TimestepEmbedding) ----
|
| 110 |
+
def get_timestep_embedding(timesteps, dim=256, max_period=10000, downscale_freq_shift=0.0):
|
| 111 |
+
half = dim // 2
|
| 112 |
+
exponent = -mx.log(mx.array(float(max_period))) * mx.arange(half).astype(mx.float32)
|
| 113 |
+
exponent = exponent / (half - downscale_freq_shift)
|
| 114 |
+
emb = mx.exp(exponent)
|
| 115 |
+
emb = timesteps.astype(mx.float32)[:, None] * emb[None, :]
|
| 116 |
+
# flip_sin_to_cos=True -> [cos, sin]
|
| 117 |
+
return mx.concatenate([mx.cos(emb), mx.sin(emb)], axis=-1)
|
| 118 |
+
|
| 119 |
+
|
| 120 |
+
def timestep_embedder(t_emb, l1_w, l1_b, l2_w, l2_b):
|
| 121 |
+
h = silu(t_emb @ l1_w.T + l1_b)
|
| 122 |
+
return h @ l2_w.T + l2_b
|
| 123 |
+
|
| 124 |
+
|
| 125 |
+
# ---- linear that accepts bf16 weight (mx array) or quantized tuple (wq, scales, biases) ----
|
| 126 |
+
def linear(x, w, bias=None, group_size=None, bits=None):
|
| 127 |
+
if isinstance(w, tuple):
|
| 128 |
+
wq, scales, biases = w
|
| 129 |
+
out = mx.quantized_matmul(x, wq, scales, biases, transpose=True,
|
| 130 |
+
group_size=group_size or QGROUP, bits=bits or QBITS)
|
| 131 |
+
else:
|
| 132 |
+
out = x @ w.T
|
| 133 |
+
return out + bias if bias is not None else out
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def decoder_layer(und, gen, rope, P, cfg):
|
| 137 |
+
"""One Cosmos3VLTextMoTDecoderLayer in MLX. P = dict of this layer's params (mx arrays or
|
| 138 |
+
quantized tuples). cfg = (n_heads, n_kv, head_dim, eps)."""
|
| 139 |
+
NH, NKV, HD, EPS = cfg
|
| 140 |
+
und_n = rms_norm(und, P["input_layernorm.weight"], EPS)
|
| 141 |
+
gen_n = rms_norm(gen, P["input_layernorm_moe_gen.weight"], EPS)
|
| 142 |
+
cos_u, sin_u, cos_g, sin_g = rope
|
| 143 |
+
|
| 144 |
+
def proj(seq, name, nh):
|
| 145 |
+
return linear(seq, P[name]).reshape(-1, nh, HD)
|
| 146 |
+
q_u = proj(und_n, "self_attn.to_q.weight", NH); k_u = proj(und_n, "self_attn.to_k.weight", NKV); v_u = proj(und_n, "self_attn.to_v.weight", NKV)
|
| 147 |
+
q_g = proj(gen_n, "self_attn.add_q_proj.weight", NH); k_g = proj(gen_n, "self_attn.add_k_proj.weight", NKV); v_g = proj(gen_n, "self_attn.add_v_proj.weight", NKV)
|
| 148 |
+
q_u = rms_norm(q_u, P["self_attn.norm_q.weight"], EPS); k_u = rms_norm(k_u, P["self_attn.norm_k.weight"], EPS)
|
| 149 |
+
q_g = rms_norm(q_g, P["self_attn.norm_added_q.weight"], EPS); k_g = rms_norm(k_g, P["self_attn.norm_added_k.weight"], EPS)
|
| 150 |
+
q_u = apply_rope(q_u, cos_u, sin_u); k_u = apply_rope(k_u, cos_u, sin_u)
|
| 151 |
+
q_g = apply_rope(q_g, cos_g, sin_g); k_g = apply_rope(k_g, cos_g, sin_g)
|
| 152 |
+
co = gqa_attention(q_u, k_u, v_u, NH, NKV, True).reshape(-1, NH * HD)
|
| 153 |
+
ak = mx.concatenate([k_u, k_g], axis=0); av = mx.concatenate([v_u, v_g], axis=0)
|
| 154 |
+
fo = gqa_attention(q_g, ak, av, NH, NKV, False).reshape(-1, NH * HD)
|
| 155 |
+
und = und + linear(co, P["self_attn.to_out.weight"])
|
| 156 |
+
gen = gen + linear(fo, P["self_attn.to_add_out.weight"])
|
| 157 |
+
und_m = rms_norm(und, P["post_attention_layernorm.weight"], EPS)
|
| 158 |
+
gen_m = rms_norm(gen, P["post_attention_layernorm_moe_gen.weight"], EPS)
|
| 159 |
+
und = und + linear(silu(linear(und_m, P["mlp.gate_proj.weight"])) * linear(und_m, P["mlp.up_proj.weight"]), P["mlp.down_proj.weight"])
|
| 160 |
+
gen = gen + linear(silu(linear(gen_m, P["mlp_moe_gen.gate_proj.weight"])) * linear(gen_m, P["mlp_moe_gen.up_proj.weight"]), P["mlp_moe_gen.down_proj.weight"])
|
| 161 |
+
return und, gen
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
# ---- patchify / pack / unpatchify (pure-tensor glue; matches torch methods) ----
|
| 165 |
+
def patchify_pack(latent, p, C):
|
| 166 |
+
"""latent [C,T,H,W] -> packed [num_patches, p*p*C], (T, hpat, wpat)."""
|
| 167 |
+
_, T, H, W = latent.shape
|
| 168 |
+
Hp = ((H + p - 1) // p) * p; Wp = ((W + p - 1) // p) * p
|
| 169 |
+
if Hp != H or Wp != W:
|
| 170 |
+
pad = mx.zeros((C, T, Hp, Wp), dtype=latent.dtype)
|
| 171 |
+
pad[:, :, :H, :W] = latent; latent = pad
|
| 172 |
+
hpat, wpat = Hp // p, Wp // p
|
| 173 |
+
latent = latent.reshape(C, T, hpat, p, wpat, p)
|
| 174 |
+
latent = mx.einsum("cthpwq->thwpqc", latent).reshape(-1, p * p * C)
|
| 175 |
+
return latent, (T, hpat, wpat)
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
def unpatchify(packed, token_shape, orig_hw, p, C):
|
| 179 |
+
"""packed [num_patches, p*p*C] -> latent [C, T, H, W]."""
|
| 180 |
+
T, hpat, wpat = token_shape
|
| 181 |
+
H, W = orig_hw
|
| 182 |
+
x = packed.reshape(T, hpat, wpat, p, p, C)
|
| 183 |
+
x = mx.einsum("thwpqc->cthpwq", x).reshape(C, T, hpat * p, wpat * p)
|
| 184 |
+
return x[:, :, :H, :W]
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
def scatter_timestep_single(tokens, t_embed, n_noisy_tokens):
|
| 188 |
+
"""t2i / all-noisy single-item case: add the (broadcast) timestep embed to the first
|
| 189 |
+
n_noisy_tokens rows. General multi-frame scatter handled in the pipeline layer."""
|
| 190 |
+
if t_embed.ndim == 1:
|
| 191 |
+
t_embed = mx.broadcast_to(t_embed[None, :], (n_noisy_tokens, tokens.shape[1]))
|
| 192 |
+
tokens[:n_noisy_tokens] = tokens[:n_noisy_tokens] + t_embed
|
| 193 |
+
return tokens
|
mlx_pipeline.py
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""End-to-end text2image with the MLX 4-bit transformer + torch pipeline orchestration.
|
| 2 |
+
A torch nn.Module wrapper routes the transformer forward to MLX; everything else (tokenizer,
|
| 3 |
+
UniPC scheduler, CFG, VAE decode) stays in torch (small, fits RAM). The 33GB torch transformer
|
| 4 |
+
is never loaded."""
|
| 5 |
+
import glob, json, sys, time
|
| 6 |
+
import numpy as np, torch
|
| 7 |
+
import mlx.core as mx
|
| 8 |
+
from types import SimpleNamespace
|
| 9 |
+
sys.path.insert(0, "/Users/studio/cosmos_mlx/work")
|
| 10 |
+
import mlx_cosmos3 as M
|
| 11 |
+
|
| 12 |
+
NANO = "/Users/studio/cosmos_mlx/models/Cosmos3-Nano"
|
| 13 |
+
EXPORT = "/Users/studio/cosmos_mlx/export/Cosmos3-Nano-MLX-4bit/transformer"
|
| 14 |
+
HID, HD, NH, NKV, NL, EPS = 4096, 128, 32, 8, 36, 1e-6
|
| 15 |
+
P_PATCH, C_LAT, AXES, THETA, TS_SCALE = 2, 48, [24, 20, 20], 5e6, 0.001
|
| 16 |
+
|
| 17 |
+
def _t2m(t): # torch -> mlx
|
| 18 |
+
return mx.array(t.detach().to(torch.float32).cpu().numpy())
|
| 19 |
+
def _m2t(a, dtype=torch.bfloat16):
|
| 20 |
+
return torch.from_numpy(np.array(a.astype(mx.float32))).to(dtype)
|
| 21 |
+
|
| 22 |
+
class MLXCosmos3Transformer(torch.nn.Module):
|
| 23 |
+
def __init__(self, export_dir):
|
| 24 |
+
super().__init__()
|
| 25 |
+
self.W = {}
|
| 26 |
+
for f in sorted(glob.glob(export_dir + "/*.safetensors")):
|
| 27 |
+
self.W.update(mx.load(f))
|
| 28 |
+
cfgd = json.load(open(NANO + "/transformer/config.json"))
|
| 29 |
+
cfgd = {k: v for k, v in cfgd.items() if not k.startswith("_")}
|
| 30 |
+
self.config = SimpleNamespace(**cfgd) # real config -> all fields the pipeline reads
|
| 31 |
+
qc = json.load(open(export_dir + "/mlx_quant_config.json"))
|
| 32 |
+
M.QGROUP, M.QBITS = qc.get("group_size", 64), qc.get("bits", 4) # 4 or 8 bit
|
| 33 |
+
self._dbg = True
|
| 34 |
+
self._dtype = torch.bfloat16
|
| 35 |
+
@property
|
| 36 |
+
def dtype(self): return self._dtype
|
| 37 |
+
@property
|
| 38 |
+
def device(self): return torch.device("cpu")
|
| 39 |
+
def to(self, *a, **k): return self
|
| 40 |
+
def eval(self): return self
|
| 41 |
+
|
| 42 |
+
def _lp(self, i):
|
| 43 |
+
pre = f"layers.{i}."; P = {}
|
| 44 |
+
for k in self.W:
|
| 45 |
+
if not k.startswith(pre) or k.endswith(".scales") or k.endswith(".biases"): continue
|
| 46 |
+
n = k[len(pre):]
|
| 47 |
+
P[n] = (self.W[k], self.W[k + ".scales"], self.W[k + ".biases"]) if k + ".scales" in self.W else self.W[k]
|
| 48 |
+
return P
|
| 49 |
+
def _gv(self, n):
|
| 50 |
+
return (self.W[n], self.W[n + ".scales"], self.W[n + ".biases"]) if n + ".scales" in self.W else self.W[n]
|
| 51 |
+
|
| 52 |
+
@torch.no_grad()
|
| 53 |
+
def forward(self, input_ids, text_indexes, position_ids, und_len, sequence_length,
|
| 54 |
+
vision_tokens, vision_token_shapes, vision_sequence_indexes, vision_mse_loss_indexes,
|
| 55 |
+
vision_timesteps, vision_noisy_frame_indexes, **sound_kw):
|
| 56 |
+
W = self.W
|
| 57 |
+
ii = mx.array(input_ids.cpu().numpy().astype(np.int32))
|
| 58 |
+
ti = mx.array(text_indexes.cpu().numpy().astype(np.int32))
|
| 59 |
+
vsi = mx.array(vision_sequence_indexes.cpu().numpy().astype(np.int32))
|
| 60 |
+
vmi = mx.array(vision_mse_loss_indexes.cpu().numpy().astype(np.int32))
|
| 61 |
+
pid = mx.array(position_ids.cpu().numpy().astype(np.int32))
|
| 62 |
+
latent = _t2m(vision_tokens[0]).reshape(C_LAT, *vision_tokens[0].shape[-3:]) # [C,T,H,W]
|
| 63 |
+
H, Wd = int(latent.shape[-2]), int(latent.shape[-1])
|
| 64 |
+
if getattr(self, "_dbg", False):
|
| 65 |
+
print(f"[wrapper] vision_tokens[0].shape={tuple(vision_tokens[0].shape)} latent T={latent.shape[1]} "
|
| 66 |
+
f"seq_len={sequence_length} und_len={und_len} mse_idx={vision_mse_loss_indexes.shape} "
|
| 67 |
+
f"token_shapes={vision_token_shapes} noisy={[ (x.tolist() if hasattr(x,'tolist') else x) for x in vision_noisy_frame_indexes]}", flush=True)
|
| 68 |
+
self._dbg = False
|
| 69 |
+
tstep = float(vision_timesteps[0].item())
|
| 70 |
+
|
| 71 |
+
emb = W["embed_tokens.weight"][ii]
|
| 72 |
+
hidden = mx.zeros((sequence_length, HID), dtype=emb.dtype)
|
| 73 |
+
hidden[ti] = emb
|
| 74 |
+
packed, shape = M.patchify_pack(latent, P_PATCH, C_LAT)
|
| 75 |
+
packed = M.linear(packed.astype(emb.dtype), self._gv("proj_in.weight"), W["proj_in.bias"])
|
| 76 |
+
te = M.get_timestep_embedding(mx.array([tstep * TS_SCALE]))
|
| 77 |
+
te = M.timestep_embedder(te, W["time_embedder.linear_1.weight"], W["time_embedder.linear_1.bias"],
|
| 78 |
+
W["time_embedder.linear_2.weight"], W["time_embedder.linear_2.bias"])[0].astype(emb.dtype)
|
| 79 |
+
packed = M.scatter_timestep_single(packed, te, packed.shape[0]) # t2i: all vision tokens noisy
|
| 80 |
+
hidden[vsi] = packed
|
| 81 |
+
cos, sin = M.RoPE3D(HD, THETA, AXES)(pid)
|
| 82 |
+
cos = cos.astype(emb.dtype); sin = sin.astype(emb.dtype)
|
| 83 |
+
und, gen = hidden[:und_len], hidden[und_len:]
|
| 84 |
+
rope = (cos[:und_len], sin[:und_len], cos[und_len:], sin[und_len:])
|
| 85 |
+
for i in range(NL):
|
| 86 |
+
und, gen = M.decoder_layer(und, gen, rope, self._lp(i), (NH, NKV, HD, EPS)); mx.eval(und, gen)
|
| 87 |
+
und = M.rms_norm(und, W["norm.weight"], EPS); gen = M.rms_norm(gen, W["norm_moe_gen.weight"], EPS)
|
| 88 |
+
last = mx.concatenate([und, gen], axis=0)
|
| 89 |
+
preds = M.linear(last[vmi], self._gv("proj_out.weight"), W["proj_out.bias"])
|
| 90 |
+
out = M.unpatchify(preds, shape, (H, Wd), P_PATCH, C_LAT); mx.eval(out)
|
| 91 |
+
return [_m2t(out, vision_tokens[0].dtype).unsqueeze(0)], None
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
if __name__ == "__main__":
|
| 95 |
+
from diffusers import Cosmos3OmniPipeline, AutoencoderKLWan, UniPCMultistepScheduler
|
| 96 |
+
from diffusers.models.autoencoders.autoencoder_cosmos3_audio import Cosmos3AVAEAudioTokenizer
|
| 97 |
+
from transformers import AutoTokenizer
|
| 98 |
+
dev = "cpu"
|
| 99 |
+
print("loading components (no torch transformer)...")
|
| 100 |
+
vae = AutoencoderKLWan.from_pretrained(NANO, subfolder="vae", torch_dtype=torch.float32).to(dev).eval()
|
| 101 |
+
sched = UniPCMultistepScheduler.from_pretrained(NANO, subfolder="scheduler")
|
| 102 |
+
tok = AutoTokenizer.from_pretrained(NANO, subfolder="text_tokenizer")
|
| 103 |
+
st = Cosmos3AVAEAudioTokenizer.from_pretrained(NANO, subfolder="sound_tokenizer", torch_dtype=torch.float32).to(dev).eval()
|
| 104 |
+
tf = MLXCosmos3Transformer(EXPORT)
|
| 105 |
+
pipe = Cosmos3OmniPipeline(transformer=tf, text_tokenizer=tok, vae=vae, scheduler=sched,
|
| 106 |
+
sound_tokenizer=st, enable_safety_checker=False)
|
| 107 |
+
print("generating (MLX 4-bit transformer)...")
|
| 108 |
+
t0 = time.time()
|
| 109 |
+
out = pipe(prompt="A red panda astronaut floating in a nebula, highly detailed", num_frames=1,
|
| 110 |
+
height=256, width=256, num_inference_steps=20, generator=torch.Generator().manual_seed(1))
|
| 111 |
+
img = out.video[0][0] if isinstance(out.video[0], list) else out.video[0]
|
| 112 |
+
img.save("/Users/studio/cosmos_mlx/work/mlx_t2i.png")
|
| 113 |
+
print(f"GENERATED in {time.time()-t0:.0f}s -> mlx_t2i.png ({img.size})")
|
model_index.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "Cosmos3OmniDiffusersPipeline",
|
| 3 |
+
"_diffusers_version": "0.37.1",
|
| 4 |
+
"scheduler": [
|
| 5 |
+
"diffusers",
|
| 6 |
+
"UniPCMultistepScheduler"
|
| 7 |
+
],
|
| 8 |
+
"text_tokenizer": [
|
| 9 |
+
"transformers",
|
| 10 |
+
"Qwen2TokenizerFast"
|
| 11 |
+
],
|
| 12 |
+
"transformer": [
|
| 13 |
+
"diffusers",
|
| 14 |
+
"Cosmos3OmniTransformer"
|
| 15 |
+
],
|
| 16 |
+
"vae": [
|
| 17 |
+
"diffusers",
|
| 18 |
+
"AutoencoderKLWan"
|
| 19 |
+
],
|
| 20 |
+
"vision_encoder": [
|
| 21 |
+
"transformers",
|
| 22 |
+
"Qwen3VLVisionModel"
|
| 23 |
+
],
|
| 24 |
+
"sound_tokenizer": [
|
| 25 |
+
"diffusers",
|
| 26 |
+
"Cosmos3AVAEAudioTokenizer"
|
| 27 |
+
]
|
| 28 |
+
}
|
samples/anime.png
ADDED
|
Git LFS Details
|
samples/barista.png
ADDED
|
Git LFS Details
|
samples/city.png
ADDED
|
Git LFS Details
|
samples/food.png
ADDED
|
Git LFS Details
|
samples/panda.png
ADDED
|
Git LFS Details
|
samples/portrait.png
ADDED
|
Git LFS Details
|
samples/t2v_f0.png
ADDED
|
samples/t2v_f16.png
ADDED
|
Git LFS Details
|
samples/t2v_f8.png
ADDED
|
Git LFS Details
|
samples/t2v_waves.mp4
ADDED
|
Binary file (98.1 kB). View file
|
|
|
scheduler/scheduler_config.json
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "UniPCMultistepScheduler",
|
| 3 |
+
"_diffusers_version": "0.37.1",
|
| 4 |
+
"beta_end": 0.02,
|
| 5 |
+
"beta_schedule": "linear",
|
| 6 |
+
"beta_start": 0.0001,
|
| 7 |
+
"disable_corrector": [],
|
| 8 |
+
"dynamic_thresholding_ratio": 0.995,
|
| 9 |
+
"final_sigmas_type": "zero",
|
| 10 |
+
"flow_shift": 1.0,
|
| 11 |
+
"lower_order_final": true,
|
| 12 |
+
"num_train_timesteps": 1000,
|
| 13 |
+
"predict_x0": true,
|
| 14 |
+
"prediction_type": "flow_prediction",
|
| 15 |
+
"rescale_betas_zero_snr": false,
|
| 16 |
+
"sample_max_value": 1.0,
|
| 17 |
+
"shift_terminal": null,
|
| 18 |
+
"sigma_max": 200.0,
|
| 19 |
+
"sigma_min": 0.147,
|
| 20 |
+
"solver_order": 2,
|
| 21 |
+
"solver_p": null,
|
| 22 |
+
"solver_type": "bh2",
|
| 23 |
+
"steps_offset": 0,
|
| 24 |
+
"thresholding": false,
|
| 25 |
+
"time_shift_type": "exponential",
|
| 26 |
+
"timestep_spacing": "linspace",
|
| 27 |
+
"trained_betas": null,
|
| 28 |
+
"use_beta_sigmas": false,
|
| 29 |
+
"use_dynamic_shifting": false,
|
| 30 |
+
"use_exponential_sigmas": false,
|
| 31 |
+
"use_flow_sigmas": true,
|
| 32 |
+
"use_karras_sigmas": true
|
| 33 |
+
}
|
sound_tokenizer/config.json
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "autoencoder_v2",
|
| 3 |
+
"sampling_rate": 48000,
|
| 4 |
+
"stereo": true,
|
| 5 |
+
"use_wav_as_input": true,
|
| 6 |
+
"normalize_volume": true,
|
| 7 |
+
"hop_size": 1920,
|
| 8 |
+
"input_channels": 1,
|
| 9 |
+
"enc_type": "spec_convnext",
|
| 10 |
+
"enc_dim": 192,
|
| 11 |
+
"enc_intermediate_dim": 768,
|
| 12 |
+
"enc_num_layers": 12,
|
| 13 |
+
"enc_num_blocks": 2,
|
| 14 |
+
"enc_n_fft": 64,
|
| 15 |
+
"enc_hop_length": 16,
|
| 16 |
+
"enc_latent_dim": 128,
|
| 17 |
+
"enc_c_mults": [
|
| 18 |
+
1,
|
| 19 |
+
2,
|
| 20 |
+
4
|
| 21 |
+
],
|
| 22 |
+
"enc_strides": [
|
| 23 |
+
4,
|
| 24 |
+
5,
|
| 25 |
+
6
|
| 26 |
+
],
|
| 27 |
+
"enc_identity_init": false,
|
| 28 |
+
"enc_use_snake": true,
|
| 29 |
+
"dec_type": "oobleck",
|
| 30 |
+
"dec_dim": 320,
|
| 31 |
+
"dec_c_mults": [
|
| 32 |
+
1,
|
| 33 |
+
2,
|
| 34 |
+
4,
|
| 35 |
+
8,
|
| 36 |
+
16
|
| 37 |
+
],
|
| 38 |
+
"dec_strides": [
|
| 39 |
+
2,
|
| 40 |
+
4,
|
| 41 |
+
5,
|
| 42 |
+
6,
|
| 43 |
+
8
|
| 44 |
+
],
|
| 45 |
+
"dec_use_snake": true,
|
| 46 |
+
"dec_final_tanh": false,
|
| 47 |
+
"dec_out_channels": 2,
|
| 48 |
+
"dec_anti_aliasing": false,
|
| 49 |
+
"dec_use_nearest_upsample": false,
|
| 50 |
+
"dec_use_tanh_at_final": false,
|
| 51 |
+
"bottleneck_type": "vae",
|
| 52 |
+
"bottleneck": {
|
| 53 |
+
"type": "vae"
|
| 54 |
+
},
|
| 55 |
+
"activation": "snakebeta",
|
| 56 |
+
"snake_logscale": true,
|
| 57 |
+
"anti_aliasing": false,
|
| 58 |
+
"use_cuda_kernel": false,
|
| 59 |
+
"causal": false,
|
| 60 |
+
"padding_mode": "zeros",
|
| 61 |
+
"vocoder_input_dim": 64,
|
| 62 |
+
"latent_mean": null,
|
| 63 |
+
"latent_std": null
|
| 64 |
+
}
|
sound_tokenizer/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9d4c61cde38acfb0cad9048a140c3533750277a8462b19dc08450d9fe1ad9879
|
| 3 |
+
size 1892409600
|
text_tokenizer/added_tokens.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"</think>": 151668,
|
| 3 |
+
"</tool_call>": 151658,
|
| 4 |
+
"</tool_response>": 151666,
|
| 5 |
+
"<think>": 151667,
|
| 6 |
+
"<tool_call>": 151657,
|
| 7 |
+
"<tool_response>": 151665,
|
| 8 |
+
"<|box_end|>": 151649,
|
| 9 |
+
"<|box_start|>": 151648,
|
| 10 |
+
"<|endoftext|>": 151643,
|
| 11 |
+
"<|file_sep|>": 151664,
|
| 12 |
+
"<|fim_middle|>": 151660,
|
| 13 |
+
"<|fim_pad|>": 151662,
|
| 14 |
+
"<|fim_prefix|>": 151659,
|
| 15 |
+
"<|fim_suffix|>": 151661,
|
| 16 |
+
"<|im_end|>": 151645,
|
| 17 |
+
"<|im_start|>": 151644,
|
| 18 |
+
"<|image_pad|>": 151655,
|
| 19 |
+
"<|object_ref_end|>": 151647,
|
| 20 |
+
"<|object_ref_start|>": 151646,
|
| 21 |
+
"<|quad_end|>": 151651,
|
| 22 |
+
"<|quad_start|>": 151650,
|
| 23 |
+
"<|repo_name|>": 151663,
|
| 24 |
+
"<|video_pad|>": 151656,
|
| 25 |
+
"<|vision_end|>": 151653,
|
| 26 |
+
"<|vision_pad|>": 151654,
|
| 27 |
+
"<|vision_start|>": 151652
|
| 28 |
+
}
|
text_tokenizer/chat_template.jinja
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- if tools %}
|
| 2 |
+
{{- '<|im_start|>system\n' }}
|
| 3 |
+
{%- if messages[0].role == 'system' %}
|
| 4 |
+
{%- if messages[0].content is string %}
|
| 5 |
+
{{- messages[0].content }}
|
| 6 |
+
{%- else %}
|
| 7 |
+
{%- for content in messages[0].content %}
|
| 8 |
+
{%- if 'text' in content %}
|
| 9 |
+
{{- content.text }}
|
| 10 |
+
{%- endif %}
|
| 11 |
+
{%- endfor %}
|
| 12 |
+
{%- endif %}
|
| 13 |
+
{{- '\n\n' }}
|
| 14 |
+
{%- endif %}
|
| 15 |
+
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
|
| 16 |
+
{%- for tool in tools %}
|
| 17 |
+
{{- "\n" }}
|
| 18 |
+
{{- tool | tojson }}
|
| 19 |
+
{%- endfor %}
|
| 20 |
+
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
|
| 21 |
+
{%- else %}
|
| 22 |
+
{%- if messages[0].role == 'system' %}
|
| 23 |
+
{{- '<|im_start|>system\n' }}
|
| 24 |
+
{%- if messages[0].content is string %}
|
| 25 |
+
{{- messages[0].content }}
|
| 26 |
+
{%- else %}
|
| 27 |
+
{%- for content in messages[0].content %}
|
| 28 |
+
{%- if 'text' in content %}
|
| 29 |
+
{{- content.text }}
|
| 30 |
+
{%- endif %}
|
| 31 |
+
{%- endfor %}
|
| 32 |
+
{%- endif %}
|
| 33 |
+
{{- '<|im_end|>\n' }}
|
| 34 |
+
{%- endif %}
|
| 35 |
+
{%- endif %}
|
| 36 |
+
{%- set image_count = namespace(value=0) %}
|
| 37 |
+
{%- set video_count = namespace(value=0) %}
|
| 38 |
+
{%- for message in messages %}
|
| 39 |
+
{%- if message.role == "user" %}
|
| 40 |
+
{{- '<|im_start|>' + message.role + '\n' }}
|
| 41 |
+
{%- if message.content is string %}
|
| 42 |
+
{{- message.content }}
|
| 43 |
+
{%- else %}
|
| 44 |
+
{%- for content in message.content %}
|
| 45 |
+
{%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
|
| 46 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 47 |
+
{%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
|
| 48 |
+
<|vision_start|><|image_pad|><|vision_end|>
|
| 49 |
+
{%- elif content.type == 'video' or 'video' in content %}
|
| 50 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 51 |
+
{%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
|
| 52 |
+
<|vision_start|><|video_pad|><|vision_end|>
|
| 53 |
+
{%- elif 'text' in content %}
|
| 54 |
+
{{- content.text }}
|
| 55 |
+
{%- endif %}
|
| 56 |
+
{%- endfor %}
|
| 57 |
+
{%- endif %}
|
| 58 |
+
{{- '<|im_end|>\n' }}
|
| 59 |
+
{%- elif message.role == "assistant" %}
|
| 60 |
+
{{- '<|im_start|>' + message.role + '\n' }}
|
| 61 |
+
{%- if message.content is string %}
|
| 62 |
+
{{- message.content }}
|
| 63 |
+
{%- else %}
|
| 64 |
+
{%- for content_item in message.content %}
|
| 65 |
+
{%- if 'text' in content_item %}
|
| 66 |
+
{{- content_item.text }}
|
| 67 |
+
{%- endif %}
|
| 68 |
+
{%- endfor %}
|
| 69 |
+
{%- endif %}
|
| 70 |
+
{%- if message.tool_calls %}
|
| 71 |
+
{%- for tool_call in message.tool_calls %}
|
| 72 |
+
{%- if (loop.first and message.content) or (not loop.first) %}
|
| 73 |
+
{{- '\n' }}
|
| 74 |
+
{%- endif %}
|
| 75 |
+
{%- if tool_call.function %}
|
| 76 |
+
{%- set tool_call = tool_call.function %}
|
| 77 |
+
{%- endif %}
|
| 78 |
+
{{- '<tool_call>\n{"name": "' }}
|
| 79 |
+
{{- tool_call.name }}
|
| 80 |
+
{{- '", "arguments": ' }}
|
| 81 |
+
{%- if tool_call.arguments is string %}
|
| 82 |
+
{{- tool_call.arguments }}
|
| 83 |
+
{%- else %}
|
| 84 |
+
{{- tool_call.arguments | tojson }}
|
| 85 |
+
{%- endif %}
|
| 86 |
+
{{- '}\n</tool_call>' }}
|
| 87 |
+
{%- endfor %}
|
| 88 |
+
{%- endif %}
|
| 89 |
+
{{- '<|im_end|>\n' }}
|
| 90 |
+
{%- elif message.role == "tool" %}
|
| 91 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 92 |
+
{{- '<|im_start|>user' }}
|
| 93 |
+
{%- endif %}
|
| 94 |
+
{{- '\n<tool_response>\n' }}
|
| 95 |
+
{%- if message.content is string %}
|
| 96 |
+
{{- message.content }}
|
| 97 |
+
{%- else %}
|
| 98 |
+
{%- for content in message.content %}
|
| 99 |
+
{%- if content.type == 'image' or 'image' in content or 'image_url' in content %}
|
| 100 |
+
{%- set image_count.value = image_count.value + 1 %}
|
| 101 |
+
{%- if add_vision_id %}Picture {{ image_count.value }}: {% endif -%}
|
| 102 |
+
<|vision_start|><|image_pad|><|vision_end|>
|
| 103 |
+
{%- elif content.type == 'video' or 'video' in content %}
|
| 104 |
+
{%- set video_count.value = video_count.value + 1 %}
|
| 105 |
+
{%- if add_vision_id %}Video {{ video_count.value }}: {% endif -%}
|
| 106 |
+
<|vision_start|><|video_pad|><|vision_end|>
|
| 107 |
+
{%- elif 'text' in content %}
|
| 108 |
+
{{- content.text }}
|
| 109 |
+
{%- endif %}
|
| 110 |
+
{%- endfor %}
|
| 111 |
+
{%- endif %}
|
| 112 |
+
{{- '\n</tool_response>' }}
|
| 113 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
|
| 114 |
+
{{- '<|im_end|>\n' }}
|
| 115 |
+
{%- endif %}
|
| 116 |
+
{%- endif %}
|
| 117 |
+
{%- endfor %}
|
| 118 |
+
{%- if add_generation_prompt %}
|
| 119 |
+
{{- '<|im_start|>assistant\n' }}
|
| 120 |
+
{%- endif %}
|
text_tokenizer/merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
text_tokenizer/special_tokens_map.json
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"additional_special_tokens": [
|
| 3 |
+
"<|im_start|>",
|
| 4 |
+
"<|im_end|>",
|
| 5 |
+
"<|object_ref_start|>",
|
| 6 |
+
"<|object_ref_end|>",
|
| 7 |
+
"<|box_start|>",
|
| 8 |
+
"<|box_end|>",
|
| 9 |
+
"<|quad_start|>",
|
| 10 |
+
"<|quad_end|>",
|
| 11 |
+
"<|vision_start|>",
|
| 12 |
+
"<|vision_end|>",
|
| 13 |
+
"<|vision_pad|>",
|
| 14 |
+
"<|image_pad|>",
|
| 15 |
+
"<|video_pad|>"
|
| 16 |
+
],
|
| 17 |
+
"eos_token": {
|
| 18 |
+
"content": "<|im_end|>",
|
| 19 |
+
"lstrip": false,
|
| 20 |
+
"normalized": false,
|
| 21 |
+
"rstrip": false,
|
| 22 |
+
"single_word": false
|
| 23 |
+
},
|
| 24 |
+
"pad_token": {
|
| 25 |
+
"content": "<|endoftext|>",
|
| 26 |
+
"lstrip": false,
|
| 27 |
+
"normalized": false,
|
| 28 |
+
"rstrip": false,
|
| 29 |
+
"single_word": false
|
| 30 |
+
}
|
| 31 |
+
}
|
text_tokenizer/tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
|
| 3 |
+
size 11422654
|
text_tokenizer/tokenizer_config.json
ADDED
|
@@ -0,0 +1,239 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_bos_token": false,
|
| 3 |
+
"add_prefix_space": false,
|
| 4 |
+
"added_tokens_decoder": {
|
| 5 |
+
"151643": {
|
| 6 |
+
"content": "<|endoftext|>",
|
| 7 |
+
"lstrip": false,
|
| 8 |
+
"normalized": false,
|
| 9 |
+
"rstrip": false,
|
| 10 |
+
"single_word": false,
|
| 11 |
+
"special": true
|
| 12 |
+
},
|
| 13 |
+
"151644": {
|
| 14 |
+
"content": "<|im_start|>",
|
| 15 |
+
"lstrip": false,
|
| 16 |
+
"normalized": false,
|
| 17 |
+
"rstrip": false,
|
| 18 |
+
"single_word": false,
|
| 19 |
+
"special": true
|
| 20 |
+
},
|
| 21 |
+
"151645": {
|
| 22 |
+
"content": "<|im_end|>",
|
| 23 |
+
"lstrip": false,
|
| 24 |
+
"normalized": false,
|
| 25 |
+
"rstrip": false,
|
| 26 |
+
"single_word": false,
|
| 27 |
+
"special": true
|
| 28 |
+
},
|
| 29 |
+
"151646": {
|
| 30 |
+
"content": "<|object_ref_start|>",
|
| 31 |
+
"lstrip": false,
|
| 32 |
+
"normalized": false,
|
| 33 |
+
"rstrip": false,
|
| 34 |
+
"single_word": false,
|
| 35 |
+
"special": true
|
| 36 |
+
},
|
| 37 |
+
"151647": {
|
| 38 |
+
"content": "<|object_ref_end|>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false,
|
| 43 |
+
"special": true
|
| 44 |
+
},
|
| 45 |
+
"151648": {
|
| 46 |
+
"content": "<|box_start|>",
|
| 47 |
+
"lstrip": false,
|
| 48 |
+
"normalized": false,
|
| 49 |
+
"rstrip": false,
|
| 50 |
+
"single_word": false,
|
| 51 |
+
"special": true
|
| 52 |
+
},
|
| 53 |
+
"151649": {
|
| 54 |
+
"content": "<|box_end|>",
|
| 55 |
+
"lstrip": false,
|
| 56 |
+
"normalized": false,
|
| 57 |
+
"rstrip": false,
|
| 58 |
+
"single_word": false,
|
| 59 |
+
"special": true
|
| 60 |
+
},
|
| 61 |
+
"151650": {
|
| 62 |
+
"content": "<|quad_start|>",
|
| 63 |
+
"lstrip": false,
|
| 64 |
+
"normalized": false,
|
| 65 |
+
"rstrip": false,
|
| 66 |
+
"single_word": false,
|
| 67 |
+
"special": true
|
| 68 |
+
},
|
| 69 |
+
"151651": {
|
| 70 |
+
"content": "<|quad_end|>",
|
| 71 |
+
"lstrip": false,
|
| 72 |
+
"normalized": false,
|
| 73 |
+
"rstrip": false,
|
| 74 |
+
"single_word": false,
|
| 75 |
+
"special": true
|
| 76 |
+
},
|
| 77 |
+
"151652": {
|
| 78 |
+
"content": "<|vision_start|>",
|
| 79 |
+
"lstrip": false,
|
| 80 |
+
"normalized": false,
|
| 81 |
+
"rstrip": false,
|
| 82 |
+
"single_word": false,
|
| 83 |
+
"special": true
|
| 84 |
+
},
|
| 85 |
+
"151653": {
|
| 86 |
+
"content": "<|vision_end|>",
|
| 87 |
+
"lstrip": false,
|
| 88 |
+
"normalized": false,
|
| 89 |
+
"rstrip": false,
|
| 90 |
+
"single_word": false,
|
| 91 |
+
"special": true
|
| 92 |
+
},
|
| 93 |
+
"151654": {
|
| 94 |
+
"content": "<|vision_pad|>",
|
| 95 |
+
"lstrip": false,
|
| 96 |
+
"normalized": false,
|
| 97 |
+
"rstrip": false,
|
| 98 |
+
"single_word": false,
|
| 99 |
+
"special": true
|
| 100 |
+
},
|
| 101 |
+
"151655": {
|
| 102 |
+
"content": "<|image_pad|>",
|
| 103 |
+
"lstrip": false,
|
| 104 |
+
"normalized": false,
|
| 105 |
+
"rstrip": false,
|
| 106 |
+
"single_word": false,
|
| 107 |
+
"special": true
|
| 108 |
+
},
|
| 109 |
+
"151656": {
|
| 110 |
+
"content": "<|video_pad|>",
|
| 111 |
+
"lstrip": false,
|
| 112 |
+
"normalized": false,
|
| 113 |
+
"rstrip": false,
|
| 114 |
+
"single_word": false,
|
| 115 |
+
"special": true
|
| 116 |
+
},
|
| 117 |
+
"151657": {
|
| 118 |
+
"content": "<tool_call>",
|
| 119 |
+
"lstrip": false,
|
| 120 |
+
"normalized": false,
|
| 121 |
+
"rstrip": false,
|
| 122 |
+
"single_word": false,
|
| 123 |
+
"special": false
|
| 124 |
+
},
|
| 125 |
+
"151658": {
|
| 126 |
+
"content": "</tool_call>",
|
| 127 |
+
"lstrip": false,
|
| 128 |
+
"normalized": false,
|
| 129 |
+
"rstrip": false,
|
| 130 |
+
"single_word": false,
|
| 131 |
+
"special": false
|
| 132 |
+
},
|
| 133 |
+
"151659": {
|
| 134 |
+
"content": "<|fim_prefix|>",
|
| 135 |
+
"lstrip": false,
|
| 136 |
+
"normalized": false,
|
| 137 |
+
"rstrip": false,
|
| 138 |
+
"single_word": false,
|
| 139 |
+
"special": false
|
| 140 |
+
},
|
| 141 |
+
"151660": {
|
| 142 |
+
"content": "<|fim_middle|>",
|
| 143 |
+
"lstrip": false,
|
| 144 |
+
"normalized": false,
|
| 145 |
+
"rstrip": false,
|
| 146 |
+
"single_word": false,
|
| 147 |
+
"special": false
|
| 148 |
+
},
|
| 149 |
+
"151661": {
|
| 150 |
+
"content": "<|fim_suffix|>",
|
| 151 |
+
"lstrip": false,
|
| 152 |
+
"normalized": false,
|
| 153 |
+
"rstrip": false,
|
| 154 |
+
"single_word": false,
|
| 155 |
+
"special": false
|
| 156 |
+
},
|
| 157 |
+
"151662": {
|
| 158 |
+
"content": "<|fim_pad|>",
|
| 159 |
+
"lstrip": false,
|
| 160 |
+
"normalized": false,
|
| 161 |
+
"rstrip": false,
|
| 162 |
+
"single_word": false,
|
| 163 |
+
"special": false
|
| 164 |
+
},
|
| 165 |
+
"151663": {
|
| 166 |
+
"content": "<|repo_name|>",
|
| 167 |
+
"lstrip": false,
|
| 168 |
+
"normalized": false,
|
| 169 |
+
"rstrip": false,
|
| 170 |
+
"single_word": false,
|
| 171 |
+
"special": false
|
| 172 |
+
},
|
| 173 |
+
"151664": {
|
| 174 |
+
"content": "<|file_sep|>",
|
| 175 |
+
"lstrip": false,
|
| 176 |
+
"normalized": false,
|
| 177 |
+
"rstrip": false,
|
| 178 |
+
"single_word": false,
|
| 179 |
+
"special": false
|
| 180 |
+
},
|
| 181 |
+
"151665": {
|
| 182 |
+
"content": "<tool_response>",
|
| 183 |
+
"lstrip": false,
|
| 184 |
+
"normalized": false,
|
| 185 |
+
"rstrip": false,
|
| 186 |
+
"single_word": false,
|
| 187 |
+
"special": false
|
| 188 |
+
},
|
| 189 |
+
"151666": {
|
| 190 |
+
"content": "</tool_response>",
|
| 191 |
+
"lstrip": false,
|
| 192 |
+
"normalized": false,
|
| 193 |
+
"rstrip": false,
|
| 194 |
+
"single_word": false,
|
| 195 |
+
"special": false
|
| 196 |
+
},
|
| 197 |
+
"151667": {
|
| 198 |
+
"content": "<think>",
|
| 199 |
+
"lstrip": false,
|
| 200 |
+
"normalized": false,
|
| 201 |
+
"rstrip": false,
|
| 202 |
+
"single_word": false,
|
| 203 |
+
"special": false
|
| 204 |
+
},
|
| 205 |
+
"151668": {
|
| 206 |
+
"content": "</think>",
|
| 207 |
+
"lstrip": false,
|
| 208 |
+
"normalized": false,
|
| 209 |
+
"rstrip": false,
|
| 210 |
+
"single_word": false,
|
| 211 |
+
"special": false
|
| 212 |
+
}
|
| 213 |
+
},
|
| 214 |
+
"additional_special_tokens": [
|
| 215 |
+
"<|im_start|>",
|
| 216 |
+
"<|im_end|>",
|
| 217 |
+
"<|object_ref_start|>",
|
| 218 |
+
"<|object_ref_end|>",
|
| 219 |
+
"<|box_start|>",
|
| 220 |
+
"<|box_end|>",
|
| 221 |
+
"<|quad_start|>",
|
| 222 |
+
"<|quad_end|>",
|
| 223 |
+
"<|vision_start|>",
|
| 224 |
+
"<|vision_end|>",
|
| 225 |
+
"<|vision_pad|>",
|
| 226 |
+
"<|image_pad|>",
|
| 227 |
+
"<|video_pad|>"
|
| 228 |
+
],
|
| 229 |
+
"bos_token": null,
|
| 230 |
+
"clean_up_tokenization_spaces": false,
|
| 231 |
+
"eos_token": "<|im_end|>",
|
| 232 |
+
"errors": "replace",
|
| 233 |
+
"extra_special_tokens": {},
|
| 234 |
+
"model_max_length": 262144,
|
| 235 |
+
"pad_token": "<|endoftext|>",
|
| 236 |
+
"split_special_tokens": false,
|
| 237 |
+
"tokenizer_class": "Qwen2Tokenizer",
|
| 238 |
+
"unk_token": null
|
| 239 |
+
}
|
text_tokenizer/vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
transformer/mlx_quant_config.json
ADDED
|
@@ -0,0 +1,474 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"group_size": 64,
|
| 3 |
+
"bits": 4,
|
| 4 |
+
"quantized": [
|
| 5 |
+
"layers.0.mlp.down_proj.weight",
|
| 6 |
+
"layers.0.mlp.gate_proj.weight",
|
| 7 |
+
"layers.0.mlp.up_proj.weight",
|
| 8 |
+
"layers.0.mlp_moe_gen.down_proj.weight",
|
| 9 |
+
"layers.0.mlp_moe_gen.gate_proj.weight",
|
| 10 |
+
"layers.0.mlp_moe_gen.up_proj.weight",
|
| 11 |
+
"layers.0.self_attn.add_k_proj.weight",
|
| 12 |
+
"layers.0.self_attn.add_q_proj.weight",
|
| 13 |
+
"layers.0.self_attn.add_v_proj.weight",
|
| 14 |
+
"layers.0.self_attn.to_k.weight",
|
| 15 |
+
"layers.0.self_attn.to_out.weight",
|
| 16 |
+
"layers.0.self_attn.to_q.weight",
|
| 17 |
+
"layers.0.self_attn.to_v.weight",
|
| 18 |
+
"layers.1.mlp.down_proj.weight",
|
| 19 |
+
"layers.1.mlp.gate_proj.weight",
|
| 20 |
+
"layers.1.mlp.up_proj.weight",
|
| 21 |
+
"layers.1.mlp_moe_gen.down_proj.weight",
|
| 22 |
+
"layers.1.mlp_moe_gen.gate_proj.weight",
|
| 23 |
+
"layers.1.mlp_moe_gen.up_proj.weight",
|
| 24 |
+
"layers.1.self_attn.add_k_proj.weight",
|
| 25 |
+
"layers.1.self_attn.add_q_proj.weight",
|
| 26 |
+
"layers.1.self_attn.add_v_proj.weight",
|
| 27 |
+
"layers.1.self_attn.to_k.weight",
|
| 28 |
+
"layers.1.self_attn.to_out.weight",
|
| 29 |
+
"layers.1.self_attn.to_q.weight",
|
| 30 |
+
"layers.1.self_attn.to_v.weight",
|
| 31 |
+
"layers.2.mlp.down_proj.weight",
|
| 32 |
+
"layers.2.mlp.gate_proj.weight",
|
| 33 |
+
"layers.2.mlp.up_proj.weight",
|
| 34 |
+
"layers.2.mlp_moe_gen.down_proj.weight",
|
| 35 |
+
"layers.2.mlp_moe_gen.gate_proj.weight",
|
| 36 |
+
"layers.2.mlp_moe_gen.up_proj.weight",
|
| 37 |
+
"layers.2.self_attn.add_k_proj.weight",
|
| 38 |
+
"layers.2.self_attn.add_q_proj.weight",
|
| 39 |
+
"layers.2.self_attn.add_v_proj.weight",
|
| 40 |
+
"layers.2.self_attn.to_k.weight",
|
| 41 |
+
"layers.2.self_attn.to_out.weight",
|
| 42 |
+
"layers.2.self_attn.to_q.weight",
|
| 43 |
+
"layers.2.self_attn.to_v.weight",
|
| 44 |
+
"layers.3.mlp.down_proj.weight",
|
| 45 |
+
"layers.3.mlp.gate_proj.weight",
|
| 46 |
+
"layers.3.mlp.up_proj.weight",
|
| 47 |
+
"layers.3.mlp_moe_gen.down_proj.weight",
|
| 48 |
+
"layers.3.mlp_moe_gen.gate_proj.weight",
|
| 49 |
+
"layers.3.mlp_moe_gen.up_proj.weight",
|
| 50 |
+
"layers.3.self_attn.add_k_proj.weight",
|
| 51 |
+
"layers.3.self_attn.add_q_proj.weight",
|
| 52 |
+
"layers.3.self_attn.add_v_proj.weight",
|
| 53 |
+
"layers.3.self_attn.to_k.weight",
|
| 54 |
+
"layers.3.self_attn.to_out.weight",
|
| 55 |
+
"layers.3.self_attn.to_q.weight",
|
| 56 |
+
"layers.3.self_attn.to_v.weight",
|
| 57 |
+
"layers.4.mlp.down_proj.weight",
|
| 58 |
+
"layers.4.mlp.gate_proj.weight",
|
| 59 |
+
"layers.4.mlp.up_proj.weight",
|
| 60 |
+
"layers.4.mlp_moe_gen.gate_proj.weight",
|
| 61 |
+
"layers.4.self_attn.add_k_proj.weight",
|
| 62 |
+
"layers.4.self_attn.add_q_proj.weight",
|
| 63 |
+
"layers.4.self_attn.add_v_proj.weight",
|
| 64 |
+
"layers.4.self_attn.to_k.weight",
|
| 65 |
+
"layers.4.self_attn.to_out.weight",
|
| 66 |
+
"layers.4.self_attn.to_q.weight",
|
| 67 |
+
"layers.4.self_attn.to_v.weight",
|
| 68 |
+
"layers.10.mlp.down_proj.weight",
|
| 69 |
+
"layers.10.mlp.gate_proj.weight",
|
| 70 |
+
"layers.10.mlp.up_proj.weight",
|
| 71 |
+
"layers.10.mlp_moe_gen.down_proj.weight",
|
| 72 |
+
"layers.10.mlp_moe_gen.gate_proj.weight",
|
| 73 |
+
"layers.10.mlp_moe_gen.up_proj.weight",
|
| 74 |
+
"layers.10.self_attn.add_k_proj.weight",
|
| 75 |
+
"layers.10.self_attn.add_q_proj.weight",
|
| 76 |
+
"layers.10.self_attn.add_v_proj.weight",
|
| 77 |
+
"layers.10.self_attn.to_k.weight",
|
| 78 |
+
"layers.10.self_attn.to_out.weight",
|
| 79 |
+
"layers.10.self_attn.to_q.weight",
|
| 80 |
+
"layers.10.self_attn.to_v.weight",
|
| 81 |
+
"layers.11.self_attn.add_k_proj.weight",
|
| 82 |
+
"layers.11.self_attn.add_q_proj.weight",
|
| 83 |
+
"layers.11.self_attn.add_v_proj.weight",
|
| 84 |
+
"layers.11.self_attn.to_k.weight",
|
| 85 |
+
"layers.11.self_attn.to_out.weight",
|
| 86 |
+
"layers.11.self_attn.to_q.weight",
|
| 87 |
+
"layers.11.self_attn.to_v.weight",
|
| 88 |
+
"layers.4.mlp_moe_gen.down_proj.weight",
|
| 89 |
+
"layers.4.mlp_moe_gen.up_proj.weight",
|
| 90 |
+
"layers.5.mlp.down_proj.weight",
|
| 91 |
+
"layers.5.mlp.gate_proj.weight",
|
| 92 |
+
"layers.5.mlp.up_proj.weight",
|
| 93 |
+
"layers.5.mlp_moe_gen.down_proj.weight",
|
| 94 |
+
"layers.5.mlp_moe_gen.gate_proj.weight",
|
| 95 |
+
"layers.5.mlp_moe_gen.up_proj.weight",
|
| 96 |
+
"layers.5.self_attn.add_k_proj.weight",
|
| 97 |
+
"layers.5.self_attn.add_q_proj.weight",
|
| 98 |
+
"layers.5.self_attn.add_v_proj.weight",
|
| 99 |
+
"layers.5.self_attn.to_k.weight",
|
| 100 |
+
"layers.5.self_attn.to_out.weight",
|
| 101 |
+
"layers.5.self_attn.to_q.weight",
|
| 102 |
+
"layers.5.self_attn.to_v.weight",
|
| 103 |
+
"layers.6.mlp.down_proj.weight",
|
| 104 |
+
"layers.6.mlp.gate_proj.weight",
|
| 105 |
+
"layers.6.mlp.up_proj.weight",
|
| 106 |
+
"layers.6.mlp_moe_gen.down_proj.weight",
|
| 107 |
+
"layers.6.mlp_moe_gen.gate_proj.weight",
|
| 108 |
+
"layers.6.mlp_moe_gen.up_proj.weight",
|
| 109 |
+
"layers.6.self_attn.add_k_proj.weight",
|
| 110 |
+
"layers.6.self_attn.add_q_proj.weight",
|
| 111 |
+
"layers.6.self_attn.add_v_proj.weight",
|
| 112 |
+
"layers.6.self_attn.to_k.weight",
|
| 113 |
+
"layers.6.self_attn.to_out.weight",
|
| 114 |
+
"layers.6.self_attn.to_q.weight",
|
| 115 |
+
"layers.6.self_attn.to_v.weight",
|
| 116 |
+
"layers.7.mlp.down_proj.weight",
|
| 117 |
+
"layers.7.mlp.gate_proj.weight",
|
| 118 |
+
"layers.7.mlp.up_proj.weight",
|
| 119 |
+
"layers.7.mlp_moe_gen.down_proj.weight",
|
| 120 |
+
"layers.7.mlp_moe_gen.gate_proj.weight",
|
| 121 |
+
"layers.7.mlp_moe_gen.up_proj.weight",
|
| 122 |
+
"layers.7.self_attn.add_k_proj.weight",
|
| 123 |
+
"layers.7.self_attn.add_q_proj.weight",
|
| 124 |
+
"layers.7.self_attn.add_v_proj.weight",
|
| 125 |
+
"layers.7.self_attn.to_k.weight",
|
| 126 |
+
"layers.7.self_attn.to_out.weight",
|
| 127 |
+
"layers.7.self_attn.to_q.weight",
|
| 128 |
+
"layers.7.self_attn.to_v.weight",
|
| 129 |
+
"layers.8.mlp.down_proj.weight",
|
| 130 |
+
"layers.8.mlp.gate_proj.weight",
|
| 131 |
+
"layers.8.mlp.up_proj.weight",
|
| 132 |
+
"layers.8.mlp_moe_gen.down_proj.weight",
|
| 133 |
+
"layers.8.mlp_moe_gen.gate_proj.weight",
|
| 134 |
+
"layers.8.mlp_moe_gen.up_proj.weight",
|
| 135 |
+
"layers.8.self_attn.add_k_proj.weight",
|
| 136 |
+
"layers.8.self_attn.add_q_proj.weight",
|
| 137 |
+
"layers.8.self_attn.add_v_proj.weight",
|
| 138 |
+
"layers.8.self_attn.to_k.weight",
|
| 139 |
+
"layers.8.self_attn.to_out.weight",
|
| 140 |
+
"layers.8.self_attn.to_q.weight",
|
| 141 |
+
"layers.8.self_attn.to_v.weight",
|
| 142 |
+
"layers.9.mlp.down_proj.weight",
|
| 143 |
+
"layers.9.mlp.gate_proj.weight",
|
| 144 |
+
"layers.9.mlp.up_proj.weight",
|
| 145 |
+
"layers.9.mlp_moe_gen.down_proj.weight",
|
| 146 |
+
"layers.9.mlp_moe_gen.gate_proj.weight",
|
| 147 |
+
"layers.9.mlp_moe_gen.up_proj.weight",
|
| 148 |
+
"layers.9.self_attn.add_k_proj.weight",
|
| 149 |
+
"layers.9.self_attn.add_q_proj.weight",
|
| 150 |
+
"layers.9.self_attn.add_v_proj.weight",
|
| 151 |
+
"layers.9.self_attn.to_k.weight",
|
| 152 |
+
"layers.9.self_attn.to_out.weight",
|
| 153 |
+
"layers.9.self_attn.to_q.weight",
|
| 154 |
+
"layers.9.self_attn.to_v.weight",
|
| 155 |
+
"layers.11.mlp.down_proj.weight",
|
| 156 |
+
"layers.11.mlp.gate_proj.weight",
|
| 157 |
+
"layers.11.mlp.up_proj.weight",
|
| 158 |
+
"layers.11.mlp_moe_gen.down_proj.weight",
|
| 159 |
+
"layers.11.mlp_moe_gen.gate_proj.weight",
|
| 160 |
+
"layers.11.mlp_moe_gen.up_proj.weight",
|
| 161 |
+
"layers.12.mlp.down_proj.weight",
|
| 162 |
+
"layers.12.mlp.gate_proj.weight",
|
| 163 |
+
"layers.12.mlp.up_proj.weight",
|
| 164 |
+
"layers.12.mlp_moe_gen.down_proj.weight",
|
| 165 |
+
"layers.12.mlp_moe_gen.gate_proj.weight",
|
| 166 |
+
"layers.12.mlp_moe_gen.up_proj.weight",
|
| 167 |
+
"layers.12.self_attn.add_k_proj.weight",
|
| 168 |
+
"layers.12.self_attn.add_q_proj.weight",
|
| 169 |
+
"layers.12.self_attn.add_v_proj.weight",
|
| 170 |
+
"layers.12.self_attn.to_k.weight",
|
| 171 |
+
"layers.12.self_attn.to_out.weight",
|
| 172 |
+
"layers.12.self_attn.to_q.weight",
|
| 173 |
+
"layers.12.self_attn.to_v.weight",
|
| 174 |
+
"layers.13.mlp.down_proj.weight",
|
| 175 |
+
"layers.13.mlp.gate_proj.weight",
|
| 176 |
+
"layers.13.mlp.up_proj.weight",
|
| 177 |
+
"layers.13.mlp_moe_gen.down_proj.weight",
|
| 178 |
+
"layers.13.mlp_moe_gen.gate_proj.weight",
|
| 179 |
+
"layers.13.mlp_moe_gen.up_proj.weight",
|
| 180 |
+
"layers.13.self_attn.add_k_proj.weight",
|
| 181 |
+
"layers.13.self_attn.add_q_proj.weight",
|
| 182 |
+
"layers.13.self_attn.add_v_proj.weight",
|
| 183 |
+
"layers.13.self_attn.to_k.weight",
|
| 184 |
+
"layers.13.self_attn.to_out.weight",
|
| 185 |
+
"layers.13.self_attn.to_q.weight",
|
| 186 |
+
"layers.13.self_attn.to_v.weight",
|
| 187 |
+
"layers.14.mlp.down_proj.weight",
|
| 188 |
+
"layers.14.mlp.gate_proj.weight",
|
| 189 |
+
"layers.14.mlp.up_proj.weight",
|
| 190 |
+
"layers.14.mlp_moe_gen.down_proj.weight",
|
| 191 |
+
"layers.14.mlp_moe_gen.gate_proj.weight",
|
| 192 |
+
"layers.14.mlp_moe_gen.up_proj.weight",
|
| 193 |
+
"layers.14.self_attn.add_k_proj.weight",
|
| 194 |
+
"layers.14.self_attn.add_q_proj.weight",
|
| 195 |
+
"layers.14.self_attn.add_v_proj.weight",
|
| 196 |
+
"layers.14.self_attn.to_k.weight",
|
| 197 |
+
"layers.14.self_attn.to_out.weight",
|
| 198 |
+
"layers.14.self_attn.to_q.weight",
|
| 199 |
+
"layers.14.self_attn.to_v.weight",
|
| 200 |
+
"layers.15.mlp.down_proj.weight",
|
| 201 |
+
"layers.15.mlp.gate_proj.weight",
|
| 202 |
+
"layers.15.mlp.up_proj.weight",
|
| 203 |
+
"layers.15.mlp_moe_gen.down_proj.weight",
|
| 204 |
+
"layers.15.mlp_moe_gen.gate_proj.weight",
|
| 205 |
+
"layers.15.mlp_moe_gen.up_proj.weight",
|
| 206 |
+
"layers.15.self_attn.add_k_proj.weight",
|
| 207 |
+
"layers.15.self_attn.add_q_proj.weight",
|
| 208 |
+
"layers.15.self_attn.add_v_proj.weight",
|
| 209 |
+
"layers.15.self_attn.to_k.weight",
|
| 210 |
+
"layers.15.self_attn.to_out.weight",
|
| 211 |
+
"layers.15.self_attn.to_q.weight",
|
| 212 |
+
"layers.15.self_attn.to_v.weight",
|
| 213 |
+
"layers.16.mlp.down_proj.weight",
|
| 214 |
+
"layers.16.mlp.gate_proj.weight",
|
| 215 |
+
"layers.16.mlp.up_proj.weight",
|
| 216 |
+
"layers.16.mlp_moe_gen.down_proj.weight",
|
| 217 |
+
"layers.16.mlp_moe_gen.gate_proj.weight",
|
| 218 |
+
"layers.16.mlp_moe_gen.up_proj.weight",
|
| 219 |
+
"layers.16.self_attn.add_k_proj.weight",
|
| 220 |
+
"layers.16.self_attn.add_q_proj.weight",
|
| 221 |
+
"layers.16.self_attn.add_v_proj.weight",
|
| 222 |
+
"layers.16.self_attn.to_k.weight",
|
| 223 |
+
"layers.16.self_attn.to_out.weight",
|
| 224 |
+
"layers.16.self_attn.to_q.weight",
|
| 225 |
+
"layers.16.self_attn.to_v.weight",
|
| 226 |
+
"layers.17.mlp.down_proj.weight",
|
| 227 |
+
"layers.17.mlp.gate_proj.weight",
|
| 228 |
+
"layers.17.mlp.up_proj.weight",
|
| 229 |
+
"layers.17.self_attn.add_k_proj.weight",
|
| 230 |
+
"layers.17.self_attn.add_q_proj.weight",
|
| 231 |
+
"layers.17.self_attn.add_v_proj.weight",
|
| 232 |
+
"layers.17.self_attn.to_k.weight",
|
| 233 |
+
"layers.17.self_attn.to_out.weight",
|
| 234 |
+
"layers.17.self_attn.to_q.weight",
|
| 235 |
+
"layers.17.self_attn.to_v.weight",
|
| 236 |
+
"layers.17.mlp_moe_gen.down_proj.weight",
|
| 237 |
+
"layers.17.mlp_moe_gen.gate_proj.weight",
|
| 238 |
+
"layers.17.mlp_moe_gen.up_proj.weight",
|
| 239 |
+
"layers.18.mlp.down_proj.weight",
|
| 240 |
+
"layers.18.mlp.gate_proj.weight",
|
| 241 |
+
"layers.18.mlp.up_proj.weight",
|
| 242 |
+
"layers.18.mlp_moe_gen.down_proj.weight",
|
| 243 |
+
"layers.18.mlp_moe_gen.gate_proj.weight",
|
| 244 |
+
"layers.18.mlp_moe_gen.up_proj.weight",
|
| 245 |
+
"layers.18.self_attn.add_k_proj.weight",
|
| 246 |
+
"layers.18.self_attn.add_q_proj.weight",
|
| 247 |
+
"layers.18.self_attn.add_v_proj.weight",
|
| 248 |
+
"layers.18.self_attn.to_k.weight",
|
| 249 |
+
"layers.18.self_attn.to_out.weight",
|
| 250 |
+
"layers.18.self_attn.to_q.weight",
|
| 251 |
+
"layers.18.self_attn.to_v.weight",
|
| 252 |
+
"layers.19.mlp.down_proj.weight",
|
| 253 |
+
"layers.19.mlp.gate_proj.weight",
|
| 254 |
+
"layers.19.mlp.up_proj.weight",
|
| 255 |
+
"layers.19.mlp_moe_gen.down_proj.weight",
|
| 256 |
+
"layers.19.mlp_moe_gen.gate_proj.weight",
|
| 257 |
+
"layers.19.mlp_moe_gen.up_proj.weight",
|
| 258 |
+
"layers.19.self_attn.add_k_proj.weight",
|
| 259 |
+
"layers.19.self_attn.add_q_proj.weight",
|
| 260 |
+
"layers.19.self_attn.add_v_proj.weight",
|
| 261 |
+
"layers.19.self_attn.to_k.weight",
|
| 262 |
+
"layers.19.self_attn.to_out.weight",
|
| 263 |
+
"layers.19.self_attn.to_q.weight",
|
| 264 |
+
"layers.19.self_attn.to_v.weight",
|
| 265 |
+
"layers.20.mlp.down_proj.weight",
|
| 266 |
+
"layers.20.mlp.gate_proj.weight",
|
| 267 |
+
"layers.20.mlp.up_proj.weight",
|
| 268 |
+
"layers.20.mlp_moe_gen.down_proj.weight",
|
| 269 |
+
"layers.20.mlp_moe_gen.gate_proj.weight",
|
| 270 |
+
"layers.20.mlp_moe_gen.up_proj.weight",
|
| 271 |
+
"layers.20.self_attn.add_k_proj.weight",
|
| 272 |
+
"layers.20.self_attn.add_q_proj.weight",
|
| 273 |
+
"layers.20.self_attn.add_v_proj.weight",
|
| 274 |
+
"layers.20.self_attn.to_k.weight",
|
| 275 |
+
"layers.20.self_attn.to_out.weight",
|
| 276 |
+
"layers.20.self_attn.to_q.weight",
|
| 277 |
+
"layers.20.self_attn.to_v.weight",
|
| 278 |
+
"layers.21.mlp.down_proj.weight",
|
| 279 |
+
"layers.21.mlp.gate_proj.weight",
|
| 280 |
+
"layers.21.mlp.up_proj.weight",
|
| 281 |
+
"layers.21.mlp_moe_gen.down_proj.weight",
|
| 282 |
+
"layers.21.mlp_moe_gen.gate_proj.weight",
|
| 283 |
+
"layers.21.mlp_moe_gen.up_proj.weight",
|
| 284 |
+
"layers.21.self_attn.add_k_proj.weight",
|
| 285 |
+
"layers.21.self_attn.add_q_proj.weight",
|
| 286 |
+
"layers.21.self_attn.add_v_proj.weight",
|
| 287 |
+
"layers.21.self_attn.to_k.weight",
|
| 288 |
+
"layers.21.self_attn.to_out.weight",
|
| 289 |
+
"layers.21.self_attn.to_q.weight",
|
| 290 |
+
"layers.21.self_attn.to_v.weight",
|
| 291 |
+
"layers.22.mlp.down_proj.weight",
|
| 292 |
+
"layers.22.mlp.gate_proj.weight",
|
| 293 |
+
"layers.22.mlp.up_proj.weight",
|
| 294 |
+
"layers.22.mlp_moe_gen.down_proj.weight",
|
| 295 |
+
"layers.22.mlp_moe_gen.gate_proj.weight",
|
| 296 |
+
"layers.22.mlp_moe_gen.up_proj.weight",
|
| 297 |
+
"layers.22.self_attn.add_k_proj.weight",
|
| 298 |
+
"layers.22.self_attn.add_q_proj.weight",
|
| 299 |
+
"layers.22.self_attn.add_v_proj.weight",
|
| 300 |
+
"layers.22.self_attn.to_k.weight",
|
| 301 |
+
"layers.22.self_attn.to_out.weight",
|
| 302 |
+
"layers.22.self_attn.to_q.weight",
|
| 303 |
+
"layers.22.self_attn.to_v.weight",
|
| 304 |
+
"layers.23.mlp.down_proj.weight",
|
| 305 |
+
"layers.23.mlp.gate_proj.weight",
|
| 306 |
+
"layers.23.mlp.up_proj.weight",
|
| 307 |
+
"layers.23.mlp_moe_gen.down_proj.weight",
|
| 308 |
+
"layers.23.mlp_moe_gen.gate_proj.weight",
|
| 309 |
+
"layers.23.mlp_moe_gen.up_proj.weight",
|
| 310 |
+
"layers.23.self_attn.add_k_proj.weight",
|
| 311 |
+
"layers.23.self_attn.add_q_proj.weight",
|
| 312 |
+
"layers.23.self_attn.add_v_proj.weight",
|
| 313 |
+
"layers.23.self_attn.to_k.weight",
|
| 314 |
+
"layers.23.self_attn.to_out.weight",
|
| 315 |
+
"layers.23.self_attn.to_q.weight",
|
| 316 |
+
"layers.23.self_attn.to_v.weight",
|
| 317 |
+
"layers.24.self_attn.to_k.weight",
|
| 318 |
+
"layers.24.self_attn.to_q.weight",
|
| 319 |
+
"layers.24.self_attn.to_v.weight",
|
| 320 |
+
"layers.24.mlp.down_proj.weight",
|
| 321 |
+
"layers.24.mlp.gate_proj.weight",
|
| 322 |
+
"layers.24.mlp.up_proj.weight",
|
| 323 |
+
"layers.24.mlp_moe_gen.down_proj.weight",
|
| 324 |
+
"layers.24.mlp_moe_gen.gate_proj.weight",
|
| 325 |
+
"layers.24.mlp_moe_gen.up_proj.weight",
|
| 326 |
+
"layers.24.self_attn.add_k_proj.weight",
|
| 327 |
+
"layers.24.self_attn.add_q_proj.weight",
|
| 328 |
+
"layers.24.self_attn.add_v_proj.weight",
|
| 329 |
+
"layers.24.self_attn.to_out.weight",
|
| 330 |
+
"layers.25.mlp.down_proj.weight",
|
| 331 |
+
"layers.25.mlp.gate_proj.weight",
|
| 332 |
+
"layers.25.mlp.up_proj.weight",
|
| 333 |
+
"layers.25.mlp_moe_gen.down_proj.weight",
|
| 334 |
+
"layers.25.mlp_moe_gen.gate_proj.weight",
|
| 335 |
+
"layers.25.mlp_moe_gen.up_proj.weight",
|
| 336 |
+
"layers.25.self_attn.add_k_proj.weight",
|
| 337 |
+
"layers.25.self_attn.add_q_proj.weight",
|
| 338 |
+
"layers.25.self_attn.add_v_proj.weight",
|
| 339 |
+
"layers.25.self_attn.to_k.weight",
|
| 340 |
+
"layers.25.self_attn.to_out.weight",
|
| 341 |
+
"layers.25.self_attn.to_q.weight",
|
| 342 |
+
"layers.25.self_attn.to_v.weight",
|
| 343 |
+
"layers.26.mlp.down_proj.weight",
|
| 344 |
+
"layers.26.mlp.gate_proj.weight",
|
| 345 |
+
"layers.26.mlp.up_proj.weight",
|
| 346 |
+
"layers.26.mlp_moe_gen.down_proj.weight",
|
| 347 |
+
"layers.26.mlp_moe_gen.gate_proj.weight",
|
| 348 |
+
"layers.26.mlp_moe_gen.up_proj.weight",
|
| 349 |
+
"layers.26.self_attn.add_k_proj.weight",
|
| 350 |
+
"layers.26.self_attn.add_q_proj.weight",
|
| 351 |
+
"layers.26.self_attn.add_v_proj.weight",
|
| 352 |
+
"layers.26.self_attn.to_k.weight",
|
| 353 |
+
"layers.26.self_attn.to_out.weight",
|
| 354 |
+
"layers.26.self_attn.to_q.weight",
|
| 355 |
+
"layers.26.self_attn.to_v.weight",
|
| 356 |
+
"layers.27.mlp.down_proj.weight",
|
| 357 |
+
"layers.27.mlp.gate_proj.weight",
|
| 358 |
+
"layers.27.mlp.up_proj.weight",
|
| 359 |
+
"layers.27.mlp_moe_gen.down_proj.weight",
|
| 360 |
+
"layers.27.mlp_moe_gen.gate_proj.weight",
|
| 361 |
+
"layers.27.mlp_moe_gen.up_proj.weight",
|
| 362 |
+
"layers.27.self_attn.add_k_proj.weight",
|
| 363 |
+
"layers.27.self_attn.add_q_proj.weight",
|
| 364 |
+
"layers.27.self_attn.add_v_proj.weight",
|
| 365 |
+
"layers.27.self_attn.to_k.weight",
|
| 366 |
+
"layers.27.self_attn.to_out.weight",
|
| 367 |
+
"layers.27.self_attn.to_q.weight",
|
| 368 |
+
"layers.27.self_attn.to_v.weight",
|
| 369 |
+
"layers.28.mlp.down_proj.weight",
|
| 370 |
+
"layers.28.mlp.gate_proj.weight",
|
| 371 |
+
"layers.28.mlp.up_proj.weight",
|
| 372 |
+
"layers.28.mlp_moe_gen.down_proj.weight",
|
| 373 |
+
"layers.28.mlp_moe_gen.gate_proj.weight",
|
| 374 |
+
"layers.28.mlp_moe_gen.up_proj.weight",
|
| 375 |
+
"layers.28.self_attn.add_k_proj.weight",
|
| 376 |
+
"layers.28.self_attn.add_q_proj.weight",
|
| 377 |
+
"layers.28.self_attn.add_v_proj.weight",
|
| 378 |
+
"layers.28.self_attn.to_k.weight",
|
| 379 |
+
"layers.28.self_attn.to_out.weight",
|
| 380 |
+
"layers.28.self_attn.to_q.weight",
|
| 381 |
+
"layers.28.self_attn.to_v.weight",
|
| 382 |
+
"layers.29.mlp.down_proj.weight",
|
| 383 |
+
"layers.29.mlp.gate_proj.weight",
|
| 384 |
+
"layers.29.mlp.up_proj.weight",
|
| 385 |
+
"layers.29.mlp_moe_gen.down_proj.weight",
|
| 386 |
+
"layers.29.mlp_moe_gen.gate_proj.weight",
|
| 387 |
+
"layers.29.mlp_moe_gen.up_proj.weight",
|
| 388 |
+
"layers.29.self_attn.add_k_proj.weight",
|
| 389 |
+
"layers.29.self_attn.add_q_proj.weight",
|
| 390 |
+
"layers.29.self_attn.add_v_proj.weight",
|
| 391 |
+
"layers.29.self_attn.to_k.weight",
|
| 392 |
+
"layers.29.self_attn.to_out.weight",
|
| 393 |
+
"layers.29.self_attn.to_q.weight",
|
| 394 |
+
"layers.29.self_attn.to_v.weight",
|
| 395 |
+
"layers.30.mlp.gate_proj.weight",
|
| 396 |
+
"layers.30.mlp.up_proj.weight",
|
| 397 |
+
"layers.30.self_attn.add_k_proj.weight",
|
| 398 |
+
"layers.30.self_attn.add_q_proj.weight",
|
| 399 |
+
"layers.30.self_attn.add_v_proj.weight",
|
| 400 |
+
"layers.30.self_attn.to_k.weight",
|
| 401 |
+
"layers.30.self_attn.to_out.weight",
|
| 402 |
+
"layers.30.self_attn.to_q.weight",
|
| 403 |
+
"layers.30.self_attn.to_v.weight",
|
| 404 |
+
"layers.30.mlp.down_proj.weight",
|
| 405 |
+
"layers.30.mlp_moe_gen.down_proj.weight",
|
| 406 |
+
"layers.30.mlp_moe_gen.gate_proj.weight",
|
| 407 |
+
"layers.30.mlp_moe_gen.up_proj.weight",
|
| 408 |
+
"layers.31.mlp.down_proj.weight",
|
| 409 |
+
"layers.31.mlp.gate_proj.weight",
|
| 410 |
+
"layers.31.mlp.up_proj.weight",
|
| 411 |
+
"layers.31.mlp_moe_gen.down_proj.weight",
|
| 412 |
+
"layers.31.mlp_moe_gen.gate_proj.weight",
|
| 413 |
+
"layers.31.mlp_moe_gen.up_proj.weight",
|
| 414 |
+
"layers.31.self_attn.add_k_proj.weight",
|
| 415 |
+
"layers.31.self_attn.add_q_proj.weight",
|
| 416 |
+
"layers.31.self_attn.add_v_proj.weight",
|
| 417 |
+
"layers.31.self_attn.to_k.weight",
|
| 418 |
+
"layers.31.self_attn.to_out.weight",
|
| 419 |
+
"layers.31.self_attn.to_q.weight",
|
| 420 |
+
"layers.31.self_attn.to_v.weight",
|
| 421 |
+
"layers.32.mlp.down_proj.weight",
|
| 422 |
+
"layers.32.mlp.gate_proj.weight",
|
| 423 |
+
"layers.32.mlp.up_proj.weight",
|
| 424 |
+
"layers.32.mlp_moe_gen.down_proj.weight",
|
| 425 |
+
"layers.32.mlp_moe_gen.gate_proj.weight",
|
| 426 |
+
"layers.32.mlp_moe_gen.up_proj.weight",
|
| 427 |
+
"layers.32.self_attn.add_k_proj.weight",
|
| 428 |
+
"layers.32.self_attn.add_q_proj.weight",
|
| 429 |
+
"layers.32.self_attn.add_v_proj.weight",
|
| 430 |
+
"layers.32.self_attn.to_k.weight",
|
| 431 |
+
"layers.32.self_attn.to_out.weight",
|
| 432 |
+
"layers.32.self_attn.to_q.weight",
|
| 433 |
+
"layers.32.self_attn.to_v.weight",
|
| 434 |
+
"layers.33.mlp.down_proj.weight",
|
| 435 |
+
"layers.33.mlp.gate_proj.weight",
|
| 436 |
+
"layers.33.mlp.up_proj.weight",
|
| 437 |
+
"layers.33.mlp_moe_gen.down_proj.weight",
|
| 438 |
+
"layers.33.mlp_moe_gen.gate_proj.weight",
|
| 439 |
+
"layers.33.mlp_moe_gen.up_proj.weight",
|
| 440 |
+
"layers.33.self_attn.add_k_proj.weight",
|
| 441 |
+
"layers.33.self_attn.add_q_proj.weight",
|
| 442 |
+
"layers.33.self_attn.add_v_proj.weight",
|
| 443 |
+
"layers.33.self_attn.to_k.weight",
|
| 444 |
+
"layers.33.self_attn.to_out.weight",
|
| 445 |
+
"layers.33.self_attn.to_q.weight",
|
| 446 |
+
"layers.33.self_attn.to_v.weight",
|
| 447 |
+
"layers.34.mlp.down_proj.weight",
|
| 448 |
+
"layers.34.mlp.gate_proj.weight",
|
| 449 |
+
"layers.34.mlp.up_proj.weight",
|
| 450 |
+
"layers.34.mlp_moe_gen.down_proj.weight",
|
| 451 |
+
"layers.34.mlp_moe_gen.gate_proj.weight",
|
| 452 |
+
"layers.34.mlp_moe_gen.up_proj.weight",
|
| 453 |
+
"layers.34.self_attn.add_k_proj.weight",
|
| 454 |
+
"layers.34.self_attn.add_q_proj.weight",
|
| 455 |
+
"layers.34.self_attn.add_v_proj.weight",
|
| 456 |
+
"layers.34.self_attn.to_k.weight",
|
| 457 |
+
"layers.34.self_attn.to_out.weight",
|
| 458 |
+
"layers.34.self_attn.to_q.weight",
|
| 459 |
+
"layers.34.self_attn.to_v.weight",
|
| 460 |
+
"layers.35.mlp.down_proj.weight",
|
| 461 |
+
"layers.35.mlp.gate_proj.weight",
|
| 462 |
+
"layers.35.mlp.up_proj.weight",
|
| 463 |
+
"layers.35.mlp_moe_gen.down_proj.weight",
|
| 464 |
+
"layers.35.mlp_moe_gen.gate_proj.weight",
|
| 465 |
+
"layers.35.mlp_moe_gen.up_proj.weight",
|
| 466 |
+
"layers.35.self_attn.add_k_proj.weight",
|
| 467 |
+
"layers.35.self_attn.add_q_proj.weight",
|
| 468 |
+
"layers.35.self_attn.add_v_proj.weight",
|
| 469 |
+
"layers.35.self_attn.to_k.weight",
|
| 470 |
+
"layers.35.self_attn.to_out.weight",
|
| 471 |
+
"layers.35.self_attn.to_q.weight",
|
| 472 |
+
"layers.35.self_attn.to_v.weight"
|
| 473 |
+
]
|
| 474 |
+
}
|
transformer/model-00001-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:33aa7326bc74dba9d1420041eb3b0f6e051befac05082053c80f8ecb1c22f90d
|
| 3 |
+
size 2503129397
|
transformer/model-00002-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc369f9576c0cadfe691be1399ef580a5d34f9130fd9ec4d2cc765ac43220389
|
| 3 |
+
size 1724131896
|
transformer/model-00003-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:79e4356afb93a0a5e01d1844b2d7f6ae1e9250555d0ac4a2bfda1ae152959448
|
| 3 |
+
size 1680055054
|
transformer/model-00004-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:765416dc6fa5c34504b587c8a5da28789d9e62822d275519eb7dde3dbb53bfff
|
| 3 |
+
size 1695818020
|
transformer/model-00005-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fa8c7cfe7c709e5ac217e56959d45f2813d49a54f0e4d56459055c1da782049a
|
| 3 |
+
size 1708369224
|
transformer/model-00006-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:29d5b08926ee17dc43270744a77926b929482e9f19b8ba1206b1e18babc073e2
|
| 3 |
+
size 1447282090
|
transformer/model-00007-of-00007.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f36f39ad47fd8b0cb2b43d7117cd9d03784d108fb77c9a13474da6184aa0bf08
|
| 3 |
+
size 1318361139
|
vae/config.json
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_class_name": "AutoencoderKLWan",
|
| 3 |
+
"_diffusers_version": "0.37.1",
|
| 4 |
+
"_name_or_path": "Wan-AI/Wan2.2-TI2V-5B-Diffusers",
|
| 5 |
+
"attn_scales": [],
|
| 6 |
+
"base_dim": 160,
|
| 7 |
+
"clip_output": false,
|
| 8 |
+
"decoder_base_dim": 256,
|
| 9 |
+
"dim_mult": [
|
| 10 |
+
1,
|
| 11 |
+
2,
|
| 12 |
+
4,
|
| 13 |
+
4
|
| 14 |
+
],
|
| 15 |
+
"dropout": 0.0,
|
| 16 |
+
"in_channels": 12,
|
| 17 |
+
"is_residual": true,
|
| 18 |
+
"latents_mean": [
|
| 19 |
+
-0.2289,
|
| 20 |
+
-0.0052,
|
| 21 |
+
-0.1323,
|
| 22 |
+
-0.2339,
|
| 23 |
+
-0.2799,
|
| 24 |
+
0.0174,
|
| 25 |
+
0.1838,
|
| 26 |
+
0.1557,
|
| 27 |
+
-0.1382,
|
| 28 |
+
0.0542,
|
| 29 |
+
0.2813,
|
| 30 |
+
0.0891,
|
| 31 |
+
0.157,
|
| 32 |
+
-0.0098,
|
| 33 |
+
0.0375,
|
| 34 |
+
-0.1825,
|
| 35 |
+
-0.2246,
|
| 36 |
+
-0.1207,
|
| 37 |
+
-0.0698,
|
| 38 |
+
0.5109,
|
| 39 |
+
0.2665,
|
| 40 |
+
-0.2108,
|
| 41 |
+
-0.2158,
|
| 42 |
+
0.2502,
|
| 43 |
+
-0.2055,
|
| 44 |
+
-0.0322,
|
| 45 |
+
0.1109,
|
| 46 |
+
0.1567,
|
| 47 |
+
-0.0729,
|
| 48 |
+
0.0899,
|
| 49 |
+
-0.2799,
|
| 50 |
+
-0.123,
|
| 51 |
+
-0.0313,
|
| 52 |
+
-0.1649,
|
| 53 |
+
0.0117,
|
| 54 |
+
0.0723,
|
| 55 |
+
-0.2839,
|
| 56 |
+
-0.2083,
|
| 57 |
+
-0.052,
|
| 58 |
+
0.3748,
|
| 59 |
+
0.0152,
|
| 60 |
+
0.1957,
|
| 61 |
+
0.1433,
|
| 62 |
+
-0.2944,
|
| 63 |
+
0.3573,
|
| 64 |
+
-0.0548,
|
| 65 |
+
-0.1681,
|
| 66 |
+
-0.0667
|
| 67 |
+
],
|
| 68 |
+
"latents_std": [
|
| 69 |
+
0.4765,
|
| 70 |
+
1.0364,
|
| 71 |
+
0.4514,
|
| 72 |
+
1.1677,
|
| 73 |
+
0.5313,
|
| 74 |
+
0.499,
|
| 75 |
+
0.4818,
|
| 76 |
+
0.5013,
|
| 77 |
+
0.8158,
|
| 78 |
+
1.0344,
|
| 79 |
+
0.5894,
|
| 80 |
+
1.0901,
|
| 81 |
+
0.6885,
|
| 82 |
+
0.6165,
|
| 83 |
+
0.8454,
|
| 84 |
+
0.4978,
|
| 85 |
+
0.5759,
|
| 86 |
+
0.3523,
|
| 87 |
+
0.7135,
|
| 88 |
+
0.6804,
|
| 89 |
+
0.5833,
|
| 90 |
+
1.4146,
|
| 91 |
+
0.8986,
|
| 92 |
+
0.5659,
|
| 93 |
+
0.7069,
|
| 94 |
+
0.5338,
|
| 95 |
+
0.4889,
|
| 96 |
+
0.4917,
|
| 97 |
+
0.4069,
|
| 98 |
+
0.4999,
|
| 99 |
+
0.6866,
|
| 100 |
+
0.4093,
|
| 101 |
+
0.5709,
|
| 102 |
+
0.6065,
|
| 103 |
+
0.6415,
|
| 104 |
+
0.4944,
|
| 105 |
+
0.5726,
|
| 106 |
+
1.2042,
|
| 107 |
+
0.5458,
|
| 108 |
+
1.6887,
|
| 109 |
+
0.3971,
|
| 110 |
+
1.06,
|
| 111 |
+
0.3943,
|
| 112 |
+
0.5537,
|
| 113 |
+
0.5444,
|
| 114 |
+
0.4089,
|
| 115 |
+
0.7468,
|
| 116 |
+
0.7744
|
| 117 |
+
],
|
| 118 |
+
"num_res_blocks": 2,
|
| 119 |
+
"out_channels": 12,
|
| 120 |
+
"patch_size": 2,
|
| 121 |
+
"scale_factor_spatial": 16,
|
| 122 |
+
"scale_factor_temporal": 4,
|
| 123 |
+
"temperal_downsample": [
|
| 124 |
+
false,
|
| 125 |
+
true,
|
| 126 |
+
true
|
| 127 |
+
],
|
| 128 |
+
"z_dim": 48
|
| 129 |
+
}
|
vae/diffusion_pytorch_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:230496cb59ff85bc9c040487737c4062480cb61c71e697b197b4c30142f2a0da
|
| 3 |
+
size 1409400600
|