Add files using upload-large-folder tool

Browse files

Files changed (9) hide show

README.md +214 -0
README_zh.md +199 -0
cola_dlm/cola_dit/config.json +21 -0
cola_dlm/cola_dit/model-00001-of-00002.safetensors +3 -0
cola_dlm/cola_dit/model-00002-of-00002.safetensors +3 -0
cola_dlm/cola_dit/model.safetensors.index.json +477 -0
cola_dlm/cola_vae/config.json +42 -0
cola_dlm/cola_vae/model.safetensors +3 -0
tokenizer.json +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,217 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - text-generation
+  - language-model
+  - diffusion
+  - latent-diffusion
+  - flow-matching
+  - text-vae
+  - pytorch
+  - transformers
+  - research
 ---
+# Cola DLM
+[English](README.md) · [中文](README_zh.md)
+**Cola DLM** (`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.
+This model repository contains the HuggingFace-format checkpoint for the paper **Continuous Latent Diffusion Language Model**.
+## Links
+- **Model repository:** <https://huggingface.co/ByteDance-Seed/Cola-DLM>
+- **GitHub repository:** <https://github.com/ByteDance-Seed/Cola-DLM>
+- **Paper:** <https://arxiv.org/abs/2605.06548>
+- **HuggingFace Daily Paper:** <https://huggingface.co/papers/2605.06548>
+- **Project page:** <https://hongcanguo.github.io/Cola-DLM/>
+- **Blog post:** <https://hongcanguo.github.io/posts/2026-cola-dlm.html>
+- **Zhihu article:** <https://zhuanlan.zhihu.com/p/2038324180920313704>
+## Model Files
+The expected repository layout is:
+```text
+.
+├── cola_dlm/
+│   ├── cola_dit/
+│   │   ├── config.json
+│   │   └── model.safetensors*
+│   └── cola_vae/
+│       ├── config.json
+│       └── model.safetensors*
+├── tokenizer.json
+├── README.md
+└── README_zh.md
+```
+The checkpoint consists of two cooperating modules:
+- `ColaDiTModel`: a block-causal 1-D Diffusion Transformer prior over continuous text latents.
+- `ColaTextVAEModel`: a Text VAE encoder and conditional decoder for text-to-latent and latent-to-text mapping.
+## Quickstart
+Install the Cola DLM code package from the [GitHub repository](https://github.com/ByteDance-Seed/Cola-DLM), then install the download helper:
+```bash
+git clone https://github.com/ByteDance-Seed/Cola-DLM.git
+cd Cola-DLM
+pip install -e .
+pip install huggingface_hub
+```
+Download the model files:
+```bash
+huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
+```
+Run a minimal Python example:
+```python
+import torch
+from tokenizers import Tokenizer
+from cola_dlm import (
+    ColaDiTModel,
+    ColaTextVAEModel,
+    generate_task_repaint_inference,
+)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
+vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
+tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
+prompts = [{"question": "Question: What is the capital of France? Answer:"}]
+results = generate_task_repaint_inference(
+    dit=dit,
+    vae=vae,
+    tokenizer=tokenizer,
+    prompts=prompts,
+    task_name="lambada",
+    device=device,
+    max_new_tokens=32,
+    temperature=0.0,
+    guidance_scale=7.0,
+    timestep_num=16,
+    pad_token_id=100277,
+)
+print(results[0]["generate"])
+```
+## OpenAI-Compatible Serving
+The companion `openai_adapter/` service in the Cola DLM code release exposes this model through an OpenAI-compatible Chat Completions endpoint:
+```text
+POST /v1/chat/completions
+```
+Install the adapter dependencies from the code repository root:
+```bash
+pip install -e .
+pip install -r openai_adapter/requirements.txt
+```
+Start the service:
+```bash
+export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
+export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
+export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
+export COLA_MODEL_NAME=cola-dlm
+export COLA_API_KEY=change-me
+uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
+```
+Then send a request:
+```bash
+curl http://127.0.0.1:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer change-me" \
+  -d '{
+    "model": "cola-dlm",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Question: What is the capital of France? Answer:"
+      }
+    ],
+    "temperature": 0,
+    "max_tokens": 32,
+    "stream": false
+  }'
+```
+The adapter currently supports non-streaming completions.
+## Model Details
+- **Architecture:** Text VAE + block-causal DiT latent prior.
+- **Training objective:** two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
+- **Training-compute checkpoint:** the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve.
+- **Tokenizer:** OLMo 2 tokenizer with a 100,278-entry vocabulary.
+- **Special token ids:** `pad_token_id=100277`, `eos_token_id=100257`, `im_end_token_id=100265`.
+- **Framework:** PyTorch 2.1+ and HuggingFace Transformers 4.40+.
+- **License:** Apache License 2.0.
+## Evaluation
+Reference zero-shot benchmark results from the open-source inference implementation:
+| Task | Accuracy (%) |
+| --- | ---: |
+| LAMBADA | 50.80 |
+| MMLU | 19.30 |
+| OBQA | 23.00 |
+| HellaSwag | 10.70 |
+| RACE | 19.60 |
+| SIQA | 28.90 |
+| SQuAD | 30.90 |
+| Story Cloze | 30.77 |
+| **Tasks Average** | **26.75** |
+The open-source HuggingFace Transformers implementation may differ slightly from the internal implementation used in the paper, so per-task numbers can fluctuate slightly. The overall trend is consistent with the paper.
+## Intended Use
+Cola DLM is intended primarily for research on hierarchical latent-variable language models, continuous latent diffusion for text, Flow Matching priors, and benchmark-style text generation.
+This checkpoint is **not instruction-tuned** and has not gone through RLHF. It should not be treated as a production chatbot or used for safety-critical decision making.
+## Limitations
+- The model was trained primarily on English text; other languages are not well evaluated.
+- Outputs may contain factual errors, offensive content, bias, or hallucinations.
+- Generation quality can be sensitive to prompt format and prompt length. QA-style prompts such as `"Question: ... Answer:"` are recommended for quick evaluation.
+- The model uses mutable KV caches during generation; service implementations should serialize generation inside one process unless cache handling is explicitly isolated.
+## Citation
+If you use Cola DLM in your work, please cite:
+```bibtex
+@article{guo2026cola,
+  title   = {Continuous Latent Diffusion Language Model},
+  author  = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
+             Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
+             Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
+  journal = {arXiv preprint arXiv:2605.06548},
+  year    = {2026},
+  url     = {https://arxiv.org/abs/2605.06548},
+}
+```

README_zh.md ADDED Viewed

	@@ -0,0 +1,199 @@

+# Cola DLM
+[English](README.md) · [中文](README_zh.md)
+**Cola DLM**（`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel，连续隐空间扩散语言模型）是一个层次化连续隐空间扩散语言模型。它由 Text VAE 与分块因果 Diffusion Transformer（DiT）先验组成：VAE 负责在文本与连续隐变量序列之间建立映射，并将隐变量解码回 token；DiT 则通过 Flow Matching 在隐空间中进行先验传输。
+本模型仓库包含论文 **Continuous Latent Diffusion Language Model** 对应的 HuggingFace 格式 checkpoint。
+## 相关链接
+- **模型仓库**：<https://huggingface.co/ByteDance-Seed/Cola-DLM>
+- **GitHub 仓库**：<https://github.com/ByteDance-Seed/Cola-DLM>
+- **论文**：<https://arxiv.org/abs/2605.06548>
+- **HuggingFace Daily Paper**：<https://huggingface.co/papers/2605.06548>
+- **项目主页**：<https://hongcanguo.github.io/Cola-DLM/>
+- **博客解读**：<https://hongcanguo.github.io/posts/2026-cola-dlm.html>
+- **知乎文章**：<https://zhuanlan.zhihu.com/p/2038324180920313704>
+## 模型文件
+预期的模型仓库结构如下：
+```text
+.
+├── cola_dlm/
+│   ├── cola_dit/
+│   │   ├── config.json
+│   │   └── model.safetensors*
+│   └── cola_vae/
+│       ├── config.json
+│       └── model.safetensors*
+├── tokenizer.json
+├── README.md
+└── README_zh.md
+```
+checkpoint 由两个协同模块组成：
+- `ColaDiTModel`：面向连续文本隐变量的分块因果 1-D Diffusion Transformer 先验。
+- `ColaTextVAEModel`：Text VAE 编码器与条件解码器，负责文本到隐变量、隐变量到文本的映射。
+## 快速开始
+请先从 [GitHub 仓库](https://github.com/ByteDance-Seed/Cola-DLM) 安装 Cola DLM 代码包，然后安装下载辅助工具：
+```bash
+git clone https://github.com/ByteDance-Seed/Cola-DLM.git
+cd Cola-DLM
+pip install -e .
+pip install huggingface_hub
+```
+下载模型文件：
+```bash
+huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
+```
+最小 Python 调用示例：
+```python
+import torch
+from tokenizers import Tokenizer
+from cola_dlm import (
+    ColaDiTModel,
+    ColaTextVAEModel,
+    generate_task_repaint_inference,
+)
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
+vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
+tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
+prompts = [{"question": "Question: What is the capital of France? Answer:"}]
+results = generate_task_repaint_inference(
+    dit=dit,
+    vae=vae,
+    tokenizer=tokenizer,
+    prompts=prompts,
+    task_name="lambada",
+    device=device,
+    max_new_tokens=32,
+    temperature=0.0,
+    guidance_scale=7.0,
+    timestep_num=16,
+    pad_token_id=100277,
+)
+print(results[0]["generate"])
+```
+## OpenAI 兼容服务部署
+Cola DLM 代码仓库中的 `openai_adapter/` 服务可以通过 OpenAI 兼容的 Chat Completions 接口暴露本模型：
+```text
+POST /v1/chat/completions
+```
+在源码仓库根目录安装 adapter 依赖：
+```bash
+pip install -e .
+pip install -r openai_adapter/requirements.txt
+```
+启动服务：
+```bash
+export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
+export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
+export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
+export COLA_MODEL_NAME=cola-dlm
+export COLA_API_KEY=change-me
+uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
+```
+发送请求：
+```bash
+curl http://127.0.0.1:8000/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer change-me" \
+  -d '{
+    "model": "cola-dlm",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Question: What is the capital of France? Answer:"
+      }
+    ],
+    "temperature": 0,
+    "max_tokens": 32,
+    "stream": false
+  }'
+```
+当前 adapter 支持非流式生成。
+## 模型细节
+- **模型结构**：Text VAE + 分块因果 DiT 隐先验。
+- **训练目标**：两阶段训练，先进行 Text VAE 预训练，再通过 Flow Matching 联合训练 Text VAE 与 DiT。
+- **训练量节点**：开源权重对应论文 RQ4 scaling 曲线中的 2000 EFLOPs checkpoint。
+- **Tokenizer**：OLMo 2 tokenizer，词表大小为 100,278。
+- **特殊 token id**：`pad_token_id=100277`，`eos_token_id=100257`，`im_end_token_id=100265`。
+- **框架**：PyTorch 2.1+ 与 HuggingFace Transformers 4.40+。
+- **许可证**：Apache License 2.0。
+## 评测
+基于开源推理实现的零样本参考结果如下：
+| 任务 | 准确率（%） |
+| --- | ---: |
+| LAMBADA | 50.80 |
+| MMLU | 19.30 |
+| OBQA | 23.00 |
+| HellaSwag | 10.70 |
+| RACE | 19.60 |
+| SIQA | 28.90 |
+| SQuAD | 30.90 |
+| Story Cloze | 30.77 |
+| **Tasks Average** | **26.75** |
+开源 HuggingFace Transformers 实现与论文��使用的内部实现存在细微差异，因此各任务数值可能有小幅波动，但整体趋势与论文一致。
+## 预期用途
+Cola DLM 主要面向层次化隐变量语言模型、连续隐空间文本扩散、Flow Matching 先验以及 benchmark 风格文本生成等研究场景。
+该 checkpoint **没有经过指令微调**，也没有经过 RLHF；不应被视为生产级聊天机器人，也不应用于安全关键决策场景。
+## 局限性
+- 模型主要基于英文文本训练；其他语言能力尚未充分评估。
+- 输出可能包含事实错误、冒犯性内容、偏见或幻觉。
+- 生成质量对 prompt 格式和长度较敏感。快速评测时建议使用 `"Question: ... Answer:"` 这类 QA 风格 prompt。
+- 推理时会使用可变 KV cache；服务实现中建议在单进程内串行执行生成，除非显式隔离 cache 状态。
+## 引用
+如果 Cola DLM 对你的工作有帮助，请引用：
+```bibtex
+@article{guo2026cola,
+  title   = {Continuous Latent Diffusion Language Model},
+  author  = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
+             Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
+             Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
+  journal = {arXiv preprint arXiv:2605.06548},
+  year    = {2026},
+  url     = {https://arxiv.org/abs/2605.06548},
+}
+```

cola_dlm/cola_dit/config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "architectures": [
+    "ColaDiTModel"
+  ],
+  "block_size": 16,
+  "emb_dim": 2048,
+  "expand_ratio": 4,
+  "head_dim": 128,
+  "heads": 16,
+  "model_type": "cola_dit",
+  "norm_eps": 1e-05,
+  "num_layers": 24,
+  "patch_size": 1,
+  "qk_bias": false,
+  "rope_dim": 96,
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.1",
+  "txt_dim": 2048,
+  "txt_in_channels": 16,
+  "txt_out_channels": 16
+}

cola_dlm/cola_dit/model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e4f4f2823d47553db823036cb80c5c5fec3f1a8769f17b4c2904f69b1bfe94e
+size 4936274728

cola_dlm/cola_dit/model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa0ceadba148ab3d13b89ae6c6906db2ece5884784839fb5433dcdb8354f955f
+size 2383280512

cola_dlm/cola_dit/model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,477 @@

+{
+  "metadata": {
+    "total_size": 7319507520
+  },
+  "weight_map": {
+    "blocks.0.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.0.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.1.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.10.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.11.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.12.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.13.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.14.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.15.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.16.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.16.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.16.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.16.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.17.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.17.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.18.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.19.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.2.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.2.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.20.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.20.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.21.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.22.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.norm_k.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.norm_k.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.norm_q.bias": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.norm_q.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.proj_out.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
+    "blocks.23.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
+    "blocks.3.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.3.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.4.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.5.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.6.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.7.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.8.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.norm_k.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.norm_k.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.norm_q.bias": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.norm_q.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.proj_out.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
+    "blocks.9.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
+    "emb_in.proj_hid.bias": "model-00001-of-00002.safetensors",
+    "emb_in.proj_hid.weight": "model-00001-of-00002.safetensors",
+    "emb_in.proj_in.bias": "model-00001-of-00002.safetensors",
+    "emb_in.proj_in.weight": "model-00001-of-00002.safetensors",
+    "emb_in.proj_out.bias": "model-00001-of-00002.safetensors",
+    "emb_in.proj_out.weight": "model-00001-of-00002.safetensors",
+    "txt_in.proj.bias": "model-00001-of-00002.safetensors",
+    "txt_in.proj.weight": "model-00001-of-00002.safetensors",
+    "txt_out.proj.bias": "model-00002-of-00002.safetensors",
+    "txt_out.proj.weight": "model-00002-of-00002.safetensors",
+    "txt_out_ada.out_in.1.bias": "model-00002-of-00002.safetensors",
+    "txt_out_ada.out_in.1.weight": "model-00002-of-00002.safetensors",
+    "txt_out_norm.bias": "model-00002-of-00002.safetensors",
+    "txt_out_norm.weight": "model-00002-of-00002.safetensors"
+  }
+}

cola_dlm/cola_vae/config.json ADDED Viewed

	@@ -0,0 +1,42 @@

+{
+  "act": "swiglu",
+  "architectures": [
+    "ColaTextVAEModel"
+  ],
+  "attn_dropout": 0.0,
+  "bias": true,
+  "block_causal": true,
+  "block_size": 1,
+  "causal": false,
+  "clip_qkv": null,
+  "decoder_num_blocks": 4,
+  "dim": 1536,
+  "dropout": 0.0,
+  "encoder_last_ln": true,
+  "encoder_num_blocks": 4,
+  "ffn_dim": 6144,
+  "init_cutoff_factor": 3,
+  "init_fn": "normal",
+  "init_std": 0.02,
+  "latent_dim": 16,
+  "layer_norm_affine": true,
+  "layer_norm_eps": 1e-06,
+  "layer_norm_type": "layer_norm",
+  "model_type": "cola_text_vae",
+  "num_heads": 12,
+  "patch_size": 1,
+  "post_norm": true,
+  "qk_bias": false,
+  "qk_norm": true,
+  "qk_norm_affine": true,
+  "rope_full_precision": true,
+  "rope_theta": 500000,
+  "scaling_factor": 1.0,
+  "shared_heads_kv": 1,
+  "shifting_factor": 0.0,
+  "torch_dtype": "float32",
+  "transformers_version": "4.46.1",
+  "use_emb": true,
+  "use_variation": true,
+  "vocab_size": 100278
+}

cola_dlm/cola_vae/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b27cf0a73ecf687d37c28d45881117960979436ed6ec908abd44330b23e991c
+size 2007500120

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff