HongcanGuo04 commited on
Commit
f47401a
·
verified ·
1 Parent(s): 513e54b

Add files using upload-large-folder tool

Browse files
README.md CHANGED
@@ -1,3 +1,217 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - text-generation
9
+ - language-model
10
+ - diffusion
11
+ - latent-diffusion
12
+ - flow-matching
13
+ - text-vae
14
+ - pytorch
15
+ - transformers
16
+ - research
17
  ---
18
+
19
+ # Cola DLM
20
+
21
+ [English](README.md) · [中文](README_zh.md)
22
+
23
+ **Cola DLM** (`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.
24
+
25
+ This model repository contains the HuggingFace-format checkpoint for the paper **Continuous Latent Diffusion Language Model**.
26
+
27
+ ## Links
28
+
29
+ - **Model repository:** <https://huggingface.co/ByteDance-Seed/Cola-DLM>
30
+ - **GitHub repository:** <https://github.com/ByteDance-Seed/Cola-DLM>
31
+ - **Paper:** <https://arxiv.org/abs/2605.06548>
32
+ - **HuggingFace Daily Paper:** <https://huggingface.co/papers/2605.06548>
33
+ - **Project page:** <https://hongcanguo.github.io/Cola-DLM/>
34
+ - **Blog post:** <https://hongcanguo.github.io/posts/2026-cola-dlm.html>
35
+ - **Zhihu article:** <https://zhuanlan.zhihu.com/p/2038324180920313704>
36
+
37
+ ## Model Files
38
+
39
+ The expected repository layout is:
40
+
41
+ ```text
42
+ .
43
+ ├── cola_dlm/
44
+ │ ├── cola_dit/
45
+ │ │ ├── config.json
46
+ │ │ └── model.safetensors*
47
+ │ └── cola_vae/
48
+ │ ├── config.json
49
+ │ └── model.safetensors*
50
+ ├── tokenizer.json
51
+ ├── README.md
52
+ └── README_zh.md
53
+ ```
54
+
55
+ The checkpoint consists of two cooperating modules:
56
+
57
+ - `ColaDiTModel`: a block-causal 1-D Diffusion Transformer prior over continuous text latents.
58
+ - `ColaTextVAEModel`: a Text VAE encoder and conditional decoder for text-to-latent and latent-to-text mapping.
59
+
60
+ ## Quickstart
61
+
62
+ Install the Cola DLM code package from the [GitHub repository](https://github.com/ByteDance-Seed/Cola-DLM), then install the download helper:
63
+
64
+ ```bash
65
+ git clone https://github.com/ByteDance-Seed/Cola-DLM.git
66
+ cd Cola-DLM
67
+ pip install -e .
68
+ pip install huggingface_hub
69
+ ```
70
+
71
+ Download the model files:
72
+
73
+ ```bash
74
+ huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
75
+ ```
76
+
77
+ Run a minimal Python example:
78
+
79
+ ```python
80
+ import torch
81
+ from tokenizers import Tokenizer
82
+
83
+ from cola_dlm import (
84
+ ColaDiTModel,
85
+ ColaTextVAEModel,
86
+ generate_task_repaint_inference,
87
+ )
88
+
89
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
90
+
91
+ dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
92
+ vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
93
+ tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
94
+
95
+ prompts = [{"question": "Question: What is the capital of France? Answer:"}]
96
+ results = generate_task_repaint_inference(
97
+ dit=dit,
98
+ vae=vae,
99
+ tokenizer=tokenizer,
100
+ prompts=prompts,
101
+ task_name="lambada",
102
+ device=device,
103
+ max_new_tokens=32,
104
+ temperature=0.0,
105
+ guidance_scale=7.0,
106
+ timestep_num=16,
107
+ pad_token_id=100277,
108
+ )
109
+
110
+ print(results[0]["generate"])
111
+ ```
112
+
113
+ ## OpenAI-Compatible Serving
114
+
115
+ The companion `openai_adapter/` service in the Cola DLM code release exposes this model through an OpenAI-compatible Chat Completions endpoint:
116
+
117
+ ```text
118
+ POST /v1/chat/completions
119
+ ```
120
+
121
+ Install the adapter dependencies from the code repository root:
122
+
123
+ ```bash
124
+ pip install -e .
125
+ pip install -r openai_adapter/requirements.txt
126
+ ```
127
+
128
+ Start the service:
129
+
130
+ ```bash
131
+ export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
132
+ export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
133
+ export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
134
+ export COLA_MODEL_NAME=cola-dlm
135
+ export COLA_API_KEY=change-me
136
+
137
+ uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
138
+ ```
139
+
140
+ Then send a request:
141
+
142
+ ```bash
143
+ curl http://127.0.0.1:8000/v1/chat/completions \
144
+ -H "Content-Type: application/json" \
145
+ -H "Authorization: Bearer change-me" \
146
+ -d '{
147
+ "model": "cola-dlm",
148
+ "messages": [
149
+ {
150
+ "role": "user",
151
+ "content": "Question: What is the capital of France? Answer:"
152
+ }
153
+ ],
154
+ "temperature": 0,
155
+ "max_tokens": 32,
156
+ "stream": false
157
+ }'
158
+ ```
159
+
160
+ The adapter currently supports non-streaming completions.
161
+
162
+ ## Model Details
163
+
164
+ - **Architecture:** Text VAE + block-causal DiT latent prior.
165
+ - **Training objective:** two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
166
+ - **Training-compute checkpoint:** the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve.
167
+ - **Tokenizer:** OLMo 2 tokenizer with a 100,278-entry vocabulary.
168
+ - **Special token ids:** `pad_token_id=100277`, `eos_token_id=100257`, `im_end_token_id=100265`.
169
+ - **Framework:** PyTorch 2.1+ and HuggingFace Transformers 4.40+.
170
+ - **License:** Apache License 2.0.
171
+
172
+ ## Evaluation
173
+
174
+ Reference zero-shot benchmark results from the open-source inference implementation:
175
+
176
+ | Task | Accuracy (%) |
177
+ | --- | ---: |
178
+ | LAMBADA | 50.80 |
179
+ | MMLU | 19.30 |
180
+ | OBQA | 23.00 |
181
+ | HellaSwag | 10.70 |
182
+ | RACE | 19.60 |
183
+ | SIQA | 28.90 |
184
+ | SQuAD | 30.90 |
185
+ | Story Cloze | 30.77 |
186
+ | **Tasks Average** | **26.75** |
187
+
188
+ The open-source HuggingFace Transformers implementation may differ slightly from the internal implementation used in the paper, so per-task numbers can fluctuate slightly. The overall trend is consistent with the paper.
189
+
190
+ ## Intended Use
191
+
192
+ Cola DLM is intended primarily for research on hierarchical latent-variable language models, continuous latent diffusion for text, Flow Matching priors, and benchmark-style text generation.
193
+
194
+ This checkpoint is **not instruction-tuned** and has not gone through RLHF. It should not be treated as a production chatbot or used for safety-critical decision making.
195
+
196
+ ## Limitations
197
+
198
+ - The model was trained primarily on English text; other languages are not well evaluated.
199
+ - Outputs may contain factual errors, offensive content, bias, or hallucinations.
200
+ - Generation quality can be sensitive to prompt format and prompt length. QA-style prompts such as `"Question: ... Answer:"` are recommended for quick evaluation.
201
+ - The model uses mutable KV caches during generation; service implementations should serialize generation inside one process unless cache handling is explicitly isolated.
202
+
203
+ ## Citation
204
+
205
+ If you use Cola DLM in your work, please cite:
206
+
207
+ ```bibtex
208
+ @article{guo2026cola,
209
+ title = {Continuous Latent Diffusion Language Model},
210
+ author = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
211
+ Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
212
+ Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
213
+ journal = {arXiv preprint arXiv:2605.06548},
214
+ year = {2026},
215
+ url = {https://arxiv.org/abs/2605.06548},
216
+ }
217
+ ```
README_zh.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cola DLM
2
+
3
+ [English](README.md) · [中文](README_zh.md)
4
+
5
+ **Cola DLM**(`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel,连续隐空间扩散语言模型)是一个层次化连续隐空间扩散语言模型。它由 Text VAE 与分块因果 Diffusion Transformer(DiT)先验组成:VAE 负责在文本与连续隐变量序列之间建立映射,并将隐变量解码回 token;DiT 则通过 Flow Matching 在隐空间中进行先验传输。
6
+
7
+ 本模型仓库包含论文 **Continuous Latent Diffusion Language Model** 对应的 HuggingFace 格式 checkpoint。
8
+
9
+ ## 相关链接
10
+
11
+ - **模型仓库**:<https://huggingface.co/ByteDance-Seed/Cola-DLM>
12
+ - **GitHub 仓库**:<https://github.com/ByteDance-Seed/Cola-DLM>
13
+ - **论文**:<https://arxiv.org/abs/2605.06548>
14
+ - **HuggingFace Daily Paper**:<https://huggingface.co/papers/2605.06548>
15
+ - **项目主页**:<https://hongcanguo.github.io/Cola-DLM/>
16
+ - **博客解读**:<https://hongcanguo.github.io/posts/2026-cola-dlm.html>
17
+ - **知乎文章**:<https://zhuanlan.zhihu.com/p/2038324180920313704>
18
+
19
+ ## 模型文件
20
+
21
+ 预期的模型仓库结构如下:
22
+
23
+ ```text
24
+ .
25
+ ├── cola_dlm/
26
+ │ ├── cola_dit/
27
+ │ │ ├── config.json
28
+ │ │ └── model.safetensors*
29
+ │ └── cola_vae/
30
+ │ ├── config.json
31
+ │ └── model.safetensors*
32
+ ├── tokenizer.json
33
+ ├── README.md
34
+ └── README_zh.md
35
+ ```
36
+
37
+ checkpoint 由两个协同模块组成:
38
+
39
+ - `ColaDiTModel`:面向连续文本隐变量的分块因果 1-D Diffusion Transformer 先验。
40
+ - `ColaTextVAEModel`:Text VAE 编码器与条件解码器,负责文本到隐变量、隐变量到文本的映射。
41
+
42
+ ## 快速开始
43
+
44
+ 请先从 [GitHub 仓库](https://github.com/ByteDance-Seed/Cola-DLM) 安装 Cola DLM 代码包,然后安装下载辅助工具:
45
+
46
+ ```bash
47
+ git clone https://github.com/ByteDance-Seed/Cola-DLM.git
48
+ cd Cola-DLM
49
+ pip install -e .
50
+ pip install huggingface_hub
51
+ ```
52
+
53
+ 下载模型文件:
54
+
55
+ ```bash
56
+ huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
57
+ ```
58
+
59
+ 最小 Python 调用示例:
60
+
61
+ ```python
62
+ import torch
63
+ from tokenizers import Tokenizer
64
+
65
+ from cola_dlm import (
66
+ ColaDiTModel,
67
+ ColaTextVAEModel,
68
+ generate_task_repaint_inference,
69
+ )
70
+
71
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
72
+
73
+ dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
74
+ vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
75
+ tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
76
+
77
+ prompts = [{"question": "Question: What is the capital of France? Answer:"}]
78
+ results = generate_task_repaint_inference(
79
+ dit=dit,
80
+ vae=vae,
81
+ tokenizer=tokenizer,
82
+ prompts=prompts,
83
+ task_name="lambada",
84
+ device=device,
85
+ max_new_tokens=32,
86
+ temperature=0.0,
87
+ guidance_scale=7.0,
88
+ timestep_num=16,
89
+ pad_token_id=100277,
90
+ )
91
+
92
+ print(results[0]["generate"])
93
+ ```
94
+
95
+ ## OpenAI 兼容服务部署
96
+
97
+ Cola DLM 代码仓库中的 `openai_adapter/` 服务可以通过 OpenAI 兼容的 Chat Completions 接口暴露本模型:
98
+
99
+ ```text
100
+ POST /v1/chat/completions
101
+ ```
102
+
103
+ 在源码仓库根目录安装 adapter 依赖:
104
+
105
+ ```bash
106
+ pip install -e .
107
+ pip install -r openai_adapter/requirements.txt
108
+ ```
109
+
110
+ 启动服务:
111
+
112
+ ```bash
113
+ export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
114
+ export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
115
+ export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
116
+ export COLA_MODEL_NAME=cola-dlm
117
+ export COLA_API_KEY=change-me
118
+
119
+ uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
120
+ ```
121
+
122
+ 发送请求:
123
+
124
+ ```bash
125
+ curl http://127.0.0.1:8000/v1/chat/completions \
126
+ -H "Content-Type: application/json" \
127
+ -H "Authorization: Bearer change-me" \
128
+ -d '{
129
+ "model": "cola-dlm",
130
+ "messages": [
131
+ {
132
+ "role": "user",
133
+ "content": "Question: What is the capital of France? Answer:"
134
+ }
135
+ ],
136
+ "temperature": 0,
137
+ "max_tokens": 32,
138
+ "stream": false
139
+ }'
140
+ ```
141
+
142
+ 当前 adapter 支持非流式生成。
143
+
144
+ ## 模型细节
145
+
146
+ - **模型结构**:Text VAE + 分块因果 DiT 隐先验。
147
+ - **训练目标**:两阶段训练,先进行 Text VAE 预训练,再通过 Flow Matching 联合训练 Text VAE 与 DiT。
148
+ - **训练量节点**:开源权重对应论文 RQ4 scaling 曲线中的 2000 EFLOPs checkpoint。
149
+ - **Tokenizer**:OLMo 2 tokenizer,词表大小为 100,278。
150
+ - **特殊 token id**:`pad_token_id=100277`,`eos_token_id=100257`,`im_end_token_id=100265`。
151
+ - **框架**:PyTorch 2.1+ 与 HuggingFace Transformers 4.40+。
152
+ - **许可证**:Apache License 2.0。
153
+
154
+ ## 评测
155
+
156
+ 基于开源推理实现的零样本参考结果如下:
157
+
158
+ | 任务 | 准确率(%) |
159
+ | --- | ---: |
160
+ | LAMBADA | 50.80 |
161
+ | MMLU | 19.30 |
162
+ | OBQA | 23.00 |
163
+ | HellaSwag | 10.70 |
164
+ | RACE | 19.60 |
165
+ | SIQA | 28.90 |
166
+ | SQuAD | 30.90 |
167
+ | Story Cloze | 30.77 |
168
+ | **Tasks Average** | **26.75** |
169
+
170
+ 开源 HuggingFace Transformers 实现与论文��使用的内部实现存在细微差异,因此各任务数值可能有小幅波动,但整体趋势与论文一致。
171
+
172
+ ## 预期用途
173
+
174
+ Cola DLM 主要面向层次化隐变量语言模型、连续隐空间文本扩散、Flow Matching 先验以及 benchmark 风格文本生成等研究场景。
175
+
176
+ 该 checkpoint **没有经过指令微调**,也没有经过 RLHF;不应被视为生产级聊天机器人,也不应用于安全关键决策场景。
177
+
178
+ ## 局限性
179
+
180
+ - 模型主要基于英文文本训练;其他语言能力尚未充分评估。
181
+ - 输出可能包含事实错误、冒犯性内容、偏见或幻觉。
182
+ - 生成质量对 prompt 格式和长度较敏感。快速评测时建议使用 `"Question: ... Answer:"` 这类 QA 风格 prompt。
183
+ - 推理时会使用可变 KV cache;服务实现中建议在单进程内串行执行生成,除非显式隔离 cache 状态。
184
+
185
+ ## 引用
186
+
187
+ 如果 Cola DLM 对你的工作有帮助,请引用:
188
+
189
+ ```bibtex
190
+ @article{guo2026cola,
191
+ title = {Continuous Latent Diffusion Language Model},
192
+ author = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
193
+ Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
194
+ Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
195
+ journal = {arXiv preprint arXiv:2605.06548},
196
+ year = {2026},
197
+ url = {https://arxiv.org/abs/2605.06548},
198
+ }
199
+ ```
cola_dlm/cola_dit/config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ColaDiTModel"
4
+ ],
5
+ "block_size": 16,
6
+ "emb_dim": 2048,
7
+ "expand_ratio": 4,
8
+ "head_dim": 128,
9
+ "heads": 16,
10
+ "model_type": "cola_dit",
11
+ "norm_eps": 1e-05,
12
+ "num_layers": 24,
13
+ "patch_size": 1,
14
+ "qk_bias": false,
15
+ "rope_dim": 96,
16
+ "torch_dtype": "float32",
17
+ "transformers_version": "4.46.1",
18
+ "txt_dim": 2048,
19
+ "txt_in_channels": 16,
20
+ "txt_out_channels": 16
21
+ }
cola_dlm/cola_dit/model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e4f4f2823d47553db823036cb80c5c5fec3f1a8769f17b4c2904f69b1bfe94e
3
+ size 4936274728
cola_dlm/cola_dit/model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa0ceadba148ab3d13b89ae6c6906db2ece5884784839fb5433dcdb8354f955f
3
+ size 2383280512
cola_dlm/cola_dit/model.safetensors.index.json ADDED
@@ -0,0 +1,477 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 7319507520
4
+ },
5
+ "weight_map": {
6
+ "blocks.0.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
7
+ "blocks.0.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
8
+ "blocks.0.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
9
+ "blocks.0.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
10
+ "blocks.0.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
11
+ "blocks.0.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
12
+ "blocks.0.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
13
+ "blocks.0.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
14
+ "blocks.0.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
15
+ "blocks.0.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
16
+ "blocks.0.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
17
+ "blocks.0.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
18
+ "blocks.0.msa.norm_k.bias": "model-00001-of-00002.safetensors",
19
+ "blocks.0.msa.norm_k.weight": "model-00001-of-00002.safetensors",
20
+ "blocks.0.msa.norm_q.bias": "model-00001-of-00002.safetensors",
21
+ "blocks.0.msa.norm_q.weight": "model-00001-of-00002.safetensors",
22
+ "blocks.0.msa.proj_out.weight": "model-00001-of-00002.safetensors",
23
+ "blocks.0.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
24
+ "blocks.0.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
25
+ "blocks.1.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
26
+ "blocks.1.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
27
+ "blocks.1.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
28
+ "blocks.1.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
29
+ "blocks.1.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
30
+ "blocks.1.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
31
+ "blocks.1.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
32
+ "blocks.1.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
33
+ "blocks.1.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
34
+ "blocks.1.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
35
+ "blocks.1.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
36
+ "blocks.1.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
37
+ "blocks.1.msa.norm_k.bias": "model-00001-of-00002.safetensors",
38
+ "blocks.1.msa.norm_k.weight": "model-00001-of-00002.safetensors",
39
+ "blocks.1.msa.norm_q.bias": "model-00001-of-00002.safetensors",
40
+ "blocks.1.msa.norm_q.weight": "model-00001-of-00002.safetensors",
41
+ "blocks.1.msa.proj_out.weight": "model-00001-of-00002.safetensors",
42
+ "blocks.1.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
43
+ "blocks.1.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
44
+ "blocks.10.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
45
+ "blocks.10.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
46
+ "blocks.10.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
47
+ "blocks.10.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
48
+ "blocks.10.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
49
+ "blocks.10.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
50
+ "blocks.10.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
51
+ "blocks.10.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
52
+ "blocks.10.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
53
+ "blocks.10.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
54
+ "blocks.10.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
55
+ "blocks.10.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
56
+ "blocks.10.msa.norm_k.bias": "model-00001-of-00002.safetensors",
57
+ "blocks.10.msa.norm_k.weight": "model-00001-of-00002.safetensors",
58
+ "blocks.10.msa.norm_q.bias": "model-00001-of-00002.safetensors",
59
+ "blocks.10.msa.norm_q.weight": "model-00001-of-00002.safetensors",
60
+ "blocks.10.msa.proj_out.weight": "model-00001-of-00002.safetensors",
61
+ "blocks.10.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
62
+ "blocks.10.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
63
+ "blocks.11.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
64
+ "blocks.11.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
65
+ "blocks.11.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
66
+ "blocks.11.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
67
+ "blocks.11.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
68
+ "blocks.11.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
69
+ "blocks.11.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
70
+ "blocks.11.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
71
+ "blocks.11.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
72
+ "blocks.11.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
73
+ "blocks.11.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
74
+ "blocks.11.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
75
+ "blocks.11.msa.norm_k.bias": "model-00001-of-00002.safetensors",
76
+ "blocks.11.msa.norm_k.weight": "model-00001-of-00002.safetensors",
77
+ "blocks.11.msa.norm_q.bias": "model-00001-of-00002.safetensors",
78
+ "blocks.11.msa.norm_q.weight": "model-00001-of-00002.safetensors",
79
+ "blocks.11.msa.proj_out.weight": "model-00001-of-00002.safetensors",
80
+ "blocks.11.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
81
+ "blocks.11.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
82
+ "blocks.12.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
83
+ "blocks.12.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
84
+ "blocks.12.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
85
+ "blocks.12.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
86
+ "blocks.12.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
87
+ "blocks.12.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
88
+ "blocks.12.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
89
+ "blocks.12.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
90
+ "blocks.12.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
91
+ "blocks.12.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
92
+ "blocks.12.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
93
+ "blocks.12.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
94
+ "blocks.12.msa.norm_k.bias": "model-00001-of-00002.safetensors",
95
+ "blocks.12.msa.norm_k.weight": "model-00001-of-00002.safetensors",
96
+ "blocks.12.msa.norm_q.bias": "model-00001-of-00002.safetensors",
97
+ "blocks.12.msa.norm_q.weight": "model-00001-of-00002.safetensors",
98
+ "blocks.12.msa.proj_out.weight": "model-00001-of-00002.safetensors",
99
+ "blocks.12.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
100
+ "blocks.12.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
101
+ "blocks.13.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
102
+ "blocks.13.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
103
+ "blocks.13.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
104
+ "blocks.13.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
105
+ "blocks.13.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
106
+ "blocks.13.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
107
+ "blocks.13.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
108
+ "blocks.13.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
109
+ "blocks.13.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
110
+ "blocks.13.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
111
+ "blocks.13.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
112
+ "blocks.13.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
113
+ "blocks.13.msa.norm_k.bias": "model-00001-of-00002.safetensors",
114
+ "blocks.13.msa.norm_k.weight": "model-00001-of-00002.safetensors",
115
+ "blocks.13.msa.norm_q.bias": "model-00001-of-00002.safetensors",
116
+ "blocks.13.msa.norm_q.weight": "model-00001-of-00002.safetensors",
117
+ "blocks.13.msa.proj_out.weight": "model-00001-of-00002.safetensors",
118
+ "blocks.13.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
119
+ "blocks.13.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
120
+ "blocks.14.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
121
+ "blocks.14.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
122
+ "blocks.14.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
123
+ "blocks.14.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
124
+ "blocks.14.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
125
+ "blocks.14.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
126
+ "blocks.14.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
127
+ "blocks.14.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
128
+ "blocks.14.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
129
+ "blocks.14.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
130
+ "blocks.14.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
131
+ "blocks.14.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
132
+ "blocks.14.msa.norm_k.bias": "model-00001-of-00002.safetensors",
133
+ "blocks.14.msa.norm_k.weight": "model-00001-of-00002.safetensors",
134
+ "blocks.14.msa.norm_q.bias": "model-00001-of-00002.safetensors",
135
+ "blocks.14.msa.norm_q.weight": "model-00001-of-00002.safetensors",
136
+ "blocks.14.msa.proj_out.weight": "model-00001-of-00002.safetensors",
137
+ "blocks.14.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
138
+ "blocks.14.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
139
+ "blocks.15.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
140
+ "blocks.15.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
141
+ "blocks.15.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
142
+ "blocks.15.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
143
+ "blocks.15.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
144
+ "blocks.15.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
145
+ "blocks.15.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
146
+ "blocks.15.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
147
+ "blocks.15.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
148
+ "blocks.15.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
149
+ "blocks.15.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
150
+ "blocks.15.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
151
+ "blocks.15.msa.norm_k.bias": "model-00001-of-00002.safetensors",
152
+ "blocks.15.msa.norm_k.weight": "model-00001-of-00002.safetensors",
153
+ "blocks.15.msa.norm_q.bias": "model-00001-of-00002.safetensors",
154
+ "blocks.15.msa.norm_q.weight": "model-00001-of-00002.safetensors",
155
+ "blocks.15.msa.proj_out.weight": "model-00001-of-00002.safetensors",
156
+ "blocks.15.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
157
+ "blocks.15.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
158
+ "blocks.16.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
159
+ "blocks.16.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
160
+ "blocks.16.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
161
+ "blocks.16.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
162
+ "blocks.16.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
163
+ "blocks.16.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
164
+ "blocks.16.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
165
+ "blocks.16.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
166
+ "blocks.16.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
167
+ "blocks.16.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
168
+ "blocks.16.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
169
+ "blocks.16.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
170
+ "blocks.16.msa.norm_k.bias": "model-00001-of-00002.safetensors",
171
+ "blocks.16.msa.norm_k.weight": "model-00001-of-00002.safetensors",
172
+ "blocks.16.msa.norm_q.bias": "model-00001-of-00002.safetensors",
173
+ "blocks.16.msa.norm_q.weight": "model-00001-of-00002.safetensors",
174
+ "blocks.16.msa.proj_out.weight": "model-00001-of-00002.safetensors",
175
+ "blocks.16.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
176
+ "blocks.16.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
177
+ "blocks.17.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
178
+ "blocks.17.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
179
+ "blocks.17.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
180
+ "blocks.17.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
181
+ "blocks.17.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
182
+ "blocks.17.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
183
+ "blocks.17.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
184
+ "blocks.17.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
185
+ "blocks.17.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
186
+ "blocks.17.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
187
+ "blocks.17.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
188
+ "blocks.17.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
189
+ "blocks.17.msa.norm_k.bias": "model-00002-of-00002.safetensors",
190
+ "blocks.17.msa.norm_k.weight": "model-00002-of-00002.safetensors",
191
+ "blocks.17.msa.norm_q.bias": "model-00002-of-00002.safetensors",
192
+ "blocks.17.msa.norm_q.weight": "model-00002-of-00002.safetensors",
193
+ "blocks.17.msa.proj_out.weight": "model-00002-of-00002.safetensors",
194
+ "blocks.17.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
195
+ "blocks.17.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
196
+ "blocks.18.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
197
+ "blocks.18.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
198
+ "blocks.18.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
199
+ "blocks.18.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
200
+ "blocks.18.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
201
+ "blocks.18.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
202
+ "blocks.18.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
203
+ "blocks.18.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
204
+ "blocks.18.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
205
+ "blocks.18.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
206
+ "blocks.18.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
207
+ "blocks.18.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
208
+ "blocks.18.msa.norm_k.bias": "model-00002-of-00002.safetensors",
209
+ "blocks.18.msa.norm_k.weight": "model-00002-of-00002.safetensors",
210
+ "blocks.18.msa.norm_q.bias": "model-00002-of-00002.safetensors",
211
+ "blocks.18.msa.norm_q.weight": "model-00002-of-00002.safetensors",
212
+ "blocks.18.msa.proj_out.weight": "model-00002-of-00002.safetensors",
213
+ "blocks.18.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
214
+ "blocks.18.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
215
+ "blocks.19.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
216
+ "blocks.19.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
217
+ "blocks.19.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
218
+ "blocks.19.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
219
+ "blocks.19.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
220
+ "blocks.19.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
221
+ "blocks.19.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
222
+ "blocks.19.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
223
+ "blocks.19.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
224
+ "blocks.19.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
225
+ "blocks.19.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
226
+ "blocks.19.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
227
+ "blocks.19.msa.norm_k.bias": "model-00002-of-00002.safetensors",
228
+ "blocks.19.msa.norm_k.weight": "model-00002-of-00002.safetensors",
229
+ "blocks.19.msa.norm_q.bias": "model-00002-of-00002.safetensors",
230
+ "blocks.19.msa.norm_q.weight": "model-00002-of-00002.safetensors",
231
+ "blocks.19.msa.proj_out.weight": "model-00002-of-00002.safetensors",
232
+ "blocks.19.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
233
+ "blocks.19.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
234
+ "blocks.2.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
235
+ "blocks.2.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
236
+ "blocks.2.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
237
+ "blocks.2.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
238
+ "blocks.2.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
239
+ "blocks.2.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
240
+ "blocks.2.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
241
+ "blocks.2.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
242
+ "blocks.2.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
243
+ "blocks.2.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
244
+ "blocks.2.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
245
+ "blocks.2.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
246
+ "blocks.2.msa.norm_k.bias": "model-00001-of-00002.safetensors",
247
+ "blocks.2.msa.norm_k.weight": "model-00001-of-00002.safetensors",
248
+ "blocks.2.msa.norm_q.bias": "model-00001-of-00002.safetensors",
249
+ "blocks.2.msa.norm_q.weight": "model-00001-of-00002.safetensors",
250
+ "blocks.2.msa.proj_out.weight": "model-00001-of-00002.safetensors",
251
+ "blocks.2.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
252
+ "blocks.2.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
253
+ "blocks.20.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
254
+ "blocks.20.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
255
+ "blocks.20.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
256
+ "blocks.20.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
257
+ "blocks.20.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
258
+ "blocks.20.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
259
+ "blocks.20.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
260
+ "blocks.20.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
261
+ "blocks.20.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
262
+ "blocks.20.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
263
+ "blocks.20.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
264
+ "blocks.20.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
265
+ "blocks.20.msa.norm_k.bias": "model-00002-of-00002.safetensors",
266
+ "blocks.20.msa.norm_k.weight": "model-00002-of-00002.safetensors",
267
+ "blocks.20.msa.norm_q.bias": "model-00002-of-00002.safetensors",
268
+ "blocks.20.msa.norm_q.weight": "model-00002-of-00002.safetensors",
269
+ "blocks.20.msa.proj_out.weight": "model-00002-of-00002.safetensors",
270
+ "blocks.20.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
271
+ "blocks.20.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
272
+ "blocks.21.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
273
+ "blocks.21.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
274
+ "blocks.21.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
275
+ "blocks.21.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
276
+ "blocks.21.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
277
+ "blocks.21.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
278
+ "blocks.21.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
279
+ "blocks.21.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
280
+ "blocks.21.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
281
+ "blocks.21.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
282
+ "blocks.21.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
283
+ "blocks.21.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
284
+ "blocks.21.msa.norm_k.bias": "model-00002-of-00002.safetensors",
285
+ "blocks.21.msa.norm_k.weight": "model-00002-of-00002.safetensors",
286
+ "blocks.21.msa.norm_q.bias": "model-00002-of-00002.safetensors",
287
+ "blocks.21.msa.norm_q.weight": "model-00002-of-00002.safetensors",
288
+ "blocks.21.msa.proj_out.weight": "model-00002-of-00002.safetensors",
289
+ "blocks.21.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
290
+ "blocks.21.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
291
+ "blocks.22.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
292
+ "blocks.22.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
293
+ "blocks.22.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
294
+ "blocks.22.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
295
+ "blocks.22.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
296
+ "blocks.22.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
297
+ "blocks.22.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
298
+ "blocks.22.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
299
+ "blocks.22.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
300
+ "blocks.22.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
301
+ "blocks.22.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
302
+ "blocks.22.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
303
+ "blocks.22.msa.norm_k.bias": "model-00002-of-00002.safetensors",
304
+ "blocks.22.msa.norm_k.weight": "model-00002-of-00002.safetensors",
305
+ "blocks.22.msa.norm_q.bias": "model-00002-of-00002.safetensors",
306
+ "blocks.22.msa.norm_q.weight": "model-00002-of-00002.safetensors",
307
+ "blocks.22.msa.proj_out.weight": "model-00002-of-00002.safetensors",
308
+ "blocks.22.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
309
+ "blocks.22.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
310
+ "blocks.23.ada.mlp_in.1.bias": "model-00002-of-00002.safetensors",
311
+ "blocks.23.ada.mlp_in.1.weight": "model-00002-of-00002.safetensors",
312
+ "blocks.23.ada.mlp_out.1.bias": "model-00002-of-00002.safetensors",
313
+ "blocks.23.ada.mlp_out.1.weight": "model-00002-of-00002.safetensors",
314
+ "blocks.23.ada.msa_in.1.bias": "model-00002-of-00002.safetensors",
315
+ "blocks.23.ada.msa_in.1.weight": "model-00002-of-00002.safetensors",
316
+ "blocks.23.ada.msa_out.1.bias": "model-00002-of-00002.safetensors",
317
+ "blocks.23.ada.msa_out.1.weight": "model-00002-of-00002.safetensors",
318
+ "blocks.23.mlp.proj_in.bias": "model-00002-of-00002.safetensors",
319
+ "blocks.23.mlp.proj_in.weight": "model-00002-of-00002.safetensors",
320
+ "blocks.23.mlp.proj_out.bias": "model-00002-of-00002.safetensors",
321
+ "blocks.23.mlp.proj_out.weight": "model-00002-of-00002.safetensors",
322
+ "blocks.23.msa.norm_k.bias": "model-00002-of-00002.safetensors",
323
+ "blocks.23.msa.norm_k.weight": "model-00002-of-00002.safetensors",
324
+ "blocks.23.msa.norm_q.bias": "model-00002-of-00002.safetensors",
325
+ "blocks.23.msa.norm_q.weight": "model-00002-of-00002.safetensors",
326
+ "blocks.23.msa.proj_out.weight": "model-00002-of-00002.safetensors",
327
+ "blocks.23.msa.proj_qkv.weight": "model-00002-of-00002.safetensors",
328
+ "blocks.23.msa.rope.rope.freqs": "model-00002-of-00002.safetensors",
329
+ "blocks.3.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
330
+ "blocks.3.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
331
+ "blocks.3.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
332
+ "blocks.3.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
333
+ "blocks.3.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
334
+ "blocks.3.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
335
+ "blocks.3.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
336
+ "blocks.3.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
337
+ "blocks.3.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
338
+ "blocks.3.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
339
+ "blocks.3.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
340
+ "blocks.3.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
341
+ "blocks.3.msa.norm_k.bias": "model-00001-of-00002.safetensors",
342
+ "blocks.3.msa.norm_k.weight": "model-00001-of-00002.safetensors",
343
+ "blocks.3.msa.norm_q.bias": "model-00001-of-00002.safetensors",
344
+ "blocks.3.msa.norm_q.weight": "model-00001-of-00002.safetensors",
345
+ "blocks.3.msa.proj_out.weight": "model-00001-of-00002.safetensors",
346
+ "blocks.3.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
347
+ "blocks.3.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
348
+ "blocks.4.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
349
+ "blocks.4.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
350
+ "blocks.4.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
351
+ "blocks.4.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
352
+ "blocks.4.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
353
+ "blocks.4.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
354
+ "blocks.4.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
355
+ "blocks.4.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
356
+ "blocks.4.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
357
+ "blocks.4.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
358
+ "blocks.4.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
359
+ "blocks.4.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
360
+ "blocks.4.msa.norm_k.bias": "model-00001-of-00002.safetensors",
361
+ "blocks.4.msa.norm_k.weight": "model-00001-of-00002.safetensors",
362
+ "blocks.4.msa.norm_q.bias": "model-00001-of-00002.safetensors",
363
+ "blocks.4.msa.norm_q.weight": "model-00001-of-00002.safetensors",
364
+ "blocks.4.msa.proj_out.weight": "model-00001-of-00002.safetensors",
365
+ "blocks.4.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
366
+ "blocks.4.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
367
+ "blocks.5.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
368
+ "blocks.5.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
369
+ "blocks.5.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
370
+ "blocks.5.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
371
+ "blocks.5.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
372
+ "blocks.5.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
373
+ "blocks.5.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
374
+ "blocks.5.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
375
+ "blocks.5.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
376
+ "blocks.5.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
377
+ "blocks.5.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
378
+ "blocks.5.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
379
+ "blocks.5.msa.norm_k.bias": "model-00001-of-00002.safetensors",
380
+ "blocks.5.msa.norm_k.weight": "model-00001-of-00002.safetensors",
381
+ "blocks.5.msa.norm_q.bias": "model-00001-of-00002.safetensors",
382
+ "blocks.5.msa.norm_q.weight": "model-00001-of-00002.safetensors",
383
+ "blocks.5.msa.proj_out.weight": "model-00001-of-00002.safetensors",
384
+ "blocks.5.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
385
+ "blocks.5.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
386
+ "blocks.6.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
387
+ "blocks.6.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
388
+ "blocks.6.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
389
+ "blocks.6.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
390
+ "blocks.6.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
391
+ "blocks.6.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
392
+ "blocks.6.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
393
+ "blocks.6.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
394
+ "blocks.6.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
395
+ "blocks.6.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
396
+ "blocks.6.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
397
+ "blocks.6.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
398
+ "blocks.6.msa.norm_k.bias": "model-00001-of-00002.safetensors",
399
+ "blocks.6.msa.norm_k.weight": "model-00001-of-00002.safetensors",
400
+ "blocks.6.msa.norm_q.bias": "model-00001-of-00002.safetensors",
401
+ "blocks.6.msa.norm_q.weight": "model-00001-of-00002.safetensors",
402
+ "blocks.6.msa.proj_out.weight": "model-00001-of-00002.safetensors",
403
+ "blocks.6.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
404
+ "blocks.6.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
405
+ "blocks.7.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
406
+ "blocks.7.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
407
+ "blocks.7.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
408
+ "blocks.7.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
409
+ "blocks.7.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
410
+ "blocks.7.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
411
+ "blocks.7.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
412
+ "blocks.7.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
413
+ "blocks.7.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
414
+ "blocks.7.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
415
+ "blocks.7.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
416
+ "blocks.7.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
417
+ "blocks.7.msa.norm_k.bias": "model-00001-of-00002.safetensors",
418
+ "blocks.7.msa.norm_k.weight": "model-00001-of-00002.safetensors",
419
+ "blocks.7.msa.norm_q.bias": "model-00001-of-00002.safetensors",
420
+ "blocks.7.msa.norm_q.weight": "model-00001-of-00002.safetensors",
421
+ "blocks.7.msa.proj_out.weight": "model-00001-of-00002.safetensors",
422
+ "blocks.7.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
423
+ "blocks.7.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
424
+ "blocks.8.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
425
+ "blocks.8.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
426
+ "blocks.8.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
427
+ "blocks.8.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
428
+ "blocks.8.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
429
+ "blocks.8.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
430
+ "blocks.8.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
431
+ "blocks.8.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
432
+ "blocks.8.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
433
+ "blocks.8.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
434
+ "blocks.8.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
435
+ "blocks.8.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
436
+ "blocks.8.msa.norm_k.bias": "model-00001-of-00002.safetensors",
437
+ "blocks.8.msa.norm_k.weight": "model-00001-of-00002.safetensors",
438
+ "blocks.8.msa.norm_q.bias": "model-00001-of-00002.safetensors",
439
+ "blocks.8.msa.norm_q.weight": "model-00001-of-00002.safetensors",
440
+ "blocks.8.msa.proj_out.weight": "model-00001-of-00002.safetensors",
441
+ "blocks.8.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
442
+ "blocks.8.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
443
+ "blocks.9.ada.mlp_in.1.bias": "model-00001-of-00002.safetensors",
444
+ "blocks.9.ada.mlp_in.1.weight": "model-00001-of-00002.safetensors",
445
+ "blocks.9.ada.mlp_out.1.bias": "model-00001-of-00002.safetensors",
446
+ "blocks.9.ada.mlp_out.1.weight": "model-00001-of-00002.safetensors",
447
+ "blocks.9.ada.msa_in.1.bias": "model-00001-of-00002.safetensors",
448
+ "blocks.9.ada.msa_in.1.weight": "model-00001-of-00002.safetensors",
449
+ "blocks.9.ada.msa_out.1.bias": "model-00001-of-00002.safetensors",
450
+ "blocks.9.ada.msa_out.1.weight": "model-00001-of-00002.safetensors",
451
+ "blocks.9.mlp.proj_in.bias": "model-00001-of-00002.safetensors",
452
+ "blocks.9.mlp.proj_in.weight": "model-00001-of-00002.safetensors",
453
+ "blocks.9.mlp.proj_out.bias": "model-00001-of-00002.safetensors",
454
+ "blocks.9.mlp.proj_out.weight": "model-00001-of-00002.safetensors",
455
+ "blocks.9.msa.norm_k.bias": "model-00001-of-00002.safetensors",
456
+ "blocks.9.msa.norm_k.weight": "model-00001-of-00002.safetensors",
457
+ "blocks.9.msa.norm_q.bias": "model-00001-of-00002.safetensors",
458
+ "blocks.9.msa.norm_q.weight": "model-00001-of-00002.safetensors",
459
+ "blocks.9.msa.proj_out.weight": "model-00001-of-00002.safetensors",
460
+ "blocks.9.msa.proj_qkv.weight": "model-00001-of-00002.safetensors",
461
+ "blocks.9.msa.rope.rope.freqs": "model-00001-of-00002.safetensors",
462
+ "emb_in.proj_hid.bias": "model-00001-of-00002.safetensors",
463
+ "emb_in.proj_hid.weight": "model-00001-of-00002.safetensors",
464
+ "emb_in.proj_in.bias": "model-00001-of-00002.safetensors",
465
+ "emb_in.proj_in.weight": "model-00001-of-00002.safetensors",
466
+ "emb_in.proj_out.bias": "model-00001-of-00002.safetensors",
467
+ "emb_in.proj_out.weight": "model-00001-of-00002.safetensors",
468
+ "txt_in.proj.bias": "model-00001-of-00002.safetensors",
469
+ "txt_in.proj.weight": "model-00001-of-00002.safetensors",
470
+ "txt_out.proj.bias": "model-00002-of-00002.safetensors",
471
+ "txt_out.proj.weight": "model-00002-of-00002.safetensors",
472
+ "txt_out_ada.out_in.1.bias": "model-00002-of-00002.safetensors",
473
+ "txt_out_ada.out_in.1.weight": "model-00002-of-00002.safetensors",
474
+ "txt_out_norm.bias": "model-00002-of-00002.safetensors",
475
+ "txt_out_norm.weight": "model-00002-of-00002.safetensors"
476
+ }
477
+ }
cola_dlm/cola_vae/config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "act": "swiglu",
3
+ "architectures": [
4
+ "ColaTextVAEModel"
5
+ ],
6
+ "attn_dropout": 0.0,
7
+ "bias": true,
8
+ "block_causal": true,
9
+ "block_size": 1,
10
+ "causal": false,
11
+ "clip_qkv": null,
12
+ "decoder_num_blocks": 4,
13
+ "dim": 1536,
14
+ "dropout": 0.0,
15
+ "encoder_last_ln": true,
16
+ "encoder_num_blocks": 4,
17
+ "ffn_dim": 6144,
18
+ "init_cutoff_factor": 3,
19
+ "init_fn": "normal",
20
+ "init_std": 0.02,
21
+ "latent_dim": 16,
22
+ "layer_norm_affine": true,
23
+ "layer_norm_eps": 1e-06,
24
+ "layer_norm_type": "layer_norm",
25
+ "model_type": "cola_text_vae",
26
+ "num_heads": 12,
27
+ "patch_size": 1,
28
+ "post_norm": true,
29
+ "qk_bias": false,
30
+ "qk_norm": true,
31
+ "qk_norm_affine": true,
32
+ "rope_full_precision": true,
33
+ "rope_theta": 500000,
34
+ "scaling_factor": 1.0,
35
+ "shared_heads_kv": 1,
36
+ "shifting_factor": 0.0,
37
+ "torch_dtype": "float32",
38
+ "transformers_version": "4.46.1",
39
+ "use_emb": true,
40
+ "use_variation": true,
41
+ "vocab_size": 100278
42
+ }
cola_dlm/cola_vae/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b27cf0a73ecf687d37c28d45881117960979436ed6ec908abd44330b23e991c
3
+ size 2007500120
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff