Text Generation
Transformers
Safetensors
PyTorch
English
language-model
diffusion
latent-diffusion
flow-matching
text-vae
research
Instructions to use ByteDance-Seed/Cola-DLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ByteDance-Seed/Cola-DLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ByteDance-Seed/Cola-DLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ByteDance-Seed/Cola-DLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ByteDance-Seed/Cola-DLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ByteDance-Seed/Cola-DLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ByteDance-Seed/Cola-DLM
- SGLang
How to use ByteDance-Seed/Cola-DLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Cola-DLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Cola-DLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ByteDance-Seed/Cola-DLM with Docker Model Runner:
docker model run hf.co/ByteDance-Seed/Cola-DLM
File size: 9,109 Bytes
513e54b f47401a 513e54b f47401a 9ee16d1 f47401a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | ---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation
- language-model
- diffusion
- latent-diffusion
- flow-matching
- text-vae
- pytorch
- transformers
- research
---
# Cola DLM
[English](README.md) Β· [δΈζ](README_zh.md)
**Cola DLM** (`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel) is a hierarchical continuous latent-space diffusion language model. It combines a Text VAE with a block-causal Diffusion Transformer (DiT) prior: the VAE maps text into continuous latent sequences and decodes latents back to tokens, while the DiT performs latent prior transport through Flow Matching.
This model repository contains the HuggingFace-format checkpoint for the paper **Continuous Latent Diffusion Language Model**.
## Links
- **Model repository:** <https://huggingface.co/ByteDance-Seed/Cola-DLM>
- **GitHub repository:** <https://github.com/ByteDance-Seed/Cola-DLM>
- **Paper:** <https://arxiv.org/abs/2605.06548>
- **HuggingFace Daily Paper:** <https://huggingface.co/papers/2605.06548>
- **Project page:** <https://hongcanguo.github.io/Cola-DLM/>
- **Blog post:** <https://hongcanguo.github.io/posts/2026-cola-dlm.html>
- **Zhihu article:** <https://zhuanlan.zhihu.com/p/2038324180920313704>
## Model Files
The expected repository layout is:
```text
.
βββ cola_dlm/
β βββ cola_dit/
β β βββ config.json
β β βββ model.safetensors*
β βββ cola_vae/
β βββ config.json
β βββ model.safetensors*
βββ tokenizer.json
βββ README.md
βββ README_zh.md
```
The checkpoint consists of two cooperating modules:
- `ColaDiTModel`: a block-causal 1-D Diffusion Transformer prior over continuous text latents.
- `ColaTextVAEModel`: a Text VAE encoder and conditional decoder for text-to-latent and latent-to-text mapping.
## Quickstart
Install the Cola DLM code package from the [GitHub repository](https://github.com/ByteDance-Seed/Cola-DLM), then install the download helper:
```bash
git clone https://github.com/ByteDance-Seed/Cola-DLM.git
cd Cola-DLM
pip install -e .
pip install huggingface_hub
```
Download the model files:
```bash
huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models
```
Run a minimal Python example:
```python
import torch
from tokenizers import Tokenizer
from cola_dlm import (
ColaDiTModel,
ColaTextVAEModel,
generate_task_repaint_inference,
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device)
vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device)
tokenizer = Tokenizer.from_file("hf_models/tokenizer.json")
prompts = [{"question": "Question: What is the capital of France? Answer:"}]
results = generate_task_repaint_inference(
dit=dit,
vae=vae,
tokenizer=tokenizer,
prompts=prompts,
task_name="lambada",
device=device,
max_new_tokens=32,
temperature=0.0,
guidance_scale=7.0,
timestep_num=16,
pad_token_id=100277,
)
print(results[0]["generate"])
```
## OpenAI-Compatible Serving
The companion `openai_adapter/` service in the Cola DLM code release exposes this model through an OpenAI-compatible Chat Completions endpoint:
```text
POST /v1/chat/completions
```
Install the adapter dependencies from the code repository root:
```bash
pip install -e .
pip install -r openai_adapter/requirements.txt
```
Start the service:
```bash
export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit
export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae
export COLA_TOKENIZER_PATH=hf_models/tokenizer.json
export COLA_MODEL_NAME=cola-dlm
export COLA_API_KEY=change-me
uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000
```
Then send a request:
```bash
curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer change-me" \
-d '{
"model": "cola-dlm",
"messages": [
{
"role": "user",
"content": "Question: What is the capital of France? Answer:"
}
],
"temperature": 0,
"max_tokens": 32,
"stream": false
}'
```
The adapter currently supports non-streaming completions.
## Model Details
- **Architecture:** Text VAE + block-causal DiT latent prior.
- **Training objective:** two-stage training with Text VAE pretraining followed by joint Text VAE + DiT training using Flow Matching.
- **Training-compute checkpoint:** the released weights correspond to the 2000 EFLOPs checkpoint reported in the paper's RQ4 scaling curve.
- **Tokenizer:** OLMo 2 tokenizer with a 100,278-entry vocabulary.
- **Special token ids:** `pad_token_id=100277`, `eos_token_id=100257`, `im_end_token_id=100265`.
- **Framework:** PyTorch 2.1+ and HuggingFace Transformers 4.40+.
- **License:** Apache License 2.0.
## Evaluation
Reference zero-shot benchmark results from the open-source inference implementation:
| Task | Accuracy (%) |
| --- | ---: |
| LAMBADA | 50.80 |
| MMLU | 19.30 |
| OBQA | 23.00 |
| HellaSwag | 10.70 |
| RACE | 19.60 |
| SIQA | 28.90 |
| SQuAD | 30.90 |
| Story Cloze | 30.77 |
| **Tasks Average** | **26.75** |
The open-source HuggingFace Transformers implementation may differ slightly from the internal implementation used in the paper, so per-task numbers can fluctuate slightly. The overall trend is consistent with the paper.
## Intended Use
Cola DLM is intended primarily for research on hierarchical latent-variable language models, continuous latent diffusion for text, Flow Matching priors, and benchmark-style text generation.
This checkpoint is **not instruction-tuned** and has not gone through RLHF. It should not be treated as a production chatbot or used for safety-critical decision making.
## Limitations
- The model was trained primarily on English text; other languages are not well evaluated.
- Outputs may contain factual errors, offensive content, bias, or hallucinations.
- Generation quality can be sensitive to prompt format and prompt length. QA-style prompts such as `"Question: ... Answer:"` are recommended for quick evaluation.
- The model uses mutable KV caches during generation; service implementations should serialize generation inside one process unless cache handling is explicitly isolated.
## Safety Statement and Use Restrictions
Cola DLM is a research-oriented checkpoint for continuous latent diffusion language modeling. The released model is relatively small and has **not been instruction-tuned, RLHF-aligned, or systematically safety-aligned**. Therefore, it does not provide reliable refusal behavior, content moderation, or risk detection. Its outputs may contain inaccurate, offensive, biased, unlawful, inappropriate, or misleading content.
This model is intended only for academic research and technical experimentation. We do not encourage, support, or authorize the use of Cola DLM to generate, distribute, or assist with the following types of content:
- Pornographic, sexually explicit, exploitative, or otherwise inappropriate content;
- Gambling-related content, including gambling promotion, betting advice, or illegal gambling services;
- Content related to illegal drugs or controlled substances, including instructions for manufacturing, purchasing, selling, using, or evading regulation;
- Hate, harassment, discrimination, threats of violence, extremist, or inflammatory content;
- Political manipulation, targeted political persuasion, political misinformation, incitement of international or intergroup conflict, or sensitive political content that may escalate social, national, or geopolitical tensions;
- Illegal activities, regulatory evasion, cyber abuse, privacy violations, or other content that may cause real-world harm;
- Automated advice or decision-making in high-stakes domains such as medical, legal, financial, safety-critical, or security-sensitive settings.
Users who download, deploy, fine-tune, redistribute, or build applications based on this model are responsible for implementing appropriate safety and compliance measures. Such measures may include, but are not limited to, input and output moderation, access control, logging and auditing, human review, red-teaming, and compliance checks under applicable laws and regulations.
Cola DLM should not be treated as a production-ready chatbot or a safety-reliable general-purpose assistant. Any content generated by this model does not represent the views, positions, or endorsements of the authors, affiliated institutions, or contributors.
## Citation
If you use Cola DLM in your work, please cite:
```bibtex
@article{guo2026cola,
title = {Continuous Latent Diffusion Language Model},
author = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and
Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and
Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan},
journal = {arXiv preprint arXiv:2605.06548},
year = {2026},
url = {https://arxiv.org/abs/2605.06548},
}
```
|