Text Generation
Transformers
Safetensors
PyTorch
English
language-model
diffusion
latent-diffusion
flow-matching
text-vae
research
Instructions to use ByteDance-Seed/Cola-DLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ByteDance-Seed/Cola-DLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ByteDance-Seed/Cola-DLM")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ByteDance-Seed/Cola-DLM", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ByteDance-Seed/Cola-DLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ByteDance-Seed/Cola-DLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/ByteDance-Seed/Cola-DLM
- SGLang
How to use ByteDance-Seed/Cola-DLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Cola-DLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Cola-DLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Cola-DLM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use ByteDance-Seed/Cola-DLM with Docker Model Runner:
docker model run hf.co/ByteDance-Seed/Cola-DLM
| # Cola DLM | |
| [English](README.md) · [中文](README_zh.md) | |
| **Cola DLM**(`Co`ntinuous `La`tent `D`iffusion `L`anguage `M`odel,连续隐空间扩散语言模型)是一个层次化连续隐空间扩散语言模型。它由 Text VAE 与分块因果 Diffusion Transformer(DiT)先验组成:VAE 负责在文本与连续隐变量序列之间建立映射,并将隐变量解码回 token;DiT 则通过 Flow Matching 在隐空间中进行先验传输。 | |
| 本模型仓库包含论文 **Continuous Latent Diffusion Language Model** 对应的 HuggingFace 格式 checkpoint。 | |
| ## 相关链接 | |
| - **模型仓库**:<https://huggingface.co/ByteDance-Seed/Cola-DLM> | |
| - **GitHub 仓库**:<https://github.com/ByteDance-Seed/Cola-DLM> | |
| - **论文**:<https://arxiv.org/abs/2605.06548> | |
| - **HuggingFace Daily Paper**:<https://huggingface.co/papers/2605.06548> | |
| - **项目主页**:<https://hongcanguo.github.io/Cola-DLM/> | |
| - **博客解读**:<https://hongcanguo.github.io/posts/2026-cola-dlm.html> | |
| - **知乎文章**:<https://zhuanlan.zhihu.com/p/2038324180920313704> | |
| ## 模型文件 | |
| 预期的模型仓库结构如下: | |
| ```text | |
| . | |
| ├── cola_dlm/ | |
| │ ├── cola_dit/ | |
| │ │ ├── config.json | |
| │ │ └── model.safetensors* | |
| │ └── cola_vae/ | |
| │ ├── config.json | |
| │ └── model.safetensors* | |
| ├── tokenizer.json | |
| ├── README.md | |
| └── README_zh.md | |
| ``` | |
| checkpoint 由两个协同模块组成: | |
| - `ColaDiTModel`:面向连续文本隐变量的分块因果 1-D Diffusion Transformer 先验。 | |
| - `ColaTextVAEModel`:Text VAE 编码器与条件解码器,负责文本到隐变量、隐变量到文本的映射。 | |
| ## 快速开始 | |
| 请先从 [GitHub 仓库](https://github.com/ByteDance-Seed/Cola-DLM) 安装 Cola DLM 代码包,然后安装下载辅助工具: | |
| ```bash | |
| git clone https://github.com/ByteDance-Seed/Cola-DLM.git | |
| cd Cola-DLM | |
| pip install -e . | |
| pip install huggingface_hub | |
| ``` | |
| 下载模型文件: | |
| ```bash | |
| huggingface-cli download ByteDance-Seed/Cola-DLM --local-dir hf_models | |
| ``` | |
| 最小 Python 调用示例: | |
| ```python | |
| import torch | |
| from tokenizers import Tokenizer | |
| from cola_dlm import ( | |
| ColaDiTModel, | |
| ColaTextVAEModel, | |
| generate_task_repaint_inference, | |
| ) | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| dit = ColaDiTModel.from_pretrained("hf_models/cola_dlm/cola_dit").to(device) | |
| vae = ColaTextVAEModel.from_pretrained("hf_models/cola_dlm/cola_vae").to(device) | |
| tokenizer = Tokenizer.from_file("hf_models/tokenizer.json") | |
| prompts = [{"question": "Question: What is the capital of France? Answer:"}] | |
| results = generate_task_repaint_inference( | |
| dit=dit, | |
| vae=vae, | |
| tokenizer=tokenizer, | |
| prompts=prompts, | |
| task_name="lambada", | |
| device=device, | |
| max_new_tokens=32, | |
| temperature=0.0, | |
| guidance_scale=7.0, | |
| timestep_num=16, | |
| pad_token_id=100277, | |
| ) | |
| print(results[0]["generate"]) | |
| ``` | |
| ## OpenAI 兼容服务部署 | |
| Cola DLM 代码仓库中的 `openai_adapter/` 服务可以通过 OpenAI 兼容的 Chat Completions 接口暴露本模型: | |
| ```text | |
| POST /v1/chat/completions | |
| ``` | |
| 在源码仓库根目录安装 adapter 依赖: | |
| ```bash | |
| pip install -e . | |
| pip install -r openai_adapter/requirements.txt | |
| ``` | |
| 启动服务: | |
| ```bash | |
| export COLA_DIT_PATH=hf_models/cola_dlm/cola_dit | |
| export COLA_VAE_PATH=hf_models/cola_dlm/cola_vae | |
| export COLA_TOKENIZER_PATH=hf_models/tokenizer.json | |
| export COLA_MODEL_NAME=cola-dlm | |
| export COLA_API_KEY=change-me | |
| uvicorn openai_adapter.server:app --host 0.0.0.0 --port 8000 | |
| ``` | |
| 发送请求: | |
| ```bash | |
| curl http://127.0.0.1:8000/v1/chat/completions \ | |
| -H "Content-Type: application/json" \ | |
| -H "Authorization: Bearer change-me" \ | |
| -d '{ | |
| "model": "cola-dlm", | |
| "messages": [ | |
| { | |
| "role": "user", | |
| "content": "Question: What is the capital of France? Answer:" | |
| } | |
| ], | |
| "temperature": 0, | |
| "max_tokens": 32, | |
| "stream": false | |
| }' | |
| ``` | |
| 当前 adapter 支持非流式生成。 | |
| ## 模型细节 | |
| - **模型结构**:Text VAE + 分块因果 DiT 隐先验。 | |
| - **训练目标**:两阶段训练,先进行 Text VAE 预训练,再通过 Flow Matching 联合训练 Text VAE 与 DiT。 | |
| - **训练量节点**:开源权重对应论文 RQ4 scaling 曲线中的 2000 EFLOPs checkpoint。 | |
| - **Tokenizer**:OLMo 2 tokenizer,词表大小为 100,278。 | |
| - **特殊 token id**:`pad_token_id=100277`,`eos_token_id=100257`,`im_end_token_id=100265`。 | |
| - **框架**:PyTorch 2.1+ 与 HuggingFace Transformers 4.40+。 | |
| - **许可证**:Apache License 2.0。 | |
| ## 评测 | |
| 基于开源推理实现的零样本参考结果如下: | |
| | 任务 | 准确率(%) | | |
| | --- | ---: | | |
| | LAMBADA | 50.80 | | |
| | MMLU | 19.30 | | |
| | OBQA | 23.00 | | |
| | HellaSwag | 10.70 | | |
| | RACE | 19.60 | | |
| | SIQA | 28.90 | | |
| | SQuAD | 30.90 | | |
| | Story Cloze | 30.77 | | |
| | **Tasks Average** | **26.75** | | |
| 开源 HuggingFace Transformers 实现与论文中使用的内部实现存在细微差异,因此各任务数值可能有小幅波动,但整体趋势与论文一致。 | |
| ## 预期用途 | |
| Cola DLM 主要面向层次化隐变量语言模型、连续隐空间文本扩散、Flow Matching 先验以及 benchmark 风格文本生成等研究场景。 | |
| 该 checkpoint **没有经过指令微调**,也没有经过 RLHF;不应被视为生产级聊天机器人,也不应用于安全关键决策场景。 | |
| ## 局限性 | |
| - 模型主要基于英文文本训练;其他语言能力尚未充分评估。 | |
| - 输出可能包含事实错误、冒犯性内容、偏见或幻觉。 | |
| - 生成质量对 prompt 格式和长度较敏感。快速评测时建议使用 `"Question: ... Answer:"` 这类 QA 风格 prompt。 | |
| - 推理时会使用可变 KV cache;服务实现中建议在单进程内串行执行生成,除非显式隔离 cache 状态。 | |
| ## 安全声明与使用限制 | |
| Cola DLM 是一个面向研究探索的连续隐空间扩散语言模型 checkpoint。该模型规模较小,**没有经过指令微调、RLHF 或系统性的安全对齐训练**,因此不具备可靠的安全拒答、内容过滤或风险识别能力。模型输出可能包含不准确、冒犯性、有偏见、违法、不适宜或具有误导性的内容。 | |
| 本模型仅供学术研究与技术验证使用。我们不鼓励、支持或授权将 Cola DLM 用于生成、传播或协助以下类型的内容: | |
| - 色情、露骨性内容、性剥削或任何不适宜内容; | |
| - 赌博相关内容,包括赌博推广、投注建议、非法博彩服务等; | |
| - 毒品、违禁药物或受管制物质相关内容,包括制造、购买、销售、使用或规避监管的指导; | |
| - 仇恨、骚扰、歧视、暴力威胁、极端主义或煽动性内容; | |
| - 政治操纵、定向政治劝服、政治虚假信息、跨国冲突煽动,或可能引发社会、群体、国家间对立的敏感政治内容; | |
| - 违法活动、规避法律监管、网络攻击、隐私侵犯或其他可能造成现实伤害的内容; | |
| - 医疗、法律、金融、安全等高风险决策场景中的自动化建议或判断。 | |
| 使用者在下载、部署、微调、分发或基于本模型构建应用时,应自行承担相应的安全与合规责任,并根据具体场景加入必要的安全机制,包括但不限于输入输出内容审核、访问控制、日志审计、人工复核、红队测试以及地区法律法规合规检查。 | |
| Cola DLM 不应被视为生产级聊天机器人或安全可靠的通用助手。任何基于本模型生成的内容均不代表作者、机构或贡献者的观点、立场或认可。 | |
| ## 引用 | |
| 如果 Cola DLM 对你的工作有帮助,请引用: | |
| ```bibtex | |
| @article{guo2026cola, | |
| title = {Continuous Latent Diffusion Language Model}, | |
| author = {Guo, Hongcan and Zhao, Qinyu and Zhao, Yian and Nie, Shen and | |
| Zhu, Rui and Guo, Qiushan and Wang, Feng and Yang, Tao and | |
| Zhao, Hengshuang and Wei, Guoqiang and Zeng, Yan}, | |
| journal = {arXiv preprint arXiv:2605.06548}, | |
| year = {2026}, | |
| url = {https://arxiv.org/abs/2605.06548}, | |
| } | |
| ``` | |