ELYZA-Diffusion-Base-1.0-Dream-7B

Model Description

ELYZA-Diffusion-Base-1.0-Dream-7B is a Japanese-adapted diffusion language model released by ELYZA, Inc. Based on the open-source diffusion LLM Dream-v0-Instruct-7B, this model has been further pretrained on large-scale Japanese corpora.

The model follows a Discrete Diffusion Masked Language Model (DDMLM) formulation, where text generation is performed via iterative denoising starting from an all-MASK sequence.

This repository provides the base (non-instruction-tuned) model intended for research and evaluation. It is not intended as an end-user–ready system. For practical applications, please refer to the instruction-tuned variant.

For more details on the model design and training setup, please refer to our technical blog post.

Training

Initialization: Dream-v0-7B-Instruct-7B
Continued pretraining on Japanese text (~62B tokens, approximate)

How to Cite

@misc
{elyza2026dllm,
title = {elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
url = {https://huggingface.co/elyza/ELYZA-Diffusion-Base-1.0-Dream-7B},
author = {Tasavat Trisitichoke and Akira Sasaki and Congda Ma and Ryosuke Nakamoto and Satoshi Tohda and Shoetsu Sato and Masato Hirakawa},
year = {2026}
}

Citations

@article
{ye2025dream,
title={Dream 7B: Diffusion Large Language Models},
author={Ye, Jiacheng and Xie, Zhihui and Zheng, Lin and Gao, Jiahui and Wu, Zirui and Jiang, Xin and Li, Zhenguo and Kong, Lingpeng},
journal={arXiv preprint arXiv:2508.15487},
year={2025}
}