Neural CTMC: Discrete Diffusion via Decoupled Jump Timing and Direction
This repository contains the inference checkpoint and demo code for Neural CTMC, a discrete diffusion model based on continuous-time Markov chains (CTMCs). Unlike prior methods that parameterize the reverse rate matrix as a monolithic object, Neural CTMC separately parameterizes the exit rate (when to jump) and the jump distribution (where to jump) via two dedicated network heads, aligning the parameterization with the intrinsic CTMC decomposition.
This checkpoint is trained on OpenWebText with a uniform forward process and is, to our knowledge, the first open-source checkpoint for a uniform-noise discrete diffusion language model.
Model Details
| Property | Value |
|---|---|
| Architecture | DiT (Diffusion Transformer) |
| Parameters | ~170M |
| Transformer Blocks | 12 |
| Attention Heads | 12 |
| Hidden Dimension | 768 |
| Time-Conditioning Dimension | 128 |
| Vocabulary Size | 50,257 (GPT-2 BPE tokenizer) |
| Vocabulary Embedding | 50,304 (padded to nearest multiple of 128) |
| Max Sequence Length | 512 |
| Precision | float32 (trained with bf16 mixed precision) |
| Checkpoint Format | TorchScript (traced) |
| Forward Process | Uniform ($\alpha_t = 1 - t$, $\beta_t = t$) |
| Training Data | OpenWebText (262B tokens) |
Performance
Generative perplexity (scored by Gemma-2, lower is better) on OpenWebText:
| Method | Training Tokens | 16 steps | 32 steps | 64 steps | 128 steps |
|---|---|---|---|---|---|
| MDLM | 262B | 1432.8 | 553.7 | 301.6 | 210.5 |
| GIDD | 262B | 702.0 | 398.9 | 270.8 | 249.8 |
| SEDD | 682B | 614.3 | 262.7 | 182.1 | 178.3 |
| Neural CTMC -- Euler (ours) | 262B | 578.3 | 264.5 | 189.7 | 183.6 |
| Neural CTMC -- $\tau$-leaping (ours) | 262B | 584.5 | 258.8 | 199.9 | 184.8 |
Neural CTMC achieves the best generative perplexity among equal-budget (262B) methods across all step counts, and remains competitive with SEDD despite using 2.6x fewer training tokens.
Usage
Requirements
pip install torch transformers
Quick Start
from demo_infer import CTMCHFModel
model = CTMCHFModel.from_pretrained(
"owt_uniform.pt",
device="cuda",
tokenizer_name="gpt2",
)
texts = model.generate(
n_samples=3, # number of samples to generate
n_steps=128, # Euler discretization steps
T=1.0, # diffusion time horizon
)
for i, text in enumerate(texts):
print(f"[Sample {i+1}]")
print(text)
Command Line
# Generate 5 samples with 128 Euler steps on GPU 0
GPU=0 bash run.sh
You can also call the script directly:
python demo_infer.py \
--checkpoint owt_uniform.pt \
--n_samples 5 \
--n_steps 128 \
--T 1.0 \
--device cuda \
--output output/samples.txt
How It Works
The model generates text through reverse diffusion over discrete token sequences using the Euler sampler:
- Initialize a sequence of uniformly random tokens of length 512.
- Iteratively denoise for
n_stepsEuler steps: at each step, the model predicts per-token exit rates $\lambda^\theta_t$ and a jump distribution $r^\theta_t$ over the vocabulary, then stochastically updates tokens via the CTMC reverse process. - Decode the final token sequence using the GPT-2 tokenizer.
The key insight is that the ELBO decomposes into a Poisson KL for jump timing and a categorical KL for jump direction, enabling the model to learn these two aspects with separate heads.
File Structure
.
βββ README.md # This file
βββ owt_uniform.pt # Model checkpoint (~969 MB)
βββ demo_infer.py # Inference script with CTMCHFModel class
βββ run.sh # Convenience launch script
Citation
If you find this model useful, please cite our work:
@article{li2025neuralctmc,
title={Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction},
author={Jingyuan Li and Xiaoyi Jiang and Fukang Wen and Wei Liu and Renqian Luo and Yi Zhu and Zuoqiang Shi and Pipi Hu},
year={2025}
}
License
This project is licensed under the MIT License.