Neural CTMC: Discrete Diffusion via Decoupled Jump Timing and Direction

This repository contains the inference checkpoint and demo code for Neural CTMC, a discrete diffusion model based on continuous-time Markov chains (CTMCs). Unlike prior methods that parameterize the reverse rate matrix as a monolithic object, Neural CTMC separately parameterizes the exit rate (when to jump) and the jump distribution (where to jump) via two dedicated network heads, aligning the parameterization with the intrinsic CTMC decomposition.

This checkpoint is trained on OpenWebText with a uniform forward process and is, to our knowledge, the first open-source checkpoint for a uniform-noise discrete diffusion language model.

Model Details

Property	Value
Architecture	DiT (Diffusion Transformer)
Parameters	~170M
Transformer Blocks	12
Attention Heads	12
Hidden Dimension	768
Time-Conditioning Dimension	128
Vocabulary Size	50,257 (GPT-2 BPE tokenizer)
Vocabulary Embedding	50,304 (padded to nearest multiple of 128)
Max Sequence Length	512
Precision	float32 (trained with bf16 mixed precision)
Checkpoint Format	TorchScript (traced)
Forward Process	Uniform ($\alpha_t = 1 - t$, $\beta_t = t$)
Training Data	OpenWebText (262B tokens)

Performance

Generative perplexity (scored by Gemma-2, lower is better) on OpenWebText:

Method	Training Tokens	16 steps	32 steps	64 steps	128 steps
MDLM	262B	1432.8	553.7	301.6	210.5
GIDD	262B	702.0	398.9	270.8	249.8
SEDD	682B	614.3	262.7	182.1	178.3
Neural CTMC -- Euler (ours)	262B	578.3	264.5	189.7	183.6
Neural CTMC -- $\tau$-leaping (ours)	262B	584.5	258.8	199.9	184.8

Neural CTMC achieves the best generative perplexity among equal-budget (262B) methods across all step counts, and remains competitive with SEDD despite using 2.6x fewer training tokens.

Usage

Requirements

pip install torch transformers

Quick Start

from demo_infer import CTMCHFModel

model = CTMCHFModel.from_pretrained(
    "owt_uniform.pt",
    device="cuda",
    tokenizer_name="gpt2",
)

texts = model.generate(
    n_samples=3,   # number of samples to generate
    n_steps=128,   # Euler discretization steps
    T=1.0,         # diffusion time horizon
)

for i, text in enumerate(texts):
    print(f"[Sample {i+1}]")
    print(text)

Command Line

# Generate 5 samples with 128 Euler steps on GPU 0
GPU=0 bash run.sh

You can also call the script directly:

python demo_infer.py \
    --checkpoint owt_uniform.pt \
    --n_samples 5 \
    --n_steps 128 \
    --T 1.0 \
    --device cuda \
    --output output/samples.txt

How It Works

The model generates text through reverse diffusion over discrete token sequences using the Euler sampler:

Initialize a sequence of uniformly random tokens of length 512.
Iteratively denoise for n_steps Euler steps: at each step, the model predicts per-token exit rates $\lambda^\theta_t$ and a jump distribution $r^\theta_t$ over the vocabulary, then stochastically updates tokens via the CTMC reverse process.
Decode the final token sequence using the GPT-2 tokenizer.

The key insight is that the ELBO decomposes into a Poisson KL for jump timing and a categorical KL for jump direction, enabling the model to learn these two aspects with separate heads.

File Structure

.
├── README.md            # This file
├── owt_uniform.pt       # Model checkpoint (~969 MB)
├── demo_infer.py        # Inference script with CTMCHFModel class
└── run.sh               # Convenience launch script

Citation

If you find this model useful, please cite our work:

@article{li2025neuralctmc,
  title={Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction},
  author={Jingyuan Li and Xiaoyi Jiang and Fukang Wen and Wei Liu and Renqian Luo and Yi Zhu and Zuoqiang Shi and Pipi Hu},
  year={2025}
}

License

This project is licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track