Neural CTMC: Discrete Diffusion via Decoupled Jump Timing and Direction

This repository contains the inference checkpoint and demo code for Neural CTMC, a discrete diffusion model based on continuous-time Markov chains (CTMCs). Unlike prior methods that parameterize the reverse rate matrix as a monolithic object, Neural CTMC separately parameterizes the exit rate (when to jump) and the jump distribution (where to jump) via two dedicated network heads, aligning the parameterization with the intrinsic CTMC decomposition.

This checkpoint is trained on OpenWebText with a uniform forward process and is, to our knowledge, the first open-source checkpoint for a uniform-noise discrete diffusion language model.

Model Details

Property Value
Architecture DiT (Diffusion Transformer)
Parameters ~170M
Transformer Blocks 12
Attention Heads 12
Hidden Dimension 768
Time-Conditioning Dimension 128
Vocabulary Size 50,257 (GPT-2 BPE tokenizer)
Vocabulary Embedding 50,304 (padded to nearest multiple of 128)
Max Sequence Length 512
Precision float32 (trained with bf16 mixed precision)
Checkpoint Format TorchScript (traced)
Forward Process Uniform ($\alpha_t = 1 - t$, $\beta_t = t$)
Training Data OpenWebText (262B tokens)

Performance

Generative perplexity (scored by Gemma-2, lower is better) on OpenWebText:

Method Training Tokens 16 steps 32 steps 64 steps 128 steps
MDLM 262B 1432.8 553.7 301.6 210.5
GIDD 262B 702.0 398.9 270.8 249.8
SEDD 682B 614.3 262.7 182.1 178.3
Neural CTMC -- Euler (ours) 262B 578.3 264.5 189.7 183.6
Neural CTMC -- $\tau$-leaping (ours) 262B 584.5 258.8 199.9 184.8

Neural CTMC achieves the best generative perplexity among equal-budget (262B) methods across all step counts, and remains competitive with SEDD despite using 2.6x fewer training tokens.

Usage

Requirements

pip install torch transformers

Quick Start

from demo_infer import CTMCHFModel

model = CTMCHFModel.from_pretrained(
    "owt_uniform.pt",
    device="cuda",
    tokenizer_name="gpt2",
)

texts = model.generate(
    n_samples=3,   # number of samples to generate
    n_steps=128,   # Euler discretization steps
    T=1.0,         # diffusion time horizon
)

for i, text in enumerate(texts):
    print(f"[Sample {i+1}]")
    print(text)

Command Line

# Generate 5 samples with 128 Euler steps on GPU 0
GPU=0 bash run.sh

You can also call the script directly:

python demo_infer.py \
    --checkpoint owt_uniform.pt \
    --n_samples 5 \
    --n_steps 128 \
    --T 1.0 \
    --device cuda \
    --output output/samples.txt

How It Works

The model generates text through reverse diffusion over discrete token sequences using the Euler sampler:

  1. Initialize a sequence of uniformly random tokens of length 512.
  2. Iteratively denoise for n_steps Euler steps: at each step, the model predicts per-token exit rates $\lambda^\theta_t$ and a jump distribution $r^\theta_t$ over the vocabulary, then stochastically updates tokens via the CTMC reverse process.
  3. Decode the final token sequence using the GPT-2 tokenizer.

The key insight is that the ELBO decomposes into a Poisson KL for jump timing and a categorical KL for jump direction, enabling the model to learn these two aspects with separate heads.

File Structure

.
β”œβ”€β”€ README.md            # This file
β”œβ”€β”€ owt_uniform.pt       # Model checkpoint (~969 MB)
β”œβ”€β”€ demo_infer.py        # Inference script with CTMCHFModel class
└── run.sh               # Convenience launch script

Citation

If you find this model useful, please cite our work:

@article{li2025neuralctmc,
  title={Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction},
  author={Jingyuan Li and Xiaoyi Jiang and Fukang Wen and Wei Liu and Renqian Luo and Yi Zhu and Zuoqiang Shi and Pipi Hu},
  year={2025}
}

License

This project is licensed under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support