--- language: - en tags: - diffusion - discrete-flow-matching - flow-matching - ctmc - text-generation - language-modeling - pytorch library_name: pytorch pipeline_tag: text-generation license: other --- # FS-DFM (Few-Step Discrete Flow-Matching) **FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Model** Amin Karimi Monsefi, Nikhil Bhendawade, Manuel R. Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova (Jan 9, 2026) ArXiv: 2509.20624 [Github Link](https://github.com/apple/ml-fs-dfm/tree/main) [Paper Link](https://arxiv.org/abs/2509.20624) FS-DFM is a **token-space diffusion / flow-matching language model** designed for **fast long-text generation** by explicitly training for a **user-specified step budget** (e.g., 1–8 steps), while preserving a CTMC-based discrete flow formulation. ## What’s in this repo ### Checkpoint files - [`FS_DFM_checkpoint.pth`](FS_DFM_checkpoint.pth) — **FS-DFM 1.3B**, uniform source, **RK4 teacher distilled** - [`DFM_checkpoint.pth`](DFM_checkpoint.pth) — **DFM 1.3B**, uniform source, DFM pretrained initialization --- ## Model summary **Core idea (high level):** - Condition the model on a **target inference step size/budget** and train it so that **one big step matches many small steps**. - Use a **cumulative scalar** update to make large steps stable on the probability simplex. - Use **student–teacher distillation** (Runge–Kutta shortcut teachers, EMA stabilization) to improve few-step fidelity. **Formulation:** discrete flow-matching over a **CTMC** on token sequences; sampling uses custom solvers (e.g., `mixture_euler_with_cumulative_scalar`). ## Comparison of Methods | ARM | DFM | FS-DFM (Ours) | |-----|-----|---------------| | ![ARM](arm.gif) | ![DFM](dfm.gif) | ![FS-DFM](fs_dfm.gif) | --- ## Architecture From the paper’s implementation details: - Backbone is a **DiT-style transformer** with **rotary attention** - **Adaptive LayerNorm conditioning** in each block - Conditioning includes **continuous time embedding** + **step-size embedding** - Final linear head produces logits; conversion from logits to a CTMC generator + stepping happens in the solver Tokenizer: **GPT-2 tokenizer** Training/eval packing: documents packed into **1024-token** blocks (EOS appended, then packed/concatenated). --- ## Training data & evaluation data - Training: **FineWeb-Edu** - Evaluation: **WikiText-103** (See the paper for details and the exact preprocessing pipeline.) --- ## How to use FS-DFM uses custom discrete solvers and is not a drop-in `transformers` model. The intended usage is via the official training/evaluation scripts. > PLEASE SEE [OUR OFFICIAL GITHUB](https://github.com/apple/ml-fs-dfm/tree/main) ### 1) Install the official code ```bash git clone https://github.com/apple/ml-fs-dfm cd ml-fs-dfm conda env create -f fsdfm_environment.yml conda activate FSDFM pip install -e .