|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- diffusion |
|
|
- discrete-flow-matching |
|
|
- flow-matching |
|
|
- ctmc |
|
|
- text-generation |
|
|
- language-modeling |
|
|
- pytorch |
|
|
library_name: pytorch |
|
|
pipeline_tag: text-generation |
|
|
license: other |
|
|
--- |
|
|
|
|
|
# FS-DFM (Few-Step Discrete Flow-Matching) |
|
|
|
|
|
**FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Model** |
|
|
Amin Karimi Monsefi, Nikhil Bhendawade, Manuel R. Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova (Jan 9, 2026) |
|
|
ArXiv: 2509.20624 |
|
|
|
|
|
[Github Link](https://github.com/apple/ml-fs-dfm/tree/main) |
|
|
[Paper Link](https://arxiv.org/abs/2509.20624) |
|
|
|
|
|
FS-DFM is a **token-space diffusion / flow-matching language model** designed for **fast long-text generation** by explicitly training for a **user-specified step budget** (e.g., 1–8 steps), while preserving a CTMC-based discrete flow formulation. |
|
|
|
|
|
## What’s in this repo |
|
|
|
|
|
### Checkpoint files |
|
|
- [`FS_DFM_checkpoint.pth`](FS_DFM_checkpoint.pth) — **FS-DFM 1.3B**, uniform source, **RK4 teacher distilled** |
|
|
- [`DFM_checkpoint.pth`](DFM_checkpoint.pth) — **DFM 1.3B**, uniform source, DFM pretrained initialization |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Model summary |
|
|
|
|
|
**Core idea (high level):** |
|
|
- Condition the model on a **target inference step size/budget** and train it so that **one big step matches many small steps**. |
|
|
- Use a **cumulative scalar** update to make large steps stable on the probability simplex. |
|
|
- Use **student–teacher distillation** (Runge–Kutta shortcut teachers, EMA stabilization) to improve few-step fidelity. |
|
|
|
|
|
**Formulation:** discrete flow-matching over a **CTMC** on token sequences; sampling uses custom solvers (e.g., `mixture_euler_with_cumulative_scalar`). |
|
|
|
|
|
|
|
|
## Comparison of Methods |
|
|
|
|
|
| ARM | DFM | FS-DFM (Ours) | |
|
|
|-----|-----|---------------| |
|
|
|  |  |  | |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## Architecture |
|
|
|
|
|
From the paper’s implementation details: |
|
|
- Backbone is a **DiT-style transformer** with **rotary attention** |
|
|
- **Adaptive LayerNorm conditioning** in each block |
|
|
- Conditioning includes **continuous time embedding** + **step-size embedding** |
|
|
- Final linear head produces logits; conversion from logits to a CTMC generator + stepping happens in the solver |
|
|
|
|
|
Tokenizer: **GPT-2 tokenizer** |
|
|
Training/eval packing: documents packed into **1024-token** blocks (EOS appended, then packed/concatenated). |
|
|
|
|
|
--- |
|
|
|
|
|
## Training data & evaluation data |
|
|
|
|
|
- Training: **FineWeb-Edu** |
|
|
- Evaluation: **WikiText-103** |
|
|
|
|
|
(See the paper for details and the exact preprocessing pipeline.) |
|
|
|
|
|
--- |
|
|
|
|
|
## How to use |
|
|
|
|
|
FS-DFM uses custom discrete solvers and is not a drop-in `transformers` model. The intended usage is via the official training/evaluation scripts. |
|
|
|
|
|
> PLEASE SEE [OUR OFFICIAL GITHUB](https://github.com/apple/ml-fs-dfm/tree/main) |
|
|
|
|
|
### 1) Install the official code |
|
|
```bash |
|
|
git clone https://github.com/apple/ml-fs-dfm |
|
|
cd ml-fs-dfm |
|
|
|
|
|
conda env create -f fsdfm_environment.yml |
|
|
conda activate FSDFM |
|
|
|
|
|
pip install -e . |