aminr8 commited on
Commit
9ce66ce
·
verified ·
1 Parent(s): f172ae3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -3
README.md CHANGED
@@ -1,7 +1,90 @@
1
- # FS-DFM: FAST AND ACCURATE LONG TEXT GENERATION WITH FEW-STEP DIFFUSION LANGUAGE MODELS
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- A PyTorch implementation of FS-DFM with custom solvers for efficient text generation and discrete sequence modeling. This software project accompanies the research paper, [FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models](https://arxiv.org/abs/2509.20624) . [Github Repository](https://github.com/apple/ml-fs-dfm)
4
 
5
  ---
6
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - diffusion
6
+ - discrete-flow-matching
7
+ - flow-matching
8
+ - ctmc
9
+ - text-generation
10
+ - language-modeling
11
+ - pytorch
12
+ library_name: pytorch
13
+ pipeline_tag: text-generation
14
+ license: other
15
+ ---
16
+
17
+ # FS-DFM (Few-Step Discrete Flow-Matching)
18
+
19
+ This repository provides **FS-DFM checkpoints** from the paper:
20
+
21
+ **FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Model**
22
+ Amin Karimi Monsefi, Nikhil Bhendawade, Manuel R. Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova (Jan 9, 2026)
23
+ ArXiv: 2509.20624
24
+
25
+ FS-DFM is a **token-space diffusion / flow-matching language model** designed for **fast long-text generation** by explicitly training for a **user-specified step budget** (e.g., 1–8 steps), while preserving a CTMC-based discrete flow formulation.
26
+
27
+ ## What’s in this repo
28
+
29
+ ### Checkpoint files
30
+ - `FS_DFM_checkpoint.pth` — **FS-DFM 1.3B**, uniform source, **RK4 teacher distilled**
31
+ - `DFM_checkpoint.pth` — **DFM 1.3B**, uniform source, DFM pretrained initialization
32
 
 
33
 
34
  ---
35
+
36
+ ## Model summary
37
+
38
+ **Core idea (high level):**
39
+ - Condition the model on a **target inference step size/budget** and train it so that **one big step matches many small steps**.
40
+ - Use a **cumulative scalar** update to make large steps stable on the probability simplex.
41
+ - Use **student–teacher distillation** (Runge–Kutta shortcut teachers, EMA stabilization) to improve few-step fidelity.
42
+
43
+ **Formulation:** discrete flow-matching over a **CTMC** on token sequences; sampling uses custom solvers (e.g., `mixture_euler_with_cumulative_scalar`).
44
+
45
+ ---
46
+
47
+ ## Architecture
48
+
49
+ From the paper’s implementation details:
50
+ - Backbone is a **DiT-style transformer** with **rotary attention**
51
+ - **Adaptive LayerNorm conditioning** in each block
52
+ - Conditioning includes **continuous time embedding** + **step-size embedding**
53
+ - Final linear head produces logits; conversion from logits to a CTMC generator + stepping happens in the solver
54
+
55
+ Tokenizer: **GPT-2 tokenizer**
56
+ Training/eval packing: documents packed into **1024-token** blocks (EOS appended, then packed/concatenated).
57
+
58
  ---
59
+
60
+ ## Training data & evaluation data
61
+
62
+ - Training: **FineWeb-Edu**
63
+ - Evaluation: **WikiText-103**
64
+
65
+ (See the paper for details and the exact preprocessing pipeline.)
66
+
67
+ ---
68
+
69
+ ## Reported behavior (paper)
70
+
71
+ FS-DFM targets **long-horizon language modeling**. In the paper, **8-step sampling** is reported to reach **perplexity parity** with a **1024-step** discrete-flow baseline for **1024-token generation**, yielding up to **128× fewer model evaluations**.
72
+
73
+ > For exact numbers/plots and ablations, refer to the paper.
74
+
75
+ ---
76
+
77
+ ## How to use (recommended)
78
+
79
+ FS-DFM uses custom discrete solvers and is not a drop-in `transformers` model. The intended usage is via the official training/evaluation scripts.
80
+
81
+ ### 1) Install the official code
82
+ ```bash
83
+ git clone https://github.com/apple/ml-fs-dfm
84
+ cd ml-fs-dfm
85
+
86
+ conda env create -f fsdfm_environment.yml
87
+ conda activate FSDFM
88
+
89
+ pip install -e .
90
+