|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-diffusion |
|
|
- discrete-diffusion |
|
|
- pytorch |
|
|
- mdlm |
|
|
- seed-diffusion |
|
|
- generative-ai |
|
|
model_index: |
|
|
- name: diffusionGPT |
|
|
results: [] |
|
|
custom_pipelines: |
|
|
text-diffusion: |
|
|
impl: pipeline.TextDiffusionPipeline |
|
|
pt: |
|
|
- AutoModelForMaskedLM |
|
|
--- |
|
|
|
|
|
# diffusionGPT |
|
|
|
|
|
[**GitHub Repository**](https://github.com/JorgeVanco/diffusionGPT) | [**Model License: MIT**](https://opensource.org/licenses/MIT) |
|
|
|
|
|
DiffusionGPT is a **Discrete Diffusion Language Model (MDLM)** fine-tuned for conversational AI. Unlike traditional autoregressive models (like GPT-4 or Llama) that predict text one token at a time from left to right, DiffusionGPT generates text through an iterative denoising process. |
|
|
|
|
|
This approach allows for parallel decoding, flexible text infilling, and "Seed Diffusion" editing capabilities. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
* **Parallel Decoding:** Generates and refines tokens simultaneously across the sequence. |
|
|
* **Seed Diffusion Editing:** Implements advanced editing logic (per [arXiv:2508.02193](https://arxiv.org/pdf/2508.02193)) to refine existing text while maintaining context. |
|
|
* **Semi-Autoregressive Generation:** Supports block-wise generation for long-form content, combining the strengths of diffusion with the length-scaling of autoregression. |
|
|
* **Custom Pipeline:** Built-in support for `TextDiffusionPipeline` which handles the complex ancestral sampling and confidence-based unmasking automatically. |
|
|
|
|
|
--- |
|
|
|
|
|
## Quickstart |
|
|
|
|
|
To use this model, ensure you have the `pipeline.py` file from the repository in your local directory (Hugging Face will download it automatically if `trust_remote_code=True`). |
|
|
|
|
|
### 1. Basic Chat Completion |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
pipe = pipeline( |
|
|
"text-diffusion", |
|
|
model="JorgeVanco/diffusionGPT", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
messages = [{"role": "user", "content": "Explain diffusion models in simple terms."}] |
|
|
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
|
|
# Generate using standard diffusion |
|
|
result = pipe(prompt, num_steps=50) |
|
|
print(result["decoded_texts"][0]) |
|
|
``` |
|
|
|
|
|
### 2. Streaming Intermediate Denoising |
|
|
Watch the model "think" as it refines the text from masks to a final response. |
|
|
```python |
|
|
for partial_text in pipe.stream_generation(prompt, num_steps=32): |
|
|
print(f"\033[H\033[J{partial_text}") # Clears terminal for animation effect |
|
|
``` |
|
|
|
|
|
### 3. Block-wise (Semi-Autoregressive) Generation |
|
|
For longer responses that exceed the standard sequence length: |
|
|
```python |
|
|
response = pipe.stream_semi_autoregressive_generate( |
|
|
input_text=prompt, |
|
|
block_size=64, |
|
|
max_length=256, |
|
|
num_steps=32 |
|
|
) |
|
|
|
|
|
for step in response: |
|
|
print(step) |
|
|
``` |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Model Architecture |
|
|
The backbone is a Transformer Encoder (`AutoModelForMaskedLM`) configured for discrete diffusion. |
|
|
- **Training Objective:** Multi-step corruption and reconstruction (MDLM formulation). |
|
|
- **Corruption Strategy:** Uses a `DiscreteDiffusionCollator` which applies random masking and optional "Insertion Corruption" using a `<|delete|>` token. |
|
|
|
|
|
### Sampling Parameters |
|
|
In the `pipe()`, you can tune the generation using: |
|
|
- `num_steps`: Higher steps generally lead to higher quality but slower inference. |
|
|
- `use_confidence`: When `True`, the model uses confidence-based unmasking (Top-K) instead of random unmasking. |
|
|
- `allow_edits`: Enables Seed Diffusion logic to refine previously "visible" tokens (leave at `True` for better generation). |
|
|
|
|
|
## Training Setup |
|
|
The model was trained using the `DiffusionTrainer` class provided in the [source repository](https://github.com/JorgeVanco/diffusionGPT). |
|
|
### Hardware & Config: |
|
|
- **Optimizer:** AdamW with linear schedule. |
|
|
- **Loss:** Time-weighted Cross-Entropy (MDLM). |
|
|
- **Curriculum:** Includes a `SeedDiffusionCurriculumCallback` that introduces corruption stages gradually to improve model robustness. |
|
|
|
|
|
### Example Training Command: |
|
|
```bash |
|
|
uv run train.py \ |
|
|
--num_hidden_layers 12 \ |
|
|
--hidden_size 768 \ |
|
|
--num_diffusion_steps 32 \ |
|
|
--max_seq_length 128 \ |
|
|
--target_param_data_ratio 20 |
|
|
``` |
|
|
|
|
|
## ⚠️ Limitations & Bias |
|
|
- **Factual Accuracy:** Like all LLMs, this model can hallucinate. It is not optimized for factual retrieval. |
|
|
- **Coherence:** While excellent for short-to-medium chat, very long-range coherence is currently under development through the semi-autoregressive block method. |
|
|
- **Special Tokens:** The model relies on specific tokens like `<|im_start|>` and `<|im_end|>` for chat structure. |
|
|
|
|
|
## Citation & Acknowledgments |
|
|
This implementation is inspired by recent research in discrete diffusion for language: |
|
|
- **MDLM:** [Simple and Effective Masked Diffusion Language Models](https://s-sahoo.com/mdlm/) |
|
|
- **Seed Diffusion:** [Seed Diffusion: Continuous Training of Discrete Diffusion Language Models](https://seed.bytedance.com/en/seed_diffusion) |
|
|
|
|
|
## License |
|
|
This model and its associated code are relased under the **MIT License**. |