adhd-diffusion / README.md
shouryamaanjain's picture
Update README.md
da6fd6b verified
---
license: apache-2.0
language:
- en
library_name: transformers
tags:
- diffusion
- code-generation
- discrete-diffusion
- bidirectional
- text-generation
pipeline_tag: text-generation
model-index:
- name: adhd-diffusion
results: []
---
# adhd-diffusion
A discrete diffusion language model for code generation, based on the CoDA (Coding LM via Diffusion Adaptation) architecture.
> ⚠️ **Note:** This is an intermediate checkpoint (step 12,000) from an interrupted training run. The model may not be fully trained.
## Model Details
| Property | Value |
|----------|-------|
| **Architecture** | DiffusionQwen3 (Bidirectional Transformer) |
| **Base Model** | Qwen-based architecture |
| **Hidden Size** | 1536 |
| **Layers** | 28 |
| **Attention Heads** | 12 |
| **KV Heads** | 2 (GQA) |
| **Intermediate Size** | 8960 |
| **Max Position Embeddings** | 32,768 |
| **Vocab Size** | 151,666 |
| **Training Checkpoint** | 12,000 steps |
## How Diffusion LMs Work
Unlike autoregressive models that generate tokens left-to-right, this model uses **discrete diffusion**:
1. Start with all `<mask>` tokens in the generation region
2. Iteratively unmask tokens based on model confidence
3. Higher-confidence predictions are revealed first
4. Process repeats until all tokens are generated
This enables **bidirectional context** during generation, potentially improving coherence for code.
## Usage
### Installation
```bash
pip install torch transformers
```
### Inference
```python
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("shouryamaanjain/adhd-diffusion", trust_remote_code=True)
# Load model (see inference.py for full diffusion generation logic)
# The model uses custom DiffusionQwen3Model class
```
For full inference with diffusion sampling, use the included `inference.py` script:
```bash
# Single prompt
python inference.py --checkpoint /path/to/model --prompt "def fibonacci(n):"
# Interactive chat
python inference.py --checkpoint /path/to/model --mode chat
# With custom parameters
python inference.py --checkpoint /path/to/model \
--prompt "Write a function to sort a list" \
--steps 128 \
--temperature 0.0 \
--max-tokens 256 \
--alg entropy
```
### Generation Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `steps` | 128 | Number of diffusion denoising steps |
| `temperature` | 0.0 | Sampling temperature (0 = greedy) |
| `top_p` | None | Nucleus sampling threshold |
| `top_k` | None | Top-k sampling |
| `alg` | entropy | Sampling algorithm: `origin`, `entropy`, `maskgit_plus`, `topk_margin` |
| `alg_temp` | 0.1 | Algorithm-specific confidence temperature |
## Model Architecture
The model is a bidirectional transformer (non-causal attention) trained with discrete diffusion objectives:
```
DiffusionQwen3Model(
(model): Qwen2Model with bidirectional attention
(lm_head): Linear(1536, 151666)
)
```
### Training Objective
- **Forward process:** Randomly mask tokens with probability `σ ~ U[ε, 1]`
- **Reverse process:** Predict original tokens from masked input
- **Loss weighting:** `1/σ` (ELBO-derived)
## Files
- `pytorch_model.bin` - Model weights
- `config.json` - Model configuration
- `tokenizer.json`, `vocab.json`, `merges.txt` - Tokenizer files
- `inference.py` - Standalone inference script
- `modeling_diffusion_qwen3.py` - Model class definition
## Citation
Based on CoDA by Salesforce AI Research:
```bibtex
@article{coda2024,
title={CoDA: Coding LM via Diffusion Adaptation},
author={Salesforce AI Research},
journal={arXiv preprint},
year={2024}
}
```
## License
Please refer to the base Qwen model license for usage terms.