---
license: apache-2.0
language:
  - en
library_name: transformers
tags:
  - diffusion
  - code-generation
  - discrete-diffusion
  - bidirectional
  - text-generation
pipeline_tag: text-generation
model-index:
  - name: adhd-diffusion
    results: []
---

# adhd-diffusion

A discrete diffusion language model for code generation, based on the CoDA (Coding LM via Diffusion Adaptation) architecture.

> ⚠️ **Note:** This is an intermediate checkpoint (step 12,000) from an interrupted training run. The model may not be fully trained.

## Model Details

| Property | Value |
|----------|-------|
| **Architecture** | DiffusionQwen3 (Bidirectional Transformer) |
| **Base Model** | Qwen-based architecture |
| **Hidden Size** | 1536 |
| **Layers** | 28 |
| **Attention Heads** | 12 |
| **KV Heads** | 2 (GQA) |
| **Intermediate Size** | 8960 |
| **Max Position Embeddings** | 32,768 |
| **Vocab Size** | 151,666 |
| **Training Checkpoint** | 12,000 steps |

## How Diffusion LMs Work

Unlike autoregressive models that generate tokens left-to-right, this model uses **discrete diffusion**:

1. Start with all `<mask>` tokens in the generation region
2. Iteratively unmask tokens based on model confidence
3. Higher-confidence predictions are revealed first
4. Process repeats until all tokens are generated

This enables **bidirectional context** during generation, potentially improving coherence for code.

## Usage

### Installation

```bash
pip install torch transformers
```

### Inference

```python
import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("shouryamaanjain/adhd-diffusion", trust_remote_code=True)

# Load model (see inference.py for full diffusion generation logic)
# The model uses custom DiffusionQwen3Model class
```

For full inference with diffusion sampling, use the included `inference.py` script:

```bash
# Single prompt
python inference.py --checkpoint /path/to/model --prompt "def fibonacci(n):"

# Interactive chat
python inference.py --checkpoint /path/to/model --mode chat

# With custom parameters
python inference.py --checkpoint /path/to/model \
    --prompt "Write a function to sort a list" \
    --steps 128 \
    --temperature 0.0 \
    --max-tokens 256 \
    --alg entropy
```

### Generation Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `steps` | 128 | Number of diffusion denoising steps |
| `temperature` | 0.0 | Sampling temperature (0 = greedy) |
| `top_p` | None | Nucleus sampling threshold |
| `top_k` | None | Top-k sampling |
| `alg` | entropy | Sampling algorithm: `origin`, `entropy`, `maskgit_plus`, `topk_margin` |
| `alg_temp` | 0.1 | Algorithm-specific confidence temperature |

## Model Architecture

The model is a bidirectional transformer (non-causal attention) trained with discrete diffusion objectives:

```
DiffusionQwen3Model(
  (model): Qwen2Model with bidirectional attention
  (lm_head): Linear(1536, 151666)
)
```

### Training Objective

- **Forward process:** Randomly mask tokens with probability `σ ~ U[ε, 1]`
- **Reverse process:** Predict original tokens from masked input
- **Loss weighting:** `1/σ` (ELBO-derived)
  

## Files

- `pytorch_model.bin` - Model weights
- `config.json` - Model configuration
- `tokenizer.json`, `vocab.json`, `merges.txt` - Tokenizer files
- `inference.py` - Standalone inference script
- `modeling_diffusion_qwen3.py` - Model class definition

## Citation

Based on CoDA by Salesforce AI Research:

```bibtex
@article{coda2024,
  title={CoDA: Coding LM via Diffusion Adaptation},
  author={Salesforce AI Research},
  journal={arXiv preprint},
  year={2024}
}
```

## License

Please refer to the base Qwen model license for usage terms.