| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | library_name: transformers |
| | tags: |
| | - diffusion |
| | - code-generation |
| | - discrete-diffusion |
| | - bidirectional |
| | - text-generation |
| | pipeline_tag: text-generation |
| | model-index: |
| | - name: adhd-diffusion |
| | results: [] |
| | --- |
| | |
| | # adhd-diffusion |
| |
|
| | A discrete diffusion language model for code generation, based on the CoDA (Coding LM via Diffusion Adaptation) architecture. |
| |
|
| | > ⚠️ **Note:** This is an intermediate checkpoint (step 12,000) from an interrupted training run. The model may not be fully trained. |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | **Architecture** | DiffusionQwen3 (Bidirectional Transformer) | |
| | | **Base Model** | Qwen-based architecture | |
| | | **Hidden Size** | 1536 | |
| | | **Layers** | 28 | |
| | | **Attention Heads** | 12 | |
| | | **KV Heads** | 2 (GQA) | |
| | | **Intermediate Size** | 8960 | |
| | | **Max Position Embeddings** | 32,768 | |
| | | **Vocab Size** | 151,666 | |
| | | **Training Checkpoint** | 12,000 steps | |
| |
|
| | ## How Diffusion LMs Work |
| |
|
| | Unlike autoregressive models that generate tokens left-to-right, this model uses **discrete diffusion**: |
| |
|
| | 1. Start with all `<mask>` tokens in the generation region |
| | 2. Iteratively unmask tokens based on model confidence |
| | 3. Higher-confidence predictions are revealed first |
| | 4. Process repeats until all tokens are generated |
| |
|
| | This enables **bidirectional context** during generation, potentially improving coherence for code. |
| |
|
| | ## Usage |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | pip install torch transformers |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoTokenizer |
| | |
| | # Load tokenizer |
| | tokenizer = AutoTokenizer.from_pretrained("shouryamaanjain/adhd-diffusion", trust_remote_code=True) |
| | |
| | # Load model (see inference.py for full diffusion generation logic) |
| | # The model uses custom DiffusionQwen3Model class |
| | ``` |
| |
|
| | For full inference with diffusion sampling, use the included `inference.py` script: |
| |
|
| | ```bash |
| | # Single prompt |
| | python inference.py --checkpoint /path/to/model --prompt "def fibonacci(n):" |
| | |
| | # Interactive chat |
| | python inference.py --checkpoint /path/to/model --mode chat |
| | |
| | # With custom parameters |
| | python inference.py --checkpoint /path/to/model \ |
| | --prompt "Write a function to sort a list" \ |
| | --steps 128 \ |
| | --temperature 0.0 \ |
| | --max-tokens 256 \ |
| | --alg entropy |
| | ``` |
| |
|
| | ### Generation Parameters |
| |
|
| | | Parameter | Default | Description | |
| | |-----------|---------|-------------| |
| | | `steps` | 128 | Number of diffusion denoising steps | |
| | | `temperature` | 0.0 | Sampling temperature (0 = greedy) | |
| | | `top_p` | None | Nucleus sampling threshold | |
| | | `top_k` | None | Top-k sampling | |
| | | `alg` | entropy | Sampling algorithm: `origin`, `entropy`, `maskgit_plus`, `topk_margin` | |
| | | `alg_temp` | 0.1 | Algorithm-specific confidence temperature | |
| |
|
| | ## Model Architecture |
| |
|
| | The model is a bidirectional transformer (non-causal attention) trained with discrete diffusion objectives: |
| |
|
| | ``` |
| | DiffusionQwen3Model( |
| | (model): Qwen2Model with bidirectional attention |
| | (lm_head): Linear(1536, 151666) |
| | ) |
| | ``` |
| |
|
| | ### Training Objective |
| |
|
| | - **Forward process:** Randomly mask tokens with probability `σ ~ U[ε, 1]` |
| | - **Reverse process:** Predict original tokens from masked input |
| | - **Loss weighting:** `1/σ` (ELBO-derived) |
| | |
| |
|
| | ## Files |
| |
|
| | - `pytorch_model.bin` - Model weights |
| | - `config.json` - Model configuration |
| | - `tokenizer.json`, `vocab.json`, `merges.txt` - Tokenizer files |
| | - `inference.py` - Standalone inference script |
| | - `modeling_diffusion_qwen3.py` - Model class definition |
| |
|
| | ## Citation |
| |
|
| | Based on CoDA by Salesforce AI Research: |
| |
|
| | ```bibtex |
| | @article{coda2024, |
| | title={CoDA: Coding LM via Diffusion Adaptation}, |
| | author={Salesforce AI Research}, |
| | journal={arXiv preprint}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | Please refer to the base Qwen model license for usage terms. |
| |
|
| |
|