--- license: apache-2.0 language: - en library_name: transformers tags: - diffusion - code-generation - discrete-diffusion - bidirectional - text-generation pipeline_tag: text-generation model-index: - name: adhd-diffusion results: [] --- # adhd-diffusion A discrete diffusion language model for code generation, based on the CoDA (Coding LM via Diffusion Adaptation) architecture. > ⚠️ **Note:** This is an intermediate checkpoint (step 12,000) from an interrupted training run. The model may not be fully trained. ## Model Details | Property | Value | |----------|-------| | **Architecture** | DiffusionQwen3 (Bidirectional Transformer) | | **Base Model** | Qwen-based architecture | | **Hidden Size** | 1536 | | **Layers** | 28 | | **Attention Heads** | 12 | | **KV Heads** | 2 (GQA) | | **Intermediate Size** | 8960 | | **Max Position Embeddings** | 32,768 | | **Vocab Size** | 151,666 | | **Training Checkpoint** | 12,000 steps | ## How Diffusion LMs Work Unlike autoregressive models that generate tokens left-to-right, this model uses **discrete diffusion**: 1. Start with all `` tokens in the generation region 2. Iteratively unmask tokens based on model confidence 3. Higher-confidence predictions are revealed first 4. Process repeats until all tokens are generated This enables **bidirectional context** during generation, potentially improving coherence for code. ## Usage ### Installation ```bash pip install torch transformers ``` ### Inference ```python import torch from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("shouryamaanjain/adhd-diffusion", trust_remote_code=True) # Load model (see inference.py for full diffusion generation logic) # The model uses custom DiffusionQwen3Model class ``` For full inference with diffusion sampling, use the included `inference.py` script: ```bash # Single prompt python inference.py --checkpoint /path/to/model --prompt "def fibonacci(n):" # Interactive chat python inference.py --checkpoint /path/to/model --mode chat # With custom parameters python inference.py --checkpoint /path/to/model \ --prompt "Write a function to sort a list" \ --steps 128 \ --temperature 0.0 \ --max-tokens 256 \ --alg entropy ``` ### Generation Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `steps` | 128 | Number of diffusion denoising steps | | `temperature` | 0.0 | Sampling temperature (0 = greedy) | | `top_p` | None | Nucleus sampling threshold | | `top_k` | None | Top-k sampling | | `alg` | entropy | Sampling algorithm: `origin`, `entropy`, `maskgit_plus`, `topk_margin` | | `alg_temp` | 0.1 | Algorithm-specific confidence temperature | ## Model Architecture The model is a bidirectional transformer (non-causal attention) trained with discrete diffusion objectives: ``` DiffusionQwen3Model( (model): Qwen2Model with bidirectional attention (lm_head): Linear(1536, 151666) ) ``` ### Training Objective - **Forward process:** Randomly mask tokens with probability `σ ~ U[ε, 1]` - **Reverse process:** Predict original tokens from masked input - **Loss weighting:** `1/σ` (ELBO-derived) ## Files - `pytorch_model.bin` - Model weights - `config.json` - Model configuration - `tokenizer.json`, `vocab.json`, `merges.txt` - Tokenizer files - `inference.py` - Standalone inference script - `modeling_diffusion_qwen3.py` - Model class definition ## Citation Based on CoDA by Salesforce AI Research: ```bibtex @article{coda2024, title={CoDA: Coding LM via Diffusion Adaptation}, author={Salesforce AI Research}, journal={arXiv preprint}, year={2024} } ``` ## License Please refer to the base Qwen model license for usage terms.