Update README.md

da6fd6b verified about 1 month ago

3.76 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	tags:
	- diffusion
	- code-generation
	- discrete-diffusion
	- bidirectional
	- text-generation
	pipeline_tag: text-generation
	model-index:
	- name: adhd-diffusion
	results: []
	---

	# adhd-diffusion

	A discrete diffusion language model for code generation, based on the CoDA (Coding LM via Diffusion Adaptation) architecture.

	> ⚠️ Note: This is an intermediate checkpoint (step 12,000) from an interrupted training run. The model may not be fully trained.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| DiffusionQwen3 (Bidirectional Transformer) \|
	\| Base Model \| Qwen-based architecture \|
	\| Hidden Size \| 1536 \|
	\| Layers \| 28 \|
	\| Attention Heads \| 12 \|
	\| KV Heads \| 2 (GQA) \|
	\| Intermediate Size \| 8960 \|
	\| Max Position Embeddings \| 32,768 \|
	\| Vocab Size \| 151,666 \|
	\| Training Checkpoint \| 12,000 steps \|

	## How Diffusion LMs Work

	Unlike autoregressive models that generate tokens left-to-right, this model uses discrete diffusion:

	1. Start with all `<mask>` tokens in the generation region
	2. Iteratively unmask tokens based on model confidence
	3. Higher-confidence predictions are revealed first
	4. Process repeats until all tokens are generated

	This enables bidirectional context during generation, potentially improving coherence for code.

	## Usage

	### Installation

	```bash
	pip install torch transformers
	```

	### Inference

	```python
	import torch
	from transformers import AutoTokenizer

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("shouryamaanjain/adhd-diffusion", trust_remote_code=True)

	# Load model (see inference.py for full diffusion generation logic)
	# The model uses custom DiffusionQwen3Model class
	```

	For full inference with diffusion sampling, use the included `inference.py` script:

	```bash
	# Single prompt
	python inference.py --checkpoint /path/to/model --prompt "def fibonacci(n):"

	# Interactive chat
	python inference.py --checkpoint /path/to/model --mode chat

	# With custom parameters
	python inference.py --checkpoint /path/to/model \
	--prompt "Write a function to sort a list" \
	--steps 128 \
	--temperature 0.0 \
	--max-tokens 256 \
	--alg entropy
	```

	### Generation Parameters

	\| Parameter \| Default \| Description \|
	\|-----------\|---------\|-------------\|
	\| `steps` \| 128 \| Number of diffusion denoising steps \|
	\| `temperature` \| 0.0 \| Sampling temperature (0 = greedy) \|
	\| `top_p` \| None \| Nucleus sampling threshold \|
	\| `top_k` \| None \| Top-k sampling \|
	\| `alg` \| entropy \| Sampling algorithm: `origin`, `entropy`, `maskgit_plus`, `topk_margin` \|
	\| `alg_temp` \| 0.1 \| Algorithm-specific confidence temperature \|

	## Model Architecture

	The model is a bidirectional transformer (non-causal attention) trained with discrete diffusion objectives:

	```
	DiffusionQwen3Model(
	(model): Qwen2Model with bidirectional attention
	(lm_head): Linear(1536, 151666)
	)
	```

	### Training Objective

	- Forward process: Randomly mask tokens with probability `σ ~ U[ε, 1]`
	- Reverse process: Predict original tokens from masked input
	- Loss weighting: `1/σ` (ELBO-derived)


	## Files

	- `pytorch_model.bin` - Model weights
	- `config.json` - Model configuration
	- `tokenizer.json`, `vocab.json`, `merges.txt` - Tokenizer files
	- `inference.py` - Standalone inference script
	- `modeling_diffusion_qwen3.py` - Model class definition

	## Citation

	Based on CoDA by Salesforce AI Research:

	```bibtex
	@article{coda2024,
	title={CoDA: Coding LM via Diffusion Adaptation},
	author={Salesforce AI Research},
	journal={arXiv preprint},
	year={2024}
	}
	```

	## License

	Please refer to the base Qwen model license for usage terms.