# LLaDA CoreML Diffusion Loop Examples

These scripts show how to run `llada_8b_instruct_seq192.mlpackage` in an iterative diffusion loop (not single-pass argmax).

## Files

- `llada_generate.py`: text prompt -> tokenize -> run diffusion loop -> decode output
- `llada_diffuse.swift`: CoreML denoising loop runner (called by Python wrapper)

## Prerequisites

- macOS with Xcode command line tools (`xcrun`, `swift`)
- Python 3.10+
- Hugging Face access for tokenizer (`GSAI-ML/LLaDA-8B-Instruct`)

Install Python deps:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install transformers sentencepiece jinja2
```

## Quick run

From the repo root (where `llada_8b_instruct_seq192.mlpackage` exists):

```bash
source .venv/bin/activate
python examples/llada_generate.py "Write one short sentence about the moon." --max-new-tokens 48 --steps 32
```

## Notes

- Uses `<|mdm_mask|>` as the diffusion mask token.
- `--steps` and `--max-new-tokens` are the main quality/speed knobs.
- Model is loaded through CoreML (`MLModel`), and `.mlpackage` is auto-compiled at runtime.