| # LLaDA CoreML Diffusion Loop Examples | |
| These scripts show how to run `llada_8b_instruct_seq192.mlpackage` in an iterative diffusion loop (not single-pass argmax). | |
| ## Files | |
| - `llada_generate.py`: text prompt -> tokenize -> run diffusion loop -> decode output | |
| - `llada_diffuse.swift`: CoreML denoising loop runner (called by Python wrapper) | |
| ## Prerequisites | |
| - macOS with Xcode command line tools (`xcrun`, `swift`) | |
| - Python 3.10+ | |
| - Hugging Face access for tokenizer (`GSAI-ML/LLaDA-8B-Instruct`) | |
| Install Python deps: | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| python -m pip install --upgrade pip | |
| python -m pip install transformers sentencepiece jinja2 | |
| ``` | |
| ## Quick run | |
| From the repo root (where `llada_8b_instruct_seq192.mlpackage` exists): | |
| ```bash | |
| source .venv/bin/activate | |
| python examples/llada_generate.py "Write one short sentence about the moon." --max-new-tokens 48 --steps 32 | |
| ``` | |
| ## Notes | |
| - Uses `<|mdm_mask|>` as the diffusion mask token. | |
| - `--steps` and `--max-new-tokens` are the main quality/speed knobs. | |
| - Model is loaded through CoreML (`MLModel`), and `.mlpackage` is auto-compiled at runtime. | |