# LLaDA CoreML Diffusion Loop Examples These scripts show how to run `llada_8b_instruct_seq192.mlpackage` in an iterative diffusion loop (not single-pass argmax). ## Files - `llada_generate.py`: text prompt -> tokenize -> run diffusion loop -> decode output - `llada_diffuse.swift`: CoreML denoising loop runner (called by Python wrapper) ## Prerequisites - macOS with Xcode command line tools (`xcrun`, `swift`) - Python 3.10+ - Hugging Face access for tokenizer (`GSAI-ML/LLaDA-8B-Instruct`) Install Python deps: ```bash python3 -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip python -m pip install transformers sentencepiece jinja2 ``` ## Quick run From the repo root (where `llada_8b_instruct_seq192.mlpackage` exists): ```bash source .venv/bin/activate python examples/llada_generate.py "Write one short sentence about the moon." --max-new-tokens 48 --steps 32 ``` ## Notes - Uses `<|mdm_mask|>` as the diffusion mask token. - `--steps` and `--max-new-tokens` are the main quality/speed knobs. - Model is loaded through CoreML (`MLModel`), and `.mlpackage` is auto-compiled at runtime.