LLaDA CoreML Diffusion Loop Examples
These scripts show how to run llada_8b_instruct_seq192.mlpackage in an iterative diffusion loop (not single-pass argmax).
Files
llada_generate.py: text prompt -> tokenize -> run diffusion loop -> decode outputllada_diffuse.swift: CoreML denoising loop runner (called by Python wrapper)
Prerequisites
- macOS with Xcode command line tools (
xcrun,swift) - Python 3.10+
- Hugging Face access for tokenizer (
GSAI-ML/LLaDA-8B-Instruct)
Install Python deps:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install transformers sentencepiece jinja2
Quick run
From the repo root (where llada_8b_instruct_seq192.mlpackage exists):
source .venv/bin/activate
python examples/llada_generate.py "Write one short sentence about the moon." --max-new-tokens 48 --steps 32
Notes
- Uses
<|mdm_mask|>as the diffusion mask token. --stepsand--max-new-tokensare the main quality/speed knobs.- Model is loaded through CoreML (
MLModel), and.mlpackageis auto-compiled at runtime.