Stable Diffusion v1.5 converted to LiteRT

This repository contains a LiteRT/TFLite export of the Hugging Face model stable-diffusion-v1-5/stable-diffusion-v1-5.

Base variants

fp32/: reference float export used by android-gpu and ios-coreml
int8/: mixed bundle with fp32 text encoder fallback, PT2E dynamic int8 UNet, and fp32 VAE fallback

Deployment profiles

android-qnn-npu: LiteRT Qualcomm AI Engine Direct (QNN) (android, preferred accelerator=NPU)
android-gpu: LiteRT GPU delegate (android, preferred accelerator=GPU)
android-cpu: LiteRT CPU/XNNPACK (android, preferred accelerator=CPU)
ios-coreml: LiteRT Core ML delegate (ios, preferred accelerator=CORE_ML)

Profiles are emitted in conversion_manifest.json as manifest-level mappings onto the exported base variants. This avoids duplicating large model binaries while still letting each runtime pick backend-specific artifacts.

Files per exported base variant

text_encoder.tflite
unet.tflite
vae_decoder.tflite

Shared assets

tokenizer/
scheduler/
configs/
configs/text_encoder_runtime_config.json
conversion_manifest.json

Notes

Stable Diffusion v1.5 is a multi-stage pipeline, so this export is split into submodels.
The notebook first tries to export the text encoder with INT32 token ids for better GPU/Core ML delegate compatibility and records the actual exported input dtype per variant and per deployment profile.
The fp32 bundle is optional debug output; on CPU runtimes it is skipped by default to avoid kernel deaths during fp32 UNet conversion.
android-qnn-npu is a LiteRT/QNN-oriented deployment profile, not a Qualcomm AOT context binary.
Both exported base variants are smoke-tested by reloading the serialized LiteRT models and executing inference.
The preview images in preview/ are decoder smoke tests, not final text-to-image samples.

Downloads last month: 209

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support