File size: 4,942 Bytes
be0e7db
 
 
 
 
 
 
 
 
ff1b4cf
be0e7db
ff1b4cf
be0e7db
ff1b4cf
be0e7db
ff1b4cf
be0e7db
ff1b4cf
be0e7db
ff1b4cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be0e7db
ff1b4cf
be0e7db
 
 
 
 
 
 
ff1b4cf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be0e7db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
base_model:
- TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
tags:
- rknn
- LCM
- stable-diffusion
---

# Stable Diffusion 1.5 Latent Consistency Model (LCM-SD) for RKNN2

Run the **Stable Diffusion 1.5 Latent Consistency Model (LCM-SD)** on **Rockchip RKNPU2 (RK3588)** using RKNN2.

This repository supports **command-line inference** and a **production-ready HTTP server** optimized specifically for **LCM-SD**.

---

## Performance (RK3588, single NPU core)

| Resolution | Text Encoder | U-Net (per step) | VAE Decoder |
|-----------:|-------------:|-----------------:|------------:|
| 384×384    | ~0.05s       | ~2.36s           | ~5.48s      |
| 512×512    | ~0.05s       | ~5.65s           | ~11–14s     |

> NOTE: VAE decode latency is a known RKNN limitation and is not caused by layout, server, or postprocessing overhead.

---

## LCM-SD Optimizations & Quirks (Specific to This Repo)

- Correct tensor layouts:
  - Text encoder: **NCHW**
  - U-Net: **NHWC**
  - VAE decoder: **NHWC**
- All RKNN runtime auto-conversion warnings eliminated
- One RKNN runtime context per worker (safe multi-context usage)
- Deterministic generation via explicit `numpy.RandomState(seed)`
- VAE decode slowness is a **known RKNN behavior** and unaffected by toolkit version

---

## Command-Line Usage (LCM-SD Only)

```bash
python ./run_rknn-lcm.py -i ./model -o ./images --num-inference-steps 4 -s 512x512 --prompt "Majestic mountain landscape with snow-capped peaks, autumn foliage in vibrant reds and oranges, a turquoise river winding through a valley, crisp and serene atmosphere, ultra-realistic style."
```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6319d0860d7478ae0069cd92/50jwBxv0Edf7x0WoHmpwi.png)

## LCM-SD HTTP Server

### Start the Server (Command Line)

```bash
export MODEL_ROOT=./model
export NUM_WORKERS=3
export PORT=4200

python lcm_server.py
```

The server listens on:

```bash
http://0.0.0.0:4200
```

## Server Endpoints (LCM-SD Only)

### POST /generate

Generate a PNG image using LCM-SD.

Request body (JSON):

```json
{
  "prompt": "a cinematic forest at sunrise",
  "size": "512x512",
  "num_inference_steps": 4,
  "guidance_scale": 1.0,
  "seed": 1234
}
```

Response:
  • HTTP 200
  • Content-Type: image/png
  • Binary PNG payload

### curl Example (LCM-SD Server Only)

```bash
curl -X POST http://node1.lan:4200/generate \
  -H "Content-Type: application/json" \
  -o output.png \
  -d '{
    "prompt": "a cinematic forest at sunrise",
    "size": "512x512",
    "num_inference_steps": 4,
    "guidance_scale": 1.0,
    "seed": 1234
  }'
```

## Docker Usage (LCM-SD Server)

### Build Image

```bash
docker build \
  -t rknn-lcm-sd .
```

### Run Container

```bash
docker run --rm -it \
  --device /dev/dri \
  --device /dev/rknpu \
  -v ./model:/models \
  -e MODEL_ROOT=/models \
  -e NUM_WORKERS=3 \
  -p 4200:4200 \
  rknn-lcm-sd
```

Additionally, a docker-compose.yml is provided.

## Model Conversion

### Install dependencies

```bash
pip install diffusers pillow numpy<2 rknn-toolkit2
```

### 1. Download the model

Download a Stable Diffusion 1.5 LCM model in ONNX format and place it in the `./model` directory.

```bash
huggingface-cli download TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
cp -r -L ~/.cache/huggingface/hub/models--TheyCallMeHex--LCM-Dreamshaper-V7-ONNX/snapshots/4029a217f9cdc0437f395738d3ab686bb910ceea ./model
```

In theory, you could also achieve LCM inference by merging the LCM Lora into a regular Stable Diffusion 1.5 model and then converting it to ONNX format. However, I'm not sure how to do this. If anyone knows, please feel free to submit a PR.

### 2. Convert the model

```bash
# Convert the model, 384x384 resolution
python ./convert-onnx-to-rknn.py -m ./model -r 384x384 
```

Note that the higher the resolution, the larger the model and the longer the conversion time. It's not recommended to use very high resolutions.

## Known Issues

1. ~~As of now, models converted using the latest version of rknn-toolkit2 (version 2.2.0) still suffer from severe precision loss, even when using fp16 data type. As shown in the image, the top is the result of inference using the ONNX model, and the bottom is the result using the RKNN model. All parameters are the same. Moreover, the higher the resolution, the more severe the precision loss. This is a bug in rknn-toolkit2.~~ (Fixed in v2.3.0)

2. Actually, the model conversion script can select multiple resolutions (e.g., "384x384,256x256"), but this causes the model conversion to fail. This is a bug in rknn-toolkit2.

## References

- [TheyCallMeHex/LCM-Dreamshaper-V7-ONNX](https://huggingface.co/TheyCallMeHex/LCM-Dreamshaper-V7-ONNX)
- [Optimum's LatentConsistencyPipeline](https://github.com/huggingface/optimum/blob/main/optimum/pipelines/diffusers/pipeline_latent_consistency.py)
- [happyme531/RK3588-stable-diffusion-GPU](https://github.com/happyme531/RK3588-stable-diffusion-GPU)