File size: 2,475 Bytes
7b25ead
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e1e707f
 
7b25ead
 
e1e707f
 
7b25ead
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e1e707f
 
7b25ead
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
license: openrail++
library_name: coreai
pipeline_tag: image-to-image
tags:
  - super-resolution
  - diffusion
  - core-ai
  - apple
  - on-device
  - adcsr
  - stable-diffusion
---

# AdcSR ×4 Super-Resolution — Core AI

On-device **×4 super-resolution** with **AdcSR** ([Adversarial Diffusion Compression](https://github.com/Guaishou74851/AdcSR),
CVPR 2025) converted for Apple's **Core AI** stack. AdcSR compresses the one-step diffusion model
[OSEDiff](https://github.com/cswry/OSEDiff) into a small **diffusion-GAN**: a pruned Stable
Diffusion 2.1 UNet + a half-size VAE decoder, run in **one forward pass** — no iterative denoising,
no prompt, no noise — so it is fast and small enough to run fully on-device, including iPhone.

## What it is

- **fp32, ~1.7 GB.** Output matches the torch reference (cosine 1.000012). fp32 because the
  pruned SD-2.1 UNet's attention/group-norm overflow in fp16 (NaN on smooth tiles).
- **Image → image, one step.** Input a low-resolution tile, get a 4× tile back. No text, no noise.
- **456 M parameters** (pruned SD-2.1 UNet + half VAE decoder).
- The graph outputs the **raw** SR; AdcSR's per-image color-match is applied host-side by
  `SuperResolver` after tiling (baking it per-tile blows up uniform tiles).

## I/O contract (per tile)

- **input:** `lr` `[1,3,128,128]` in `[-1,1]` (a low-resolution tile).
- **output:** `sr` `[1,3,512,512]` in `[-1,1]` (×4), with the reference's per-image color-match baked in.

## Usage (CoreAIKit)

```swift
import CoreAIKitVision

let sr = try await SuperResolver(model: .adcsrX4)   // downloads this repo on first use
let big = try await sr.upscale(cgImage)             // ×4; tiles any-size input + feather-blends
```

`SuperResolver` splits any-size input into overlapping 128-px LR windows, runs each, and blends
(and caps very large inputs so the result stays a reasonable size).

## License & attribution

- **AdcSR** (method + the pruning/training code): Apache-2.0 — Bingchen Li et al., *Adversarial
  Diffusion Compression for Real-World Image Super-Resolution*, CVPR 2025.
- **Weights** are derived from **Stable Diffusion 2.1** (via OSEDiff) and therefore carry the
  **CreativeML Open RAIL++-M** license — commercial use is permitted under its use-based
  restrictions, the same license under which Apple distributes Stable Diffusion for Core ML.

This Core AI conversion inherits both. See `LICENSE` (Apache-2.0, AdcSR) and the SD-2.1 OpenRAIL++-M
terms.