File size: 15,440 Bytes
074383a 79d7e98 074383a 79d7e98 074383a 79d7e98 85c8ade 79d7e98 85c8ade 79d7e98 074383a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
---
license: apache-2.0
datasets:
- eugenesiow/Div2k
language:
- en
tags:
- RyzenAI
- Super Resolution
- SISR
- SESR
- ONNX
---
# π SESR-S on AMD AI PC NPU
[Bhardwaj et al. (2022)](https://arxiv.org/abs/2103.09404) introduced the Super-Efficient Super Resolution (SESR) model to solve a classic computer vision problem: to take a low-resolution input image and output a high-resolution image. The SESR model is based on a "linear overparameterization of CNNs and creates an efficient model architecture for [Single Image Super Resolution (SISR)]." The official code can be found at their accompanying GitHub repository: https://github.com/ARM-software/sesr. One of the main ideas behind the model was to make it very computationally efficient.
This version of the model is the SESR-S (Small) version; it has been converted from PyTorch format to ONNX, and then quantized to INT8 to run on an AMD AI PC NPU with Ryzen AI software. The model in its current form natively accepts a 256x256 RGB image and outputs a 512x512 RGB image; however, alternate versions of the model could accept 1920x1080 and upscale to 3840x2160 (4K) or 7680x4320 (8K).
| Model Details | Description |
| ----------- | ----------- |
| Person or organization developing model | [Tong Shen (AMD)](https://rocm.blogs.amd.com/authors/tong-shen.html), [Benjamin Consolvo (AMD)](https://huggingface.co/bconsolvo) |
| Model date | January 9, 2026 |
| Model version | 1 |
| Model type | Super-Resolution (Image-to-Image) |
| Information about training algorithms, parameters, fairness constraints or other applied approaches, and features | The \\(\times2\\) SESR was trained for "300 epochs using ADAM optimizer with a constant learning rate of \\(5 \times 10^{-4}\\) and a batch size of 32 on DIV2K training set." And the \\(\times4\\) SESR model starts with the pretrained \\(\times2\\) SESR model and replaces "the final layer of \\(5 \times 5 \times f \times 4\\) with a \\(5 \times 5 \times f \times 16\\) and then perform[s] the depth-to-space operation twice" ([Bhardwaj et al., 2022](https://arxiv.org/abs/2103.09404)). For more training details, refer to the paper.|
| Paper or other resource for more information| [Bhardwaj, K., Milosavljevic, M., O'Neil, L., Gope, D., Matas, R., Chalfin, A., ... & Loh, D. (2022). Collapsible linear blocks for super-efficient super resolution. Proceedings of machine learning and systems, 4, 529-547](https://arxiv.org/abs/2103.09404) |
| License | [Apache 2.0](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) |
| Where to send questions or comments about the model | [Community Tab](https://huggingface.co/amd/sesr/discussions) and [AMD Developer Community Discord](https://discord.gg/amd-dev)|
## β‘ Intended Use
| Intended Use | Description |
| ----------- | ----------- |
| Primary intended uses | The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs. |
| Primary intended users | Anyone using or evaluating super-resolution models on AMD AI PCs. |
| Out-of-scope uses | This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right. |
### How to Use
#### π Hardware Prerequisites
Before getting started, make sure you meet the minimum hardware and OS requirements:
| Series | Codename | Abbreviation | Launch Year | Windows 11 | Linux |
|--------|----------|--------------|----------------|-------------|---------|
| Ryzen AI Max PRO 300 Series | Strix Halo | STX | 2025 | βοΈ | |
| Ryzen AI PRO 300 Series | Strix Point / Krackan Point | STX/KRK | 2025 | βοΈ | |
| Ryzen AI Max 300 Series | Strix Halo | STX | 2025 | βοΈ | |
| Ryzen AI 300 Series | Strix Point | STX | 2025 | βοΈ | |
| Ryzen Pro 200 Series | Hawk Point | HPT | 2025 | βοΈ | |
| Ryzen 200 Series | Hawk Point | HPT | 2025 | βοΈ | |
| Ryzen PRO 8000 Series | Hawk Point | HPT | 2024 | βοΈ | |
| Ryzen 8000 Series | Hawk Point | HPT | 2024 | βοΈ | |
| Ryzen Pro 7000 Series | Phoenix | PHX | 2023 | βοΈ | |
| Ryzen 7000 Series | Phoenix | PHX | 2023 | βοΈ | |
#### Getting Started
1. Follow the instructions here to download necessary NPU drivers and Ryzen AI software: [Ryzen AI SW Installation Instructions](https://ryzenai.docs.amd.com/en/latest/inst.html). Please allow for around **30 minutes** to install all of the necessary components of Ryzen AI SW. The tested working version as of writing is Ryzen AI 1.7.0.
2. Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path:
```powershell
conda activate ryzen-ai-1.7.0
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/1.7.0/'
```
3. Clone the Hugging Face model repository:
```powershell
git clone https://huggingface.co/amd/sesr
```
4. Install the prerequisites:
```powershell
pip install -r requirements.txt
```
## Quantitative Analyses
| Regime | Model | Parameters | MACs | Set5 | Set14 | BSD100 | Urban100 | Manga109 | DIV2K |
|--------|-------|------------|------|------|-------|--------|----------|----------|-------|
| **Small** | Bicubic | - | - | 33.68/0.9307 | 30.24/0.8693 | 29.56/0.8439 | 26.88/0.8408 | 30.82/0.9349 | 32.45/0.9043 |
| | FSRCNN (our setup) | 12.46K | 6.00G | 36.85/0.9561 | 32.47/0.9076 | 31.37/0.8891 | 29.43/0.8963 | 35.81/0.9689 | 34.73/0.9349 |
| | FSRCNN (Dong et al., 2016) | 12.46K | 6.00G | 36.98/0.9556 | 32.62/0.9087 | 31.50/0.8904 | 29.85/0.9009 | 36.62/0.9710 | 34.74/0.9340 |
| | MOREMNAS-C (Chu et al., 2020) | 25K | 5.5G | 37.06/0.9561 | 32.75/0.9094 | 31.50/0.8904 | 29.92/0.9023 | -/- | -/- |
| | SESR-M3 (f=16, m=3) | <mark>8.91K</mark> | <mark>2.05G</mark> | 37.21/0.9577 | 32.70/0.9100 | 31.56/0.8920 | 29.92/0.9034 | 36.47/0.9717 | 35.03/0.9373 |
| | SESR-M5 (f=16, m=5) | 13.52K | 3.11G | 37.39/0.9585 | 32.84/0.9115 | 31.70/0.8938 | 30.33/0.9087 | 37.07/0.9734 | 35.24/0.9389 |
| | <mark>SESR-M7 (f=16, m=7)</mark> | 18.12K | 4.17G | <mark>37.47</mark>/<mark>0.9588</mark> | <mark>32.91</mark>/<mark>0.9118</mark> | <mark>31.77</mark>/<mark>0.8946</mark> | <mark>30.49</mark>/<mark>0.9105</mark> | <mark>37.14</mark>/<mark>0.9738</mark> | <mark>35.32</mark>/<mark>0.9395</mark> |
| | | | | | | | | | |
| **Medium** | TPSR-NoGAN (Lee et al., 2020) | 60K | 14.0G | 37.38/0.9583 | 33.00/0.9123 | 31.75/0.8942 | 30.61/0.9119 | -/- | -/- |
| | <mark>SESR-M11 (f=16, m=11)</mark> | <mark>27.34K</mark> | <mark>6.30G</mark> | <mark>37.58</mark>/<mark>0.9593</mark> | <mark>33.03</mark>/<mark>0.9128</mark> | <mark>31.85</mark>/<mark>0.8956</mark> | <mark>30.72</mark>/<mark>0.9136</mark> | <mark>37.40</mark>/<mark>0.9746</mark> | <mark>35.45</mark>/<mark>0.9404</mark> |
| | | | | | | | | | |
| **Large** | VDSR (Kim et al., 2016) | 665K | 612.6G | 37.53/0.9587 | 33.05/0.9127 | 31.90/0.8960 | 30.77/0.9141 | 37.16/0.9740 | 35.43/0.9410 |
| | LapSRN (Lai et al., 2017) | 813K | 29.9G | 37.52/0.9590 | 33.08/0.9130 | 31.80/0.8950 | 30.41/0.9100 | 37.53/0.9740 | 35.31/0.9400 |
| | BTSRN (Fan et al., 2017) | 410K | 207.7G | 37.75/- | 33.20/- | <mark>32.05</mark>/- | <mark>31.63</mark>/- | -/- | -/- |
| | CARN-M (Ahn et al., 2018) | 412K | 91.2G | 37.53/0.9583 | <mark>33.26</mark>/0.9141 | 31.92/0.8960 | 31.23/<mark>0.9193</mark> | -/- | -/- |
| | MOREMNAS-B (Chu et al., 2020) | 1118K | 256.9G | 37.58/0.9584 | 33.22/0.9135 | 31.91/0.8959 | 31.14/0.9175 | -/- | -/- |
| | <mark>SESR-XL (f=32, m=11)</mark> | <mark>105.37K</mark> | <mark>24.27G</mark> | <mark>37.77/0.9601</mark> | 33.24/<mark>0.9145</mark> | 31.99/<mark>0.8976</mark> | 31.16/0.9184 | <mark>38.01</mark>/<mark>0.9759</mark> | <mark>35.67</mark>/<mark>0.9420</mark> |
Table 1: "PSNR/SSIM results on \\(\times\\)2 Super Resolution on several benchmark datasets. MACs are reported as the number of multiply-adds
needed to convert an image to 720p (1280 \\(\times\\) 720) resolution via \\(\times\\)2 SISR." Highlights indicate best score within each regime. Table from [Bhardwaj et al. (2022)](https://arxiv.org/abs/2103.09404).
| Regime | Model | Parameters | MACs | Set5 | Set14 | BSD100 | Urban100 | Manga109 | DIV2K |
|--------|-------|------------|------|------|-------|--------|----------|----------|-------|
| **Small** | Bicubic | - | - | 28.43/0.8113 | 26.00/0.7025 | 25.96/0.6682 | 23.14/0.6577 | 24.90/0.7855 | 28.10/0.7745 |
| | FSRCNN (our setup) | <mark>12.46K</mark> | 4.63G | 30.45/0.8648 | 27.44/0.7528 | 26.89/0.7124 | 24.39/0.7212 | 27.40/0.8539 | 29.37/0.8117 |
| | FSRCNN (Dong et al., 2016) | <mark>12.46K</mark> | 4.63G | 30.70/0.8657 | 27.59/0.7535 | 26.96/0.7128 | 24.60/0.7258 | 27.89/0.8590 | 29.36/0.8110 |
| | SESR-M3 (f=16, m=3) | 13.71K | <mark>0.79G</mark> | 30.75/0.8714 | 27.62/0.7579 | 27.00/0.7166 | 24.61/0.7304 | 27.90/0.8644 | 29.52/0.8155 |
| | SESR-M5 (f=16, m=5) | 18.32K | 1.05G | 30.99/0.8764 | 27.81/0.7624 | 27.11/0.7199 | 24.80/0.7389 | 28.29/0.8734 | 29.65/0.8189 |
| | <mark>SESR-M7 (f=16, m=7)</mark> | 22.92K | 1.32G | <mark>31.14</mark>/<mark>0.8787</mark> | <mark>27.88</mark>/<mark>0.7641</mark> | <mark>27.13</mark>/<mark>0.7209</mark> | <mark>24.90</mark>/<mark>0.7436</mark> | <mark>28.53</mark>/<mark>0.8778</mark> | <mark>29.72</mark>/<mark>0.8204</mark> |
| | | | | | | | | | |
| **Medium** | TPSR-NoGAN (Lee et al., 2020) | 61K | 3.6G | 31.10/0.8779 | <mark>27.95</mark>/<mark>0.7663</mark> | 27.15/0.7214 | 24.97/0.7456 | -/- | -/- |
| | <mark>SESR-M11 (f=16, m=11)</mark> | <mark>32.14K</mark> | <mark>1.85G</mark> | <mark>31.27</mark>/<mark>0.8810</mark> | 27.94/0.7660 | <mark>27.20</mark>/<mark>0.7225</mark> | <mark>25.00</mark>/<mark>0.7466</mark> | <mark>28.73</mark>/<mark>0.8815</mark> | <mark>29.81</mark>/<mark>0.8221</mark> |
| | | | | | | | | | |
| **Large** | VDSR (Kim et al., 2016) | 665K | 612.6G | 31.35/0.8838 | 28.02/0.7678 | 27.29/0.7252 | 25.18/0.7525 | 28.82/0.8860 | 29.82/0.8240 |
| | LapSRN (Lai et al., 2017) | 813K | 149.4G | 31.54/0.8850 | 28.19/0.7720 | 27.32/0.7280 | 25.21/0.7560 | <mark>29.09</mark>/0.8900| 29.88/0.8250 |
| | BTSRN (Fan et al., 2017) | 410K | 165.2G | 31.85/- | 28.20/- | <mark>27.47</mark>/- | <mark>25.74</mark>/- | -/- | -/- |
| | <mark>CARN-M (Ahn et al., 2018)</mark> | 412K | 32.5G | <mark>31.92</mark>/<mark>0.8903</mark> | <mark>28.42</mark>/<mark>0.7762</mark> | 27.44/<mark>0.7304</mark> | 25.62/<mark>0.7694</mark> | -/- | -/- |
| | SESR-XL (f=32, m=11) | <mark>114.97K</mark> | <mark>6.62G</mark> | 31.54/0.8866 | 28.12/0.7712 | 27.31/0.7277 | 25.31/0.7604 | 29.04/<mark>0.8901</mark> | <mark>29.94</mark>/<mark>0.8266</mark> |
Table 2: "PSNR/SSIM results on \\(\times\\)4 Super Resolution on several benchmark datasets. MACs are reported as the number of multiply-adds
needed to convert an image to 720p (1280 \\(\times\\) 720) resolution via \\(\times\\)4 SISR." Highlights indicate best score within each regime. Table from [Bhardwaj et al. (2022)](https://arxiv.org/abs/2103.09404).
## Model description
SESR is based on linear overparameterization of CNNs and creates an efficient model architecture for SISR. It was introduced in the paper [Collapsible Linear Blocks for Super-Efficient Super Resolution](https://arxiv.org/abs/2103.09404).
The official code for this work is available at this
https://github.com/ARM-software/sesr
We develop a modified version that could be supported by [AMD Ryzen AI](https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html).
## Intended uses & limitations
You can use the raw model for super resolution. See the [model hub](https://huggingface.co/models?search=amd/sesr) to look for all available models.
## How to use
### Installation
Follow [Ryzen AI Installation](https://ryzenai.docs.amd.com/en/latest/inst.html) to prepare the environment for Ryzen AI.
Run the following script to install pre-requisites for this model.
```bash
pip install -r requirements.txt
```
### Data Preparation (optional: for accuracy evaluation)
1. Download the benchmark(https://cv.snu.ac.kr/research/EDSR/benchmark.tar) dataset.
2. Organize the dataset directory as follows:
```Plain
βββ dataset
βββ benchmark
βββ Set5
βββ HR
| βββ baby.png
| βββ ...
βββ LR_bicubic
βββX2
βββbabyx2.png
βββ ...
βββ Set14
βββ ...
```
### Test & Evaluation
- Code snippet from [`one_image_inference.py`](one_image_inference.py) on how to use
```python
parser = argparse.ArgumentParser(description='EDSR and MDSR')
parser.add_argument('--onnx_path', type=str, default='SESR_int8.onnx',
help='onnx path')
parser.add_argument('--image_path', default='test_data/test.png',
help='path of your image')
parser.add_argument('--output_path', default='test_data/sr.png',
help='path of your image')
parser.add_argument('--ipu', action='store_true',
help='use ipu')
parser.add_argument('--provider_config', type=str, default=None,
help='provider config path')
args = parser.parse_args()
if args.ipu:
providers = ["VitisAIExecutionProvider"]
provider_options = [{"config_file": args.provider_config}]
else:
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
provider_options = None
onnx_file_name = args.onnx_path
image_path = args.image_path
output_path = args.output_path
ort_session = onnxruntime.InferenceSession(onnx_file_name, providers=providers, provider_options=provider_options)
lr = cv2.imread(image_path)[np.newaxis,:,:,:].transpose((0,3,1,2)).astype(np.float32)
sr = tiling_inference(ort_session, lr, 8, (56, 56))
sr = np.clip(sr, 0, 255)
sr = sr.squeeze().transpose((1,2,0)).astype(np.uint8)
sr = cv2.imwrite(output_path, sr)
```
- Run inference for a single image
```python
python one_image_inference.py --onnx_path SESR_int8.onnx --image_path /Path/To/Your/Image --ipu --provider_config Path/To/vaip_config.json
```
Note: **vaip_config.json** is located at the setup package of Ryzen AI (refer to [Installation](https://huggingface.co/amd/yolox-s#installation))
- Test accuracy of the quantized model
```python
python test.py --onnx_path SESR_int8.onnx --data_test Set5 --ipu --provider_config Path/To/vaip_config.json
```
### Performance
| Method | Scale | Flops | Set5 |
|------------|-------|-------|--------------|
|SESR-S (float) |X2 |10.22G |37.21|
|SESR-S (INT8) |X2 |10.22G |36.81|
- Note: the Flops is calculated with the input resolution is 256x256
```bibtex
@misc{bhardwaj2022collapsible,
title={Collapsible Linear Blocks for Super-Efficient Super Resolution},
author={Kartikeya Bhardwaj and Milos Milosavljevic and Liam O'Neil and Dibakar Gope and Ramon Matas and Alex Chalfin and Naveen Suda and Lingchuan Meng and Danny Loh},
year={2022},
eprint={2103.09404},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
``` |