sesr / README_ben.md

bconsolvo

updates readme

85c8ade 9 days ago

preview code

raw

history blame contribute delete

15.4 kB

metadata

license: apache-2.0
datasets:
  - eugenesiow/Div2k
language:
  - en
tags:
  - RyzenAI
  - Super Resolution
  - SISR
  - SESR
  - ONNX

🚀 SESR-S on AMD AI PC NPU

Bhardwaj et al. (2022) introduced the Super-Efficient Super Resolution (SESR) model to solve a classic computer vision problem: to take a low-resolution input image and output a high-resolution image. The SESR model is based on a "linear overparameterization of CNNs and creates an efficient model architecture for [Single Image Super Resolution (SISR)]." The official code can be found at their accompanying GitHub repository: https://github.com/ARM-software/sesr. One of the main ideas behind the model was to make it very computationally efficient.

This version of the model is the SESR-S (Small) version; it has been converted from PyTorch format to ONNX, and then quantized to INT8 to run on an AMD AI PC NPU with Ryzen AI software. The model in its current form natively accepts a 256x256 RGB image and outputs a 512x512 RGB image; however, alternate versions of the model could accept 1920x1080 and upscale to 3840x2160 (4K) or 7680x4320 (8K).

Model Details	Description
Person or organization developing model	Tong Shen (AMD), Benjamin Consolvo (AMD)
Model date	January 9, 2026
Model version	1
Model type	Super-Resolution (Image-to-Image)
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features	The $\times2$ SESR was trained for "300 epochs using ADAM optimizer with a constant learning rate of $5 \times 10^{-4}$ and a batch size of 32 on DIV2K training set." And the $\times4$ SESR model starts with the pretrained $\times2$ SESR model and replaces "the final layer of $5 \times 5 \times f \times 4$ with a $5 \times 5 \times f \times 16$ and then perform[s] the depth-to-space operation twice" (Bhardwaj et al., 2022). For more training details, refer to the paper.
Paper or other resource for more information	Bhardwaj, K., Milosavljevic, M., O'Neil, L., Gope, D., Matas, R., Chalfin, A., ... & Loh, D. (2022). Collapsible linear blocks for super-efficient super resolution. Proceedings of machine learning and systems, 4, 529-547
License	Apache 2.0
Where to send questions or comments about the model	Community Tab and AMD Developer Community Discord

⚡ Intended Use

Intended Use	Description
Primary intended uses	The model can be used to create high-resolution images from low-resolution images. The model has been converted to ONNX format and quantized for optimized performance on AMD AI PC NPUs.
Primary intended users	Anyone using or evaluating super-resolution models on AMD AI PCs.
Out-of-scope uses	This model is not intended for generating misinformation or disinformation, impersonating others, facilitating or inciting harassment or violence, any use that could lead to the violation of a human right.

How to Use

📐 Hardware Prerequisites

Before getting started, make sure you meet the minimum hardware and OS requirements:

Series	Codename	Abbreviation	Launch Year	Windows 11
Ryzen AI Max PRO 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI PRO 300 Series	Strix Point / Krackan Point	STX/KRK	2025	☑️
Ryzen AI Max 300 Series	Strix Halo	STX	2025	☑️
Ryzen AI 300 Series	Strix Point	STX	2025	☑️
Ryzen Pro 200 Series	Hawk Point	HPT	2025	☑️
Ryzen 200 Series	Hawk Point	HPT	2025	☑️
Ryzen PRO 8000 Series	Hawk Point	HPT	2024	☑️
Ryzen 8000 Series	Hawk Point	HPT	2024	☑️
Ryzen Pro 7000 Series	Phoenix	PHX	2023	☑️
Ryzen 7000 Series	Phoenix	PHX	2023	☑️

Getting Started

Follow the instructions here to download necessary NPU drivers and Ryzen AI software: Ryzen AI SW Installation Instructions. Please allow for around 30 minutes to install all of the necessary components of Ryzen AI SW. The tested working version as of writing is Ryzen AI 1.7.0.
Activate the previously installed conda environment from Ryzen AI (RAI) SW, and set the RAI environment variable to your installation path:

conda activate ryzen-ai-1.7.0
$Env:RYZEN_AI_INSTALLATION_PATH = 'C:/Program Files/RyzenAI/1.7.0/'

Clone the Hugging Face model repository:

git clone https://huggingface.co/amd/sesr

Install the prerequisites:

pip install -r requirements.txt

Quantitative Analyses

Regime	Model	Parameters	MACs	Set5	Set14	BSD100	Urban100	Manga109	DIV2K
Small	Bicubic	-	-	33.68/0.9307	30.24/0.8693	29.56/0.8439	26.88/0.8408	30.82/0.9349	32.45/0.9043
	FSRCNN (our setup)	12.46K	6.00G	36.85/0.9561	32.47/0.9076	31.37/0.8891	29.43/0.8963	35.81/0.9689	34.73/0.9349
	FSRCNN (Dong et al., 2016)	12.46K	6.00G	36.98/0.9556	32.62/0.9087	31.50/0.8904	29.85/0.9009	36.62/0.9710	34.74/0.9340
	MOREMNAS-C (Chu et al., 2020)	25K	5.5G	37.06/0.9561	32.75/0.9094	31.50/0.8904	29.92/0.9023	-/-	-/-
	SESR-M3 (f=16, m=3)	8.91K	2.05G	37.21/0.9577	32.70/0.9100	31.56/0.8920	29.92/0.9034	36.47/0.9717	35.03/0.9373
	SESR-M5 (f=16, m=5)	13.52K	3.11G	37.39/0.9585	32.84/0.9115	31.70/0.8938	30.33/0.9087	37.07/0.9734	35.24/0.9389
	SESR-M7 (f=16, m=7)	18.12K	4.17G	37.47/0.9588	32.91/0.9118	31.77/0.8946	30.49/0.9105	37.14/0.9738	35.32/0.9395

Medium	TPSR-NoGAN (Lee et al., 2020)	60K	14.0G	37.38/0.9583	33.00/0.9123	31.75/0.8942	30.61/0.9119	-/-	-/-
	SESR-M11 (f=16, m=11)	27.34K	6.30G	37.58/0.9593	33.03/0.9128	31.85/0.8956	30.72/0.9136	37.40/0.9746	35.45/0.9404

Large	VDSR (Kim et al., 2016)	665K	612.6G	37.53/0.9587	33.05/0.9127	31.90/0.8960	30.77/0.9141	37.16/0.9740	35.43/0.9410
	LapSRN (Lai et al., 2017)	813K	29.9G	37.52/0.9590	33.08/0.9130	31.80/0.8950	30.41/0.9100	37.53/0.9740	35.31/0.9400
	BTSRN (Fan et al., 2017)	410K	207.7G	37.75/-	33.20/-	32.05/-	31.63/-	-/-	-/-
	CARN-M (Ahn et al., 2018)	412K	91.2G	37.53/0.9583	33.26/0.9141	31.92/0.8960	31.23/0.9193	-/-	-/-
	MOREMNAS-B (Chu et al., 2020)	1118K	256.9G	37.58/0.9584	33.22/0.9135	31.91/0.8959	31.14/0.9175	-/-	-/-
	SESR-XL (f=32, m=11)	105.37K	24.27G	37.77/0.9601	33.24/0.9145	31.99/0.8976	31.16/0.9184	38.01/0.9759	35.67/0.9420

Table 1: "PSNR/SSIM results on $\times$ 2 Super Resolution on several benchmark datasets. MACs are reported as the number of multiply-adds needed to convert an image to 720p (1280 $\times$ 720) resolution via $\times$ 2 SISR." Highlights indicate best score within each regime. Table from Bhardwaj et al. (2022).

Regime	Model	Parameters	MACs	Set5	Set14	BSD100	Urban100	Manga109	DIV2K
Small	Bicubic	-	-	28.43/0.8113	26.00/0.7025	25.96/0.6682	23.14/0.6577	24.90/0.7855	28.10/0.7745
	FSRCNN (our setup)	12.46K	4.63G	30.45/0.8648	27.44/0.7528	26.89/0.7124	24.39/0.7212	27.40/0.8539	29.37/0.8117
	FSRCNN (Dong et al., 2016)	12.46K	4.63G	30.70/0.8657	27.59/0.7535	26.96/0.7128	24.60/0.7258	27.89/0.8590	29.36/0.8110
	SESR-M3 (f=16, m=3)	13.71K	0.79G	30.75/0.8714	27.62/0.7579	27.00/0.7166	24.61/0.7304	27.90/0.8644	29.52/0.8155
	SESR-M5 (f=16, m=5)	18.32K	1.05G	30.99/0.8764	27.81/0.7624	27.11/0.7199	24.80/0.7389	28.29/0.8734	29.65/0.8189
	SESR-M7 (f=16, m=7)	22.92K	1.32G	31.14/0.8787	27.88/0.7641	27.13/0.7209	24.90/0.7436	28.53/0.8778	29.72/0.8204

Medium	TPSR-NoGAN (Lee et al., 2020)	61K	3.6G	31.10/0.8779	27.95/0.7663	27.15/0.7214	24.97/0.7456	-/-	-/-
	SESR-M11 (f=16, m=11)	32.14K	1.85G	31.27/0.8810	27.94/0.7660	27.20/0.7225	25.00/0.7466	28.73/0.8815	29.81/0.8221

Large	VDSR (Kim et al., 2016)	665K	612.6G	31.35/0.8838	28.02/0.7678	27.29/0.7252	25.18/0.7525	28.82/0.8860	29.82/0.8240
	LapSRN (Lai et al., 2017)	813K	149.4G	31.54/0.8850	28.19/0.7720	27.32/0.7280	25.21/0.7560	29.09/0.8900	29.88/0.8250
	BTSRN (Fan et al., 2017)	410K	165.2G	31.85/-	28.20/-	27.47/-	25.74/-	-/-	-/-
	CARN-M (Ahn et al., 2018)	412K	32.5G	31.92/0.8903	28.42/0.7762	27.44/0.7304	25.62/0.7694	-/-	-/-
	SESR-XL (f=32, m=11)	114.97K	6.62G	31.54/0.8866	28.12/0.7712	27.31/0.7277	25.31/0.7604	29.04/0.8901	29.94/0.8266

Table 2: "PSNR/SSIM results on $\times$ 4 Super Resolution on several benchmark datasets. MACs are reported as the number of multiply-adds needed to convert an image to 720p (1280 $\times$ 720) resolution via $\times$ 4 SISR." Highlights indicate best score within each regime. Table from Bhardwaj et al. (2022).

Model description

SESR is based on linear overparameterization of CNNs and creates an efficient model architecture for SISR. It was introduced in the paper Collapsible Linear Blocks for Super-Efficient Super Resolution. The official code for this work is available at this https://github.com/ARM-software/sesr

We develop a modified version that could be supported by AMD Ryzen AI.

Intended uses & limitations

You can use the raw model for super resolution. See the model hub to look for all available models.

How to use

Installation

Follow Ryzen AI Installation to prepare the environment for Ryzen AI. Run the following script to install pre-requisites for this model.

pip install -r requirements.txt

Data Preparation (optional: for accuracy evaluation)

Download the benchmark(https://cv.snu.ac.kr/research/EDSR/benchmark.tar) dataset.
Organize the dataset directory as follows:

└── dataset
     └── benchmark
          ├── Set5
               ├── HR
               |   ├── baby.png
               |   ├── ...
               └── LR_bicubic
                   └──X2
                      ├──babyx2.png
                      ├── ...
          ├── Set14
          ├── ...

Test & Evaluation

Code snippet from one_image_inference.py on how to use

    parser = argparse.ArgumentParser(description='EDSR and MDSR')
    parser.add_argument('--onnx_path', type=str, default='SESR_int8.onnx',
                    help='onnx path')
    parser.add_argument('--image_path', default='test_data/test.png',
                    help='path of your image')
    parser.add_argument('--output_path', default='test_data/sr.png',
                    help='path of your image')
    parser.add_argument('--ipu', action='store_true',
                    help='use ipu')
    parser.add_argument('--provider_config', type=str, default=None,
                    help='provider config path')
    args = parser.parse_args()
    if args.ipu:
        providers = ["VitisAIExecutionProvider"]
        provider_options = [{"config_file": args.provider_config}]
    else:
        providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
        provider_options = None
   
    onnx_file_name = args.onnx_path
    image_path = args.image_path
    output_path = args.output_path

    ort_session = onnxruntime.InferenceSession(onnx_file_name,  providers=providers, provider_options=provider_options) 
    lr = cv2.imread(image_path)[np.newaxis,:,:,:].transpose((0,3,1,2)).astype(np.float32)
    sr = tiling_inference(ort_session, lr, 8, (56, 56))
    sr = np.clip(sr, 0, 255)
    sr = sr.squeeze().transpose((1,2,0)).astype(np.uint8)
    sr = cv2.imwrite(output_path, sr)

Run inference for a single image

python one_image_inference.py --onnx_path SESR_int8.onnx --image_path /Path/To/Your/Image --ipu --provider_config Path/To/vaip_config.json

Note: vaip_config.json is located at the setup package of Ryzen AI (refer to Installation)

Test accuracy of the quantized model

python test.py --onnx_path SESR_int8.onnx --data_test Set5 --ipu --provider_config Path/To/vaip_config.json

Performance

Method	Scale	Flops	Set5
SESR-S (float)	X2	10.22G	37.21
SESR-S (INT8)	X2	10.22G	36.81

Note: the Flops is calculated with the input resolution is 256x256

@misc{bhardwaj2022collapsible,
      title={Collapsible Linear Blocks for Super-Efficient Super Resolution}, 
      author={Kartikeya Bhardwaj and Milos Milosavljevic and Liam O'Neil and Dibakar Gope and Ramon Matas and Alex Chalfin and Naveen Suda and Lingchuan Meng and Danny Loh},
      year={2022},
      eprint={2103.09404},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}