DRUNet (color) — ncnn port

ncnn-compatible weights for DRUNet, the plug-and-play color denoiser from cszn/DPIR (Zhang et al., Plug-and-Play Image Restoration with Deep Denoiser Prior, IEEE TPAMI 2021). One model handles every noise level σ ∈ [0, 50] because the noise level is passed in as a 4th input channel — no per-σ retraining needed.

Converted from the official PyTorch checkpoint published at deepinv/drunet. To my knowledge no other ncnn port of DRUNet existed on the Hub — uploading so the next person doesn't have to spend the hour I did re-running pnnx.

Files

File	Size	Purpose
`drunet_color.ncnn.param`	~11 KB	Network topology (text format, 125 layers)
`drunet_color.ncnn.bin`	~65 MB	fp16-quantized weights

The fp16 quantization halves the on-disk footprint vs the original 130 MB fp32 .pth; visually-perceptible differences vs fp32 are within noise on a real image at σ ≤ 50.

Usage (ncnn C++)

#include "net.h"

ncnn::Net net;
net.opt.use_vulkan_compute = true;   // ~5× faster than CPU on a real GPU
net.load_param("drunet_color.ncnn.param");
net.load_model("drunet_color.ncnn.bin");

// Input layout: 4-channel float, (1, 4, H, W)
//   ch0..2 = RGB normalized to [0, 1]
//   ch3    = σ/255 broadcast as a constant plane
// H and W must be multiples of 8 (4 downscale stages in the UNet).
// Pad with cv::BORDER_REPLICATE and crop the result back.

ncnn::Mat in(W, H, 4);
// ...fill RGB and σ plane...

ncnn::Extractor ex = net.create_extractor();
ex.input("in0", in);
ncnn::Mat out;
ex.extract("out0", out);  // 3-channel float RGB in [0, 1]

A full C++ wrapper with tile-aware inference, replicate-padding, and Vulkan auto-detection lives in mlc-ncnn-img2img/src/denoise.cpp.

How this was produced

# python3, in a venv with torch + pnnx + opencv-python
import sys, torch
sys.path.insert(0, "DPIR")              # clone of github.com/cszn/DPIR
from models.network_unet import UNetRes

model = UNetRes(in_nc=4, out_nc=3, nc=[64,128,256,512], nb=4,
                act_mode="R", downsample_mode="strideconv",
                upsample_mode="convtranspose")
model.load_state_dict(torch.load("drunet_color.pth", weights_only=True))
model.eval()

x = torch.randn(1, 4, 256, 256)
torch.jit.trace(model, x, check_trace=False).save("drunet_color.pt")

import pnnx
pnnx.convert("drunet_color.pt", inputs=x, fp16=True)
# → drunet_color.ncnn.param + drunet_color.ncnn.bin

The complete driver script is convert_drunet.py in the sibling repo.

Performance

Eagle 1024×1024, σ=20 (single tile, no batching, fp16 weights):

Backend	Wall time	Notes
Vulkan, Apple M2 Ultra (MoltenVK)	1.3 s	warm; first run ~44 s (shader JIT)
Vulkan, NVIDIA RTX 3060 (Windows)	3.66 s	warm avg of 3; cold 3.49 s (5.7× over same-box CPU)
CPU, Apple M2 Ultra (4 threads)	3.1 s	native arm64, AppleClang + libomp
CPU, AMD Ryzen 7 2700X (4 threads, AVX2)	21 s	RTX 3060 box, MLC_NCNN_CPU=1 forced
CPU, Intel Xeon (4 threads, AVX2)	23 s	Linux box without hardware Vulkan
Vulkan, Mesa llvmpipe (software)	127 s	5× slower than CPU — filter this out

Notable: M2 Ultra Vulkan (1.3 s) beats RTX 3060 Vulkan (3.66 s) by ~2.8×. Likely a combination of M2 Ultra's unified memory (no per-tile PCIe round-trip) and its high FP16 throughput; the default tile size of 256 px favours architectures with cheap small-batch dispatches. A power-user knob to bump tile size on discrete GPUs with plenty of VRAM is a worthwhile follow-up.

The M2 Ultra Vulkan path is ~~17× faster than the Xeon CPU baseline. First-call latency on MoltenVK is dominated by Metal-shader JIT compilation (~~40 s); subsequent invocations from the same process amortize to the warm number. For one-shot CLI invocations on Apple Silicon, the difference matters — caching the binary's shader compile output across runs would close it (TODO).

If ncnn::get_gpu_info() only reports llvmpipe (Mesa software Vulkan on headless Linux), prefer CPU — software Vulkan is a slowdown for this model size, not a speedup. The companion C++ wrapper auto-detects this and falls back to CPU.

License & citation

MIT, inherited from the original cszn/DPIR repository.

@article{zhang2021plug,
  title={Plug-and-Play Image Restoration with Deep Denoiser Prior},
  author={Zhang, Kai and Li, Yawei and Zuo, Wangmeng and Zhang, Lei and
          Van Gool, Luc and Timofte, Radu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021}
}

Companion

This model ships as the denoise Tool Plugin in mlc OpticScript — JS scripts can call Engine.tool('denoise').apply(img, {strength: 20}) and the runtime spawns the bundled C++ binary that loads these weights.

Downloads last month: -; Downloads are not tracked for this model. How to track