UVDoc Grid Output - Document Unwarping ONNX Model

This is an ONNX export of the UVDoc document unwarping model, modified to output a coordinate grid instead of an image. This enables high-resolution document unwarping via cv2.remap().

Model Description

UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version outputs a coordinate mapping grid that can be applied to images of any resolution.

Key Difference: Grid Output vs Image Output

Approach	Output	Quality
Image-output models	288x288 RGB image	Poor (must upscale)
This grid-output model	45x31 coordinate grid	Native resolution

Model Details

Architecture: UVDoc (ResNet-based encoder-decoder)
Input: (1, 3, 720, 496) - RGB image, normalized [0, 1]
Output: (1, 2, 45, 31) - Coordinate grid in [-1, 1] range
ONNX Opset: 16
Size: ~30 MB

Input Specifications

Property	Value
Shape	`(batch, 3, 720, 496)`
Format	RGB (not BGR)
Range	`[0, 1]` (normalized)
Layout	NCHW (batch, channels, height, width)

Output Specifications

Property	Value
Shape	`(batch, 2, 45, 31)`
Channels	2 (x, y coordinates)
Range	`[-1, 1]` (normalized coordinates)
Layout	NCHW (batch, channels, height, width)

Usage

With ONNX Runtime (Python)

import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]

# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720))  # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None]  # (1, 3, 720, 496)

# Run inference
result = session.run(None, {'image': blob})[0]  # (1, 2, 45, 31)

# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0))  # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

# Apply unwarping to original high-res image
unwarped = cv2.remap(
    image,
    map_x.astype(np.float32),
    map_y.astype(np.float32),
    interpolation=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REPLICATE
)

cv2.imwrite("unwarped_document.jpg", unwarped)

With HuggingFace Hub

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
    filename="UVDoc_grid.onnx"
)

Training Details

This model was not retrained. It is a direct ONNX export of the original UVDoc weights from tanguymagne/UVDoc, with a wrapper to output only the 2D coordinate grid (discarding the 3D shape output).

Original Model

Paper: UVDoc: Neural Grid-based Document Unwarping
Authors: Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
Published: SIGGRAPH Asia 2023
Original Repository: tanguymagne/UVDoc

Limitations

Input must be resized to 720x496 for inference (grid output is always 45x31)
Works best on documents with visible text/content (needs features for grid estimation)
May not handle extreme perspective distortions well
CPU inference takes ~100-200ms per image

Citation

If you use this model, please cite the original UVDoc paper:

@inproceedings{UVDoc,
    title={{UVDoc}: Neural Grid-based Document Unwarping},
    author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
    booktitle = {SIGGRAPH ASIA, Technical Papers},
    year = {2023},
    url={https://doi.org/10.1145/3610548.3618174}
}

License

This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also Apache 2.0 licensed. See the original repository for full license details.

Downloads last month: 12

Paper for fredcallagan/uvdoc-grid-onnx

UVDoc: Neural Grid-based Document Unwarping

Paper • 2302.02887 • Published Feb 6, 2023