UVDoc Grid Output - Document Unwarping ONNX Model

This is an ONNX export of the UVDoc document unwarping model, modified to output a coordinate grid instead of an image. This enables high-resolution document unwarping via cv2.remap().

Model Description

UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version outputs a coordinate mapping grid that can be applied to images of any resolution.

Key Difference: Grid Output vs Image Output

Approach Output Quality
Image-output models 288x288 RGB image Poor (must upscale)
This grid-output model 45x31 coordinate grid Native resolution

Model Details

  • Architecture: UVDoc (ResNet-based encoder-decoder)
  • Input: (1, 3, 720, 496) - RGB image, normalized [0, 1]
  • Output: (1, 2, 45, 31) - Coordinate grid in [-1, 1] range
  • ONNX Opset: 16
  • Size: ~30 MB

Input Specifications

Property Value
Shape (batch, 3, 720, 496)
Format RGB (not BGR)
Range [0, 1] (normalized)
Layout NCHW (batch, channels, height, width)

Output Specifications

Property Value
Shape (batch, 2, 45, 31)
Channels 2 (x, y coordinates)
Range [-1, 1] (normalized coordinates)
Layout NCHW (batch, channels, height, width)

Usage

With ONNX Runtime (Python)

import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]

# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720))  # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None]  # (1, 3, 720, 496)

# Run inference
result = session.run(None, {'image': blob})[0]  # (1, 2, 45, 31)

# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0))  # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

# Apply unwarping to original high-res image
unwarped = cv2.remap(
    image,
    map_x.astype(np.float32),
    map_y.astype(np.float32),
    interpolation=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REPLICATE
)

cv2.imwrite("unwarped_document.jpg", unwarped)

With HuggingFace Hub

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
    filename="UVDoc_grid.onnx"
)

Training Details

This model was not retrained. It is a direct ONNX export of the original UVDoc weights from tanguymagne/UVDoc, with a wrapper to output only the 2D coordinate grid (discarding the 3D shape output).

Original Model

Limitations

  • Input must be resized to 720x496 for inference (grid output is always 45x31)
  • Works best on documents with visible text/content (needs features for grid estimation)
  • May not handle extreme perspective distortions well
  • CPU inference takes ~100-200ms per image

Citation

If you use this model, please cite the original UVDoc paper:

@inproceedings{UVDoc,
    title={{UVDoc}: Neural Grid-based Document Unwarping},
    author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
    booktitle = {SIGGRAPH ASIA, Technical Papers},
    year = {2023},
    url={https://doi.org/10.1145/3610548.3618174}
}

License

This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also Apache 2.0 licensed. See the original repository for full license details.

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for fredcallagan/uvdoc-grid-onnx