UVDoc Grid Output - Document Unwarping ONNX Model
This is an ONNX export of the UVDoc document unwarping model,
modified to output a coordinate grid instead of an image. This enables high-resolution document
unwarping via cv2.remap().
Model Description
UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version outputs a coordinate mapping grid that can be applied to images of any resolution.
Key Difference: Grid Output vs Image Output
| Approach | Output | Quality |
|---|---|---|
| Image-output models | 288x288 RGB image | Poor (must upscale) |
| This grid-output model | 45x31 coordinate grid | Native resolution |
Model Details
- Architecture: UVDoc (ResNet-based encoder-decoder)
- Input:
(1, 3, 720, 496)- RGB image, normalized [0, 1] - Output:
(1, 2, 45, 31)- Coordinate grid in [-1, 1] range - ONNX Opset: 16
- Size: ~30 MB
Input Specifications
| Property | Value |
|---|---|
| Shape | (batch, 3, 720, 496) |
| Format | RGB (not BGR) |
| Range | [0, 1] (normalized) |
| Layout | NCHW (batch, channels, height, width) |
Output Specifications
| Property | Value |
|---|---|
| Shape | (batch, 2, 45, 31) |
| Channels | 2 (x, y coordinates) |
| Range | [-1, 1] (normalized coordinates) |
| Layout | NCHW (batch, channels, height, width) |
Usage
With ONNX Runtime (Python)
import cv2
import numpy as np
import onnxruntime as ort
# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])
# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]
# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720)) # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496)
# Run inference
result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31)
# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)
map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)
# Apply unwarping to original high-res image
unwarped = cv2.remap(
image,
map_x.astype(np.float32),
map_y.astype(np.float32),
interpolation=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE
)
cv2.imwrite("unwarped_document.jpg", unwarped)
With HuggingFace Hub
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
filename="UVDoc_grid.onnx"
)
Training Details
This model was not retrained. It is a direct ONNX export of the original UVDoc weights from tanguymagne/UVDoc, with a wrapper to output only the 2D coordinate grid (discarding the 3D shape output).
Original Model
- Paper: UVDoc: Neural Grid-based Document Unwarping
- Authors: Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
- Published: SIGGRAPH Asia 2023
- Original Repository: tanguymagne/UVDoc
Limitations
- Input must be resized to 720x496 for inference (grid output is always 45x31)
- Works best on documents with visible text/content (needs features for grid estimation)
- May not handle extreme perspective distortions well
- CPU inference takes ~100-200ms per image
Citation
If you use this model, please cite the original UVDoc paper:
@inproceedings{UVDoc,
title={{UVDoc}: Neural Grid-based Document Unwarping},
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
booktitle = {SIGGRAPH ASIA, Technical Papers},
year = {2023},
url={https://doi.org/10.1145/3610548.3618174}
}
License
This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also Apache 2.0 licensed. See the original repository for full license details.
- Downloads last month
- 4