uvdoc-grid-onnx / README.md
fredcallagan's picture
Upload folder using huggingface_hub
39c8284 verified
metadata
license: apache-2.0
library_name: onnx
pipeline_tag: image-to-image
tags:
  - onnx
  - document-processing
  - document-unwarping
  - image-processing
  - ocr-preprocessing
  - computer-vision

UVDoc Grid Output - Document Unwarping ONNX Model

This is an ONNX export of the UVDoc document unwarping model, modified to output a coordinate grid instead of an image. This enables high-resolution document unwarping via cv2.remap().

Model Description

UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version outputs a coordinate mapping grid that can be applied to images of any resolution.

Key Difference: Grid Output vs Image Output

Approach Output Quality
Image-output models 288x288 RGB image Poor (must upscale)
This grid-output model 45x31 coordinate grid Native resolution

Model Details

  • Architecture: UVDoc (ResNet-based encoder-decoder)
  • Input: (1, 3, 720, 496) - RGB image, normalized [0, 1]
  • Output: (1, 2, 45, 31) - Coordinate grid in [-1, 1] range
  • ONNX Opset: 16
  • Size: ~30 MB

Input Specifications

Property Value
Shape (batch, 3, 720, 496)
Format RGB (not BGR)
Range [0, 1] (normalized)
Layout NCHW (batch, channels, height, width)

Output Specifications

Property Value
Shape (batch, 2, 45, 31)
Channels 2 (x, y coordinates)
Range [-1, 1] (normalized coordinates)
Layout NCHW (batch, channels, height, width)

Usage

With ONNX Runtime (Python)

import cv2
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]

# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720))  # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None]  # (1, 3, 720, 496)

# Run inference
result = session.run(None, {'image': blob})[0]  # (1, 2, 45, 31)

# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0))  # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

# Apply unwarping to original high-res image
unwarped = cv2.remap(
    image,
    map_x.astype(np.float32),
    map_y.astype(np.float32),
    interpolation=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REPLICATE
)

cv2.imwrite("unwarped_document.jpg", unwarped)

With HuggingFace Hub

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
    filename="UVDoc_grid.onnx"
)

Training Details

This model was not retrained. It is a direct ONNX export of the original UVDoc weights from tanguymagne/UVDoc, with a wrapper to output only the 2D coordinate grid (discarding the 3D shape output).

Original Model

Limitations

  • Input must be resized to 720x496 for inference (grid output is always 45x31)
  • Works best on documents with visible text/content (needs features for grid estimation)
  • May not handle extreme perspective distortions well
  • CPU inference takes ~100-200ms per image

Citation

If you use this model, please cite the original UVDoc paper:

@inproceedings{UVDoc,
    title={{UVDoc}: Neural Grid-based Document Unwarping},
    author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
    booktitle = {SIGGRAPH ASIA, Technical Papers},
    year = {2023},
    url={https://doi.org/10.1145/3610548.3618174}
}

License

This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also Apache 2.0 licensed. See the original repository for full license details.