--- license: apache-2.0 library_name: onnx pipeline_tag: image-to-image tags: - onnx - document-processing - document-unwarping - image-processing - ocr-preprocessing - computer-vision --- # UVDoc Grid Output - Document Unwarping ONNX Model This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model, modified to output a **coordinate grid** instead of an image. This enables high-resolution document unwarping via `cv2.remap()`. ## Model Description UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version outputs a coordinate mapping grid that can be applied to images of any resolution. ### Key Difference: Grid Output vs Image Output | Approach | Output | Quality | |----------|--------|---------| | **Image-output models** | 288x288 RGB image | Poor (must upscale) | | **This grid-output model** | 45x31 coordinate grid | Native resolution | ## Model Details - **Architecture:** UVDoc (ResNet-based encoder-decoder) - **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1] - **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range - **ONNX Opset:** 16 - **Size:** ~30 MB ### Input Specifications | Property | Value | |----------|-------| | Shape | `(batch, 3, 720, 496)` | | Format | RGB (not BGR) | | Range | `[0, 1]` (normalized) | | Layout | NCHW (batch, channels, height, width) | ### Output Specifications | Property | Value | |----------|-------| | Shape | `(batch, 2, 45, 31)` | | Channels | 2 (x, y coordinates) | | Range | `[-1, 1]` (normalized coordinates) | | Layout | NCHW (batch, channels, height, width) | ## Usage ### With ONNX Runtime (Python) ```python import cv2 import numpy as np import onnxruntime as ort # Load model session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider']) # Load and preprocess image image = cv2.imread("warped_document.jpg") h_orig, w_orig = image.shape[:2] # Prepare model input (720x496 RGB normalized) img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) resized = cv2.resize(img_rgb, (496, 720)) # width, height blob = resized.astype(np.float32) / 255.0 blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496) # Run inference result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31) # Convert grid to remap coordinates grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2) grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR) map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1) map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1) # Apply unwarping to original high-res image unwarped = cv2.remap( image, map_x.astype(np.float32), map_y.astype(np.float32), interpolation=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE ) cv2.imwrite("unwarped_document.jpg", unwarped) ``` ### With HuggingFace Hub ```python from huggingface_hub import hf_hub_download model_path = hf_hub_download( repo_id="YOUR_USERNAME/uvdoc-grid-onnx", filename="UVDoc_grid.onnx" ) ``` ## Training Details This model was not retrained. It is a direct ONNX export of the original UVDoc weights from [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the 2D coordinate grid (discarding the 3D shape output). ### Original Model - **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887) - **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich) - **Published:** SIGGRAPH Asia 2023 - **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc) ## Limitations - Input must be resized to 720x496 for inference (grid output is always 45x31) - Works best on documents with visible text/content (needs features for grid estimation) - May not handle extreme perspective distortions well - CPU inference takes ~100-200ms per image ## Citation If you use this model, please cite the original UVDoc paper: ```bibtex @inproceedings{UVDoc, title={{UVDoc}: Neural Grid-based Document Unwarping}, author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung}, booktitle = {SIGGRAPH ASIA, Technical Papers}, year = {2023}, url={https://doi.org/10.1145/3610548.3618174} } ``` ## License This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also Apache 2.0 licensed. See the original repository for full license details.