uvdoc-grid-onnx / README.md
fredcallagan's picture
Upload folder using huggingface_hub
39c8284 verified
---
license: apache-2.0
library_name: onnx
pipeline_tag: image-to-image
tags:
- onnx
- document-processing
- document-unwarping
- image-processing
- ocr-preprocessing
- computer-vision
---
# UVDoc Grid Output - Document Unwarping ONNX Model
This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model,
modified to output a **coordinate grid** instead of an image. This enables high-resolution document
unwarping via `cv2.remap()`.
## Model Description
UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed
documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version
outputs a coordinate mapping grid that can be applied to images of any resolution.
### Key Difference: Grid Output vs Image Output
| Approach | Output | Quality |
|----------|--------|---------|
| **Image-output models** | 288x288 RGB image | Poor (must upscale) |
| **This grid-output model** | 45x31 coordinate grid | Native resolution |
## Model Details
- **Architecture:** UVDoc (ResNet-based encoder-decoder)
- **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1]
- **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range
- **ONNX Opset:** 16
- **Size:** ~30 MB
### Input Specifications
| Property | Value |
|----------|-------|
| Shape | `(batch, 3, 720, 496)` |
| Format | RGB (not BGR) |
| Range | `[0, 1]` (normalized) |
| Layout | NCHW (batch, channels, height, width) |
### Output Specifications
| Property | Value |
|----------|-------|
| Shape | `(batch, 2, 45, 31)` |
| Channels | 2 (x, y coordinates) |
| Range | `[-1, 1]` (normalized coordinates) |
| Layout | NCHW (batch, channels, height, width) |
## Usage
### With ONNX Runtime (Python)
```python
import cv2
import numpy as np
import onnxruntime as ort
# Load model
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])
# Load and preprocess image
image = cv2.imread("warped_document.jpg")
h_orig, w_orig = image.shape[:2]
# Prepare model input (720x496 RGB normalized)
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (496, 720)) # width, height
blob = resized.astype(np.float32) / 255.0
blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496)
# Run inference
result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31)
# Convert grid to remap coordinates
grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2)
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)
map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)
# Apply unwarping to original high-res image
unwarped = cv2.remap(
image,
map_x.astype(np.float32),
map_y.astype(np.float32),
interpolation=cv2.INTER_CUBIC,
borderMode=cv2.BORDER_REPLICATE
)
cv2.imwrite("unwarped_document.jpg", unwarped)
```
### With HuggingFace Hub
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
filename="UVDoc_grid.onnx"
)
```
## Training Details
This model was not retrained. It is a direct ONNX export of the original UVDoc weights from
[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the
2D coordinate grid (discarding the 3D shape output).
### Original Model
- **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887)
- **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
- **Published:** SIGGRAPH Asia 2023
- **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc)
## Limitations
- Input must be resized to 720x496 for inference (grid output is always 45x31)
- Works best on documents with visible text/content (needs features for grid estimation)
- May not handle extreme perspective distortions well
- CPU inference takes ~100-200ms per image
## Citation
If you use this model, please cite the original UVDoc paper:
```bibtex
@inproceedings{UVDoc,
title={{UVDoc}: Neural Grid-based Document Unwarping},
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
booktitle = {SIGGRAPH ASIA, Technical Papers},
year = {2023},
url={https://doi.org/10.1145/3610548.3618174}
}
```
## License
This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also
Apache 2.0 licensed. See the original repository for full license details.