|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: onnx |
|
|
pipeline_tag: image-to-image |
|
|
tags: |
|
|
- onnx |
|
|
- document-processing |
|
|
- document-unwarping |
|
|
- image-processing |
|
|
- ocr-preprocessing |
|
|
- computer-vision |
|
|
--- |
|
|
|
|
|
# UVDoc Grid Output - Document Unwarping ONNX Model |
|
|
|
|
|
This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model, |
|
|
modified to output a **coordinate grid** instead of an image. This enables high-resolution document |
|
|
unwarping via `cv2.remap()`. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed |
|
|
documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version |
|
|
outputs a coordinate mapping grid that can be applied to images of any resolution. |
|
|
|
|
|
### Key Difference: Grid Output vs Image Output |
|
|
|
|
|
| Approach | Output | Quality | |
|
|
|----------|--------|---------| |
|
|
| **Image-output models** | 288x288 RGB image | Poor (must upscale) | |
|
|
| **This grid-output model** | 45x31 coordinate grid | Native resolution | |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Architecture:** UVDoc (ResNet-based encoder-decoder) |
|
|
- **Input:** `(1, 3, 720, 496)` - RGB image, normalized [0, 1] |
|
|
- **Output:** `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range |
|
|
- **ONNX Opset:** 16 |
|
|
- **Size:** ~30 MB |
|
|
|
|
|
### Input Specifications |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Shape | `(batch, 3, 720, 496)` | |
|
|
| Format | RGB (not BGR) | |
|
|
| Range | `[0, 1]` (normalized) | |
|
|
| Layout | NCHW (batch, channels, height, width) | |
|
|
|
|
|
### Output Specifications |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| Shape | `(batch, 2, 45, 31)` | |
|
|
| Channels | 2 (x, y coordinates) | |
|
|
| Range | `[-1, 1]` (normalized coordinates) | |
|
|
| Layout | NCHW (batch, channels, height, width) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
### With ONNX Runtime (Python) |
|
|
|
|
|
```python |
|
|
import cv2 |
|
|
import numpy as np |
|
|
import onnxruntime as ort |
|
|
|
|
|
# Load model |
|
|
session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider']) |
|
|
|
|
|
# Load and preprocess image |
|
|
image = cv2.imread("warped_document.jpg") |
|
|
h_orig, w_orig = image.shape[:2] |
|
|
|
|
|
# Prepare model input (720x496 RGB normalized) |
|
|
img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) |
|
|
resized = cv2.resize(img_rgb, (496, 720)) # width, height |
|
|
blob = resized.astype(np.float32) / 255.0 |
|
|
blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496) |
|
|
|
|
|
# Run inference |
|
|
result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31) |
|
|
|
|
|
# Convert grid to remap coordinates |
|
|
grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2) |
|
|
grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR) |
|
|
|
|
|
map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1) |
|
|
map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1) |
|
|
|
|
|
# Apply unwarping to original high-res image |
|
|
unwarped = cv2.remap( |
|
|
image, |
|
|
map_x.astype(np.float32), |
|
|
map_y.astype(np.float32), |
|
|
interpolation=cv2.INTER_CUBIC, |
|
|
borderMode=cv2.BORDER_REPLICATE |
|
|
) |
|
|
|
|
|
cv2.imwrite("unwarped_document.jpg", unwarped) |
|
|
``` |
|
|
|
|
|
### With HuggingFace Hub |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
model_path = hf_hub_download( |
|
|
repo_id="YOUR_USERNAME/uvdoc-grid-onnx", |
|
|
filename="UVDoc_grid.onnx" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
This model was not retrained. It is a direct ONNX export of the original UVDoc weights from |
|
|
[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the |
|
|
2D coordinate grid (discarding the 3D shape output). |
|
|
|
|
|
### Original Model |
|
|
|
|
|
- **Paper:** [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887) |
|
|
- **Authors:** Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich) |
|
|
- **Published:** SIGGRAPH Asia 2023 |
|
|
- **Original Repository:** [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc) |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Input must be resized to 720x496 for inference (grid output is always 45x31) |
|
|
- Works best on documents with visible text/content (needs features for grid estimation) |
|
|
- May not handle extreme perspective distortions well |
|
|
- CPU inference takes ~100-200ms per image |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original UVDoc paper: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{UVDoc, |
|
|
title={{UVDoc}: Neural Grid-based Document Unwarping}, |
|
|
author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung}, |
|
|
booktitle = {SIGGRAPH ASIA, Technical Papers}, |
|
|
year = {2023}, |
|
|
url={https://doi.org/10.1145/3610548.3618174} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also |
|
|
Apache 2.0 licensed. See the original repository for full license details. |
|
|
|