uvdoc-grid-onnx / README.md

Upload folder using huggingface_hub

39c8284 verified 11 days ago

4.54 kB

	---
	license: apache-2.0
	library_name: onnx
	pipeline_tag: image-to-image
	tags:
	- onnx
	- document-processing
	- document-unwarping
	- image-processing
	- ocr-preprocessing
	- computer-vision
	---

	# UVDoc Grid Output - Document Unwarping ONNX Model

	This is an ONNX export of the [UVDoc](https://github.com/tanguymagne/UVDoc) document unwarping model,
	modified to output a coordinate grid instead of an image. This enables high-resolution document
	unwarping via `cv2.remap()`.

	## Model Description

	UVDoc is a deep learning model for correcting perspective distortion and curvature in photographed
	documents. Unlike the PaddlePaddle ONNX variant that outputs a fixed 288x288 image, this version
	outputs a coordinate mapping grid that can be applied to images of any resolution.

	### Key Difference: Grid Output vs Image Output

	\| Approach \| Output \| Quality \|
	\|----------\|--------\|---------\|
	\| Image-output models \| 288x288 RGB image \| Poor (must upscale) \|
	\| This grid-output model \| 45x31 coordinate grid \| Native resolution \|

	## Model Details

	- Architecture: UVDoc (ResNet-based encoder-decoder)
	- Input: `(1, 3, 720, 496)` - RGB image, normalized [0, 1]
	- Output: `(1, 2, 45, 31)` - Coordinate grid in [-1, 1] range
	- ONNX Opset: 16
	- Size: ~30 MB

	### Input Specifications

	\| Property \| Value \|
	\|----------\|-------\|
	\| Shape \| `(batch, 3, 720, 496)` \|
	\| Format \| RGB (not BGR) \|
	\| Range \| `[0, 1]` (normalized) \|
	\| Layout \| NCHW (batch, channels, height, width) \|

	### Output Specifications

	\| Property \| Value \|
	\|----------\|-------\|
	\| Shape \| `(batch, 2, 45, 31)` \|
	\| Channels \| 2 (x, y coordinates) \|
	\| Range \| `[-1, 1]` (normalized coordinates) \|
	\| Layout \| NCHW (batch, channels, height, width) \|

	## Usage

	### With ONNX Runtime (Python)

	```python
	import cv2
	import numpy as np
	import onnxruntime as ort

	# Load model
	session = ort.InferenceSession("UVDoc_grid.onnx", providers=['CPUExecutionProvider'])

	# Load and preprocess image
	image = cv2.imread("warped_document.jpg")
	h_orig, w_orig = image.shape[:2]

	# Prepare model input (720x496 RGB normalized)
	img_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	resized = cv2.resize(img_rgb, (496, 720)) # width, height
	blob = resized.astype(np.float32) / 255.0
	blob = np.transpose(blob, (2, 0, 1))[None] # (1, 3, 720, 496)

	# Run inference
	result = session.run(None, {'image': blob})[0] # (1, 2, 45, 31)

	# Convert grid to remap coordinates
	grid = np.transpose(result[0], (1, 2, 0)) # (45, 31, 2)
	grid_up = cv2.resize(grid, (w_orig, h_orig), interpolation=cv2.INTER_LINEAR)

	map_x = ((grid_up[..., 0] + 1) / 2) * (w_orig - 1)
	map_y = ((grid_up[..., 1] + 1) / 2) * (h_orig - 1)

	# Apply unwarping to original high-res image
	unwarped = cv2.remap(
	image,
	map_x.astype(np.float32),
	map_y.astype(np.float32),
	interpolation=cv2.INTER_CUBIC,
	borderMode=cv2.BORDER_REPLICATE
	)

	cv2.imwrite("unwarped_document.jpg", unwarped)
	```

	### With HuggingFace Hub

	```python
	from huggingface_hub import hf_hub_download

	model_path = hf_hub_download(
	repo_id="YOUR_USERNAME/uvdoc-grid-onnx",
	filename="UVDoc_grid.onnx"
	)
	```

	## Training Details

	This model was not retrained. It is a direct ONNX export of the original UVDoc weights from
	[tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc), with a wrapper to output only the
	2D coordinate grid (discarding the 3D shape output).

	### Original Model

	- Paper: [UVDoc: Neural Grid-based Document Unwarping](https://arxiv.org/abs/2302.02887)
	- Authors: Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung (ETH Zurich)
	- Published: SIGGRAPH Asia 2023
	- Original Repository: [tanguymagne/UVDoc](https://github.com/tanguymagne/UVDoc)

	## Limitations

	- Input must be resized to 720x496 for inference (grid output is always 45x31)
	- Works best on documents with visible text/content (needs features for grid estimation)
	- May not handle extreme perspective distortions well
	- CPU inference takes ~100-200ms per image

	## Citation

	If you use this model, please cite the original UVDoc paper:

	```bibtex
	@inproceedings{UVDoc,
	title={{UVDoc}: Neural Grid-based Document Unwarping},
	author={Floor Verhoeven and Tanguy Magne and Olga Sorkine-Hornung},
	booktitle = {SIGGRAPH ASIA, Technical Papers},
	year = {2023},
	url={https://doi.org/10.1145/3610548.3618174}
	}
	```

	## License

	This ONNX export is provided under the Apache 2.0 license. The original UVDoc model is also
	Apache 2.0 licensed. See the original repository for full license details.