| --- |
| license: apache-2.0 |
| base_model: SBB/sbb_binarization |
| tags: |
| - document-binarization |
| - onnx |
| - tensorrt |
| - ocr |
| library_name: onnxruntime |
| --- |
| |
| # SBB Binarization β ONNX Model |
|
|
| ONNX conversion of [SBB/sbb_binarization](https://huggingface.co/SBB/sbb_binarization) by the Berlin State Library (Staatsbibliothek zu Berlin), developed as part of the [QURATOR](https://qurator.ai/) project. |
|
|
| The original model is a UNet + Vision Transformer hybrid that converts scanned document images to black and white for OCR. It works on 448x448 patches. |
|
|
| ## What's here |
|
|
| - **model_convtranspose.onnx** β the model, ready to use with ONNX Runtime |
| - **fix_onnx.py** β the script that converts SBB's TensorFlow SavedModel to this ONNX file |
| - **sample_workflow.py** β minimal example showing how to binarize an image |
| - **example_gpu_pipeline.py** β full GPU pipeline using CuPy for production use |
| |
| ## Quick start |
| |
| ```bash |
| pip install onnxruntime-gpu numpy Pillow |
| python3 sample_workflow.py input.jpg output.tif |
| ``` |
| |
| ## What was changed from the original |
| |
| The original TF model doesn't convert cleanly to ONNX for TensorRT. Three things needed fixing: |
| |
| 1. **Reshape batch dimension** β tf2onnx hardcoded -2048 instead of -1 for dynamic batching |
| 2. **Resize node attributes** β TF-specific modes that TensorRT doesn't support, swapped to standard ONNX equivalents |
| 3. **Resize to ConvTranspose** β replaced nearest-neighbor upsampling with equivalent depthwise ConvTranspose ops so TensorRT can compile the model as a single subgraph instead of splitting it into 8+ pieces |
|
|
| All of this is in `fix_onnx.py`. To reproduce from scratch: |
|
|
| ```bash |
| pip install tf2onnx onnx tensorflow |
| python3 -m tf2onnx.convert --saved-model path/to/saved_model/2022-08-16 --output model.onnx --opset 17 |
| python3 fix_onnx.py model.onnx model_convtranspose.onnx |
| ``` |
|
|
| Output is <0.01% pixel difference from the original TF model. |
|
|
| ## License |
|
|
| Same as the original β [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0). All credit to the SBB team and the QURATOR project for the model itself. |
|
|