# OneOCR — Reverse-Engineered Cross-Platform OCR Pipeline Full reimplementation of Microsoft's OneOCR engine from Windows Snipping Tool. `.onemodel` encryption cracked, 34 ONNX models extracted, all custom ops replaced — runs on any OS with `onnxruntime`. --- ## Project Status | Component | Status | Details | |---|---|---| | **.onemodel decryption** | ✅ Done | AES-256-CFB128, static key + IV | | **Model extraction** | ✅ Done | 34 ONNX models, 33 config files | | **Custom op unlocking** | ✅ Done | `OneOCRFeatureExtract` → `Gemm`/`Conv1x1` | | **ONNX pipeline** | ⚠️ Partial | **53% match rate** vs DLL (10/19 test images) | | **DLL pipeline (Windows)** | ✅ Done | ctypes wrapper, 100% accuracy | | **DLL pipeline (Linux)** | ✅ Done | Wine bridge, 100% accuracy, Docker ready | ### Known ONNX Engine Limitations The Python reimplementation achieves **53% match rate** against the original DLL. Below is a detailed breakdown of the remaining issues. #### Issue 1: False FPN2 Detections (4 images) **Images:** ocr_test 6, 13, 17, 18 **Symptom:** Panel edges / dialog borders detected as text **Cause:** FPN2 (stride=4) sees edges as text-like textures **DLL solution:** `SeglinkProposals` — advanced C++ post-processing with multi-stage NMS: - `textline_hardnms_iou_threshold = 0.32` - `textline_groupnms_span_ratio_threshold = 0.3` - `ambiguous_nms_threshold = 0.3` / `ambiguous_overlap_threshold = 0.5` - `K_of_detections` — per-scale detection limit #### Issue 2: Missing Small Characters "..." (2 images) **Images:** ocr_test 7, 14 **Symptom:** Three dots too small to detect **Cause:** Minimum `min_component_pixels` and `min_area` thresholds insufficient **DLL solution:** `SeglinkGroup` — groups neighboring segments into a single line #### Issue 3: Character Recognition Errors (2 images) **Images:** ocr_test 1, 15 **Symptom:** "iob" instead of "job", extra text from margins **Cause:** Differences in text cropping/preprocessing **DLL solution:** `BaseNormalizer` — sophisticated text line normalization #### Issue 4: Large Images (test.png — 31.8% match) **Symptom:** 55 of 74 lines detected, some cut off at edges **Cause:** Adaptive Scaling — DLL scales at multiple levels **DLL solution:** `AdaptiveScaling` with `AS_LARGE_TEXT_THRESHOLD` --- ## Architecture ``` Image (PIL / numpy) │ ▼ ┌──────────────────────────────────┐ │ Detector (model_00) │ PixelLink FPN (fpn2/3/4) │ BGR, mean subtraction │ stride = 4 / 8 / 16 │ → pixel_scores, link_scores │ 8-neighbor, Union-Find │ → bounding quads (lines) │ minAreaRect + NMS (IoU 0.2) └──────────────────────────────────┘ │ ▼ for each detected line ┌──────────────────────────────────┐ │ Crop + padding (15%) │ Axis-aligned / perspective │ ScriptID (model_01) │ 10 scripts: Latin, CJK, Arabic... │ RGB / 255.0, height=60px │ HW/PC classification, flip detection └──────────────────────────────────┘ │ ▼ per script ┌──────────────────────────────────┐ │ Recognizer (model_02–10) │ DynamicQuantizeLSTM + CTC │ Per-script character maps │ Greedy decode with per-char confidence │ → text + word confidences │ Word splitting on spaces └──────────────────────────────────┘ │ ▼ ┌──────────────────────────────────┐ │ Line grouping & sorting │ Y-overlap clustering │ Per-word bounding boxes │ Proportional quad interpolation │ Text angle estimation │ Median of top-edge angles └──────────────────────────────────┘ ``` ### Model Registry (34 models) | Index | Role | Script | Custom Op | Status | |-------|------|--------|-----------|--------| | 0 | Detector | Universal | `QLinearSigmoid` | ✅ Works | | 1 | ScriptID | Universal | — | ✅ Works | | 2–10 | Recognizers | Latin/CJK/Arabic/Cyrillic/Devanagari/Greek/Hebrew/Tamil/Thai | `DynamicQuantizeLSTM` | ✅ Work | | 11–21 | LangSm (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked | | 22–32 | LangMd (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked | | 33 | LineLayout | Universal | `OneOCRFeatureExtract` → **Conv1x1** | ✅ Unlocked | --- ## Quick Start ### Requirements ```bash pip install onnxruntime numpy opencv-python-headless Pillow pycryptodome onnx ``` Or with `uv`: ```bash uv sync --extra extract ``` ### Model Extraction (one-time) ```bash # Full pipeline: decrypt → extract → unlock → verify python tools/extract_pipeline.py ocr_data/oneocr.onemodel # Verify existing models only python tools/extract_pipeline.py --verify-only ``` ### Usage ```python # Recommended: Unified engine (auto-selects best backend) from ocr.engine_unified import OcrEngineUnified from PIL import Image engine = OcrEngineUnified() # auto: DLL → Wine → ONNX result = engine.recognize_pil(Image.open("screenshot.png")) print(f"Backend: {engine.backend_name}") # "dll" / "wine" / "onnx" print(result.text) # "Hello World" print(result.average_confidence) # 0.975 for line in result.lines: for word in line.words: print(f" '{word.text}' conf={word.confidence:.0%} " f"bbox=({word.bounding_rect.x1:.0f},{word.bounding_rect.y1:.0f})") ``` ```bash # CLI: python main.py screenshot.png # auto backend python main.py screenshot.png --backend dll # force DLL (Windows) python main.py screenshot.png --backend wine # force Wine (Linux) python main.py screenshot.png --backend onnx # force ONNX (any OS) python main.py screenshot.png -o result.json # save JSON output ``` ### ONNX Engine (alternative — cross-platform, no Wine needed) ```python from ocr.engine_onnx import OcrEngineOnnx from PIL import Image engine = OcrEngineOnnx() result = engine.recognize_pil(Image.open("screenshot.png")) print(result.text) ``` ### API Reference ```python engine = OcrEngineOnnx( models_dir="path/to/onnx_models", # optional config_dir="path/to/config_data", # optional providers=["CUDAExecutionProvider"], # optional (default: CPU) ) # Input formats: result = engine.recognize_pil(pil_image) # PIL Image result = engine.recognize_numpy(rgb_array) # numpy (H,W,3) RGB result = engine.recognize_bytes(png_bytes) # raw bytes (PNG/JPEG) # Result: result.text # str — full recognized text result.text_angle # float — detected rotation angle result.lines # list[OcrLine] result.average_confidence # float — overall confidence 0-1 result.error # str | None — error message # Per-word: word.text # str word.confidence # float — CTC confidence per word word.bounding_rect # BoundingRect (x1,y1...x4,y4 quadrilateral) ``` --- ## Running on Linux (Wine Bridge — 100% accuracy) The DLL has a remarkably clean dependency profile (only `KERNEL32`, `bcrypt`, `dbghelp` + shipped `onnxruntime.dll`), making it fully compatible with Wine. ### Option A: Docker (recommended) ```bash # Build docker build -t oneocr . # Run OCR on an image docker run --rm -v $(pwd)/working_space:/data oneocr \ python main.py /data/input/test.png --output /data/output/result.json # Interactive shell docker run --rm -it -v $(pwd)/working_space:/data oneocr bash ``` ### Option B: Native Wine ```bash # 1. Install Wine + MinGW cross-compiler # Ubuntu/Debian: sudo apt install wine64 mingw-w64 # Fedora: sudo dnf install wine mingw64-gcc # Arch: sudo pacman -S wine mingw-w64-gcc # 2. Initialize 64-bit Wine prefix WINEARCH=win64 wineboot --init # 3. Compile the Wine loader (one-time) x86_64-w64-mingw32-gcc -O2 -o tools/oneocr_loader.exe tools/oneocr_loader.c # 4. Test python main.py screenshot.png --backend wine ``` ### Wine Bridge Architecture ``` Linux Python ──► subprocess (wine64) ──► oneocr_loader.exe ──► oneocr.dll ▲ │ │ ▼ └──── JSON stdout ◄──── OCR results ◄──── onnxruntime.dll ``` **DLL Dependencies (all implemented in Wine ≥ 8.0):** | DLL | Functions | Wine Status | Notes | |-----|-----------|-------------|-------| | `KERNEL32.dll` | 183 | ✅ Full | Standard WinAPI | | `bcrypt.dll` | 12 | ✅ Full | AES-256-CFB128 for model decryption | | `dbghelp.dll` | 5 | ✅ Stubs | Debug symbols — non-critical | | `onnxruntime.dll` | 1 | N/A | Shipped with package | --- ## Project Structure ``` ONEOCR/ ├── main.py # CLI entry point (auto-selects backend) ├── Dockerfile # Docker setup for Linux (Wine + DLL) ├── pyproject.toml # Project config & dependencies ├── README.md # This documentation ├── .gitignore │ ├── ocr/ # Core OCR package │ ├── __init__.py # Exports all engines & models │ ├── engine.py # DLL wrapper (Windows only, 374 lines) │ ├── engine_onnx.py # ONNX engine (cross-platform, ~1100 lines) │ ├── engine_unified.py # Unified wrapper (DLL → Wine → ONNX) │ └── models.py # Data models: OcrResult, OcrLine, OcrWord │ ├── tools/ # Utilities │ ├── extract_pipeline.py # Extraction pipeline (decrypt→extract→unlock→verify) │ ├── visualize_ocr.py # OCR result visualization with bounding boxes │ ├── test_quick.py # Quick OCR test on images │ ├── wine_bridge.py # Wine bridge for Linux (C loader + Python API) │ └── oneocr_loader.c # C source for Wine loader (auto-generated) │ ├── ocr_data/ # Runtime data (DO NOT commit) │ ├── oneocr.dll # Original DLL (Windows only) │ ├── oneocr.onemodel # Encrypted model container │ └── onnxruntime.dll # ONNX Runtime DLL │ ├── oneocr_extracted/ # Extracted models (auto-generated) │ ├── onnx_models/ # 34 raw ONNX (models 11-33 have custom ops) │ ├── onnx_models_unlocked/ # 23 unlocked (models 11-33, standard ONNX ops) │ └── config_data/ # Character maps, rnn_info, manifest, configs │ ├── working_space/ # Test images │ └── input/ # 19 test images │ └── _archive/ # Archive — RE scripts, analyses, prototypes ├── temp/re_output/ # DLL reverse engineering results ├── attempts/ # Decryption attempts ├── analysis/ # Cryptographic analyses └── hooks/ # Frida hooks ``` --- ## Technical Details ### .onemodel Encryption | Element | Value | |---------|-------| | Algorithm | AES-256-CFB128 | | Master Key | `kj)TGtrK>f]b[Piow.gU+nC@s""""""4` (32B) | | IV | `Copyright @ OneO` (16B) | | DX key | `SHA256(master_key + file[8:24])` | | Config key | `SHA256(DX[48:64] + DX[32:48])` | | Chunk key | `SHA256(chunk_header[16:32] + chunk_header[0:16])` | ### OneOCRFeatureExtract — Cracked Custom Op Proprietary op (domain `com.microsoft.oneocr`) stores weights as a **big-endian float32** blob in a STRING tensor. **Models 11–32** (21→50 features): ``` config_blob (4492B, big-endian float32): W[21×50] = 1050 floats (weight matrix) b[50] = 50 floats (bias) metadata = 23 floats (dimensions [21, 50, 2], flags, calibration) Replacement: Gemm(input, W^T, b) ``` **Model 33** (256→16 channels): ``` config_blob (16548B, big-endian float32): W[256×16] = 4096 floats (convolution weights) b[16] = 16 floats (bias) metadata = 25 floats (dimensions [256, 16], flags) Replacement: Conv(input, W[in,out].T → [16,256,1,1], b, kernel=1x1) ``` ### Detector Configuration (from DLL protobuf manifest) ``` segment_conf_threshold: 0.7 (field 8) textline_conf_threshold per-FPN: P2=0.7, P3=0.8, P4=0.8 (field 9) textline_nms_threshold: 0.2 (field 10) textline_overlap_threshold: 0.4 (field 11) text_confidence_threshold: 0.8 (field 13) ambiguous_nms_threshold: 0.3 (field 15) ambiguous_overlap_threshold: 0.5 (field 16) ambiguous_save_threshold: 0.4 (field 17) textline_hardnms_iou_threshold: 0.32 (field 20) textline_groupnms_span_ratio_threshold: 0.3 (field 21) ``` ### PixelLink Detector - **FPN levels**: fpn2 (stride=4), fpn3 (stride=8), fpn4 (stride=16) - **Outputs per level**: `scores_hori/vert` (pixel text probability), `link_scores_hori/vert` (8-neighbor connectivity), `bbox_deltas_hori/vert` (corner offsets) - **Post-processing**: Threshold pixels → Union-Find connected components → bbox regression → NMS - **Detects TEXT LINES** — word splitting comes from the recognizer ### CTC Recognition - Target height: 60px, aspect ratio preserved - Input: RGB / 255.0, NCHW format - Output: log-softmax [T, 1, N_chars] - Decoding: greedy argmax with repeat merging + blank removal - Per-character confidence via `exp(max_logprob)` --- ## DLL Reverse Engineering — Results & Materials ### DLL Source Structure (from debug symbols) ``` C:\__w\1\s\CoreEngine\Native\ ├── TextDetector/ │ ├── AdaptiveScaling ← multi-level image scaling │ ├── SeglinkProposal ← KEY: detection post-processing │ ├── SeglinkGroup.h ← segment grouping into lines │ ├── TextLinePolygon ← precise text contouring │ ├── RelationRCNNRpn2 ← relational region proposal network │ ├── BaseRCNN, DQDETR ← alternative detectors │ ├── PolyFitting ← polynomial fitting │ └── BarcodePolygon ← barcode detection │ ├── TextRecognizer/ │ ├── TextLineRecognizerImpl ← main CTC implementation │ ├── ArgMaxDecoder ← CTC decoding │ ├── ConfidenceProcessor ← confidence models (models 11-21) │ ├── RejectionProcessor ← rejection models (models 22-32) │ ├── DbLstm ← dynamic batch LSTM │ └── CharacterMap/ ← per-script character maps │ ├── TextAnalyzer/ │ ├── TextAnalyzerImpl ← text layout analysis │ └── AuxMltClsClassifier ← auxiliary classifier │ ├── TextNormalizer/ │ ├── BaseNormalizer ← text line normalization │ └── ConcatTextLines ← line concatenation │ ├── TextPipeline/ │ ├── TextPipelineDevImpl ← main pipeline │ └── FilterXY ← position-based filtering │ ├── CustomOps/onnxruntime/ │ ├── SeglinkProposalsOp ← ONNX op (NOT in our models) │ ├── XYSeglinkProposalsOp ← XY variant │ └── FeatureExtractOp ← = Gemm / Conv1x1 │ ├── ModelParser/ │ ├── ModelParser ← .onemodel parsing │ └── Crypto ← AES-256-CFB128 │ └── Common/ ├── ImageUtility ← image conversion └── ImageFeature ← image features ``` ### RE Materials Reverse engineering results in `_archive/temp/re_output/`: - `03_oneocr_classes.txt` — 186 C++ classes - `06_config_strings.txt` — 429 config strings - `15_manifest_decoded.txt` — 1182 lines of decoded protobuf manifest - `09_constants.txt` — 42 float + 14 double constants (800.0, 0.7, 0.8, 0.92...) - `10_disassembly.txt` — disassembly of key exports --- ## For Future Developers — Roadmap ### Priority 1: SeglinkProposals (hardest, highest impact) This is the key C++ post-processing in the DLL that is NOT part of the ONNX models. Responsible for ~80% of the differences between the DLL and our implementation. **What it does:** 1. Takes raw pixel_scores + link_scores + bbox_deltas from all 3 FPN levels 2. Groups segments into lines (SeglinkGroup) — merges neighboring small components into a single line 3. Multi-stage NMS: textline_nms → hardnms → ambiguous_nms → groupnms 4. Confidence filtering with `text_confidence_threshold = 0.8` 5. `K_of_detections` — detection count limit **Where to look:** - `_archive/temp/re_output/06_config_strings.txt` — parameter names - `_archive/temp/re_output/15_manifest_decoded.txt` — parameter values - `SeglinkProposal` class in DLL — ~2000 lines of C++ **Approach:** - Decompile `SeglinkProposal::Process` with IDA Pro / Ghidra - Alternatively: black-box testing of different NMS configurations ### Priority 2: AdaptiveScaling The DLL dynamically scales images based on text size. **Parameters:** - `AS_LARGE_TEXT_THRESHOLD` — large text threshold - Multi-scale: DLL can run the detector at multiple scales ### Priority 3: BaseNormalizer The DLL normalizes text crops before recognition more effectively than our simple resize. ### Priority 4: Confidence/Rejection Models (11-32) The DLL uses models 11-32 to filter results — we skip them. Integration could improve precision by removing false detections. --- ## Performance | Operation | ONNX (CPU) | DLL | Notes | |---|---|---|---| | Detection (PixelLink) | ~50-200ms | ~15-50ms | Model inference + post-processing | | ScriptID | ~5ms | ~3ms | Single forward pass | | Recognition (CTC) | ~30ms/line | ~10ms/line | Per-script LSTM | | Full pipeline | ~300-1000ms | ~15-135ms | Depends on line count | --- ## License For research and educational purposes only.