| # OneOCR — Reverse-Engineered Cross-Platform OCR Pipeline | |
| Full reimplementation of Microsoft's OneOCR engine from Windows Snipping Tool. | |
| `.onemodel` encryption cracked, 34 ONNX models extracted, all custom ops replaced — runs on any OS with `onnxruntime`. | |
| --- | |
| ## Project Status | |
| | Component | Status | Details | | |
| |---|---|---| | |
| | **.onemodel decryption** | ✅ Done | AES-256-CFB128, static key + IV | | |
| | **Model extraction** | ✅ Done | 34 ONNX models, 33 config files | | |
| | **Custom op unlocking** | ✅ Done | `OneOCRFeatureExtract` → `Gemm`/`Conv1x1` | | |
| | **ONNX pipeline** | ⚠️ Partial | **53% match rate** vs DLL (10/19 test images) | | |
| | **DLL pipeline (Windows)** | ✅ Done | ctypes wrapper, 100% accuracy | | |
| | **DLL pipeline (Linux)** | ✅ Done | Wine bridge, 100% accuracy, Docker ready | | |
| ### Known ONNX Engine Limitations | |
| The Python reimplementation achieves **53% match rate** against the original DLL. Below is a detailed breakdown of the remaining issues. | |
| #### Issue 1: False FPN2 Detections (4 images) | |
| **Images:** ocr_test 6, 13, 17, 18 | |
| **Symptom:** Panel edges / dialog borders detected as text | |
| **Cause:** FPN2 (stride=4) sees edges as text-like textures | |
| **DLL solution:** `SeglinkProposals` — advanced C++ post-processing with multi-stage NMS: | |
| - `textline_hardnms_iou_threshold = 0.32` | |
| - `textline_groupnms_span_ratio_threshold = 0.3` | |
| - `ambiguous_nms_threshold = 0.3` / `ambiguous_overlap_threshold = 0.5` | |
| - `K_of_detections` — per-scale detection limit | |
| #### Issue 2: Missing Small Characters "..." (2 images) | |
| **Images:** ocr_test 7, 14 | |
| **Symptom:** Three dots too small to detect | |
| **Cause:** Minimum `min_component_pixels` and `min_area` thresholds insufficient | |
| **DLL solution:** `SeglinkGroup` — groups neighboring segments into a single line | |
| #### Issue 3: Character Recognition Errors (2 images) | |
| **Images:** ocr_test 1, 15 | |
| **Symptom:** "iob" instead of "job", extra text from margins | |
| **Cause:** Differences in text cropping/preprocessing | |
| **DLL solution:** `BaseNormalizer` — sophisticated text line normalization | |
| #### Issue 4: Large Images (test.png — 31.8% match) | |
| **Symptom:** 55 of 74 lines detected, some cut off at edges | |
| **Cause:** Adaptive Scaling — DLL scales at multiple levels | |
| **DLL solution:** `AdaptiveScaling` with `AS_LARGE_TEXT_THRESHOLD` | |
| --- | |
| ## Architecture | |
| ``` | |
| Image (PIL / numpy) | |
| │ | |
| ▼ | |
| ┌──────────────────────────────────┐ | |
| │ Detector (model_00) │ PixelLink FPN (fpn2/3/4) | |
| │ BGR, mean subtraction │ stride = 4 / 8 / 16 | |
| │ → pixel_scores, link_scores │ 8-neighbor, Union-Find | |
| │ → bounding quads (lines) │ minAreaRect + NMS (IoU 0.2) | |
| └──────────────────────────────────┘ | |
| │ | |
| ▼ for each detected line | |
| ┌──────────────────────────────────┐ | |
| │ Crop + padding (15%) │ Axis-aligned / perspective | |
| │ ScriptID (model_01) │ 10 scripts: Latin, CJK, Arabic... | |
| │ RGB / 255.0, height=60px │ HW/PC classification, flip detection | |
| └──────────────────────────────────┘ | |
| │ | |
| ▼ per script | |
| ┌──────────────────────────────────┐ | |
| │ Recognizer (model_02–10) │ DynamicQuantizeLSTM + CTC | |
| │ Per-script character maps │ Greedy decode with per-char confidence | |
| │ → text + word confidences │ Word splitting on spaces | |
| └──────────────────────────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────────────────┐ | |
| │ Line grouping & sorting │ Y-overlap clustering | |
| │ Per-word bounding boxes │ Proportional quad interpolation | |
| │ Text angle estimation │ Median of top-edge angles | |
| └──────────────────────────────────┘ | |
| ``` | |
| ### Model Registry (34 models) | |
| | Index | Role | Script | Custom Op | Status | | |
| |-------|------|--------|-----------|--------| | |
| | 0 | Detector | Universal | `QLinearSigmoid` | ✅ Works | | |
| | 1 | ScriptID | Universal | — | ✅ Works | | |
| | 2–10 | Recognizers | Latin/CJK/Arabic/Cyrillic/Devanagari/Greek/Hebrew/Tamil/Thai | `DynamicQuantizeLSTM` | ✅ Work | | |
| | 11–21 | LangSm (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked | | |
| | 22–32 | LangMd (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked | | |
| | 33 | LineLayout | Universal | `OneOCRFeatureExtract` → **Conv1x1** | ✅ Unlocked | | |
| --- | |
| ## Quick Start | |
| ### Requirements | |
| ```bash | |
| pip install onnxruntime numpy opencv-python-headless Pillow pycryptodome onnx | |
| ``` | |
| Or with `uv`: | |
| ```bash | |
| uv sync --extra extract | |
| ``` | |
| ### Model Extraction (one-time) | |
| ```bash | |
| # Full pipeline: decrypt → extract → unlock → verify | |
| python tools/extract_pipeline.py ocr_data/oneocr.onemodel | |
| # Verify existing models only | |
| python tools/extract_pipeline.py --verify-only | |
| ``` | |
| ### Usage | |
| ```python | |
| # Recommended: Unified engine (auto-selects best backend) | |
| from ocr.engine_unified import OcrEngineUnified | |
| from PIL import Image | |
| engine = OcrEngineUnified() # auto: DLL → Wine → ONNX | |
| result = engine.recognize_pil(Image.open("screenshot.png")) | |
| print(f"Backend: {engine.backend_name}") # "dll" / "wine" / "onnx" | |
| print(result.text) # "Hello World" | |
| print(result.average_confidence) # 0.975 | |
| for line in result.lines: | |
| for word in line.words: | |
| print(f" '{word.text}' conf={word.confidence:.0%} " | |
| f"bbox=({word.bounding_rect.x1:.0f},{word.bounding_rect.y1:.0f})") | |
| ``` | |
| ```bash | |
| # CLI: | |
| python main.py screenshot.png # auto backend | |
| python main.py screenshot.png --backend dll # force DLL (Windows) | |
| python main.py screenshot.png --backend wine # force Wine (Linux) | |
| python main.py screenshot.png --backend onnx # force ONNX (any OS) | |
| python main.py screenshot.png -o result.json # save JSON output | |
| ``` | |
| ### ONNX Engine (alternative — cross-platform, no Wine needed) | |
| ```python | |
| from ocr.engine_onnx import OcrEngineOnnx | |
| from PIL import Image | |
| engine = OcrEngineOnnx() | |
| result = engine.recognize_pil(Image.open("screenshot.png")) | |
| print(result.text) | |
| ``` | |
| ### API Reference | |
| ```python | |
| engine = OcrEngineOnnx( | |
| models_dir="path/to/onnx_models", # optional | |
| config_dir="path/to/config_data", # optional | |
| providers=["CUDAExecutionProvider"], # optional (default: CPU) | |
| ) | |
| # Input formats: | |
| result = engine.recognize_pil(pil_image) # PIL Image | |
| result = engine.recognize_numpy(rgb_array) # numpy (H,W,3) RGB | |
| result = engine.recognize_bytes(png_bytes) # raw bytes (PNG/JPEG) | |
| # Result: | |
| result.text # str — full recognized text | |
| result.text_angle # float — detected rotation angle | |
| result.lines # list[OcrLine] | |
| result.average_confidence # float — overall confidence 0-1 | |
| result.error # str | None — error message | |
| # Per-word: | |
| word.text # str | |
| word.confidence # float — CTC confidence per word | |
| word.bounding_rect # BoundingRect (x1,y1...x4,y4 quadrilateral) | |
| ``` | |
| --- | |
| ## Running on Linux (Wine Bridge — 100% accuracy) | |
| The DLL has a remarkably clean dependency profile (only `KERNEL32`, `bcrypt`, `dbghelp` + shipped `onnxruntime.dll`), making it fully compatible with Wine. | |
| ### Option A: Docker (recommended) | |
| ```bash | |
| # Build | |
| docker build -t oneocr . | |
| # Run OCR on an image | |
| docker run --rm -v $(pwd)/working_space:/data oneocr \ | |
| python main.py /data/input/test.png --output /data/output/result.json | |
| # Interactive shell | |
| docker run --rm -it -v $(pwd)/working_space:/data oneocr bash | |
| ``` | |
| ### Option B: Native Wine | |
| ```bash | |
| # 1. Install Wine + MinGW cross-compiler | |
| # Ubuntu/Debian: | |
| sudo apt install wine64 mingw-w64 | |
| # Fedora: | |
| sudo dnf install wine mingw64-gcc | |
| # Arch: | |
| sudo pacman -S wine mingw-w64-gcc | |
| # 2. Initialize 64-bit Wine prefix | |
| WINEARCH=win64 wineboot --init | |
| # 3. Compile the Wine loader (one-time) | |
| x86_64-w64-mingw32-gcc -O2 -o tools/oneocr_loader.exe tools/oneocr_loader.c | |
| # 4. Test | |
| python main.py screenshot.png --backend wine | |
| ``` | |
| ### Wine Bridge Architecture | |
| ``` | |
| Linux Python ──► subprocess (wine64) ──► oneocr_loader.exe ──► oneocr.dll | |
| ▲ │ | |
| │ ▼ | |
| └──── JSON stdout ◄──── OCR results ◄──── onnxruntime.dll | |
| ``` | |
| **DLL Dependencies (all implemented in Wine ≥ 8.0):** | |
| | DLL | Functions | Wine Status | Notes | | |
| |-----|-----------|-------------|-------| | |
| | `KERNEL32.dll` | 183 | ✅ Full | Standard WinAPI | | |
| | `bcrypt.dll` | 12 | ✅ Full | AES-256-CFB128 for model decryption | | |
| | `dbghelp.dll` | 5 | ✅ Stubs | Debug symbols — non-critical | | |
| | `onnxruntime.dll` | 1 | N/A | Shipped with package | | |
| --- | |
| ## Project Structure | |
| ``` | |
| ONEOCR/ | |
| ├── main.py # CLI entry point (auto-selects backend) | |
| ├── Dockerfile # Docker setup for Linux (Wine + DLL) | |
| ├── pyproject.toml # Project config & dependencies | |
| ├── README.md # This documentation | |
| ├── .gitignore | |
| │ | |
| ├── ocr/ # Core OCR package | |
| │ ├── __init__.py # Exports all engines & models | |
| │ ├── engine.py # DLL wrapper (Windows only, 374 lines) | |
| │ ├── engine_onnx.py # ONNX engine (cross-platform, ~1100 lines) | |
| │ ├── engine_unified.py # Unified wrapper (DLL → Wine → ONNX) | |
| │ └── models.py # Data models: OcrResult, OcrLine, OcrWord | |
| │ | |
| ├── tools/ # Utilities | |
| │ ├── extract_pipeline.py # Extraction pipeline (decrypt→extract→unlock→verify) | |
| │ ├── visualize_ocr.py # OCR result visualization with bounding boxes | |
| │ ├── test_quick.py # Quick OCR test on images | |
| │ ├── wine_bridge.py # Wine bridge for Linux (C loader + Python API) | |
| │ └── oneocr_loader.c # C source for Wine loader (auto-generated) | |
| │ | |
| ├── ocr_data/ # Runtime data (DO NOT commit) | |
| │ ├── oneocr.dll # Original DLL (Windows only) | |
| │ ├── oneocr.onemodel # Encrypted model container | |
| │ └── onnxruntime.dll # ONNX Runtime DLL | |
| │ | |
| ├── oneocr_extracted/ # Extracted models (auto-generated) | |
| │ ├── onnx_models/ # 34 raw ONNX (models 11-33 have custom ops) | |
| │ ├── onnx_models_unlocked/ # 23 unlocked (models 11-33, standard ONNX ops) | |
| │ └── config_data/ # Character maps, rnn_info, manifest, configs | |
| │ | |
| ├── working_space/ # Test images | |
| │ └── input/ # 19 test images | |
| │ | |
| └── _archive/ # Archive — RE scripts, analyses, prototypes | |
| ├── temp/re_output/ # DLL reverse engineering results | |
| ├── attempts/ # Decryption attempts | |
| ├── analysis/ # Cryptographic analyses | |
| └── hooks/ # Frida hooks | |
| ``` | |
| --- | |
| ## Technical Details | |
| ### .onemodel Encryption | |
| | Element | Value | | |
| |---------|-------| | |
| | Algorithm | AES-256-CFB128 | | |
| | Master Key | `kj)TGtrK>f]b[Piow.gU+nC@s""""""4` (32B) | | |
| | IV | `Copyright @ OneO` (16B) | | |
| | DX key | `SHA256(master_key + file[8:24])` | | |
| | Config key | `SHA256(DX[48:64] + DX[32:48])` | | |
| | Chunk key | `SHA256(chunk_header[16:32] + chunk_header[0:16])` | | |
| ### OneOCRFeatureExtract — Cracked Custom Op | |
| Proprietary op (domain `com.microsoft.oneocr`) stores weights as a **big-endian float32** blob in a STRING tensor. | |
| **Models 11–32** (21→50 features): | |
| ``` | |
| config_blob (4492B, big-endian float32): | |
| W[21×50] = 1050 floats (weight matrix) | |
| b[50] = 50 floats (bias) | |
| metadata = 23 floats (dimensions [21, 50, 2], flags, calibration) | |
| Replacement: Gemm(input, W^T, b) | |
| ``` | |
| **Model 33** (256→16 channels): | |
| ``` | |
| config_blob (16548B, big-endian float32): | |
| W[256×16] = 4096 floats (convolution weights) | |
| b[16] = 16 floats (bias) | |
| metadata = 25 floats (dimensions [256, 16], flags) | |
| Replacement: Conv(input, W[in,out].T → [16,256,1,1], b, kernel=1x1) | |
| ``` | |
| ### Detector Configuration (from DLL protobuf manifest) | |
| ``` | |
| segment_conf_threshold: 0.7 (field 8) | |
| textline_conf_threshold per-FPN: P2=0.7, P3=0.8, P4=0.8 (field 9) | |
| textline_nms_threshold: 0.2 (field 10) | |
| textline_overlap_threshold: 0.4 (field 11) | |
| text_confidence_threshold: 0.8 (field 13) | |
| ambiguous_nms_threshold: 0.3 (field 15) | |
| ambiguous_overlap_threshold: 0.5 (field 16) | |
| ambiguous_save_threshold: 0.4 (field 17) | |
| textline_hardnms_iou_threshold: 0.32 (field 20) | |
| textline_groupnms_span_ratio_threshold: 0.3 (field 21) | |
| ``` | |
| ### PixelLink Detector | |
| - **FPN levels**: fpn2 (stride=4), fpn3 (stride=8), fpn4 (stride=16) | |
| - **Outputs per level**: `scores_hori/vert` (pixel text probability), `link_scores_hori/vert` (8-neighbor connectivity), `bbox_deltas_hori/vert` (corner offsets) | |
| - **Post-processing**: Threshold pixels → Union-Find connected components → bbox regression → NMS | |
| - **Detects TEXT LINES** — word splitting comes from the recognizer | |
| ### CTC Recognition | |
| - Target height: 60px, aspect ratio preserved | |
| - Input: RGB / 255.0, NCHW format | |
| - Output: log-softmax [T, 1, N_chars] | |
| - Decoding: greedy argmax with repeat merging + blank removal | |
| - Per-character confidence via `exp(max_logprob)` | |
| --- | |
| ## DLL Reverse Engineering — Results & Materials | |
| ### DLL Source Structure (from debug symbols) | |
| ``` | |
| C:\__w\1\s\CoreEngine\Native\ | |
| ├── TextDetector/ | |
| │ ├── AdaptiveScaling ← multi-level image scaling | |
| │ ├── SeglinkProposal ← KEY: detection post-processing | |
| │ ├── SeglinkGroup.h ← segment grouping into lines | |
| │ ├── TextLinePolygon ← precise text contouring | |
| │ ├── RelationRCNNRpn2 ← relational region proposal network | |
| │ ├── BaseRCNN, DQDETR ← alternative detectors | |
| │ ├── PolyFitting ← polynomial fitting | |
| │ └── BarcodePolygon ← barcode detection | |
| │ | |
| ├── TextRecognizer/ | |
| │ ├── TextLineRecognizerImpl ← main CTC implementation | |
| │ ├── ArgMaxDecoder ← CTC decoding | |
| │ ├── ConfidenceProcessor ← confidence models (models 11-21) | |
| │ ├── RejectionProcessor ← rejection models (models 22-32) | |
| │ ├── DbLstm ← dynamic batch LSTM | |
| │ └── CharacterMap/ ← per-script character maps | |
| │ | |
| ├── TextAnalyzer/ | |
| │ ├── TextAnalyzerImpl ← text layout analysis | |
| │ └── AuxMltClsClassifier ← auxiliary classifier | |
| │ | |
| ├── TextNormalizer/ | |
| │ ├── BaseNormalizer ← text line normalization | |
| │ └── ConcatTextLines ← line concatenation | |
| │ | |
| ├── TextPipeline/ | |
| │ ├── TextPipelineDevImpl ← main pipeline | |
| │ └── FilterXY ← position-based filtering | |
| │ | |
| ├── CustomOps/onnxruntime/ | |
| │ ├── SeglinkProposalsOp ← ONNX op (NOT in our models) | |
| │ ├── XYSeglinkProposalsOp ← XY variant | |
| │ └── FeatureExtractOp ← = Gemm / Conv1x1 | |
| │ | |
| ├── ModelParser/ | |
| │ ├── ModelParser ← .onemodel parsing | |
| │ └── Crypto ← AES-256-CFB128 | |
| │ | |
| └── Common/ | |
| ├── ImageUtility ← image conversion | |
| └── ImageFeature ← image features | |
| ``` | |
| ### RE Materials | |
| Reverse engineering results in `_archive/temp/re_output/`: | |
| - `03_oneocr_classes.txt` — 186 C++ classes | |
| - `06_config_strings.txt` — 429 config strings | |
| - `15_manifest_decoded.txt` — 1182 lines of decoded protobuf manifest | |
| - `09_constants.txt` — 42 float + 14 double constants (800.0, 0.7, 0.8, 0.92...) | |
| - `10_disassembly.txt` — disassembly of key exports | |
| --- | |
| ## For Future Developers — Roadmap | |
| ### Priority 1: SeglinkProposals (hardest, highest impact) | |
| This is the key C++ post-processing in the DLL that is NOT part of the ONNX models. | |
| Responsible for ~80% of the differences between the DLL and our implementation. | |
| **What it does:** | |
| 1. Takes raw pixel_scores + link_scores + bbox_deltas from all 3 FPN levels | |
| 2. Groups segments into lines (SeglinkGroup) — merges neighboring small components into a single line | |
| 3. Multi-stage NMS: textline_nms → hardnms → ambiguous_nms → groupnms | |
| 4. Confidence filtering with `text_confidence_threshold = 0.8` | |
| 5. `K_of_detections` — detection count limit | |
| **Where to look:** | |
| - `_archive/temp/re_output/06_config_strings.txt` — parameter names | |
| - `_archive/temp/re_output/15_manifest_decoded.txt` — parameter values | |
| - `SeglinkProposal` class in DLL — ~2000 lines of C++ | |
| **Approach:** | |
| - Decompile `SeglinkProposal::Process` with IDA Pro / Ghidra | |
| - Alternatively: black-box testing of different NMS configurations | |
| ### Priority 2: AdaptiveScaling | |
| The DLL dynamically scales images based on text size. | |
| **Parameters:** | |
| - `AS_LARGE_TEXT_THRESHOLD` — large text threshold | |
| - Multi-scale: DLL can run the detector at multiple scales | |
| ### Priority 3: BaseNormalizer | |
| The DLL normalizes text crops before recognition more effectively than our simple resize. | |
| ### Priority 4: Confidence/Rejection Models (11-32) | |
| The DLL uses models 11-32 to filter results — we skip them. Integration could improve | |
| precision by removing false detections. | |
| --- | |
| ## Performance | |
| | Operation | ONNX (CPU) | DLL | Notes | | |
| |---|---|---|---| | |
| | Detection (PixelLink) | ~50-200ms | ~15-50ms | Model inference + post-processing | | |
| | ScriptID | ~5ms | ~3ms | Single forward pass | | |
| | Recognition (CTC) | ~30ms/line | ~10ms/line | Per-script LSTM | | |
| | Full pipeline | ~300-1000ms | ~15-135ms | Depends on line count | | |
| --- | |
| ## License | |
| For research and educational purposes only. | |