OneOCR — Reverse-Engineered Cross-Platform OCR Pipeline
Full reimplementation of Microsoft's OneOCR engine from Windows Snipping Tool..onemodel encryption cracked, 34 ONNX models extracted, all custom ops replaced — runs on any OS with onnxruntime.
Project Status
| Component | Status | Details |
|---|---|---|
| .onemodel decryption | ✅ Done | AES-256-CFB128, static key + IV |
| Model extraction | ✅ Done | 34 ONNX models, 33 config files |
| Custom op unlocking | ✅ Done | OneOCRFeatureExtract → Gemm/Conv1x1 |
| ONNX pipeline | ⚠️ Partial | 53% match rate vs DLL (10/19 test images) |
| DLL pipeline (Windows) | ✅ Done | ctypes wrapper, 100% accuracy |
| DLL pipeline (Linux) | ✅ Done | Wine bridge, 100% accuracy, Docker ready |
Known ONNX Engine Limitations
The Python reimplementation achieves 53% match rate against the original DLL. Below is a detailed breakdown of the remaining issues.
Issue 1: False FPN2 Detections (4 images)
Images: ocr_test 6, 13, 17, 18
Symptom: Panel edges / dialog borders detected as text
Cause: FPN2 (stride=4) sees edges as text-like textures
DLL solution: SeglinkProposals — advanced C++ post-processing with multi-stage NMS:
textline_hardnms_iou_threshold = 0.32textline_groupnms_span_ratio_threshold = 0.3ambiguous_nms_threshold = 0.3/ambiguous_overlap_threshold = 0.5K_of_detections— per-scale detection limit
Issue 2: Missing Small Characters "..." (2 images)
Images: ocr_test 7, 14
Symptom: Three dots too small to detect
Cause: Minimum min_component_pixels and min_area thresholds insufficient
DLL solution: SeglinkGroup — groups neighboring segments into a single line
Issue 3: Character Recognition Errors (2 images)
Images: ocr_test 1, 15
Symptom: "iob" instead of "job", extra text from margins
Cause: Differences in text cropping/preprocessing
DLL solution: BaseNormalizer — sophisticated text line normalization
Issue 4: Large Images (test.png — 31.8% match)
Symptom: 55 of 74 lines detected, some cut off at edges
Cause: Adaptive Scaling — DLL scales at multiple levels
DLL solution: AdaptiveScaling with AS_LARGE_TEXT_THRESHOLD
Architecture
Image (PIL / numpy)
│
▼
┌──────────────────────────────────┐
│ Detector (model_00) │ PixelLink FPN (fpn2/3/4)
│ BGR, mean subtraction │ stride = 4 / 8 / 16
│ → pixel_scores, link_scores │ 8-neighbor, Union-Find
│ → bounding quads (lines) │ minAreaRect + NMS (IoU 0.2)
└──────────────────────────────────┘
│
▼ for each detected line
┌──────────────────────────────────┐
│ Crop + padding (15%) │ Axis-aligned / perspective
│ ScriptID (model_01) │ 10 scripts: Latin, CJK, Arabic...
│ RGB / 255.0, height=60px │ HW/PC classification, flip detection
└──────────────────────────────────┘
│
▼ per script
┌──────────────────────────────────┐
│ Recognizer (model_02–10) │ DynamicQuantizeLSTM + CTC
│ Per-script character maps │ Greedy decode with per-char confidence
│ → text + word confidences │ Word splitting on spaces
└──────────────────────────────────┘
│
▼
┌──────────────────────────────────┐
│ Line grouping & sorting │ Y-overlap clustering
│ Per-word bounding boxes │ Proportional quad interpolation
│ Text angle estimation │ Median of top-edge angles
└──────────────────────────────────┘
Model Registry (34 models)
| Index | Role | Script | Custom Op | Status |
|---|---|---|---|---|
| 0 | Detector | Universal | QLinearSigmoid |
✅ Works |
| 1 | ScriptID | Universal | — | ✅ Works |
| 2–10 | Recognizers | Latin/CJK/Arabic/Cyrillic/Devanagari/Greek/Hebrew/Tamil/Thai | DynamicQuantizeLSTM |
✅ Work |
| 11–21 | LangSm (confidence) | Per-script | OneOCRFeatureExtract → Gemm |
✅ Unlocked |
| 22–32 | LangMd (confidence) | Per-script | OneOCRFeatureExtract → Gemm |
✅ Unlocked |
| 33 | LineLayout | Universal | OneOCRFeatureExtract → Conv1x1 |
✅ Unlocked |
Quick Start
Requirements
pip install onnxruntime numpy opencv-python-headless Pillow pycryptodome onnx
Or with uv:
uv sync --extra extract
Model Extraction (one-time)
# Full pipeline: decrypt → extract → unlock → verify
python tools/extract_pipeline.py ocr_data/oneocr.onemodel
# Verify existing models only
python tools/extract_pipeline.py --verify-only
Usage
# Recommended: Unified engine (auto-selects best backend)
from ocr.engine_unified import OcrEngineUnified
from PIL import Image
engine = OcrEngineUnified() # auto: DLL → Wine → ONNX
result = engine.recognize_pil(Image.open("screenshot.png"))
print(f"Backend: {engine.backend_name}") # "dll" / "wine" / "onnx"
print(result.text) # "Hello World"
print(result.average_confidence) # 0.975
for line in result.lines:
for word in line.words:
print(f" '{word.text}' conf={word.confidence:.0%} "
f"bbox=({word.bounding_rect.x1:.0f},{word.bounding_rect.y1:.0f})")
# CLI:
python main.py screenshot.png # auto backend
python main.py screenshot.png --backend dll # force DLL (Windows)
python main.py screenshot.png --backend wine # force Wine (Linux)
python main.py screenshot.png --backend onnx # force ONNX (any OS)
python main.py screenshot.png -o result.json # save JSON output
ONNX Engine (alternative — cross-platform, no Wine needed)
from ocr.engine_onnx import OcrEngineOnnx
from PIL import Image
engine = OcrEngineOnnx()
result = engine.recognize_pil(Image.open("screenshot.png"))
print(result.text)
API Reference
engine = OcrEngineOnnx(
models_dir="path/to/onnx_models", # optional
config_dir="path/to/config_data", # optional
providers=["CUDAExecutionProvider"], # optional (default: CPU)
)
# Input formats:
result = engine.recognize_pil(pil_image) # PIL Image
result = engine.recognize_numpy(rgb_array) # numpy (H,W,3) RGB
result = engine.recognize_bytes(png_bytes) # raw bytes (PNG/JPEG)
# Result:
result.text # str — full recognized text
result.text_angle # float — detected rotation angle
result.lines # list[OcrLine]
result.average_confidence # float — overall confidence 0-1
result.error # str | None — error message
# Per-word:
word.text # str
word.confidence # float — CTC confidence per word
word.bounding_rect # BoundingRect (x1,y1...x4,y4 quadrilateral)
Running on Linux (Wine Bridge — 100% accuracy)
The DLL has a remarkably clean dependency profile (only KERNEL32, bcrypt, dbghelp + shipped onnxruntime.dll), making it fully compatible with Wine.
Option A: Docker (recommended)
# Build
docker build -t oneocr .
# Run OCR on an image
docker run --rm -v $(pwd)/working_space:/data oneocr \
python main.py /data/input/test.png --output /data/output/result.json
# Interactive shell
docker run --rm -it -v $(pwd)/working_space:/data oneocr bash
Option B: Native Wine
# 1. Install Wine + MinGW cross-compiler
# Ubuntu/Debian:
sudo apt install wine64 mingw-w64
# Fedora:
sudo dnf install wine mingw64-gcc
# Arch:
sudo pacman -S wine mingw-w64-gcc
# 2. Initialize 64-bit Wine prefix
WINEARCH=win64 wineboot --init
# 3. Compile the Wine loader (one-time)
x86_64-w64-mingw32-gcc -O2 -o tools/oneocr_loader.exe tools/oneocr_loader.c
# 4. Test
python main.py screenshot.png --backend wine
Wine Bridge Architecture
Linux Python ──► subprocess (wine64) ──► oneocr_loader.exe ──► oneocr.dll
▲ │
│ ▼
└──── JSON stdout ◄──── OCR results ◄──── onnxruntime.dll
DLL Dependencies (all implemented in Wine ≥ 8.0):
| DLL | Functions | Wine Status | Notes |
|---|---|---|---|
KERNEL32.dll |
183 | ✅ Full | Standard WinAPI |
bcrypt.dll |
12 | ✅ Full | AES-256-CFB128 for model decryption |
dbghelp.dll |
5 | ✅ Stubs | Debug symbols — non-critical |
onnxruntime.dll |
1 | N/A | Shipped with package |
Project Structure
ONEOCR/
├── main.py # CLI entry point (auto-selects backend)
├── Dockerfile # Docker setup for Linux (Wine + DLL)
├── pyproject.toml # Project config & dependencies
├── README.md # This documentation
├── .gitignore
│
├── ocr/ # Core OCR package
│ ├── __init__.py # Exports all engines & models
│ ├── engine.py # DLL wrapper (Windows only, 374 lines)
│ ├── engine_onnx.py # ONNX engine (cross-platform, ~1100 lines)
│ ├── engine_unified.py # Unified wrapper (DLL → Wine → ONNX)
│ └── models.py # Data models: OcrResult, OcrLine, OcrWord
│
├── tools/ # Utilities
│ ├── extract_pipeline.py # Extraction pipeline (decrypt→extract→unlock→verify)
│ ├── visualize_ocr.py # OCR result visualization with bounding boxes
│ ├── test_quick.py # Quick OCR test on images
│ ├── wine_bridge.py # Wine bridge for Linux (C loader + Python API)
│ └── oneocr_loader.c # C source for Wine loader (auto-generated)
│
├── ocr_data/ # Runtime data (DO NOT commit)
│ ├── oneocr.dll # Original DLL (Windows only)
│ ├── oneocr.onemodel # Encrypted model container
│ └── onnxruntime.dll # ONNX Runtime DLL
│
├── oneocr_extracted/ # Extracted models (auto-generated)
│ ├── onnx_models/ # 34 raw ONNX (models 11-33 have custom ops)
│ ├── onnx_models_unlocked/ # 23 unlocked (models 11-33, standard ONNX ops)
│ └── config_data/ # Character maps, rnn_info, manifest, configs
│
├── working_space/ # Test images
│ └── input/ # 19 test images
│
└── _archive/ # Archive — RE scripts, analyses, prototypes
├── temp/re_output/ # DLL reverse engineering results
├── attempts/ # Decryption attempts
├── analysis/ # Cryptographic analyses
└── hooks/ # Frida hooks
Technical Details
.onemodel Encryption
| Element | Value |
|---|---|
| Algorithm | AES-256-CFB128 |
| Master Key | kj)TGtrK>f]b[Piow.gU+nC@s""""""4 (32B) |
| IV | Copyright @ OneO (16B) |
| DX key | SHA256(master_key + file[8:24]) |
| Config key | SHA256(DX[48:64] + DX[32:48]) |
| Chunk key | SHA256(chunk_header[16:32] + chunk_header[0:16]) |
OneOCRFeatureExtract — Cracked Custom Op
Proprietary op (domain com.microsoft.oneocr) stores weights as a big-endian float32 blob in a STRING tensor.
Models 11–32 (21→50 features):
config_blob (4492B, big-endian float32):
W[21×50] = 1050 floats (weight matrix)
b[50] = 50 floats (bias)
metadata = 23 floats (dimensions [21, 50, 2], flags, calibration)
Replacement: Gemm(input, W^T, b)
Model 33 (256→16 channels):
config_blob (16548B, big-endian float32):
W[256×16] = 4096 floats (convolution weights)
b[16] = 16 floats (bias)
metadata = 25 floats (dimensions [256, 16], flags)
Replacement: Conv(input, W[in,out].T → [16,256,1,1], b, kernel=1x1)
Detector Configuration (from DLL protobuf manifest)
segment_conf_threshold: 0.7 (field 8)
textline_conf_threshold per-FPN: P2=0.7, P3=0.8, P4=0.8 (field 9)
textline_nms_threshold: 0.2 (field 10)
textline_overlap_threshold: 0.4 (field 11)
text_confidence_threshold: 0.8 (field 13)
ambiguous_nms_threshold: 0.3 (field 15)
ambiguous_overlap_threshold: 0.5 (field 16)
ambiguous_save_threshold: 0.4 (field 17)
textline_hardnms_iou_threshold: 0.32 (field 20)
textline_groupnms_span_ratio_threshold: 0.3 (field 21)
PixelLink Detector
- FPN levels: fpn2 (stride=4), fpn3 (stride=8), fpn4 (stride=16)
- Outputs per level:
scores_hori/vert(pixel text probability),link_scores_hori/vert(8-neighbor connectivity),bbox_deltas_hori/vert(corner offsets) - Post-processing: Threshold pixels → Union-Find connected components → bbox regression → NMS
- Detects TEXT LINES — word splitting comes from the recognizer
CTC Recognition
- Target height: 60px, aspect ratio preserved
- Input: RGB / 255.0, NCHW format
- Output: log-softmax [T, 1, N_chars]
- Decoding: greedy argmax with repeat merging + blank removal
- Per-character confidence via
exp(max_logprob)
DLL Reverse Engineering — Results & Materials
DLL Source Structure (from debug symbols)
C:\__w\1\s\CoreEngine\Native\
├── TextDetector/
│ ├── AdaptiveScaling ← multi-level image scaling
│ ├── SeglinkProposal ← KEY: detection post-processing
│ ├── SeglinkGroup.h ← segment grouping into lines
│ ├── TextLinePolygon ← precise text contouring
│ ├── RelationRCNNRpn2 ← relational region proposal network
│ ├── BaseRCNN, DQDETR ← alternative detectors
│ ├── PolyFitting ← polynomial fitting
│ └── BarcodePolygon ← barcode detection
│
├── TextRecognizer/
│ ├── TextLineRecognizerImpl ← main CTC implementation
│ ├── ArgMaxDecoder ← CTC decoding
│ ├── ConfidenceProcessor ← confidence models (models 11-21)
│ ├── RejectionProcessor ← rejection models (models 22-32)
│ ├── DbLstm ← dynamic batch LSTM
│ └── CharacterMap/ ← per-script character maps
│
├── TextAnalyzer/
│ ├── TextAnalyzerImpl ← text layout analysis
│ └── AuxMltClsClassifier ← auxiliary classifier
│
├── TextNormalizer/
│ ├── BaseNormalizer ← text line normalization
│ └── ConcatTextLines ← line concatenation
│
├── TextPipeline/
│ ├── TextPipelineDevImpl ← main pipeline
│ └── FilterXY ← position-based filtering
│
├── CustomOps/onnxruntime/
│ ├── SeglinkProposalsOp ← ONNX op (NOT in our models)
│ ├── XYSeglinkProposalsOp ← XY variant
│ └── FeatureExtractOp ← = Gemm / Conv1x1
│
├── ModelParser/
│ ├── ModelParser ← .onemodel parsing
│ └── Crypto ← AES-256-CFB128
│
└── Common/
├── ImageUtility ← image conversion
└── ImageFeature ← image features
RE Materials
Reverse engineering results in _archive/temp/re_output/:
03_oneocr_classes.txt— 186 C++ classes06_config_strings.txt— 429 config strings15_manifest_decoded.txt— 1182 lines of decoded protobuf manifest09_constants.txt— 42 float + 14 double constants (800.0, 0.7, 0.8, 0.92...)10_disassembly.txt— disassembly of key exports
For Future Developers — Roadmap
Priority 1: SeglinkProposals (hardest, highest impact)
This is the key C++ post-processing in the DLL that is NOT part of the ONNX models.
Responsible for ~80% of the differences between the DLL and our implementation.
What it does:
- Takes raw pixel_scores + link_scores + bbox_deltas from all 3 FPN levels
- Groups segments into lines (SeglinkGroup) — merges neighboring small components into a single line
- Multi-stage NMS: textline_nms → hardnms → ambiguous_nms → groupnms
- Confidence filtering with
text_confidence_threshold = 0.8 K_of_detections— detection count limit
Where to look:
_archive/temp/re_output/06_config_strings.txt— parameter names_archive/temp/re_output/15_manifest_decoded.txt— parameter valuesSeglinkProposalclass in DLL — ~2000 lines of C++
Approach:
- Decompile
SeglinkProposal::Processwith IDA Pro / Ghidra - Alternatively: black-box testing of different NMS configurations
Priority 2: AdaptiveScaling
The DLL dynamically scales images based on text size.
Parameters:
AS_LARGE_TEXT_THRESHOLD— large text threshold- Multi-scale: DLL can run the detector at multiple scales
Priority 3: BaseNormalizer
The DLL normalizes text crops before recognition more effectively than our simple resize.
Priority 4: Confidence/Rejection Models (11-32)
The DLL uses models 11-32 to filter results — we skip them. Integration could improve precision by removing false detections.
Performance
| Operation | ONNX (CPU) | DLL | Notes |
|---|---|---|---|
| Detection (PixelLink) | ~50-200ms | ~15-50ms | Model inference + post-processing |
| ScriptID | ~5ms | ~3ms | Single forward pass |
| Recognition (CTC) | ~30ms/line | ~10ms/line | Per-script LSTM |
| Full pipeline | ~300-1000ms | ~15-135ms | Depends on line count |
License
For research and educational purposes only.