YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OneOCR — Reverse-Engineered Cross-Platform OCR Pipeline

Full reimplementation of Microsoft's OneOCR engine from Windows Snipping Tool.
.onemodel encryption cracked, 34 ONNX models extracted, all custom ops replaced — runs on any OS with onnxruntime.

Project Status

Component	Status	Details
.onemodel decryption	✅ Done	AES-256-CFB128, static key + IV
Model extraction	✅ Done	34 ONNX models, 33 config files
Custom op unlocking	✅ Done	`OneOCRFeatureExtract` → `Gemm`/`Conv1x1`
ONNX pipeline	⚠️ Partial	53% match rate vs DLL (10/19 test images)
DLL pipeline (Windows)	✅ Done	ctypes wrapper, 100% accuracy
DLL pipeline (Linux)	✅ Done	Wine bridge, 100% accuracy, Docker ready

Known ONNX Engine Limitations

The Python reimplementation achieves 53% match rate against the original DLL. Below is a detailed breakdown of the remaining issues.

Issue 1: False FPN2 Detections (4 images)

Images: ocr_test 6, 13, 17, 18
Symptom: Panel edges / dialog borders detected as text
Cause: FPN2 (stride=4) sees edges as text-like textures
DLL solution: SeglinkProposals — advanced C++ post-processing with multi-stage NMS:

textline_hardnms_iou_threshold = 0.32
textline_groupnms_span_ratio_threshold = 0.3
ambiguous_nms_threshold = 0.3 / ambiguous_overlap_threshold = 0.5
K_of_detections — per-scale detection limit

Issue 2: Missing Small Characters "..." (2 images)

Images: ocr_test 7, 14
Symptom: Three dots too small to detect
Cause: Minimum min_component_pixels and min_area thresholds insufficient
DLL solution: SeglinkGroup — groups neighboring segments into a single line

Issue 3: Character Recognition Errors (2 images)

Images: ocr_test 1, 15
Symptom: "iob" instead of "job", extra text from margins
Cause: Differences in text cropping/preprocessing
DLL solution: BaseNormalizer — sophisticated text line normalization

Issue 4: Large Images (test.png — 31.8% match)

Symptom: 55 of 74 lines detected, some cut off at edges
Cause: Adaptive Scaling — DLL scales at multiple levels
DLL solution: AdaptiveScaling with AS_LARGE_TEXT_THRESHOLD

Architecture

Image (PIL / numpy)
    │
    ▼
┌──────────────────────────────────┐
│  Detector (model_00)             │  PixelLink FPN (fpn2/3/4)
│  BGR, mean subtraction           │  stride = 4 / 8 / 16
│  → pixel_scores, link_scores    │  8-neighbor, Union-Find
│  → bounding quads (lines)       │  minAreaRect + NMS (IoU 0.2)
└──────────────────────────────────┘
    │
    ▼ for each detected line
┌──────────────────────────────────┐
│  Crop + padding (15%)            │  Axis-aligned / perspective
│  ScriptID (model_01)             │  10 scripts: Latin, CJK, Arabic...
│  RGB / 255.0, height=60px       │  HW/PC classification, flip detection
└──────────────────────────────────┘
    │
    ▼ per script
┌──────────────────────────────────┐
│  Recognizer (model_02–10)        │  DynamicQuantizeLSTM + CTC
│  Per-script character maps       │  Greedy decode with per-char confidence
│  → text + word confidences       │  Word splitting on spaces
└──────────────────────────────────┘
    │
    ▼
┌──────────────────────────────────┐
│  Line grouping & sorting         │  Y-overlap clustering
│  Per-word bounding boxes         │  Proportional quad interpolation
│  Text angle estimation           │  Median of top-edge angles
└──────────────────────────────────┘

Model Registry (34 models)

Index	Role	Script	Custom Op	Status
0	Detector	Universal	`QLinearSigmoid`	✅ Works
1	ScriptID	Universal	—	✅ Works
2–10	Recognizers	Latin/CJK/Arabic/Cyrillic/Devanagari/Greek/Hebrew/Tamil/Thai	`DynamicQuantizeLSTM`	✅ Work
11–21	LangSm (confidence)	Per-script	`OneOCRFeatureExtract` → Gemm	✅ Unlocked
22–32	LangMd (confidence)	Per-script	`OneOCRFeatureExtract` → Gemm	✅ Unlocked
33	LineLayout	Universal	`OneOCRFeatureExtract` → Conv1x1	✅ Unlocked

Quick Start

Requirements

pip install onnxruntime numpy opencv-python-headless Pillow pycryptodome onnx

Or with uv:

uv sync --extra extract

Model Extraction (one-time)

# Full pipeline: decrypt → extract → unlock → verify
python tools/extract_pipeline.py ocr_data/oneocr.onemodel

# Verify existing models only
python tools/extract_pipeline.py --verify-only

Usage

# Recommended: Unified engine (auto-selects best backend)
from ocr.engine_unified import OcrEngineUnified
from PIL import Image

engine = OcrEngineUnified()  # auto: DLL → Wine → ONNX
result = engine.recognize_pil(Image.open("screenshot.png"))

print(f"Backend: {engine.backend_name}")  # "dll" / "wine" / "onnx"
print(result.text)                         # "Hello World"
print(result.average_confidence)           # 0.975

for line in result.lines:
    for word in line.words:
        print(f"  '{word.text}' conf={word.confidence:.0%} "
              f"bbox=({word.bounding_rect.x1:.0f},{word.bounding_rect.y1:.0f})")

# CLI:
python main.py screenshot.png                # auto backend
python main.py screenshot.png --backend dll  # force DLL (Windows)
python main.py screenshot.png --backend wine # force Wine (Linux)
python main.py screenshot.png --backend onnx # force ONNX (any OS)
python main.py screenshot.png -o result.json # save JSON output

ONNX Engine (alternative — cross-platform, no Wine needed)

from ocr.engine_onnx import OcrEngineOnnx
from PIL import Image

engine = OcrEngineOnnx()
result = engine.recognize_pil(Image.open("screenshot.png"))
print(result.text)

API Reference

engine = OcrEngineOnnx(
    models_dir="path/to/onnx_models",       # optional
    config_dir="path/to/config_data",        # optional
    providers=["CUDAExecutionProvider"],      # optional (default: CPU)
)

# Input formats:
result = engine.recognize_pil(pil_image)       # PIL Image
result = engine.recognize_numpy(rgb_array)     # numpy (H,W,3) RGB
result = engine.recognize_bytes(png_bytes)     # raw bytes (PNG/JPEG)

# Result:
result.text                # str — full recognized text
result.text_angle          # float — detected rotation angle
result.lines               # list[OcrLine]
result.average_confidence  # float — overall confidence 0-1
result.error               # str | None — error message

# Per-word:
word.text                  # str
word.confidence            # float — CTC confidence per word
word.bounding_rect         # BoundingRect (x1,y1...x4,y4 quadrilateral)

Running on Linux (Wine Bridge — 100% accuracy)

The DLL has a remarkably clean dependency profile (only KERNEL32, bcrypt, dbghelp + shipped onnxruntime.dll), making it fully compatible with Wine.

Option A: Docker (recommended)

# Build
docker build -t oneocr .

# Run OCR on an image
docker run --rm -v $(pwd)/working_space:/data oneocr \
    python main.py /data/input/test.png --output /data/output/result.json

# Interactive shell
docker run --rm -it -v $(pwd)/working_space:/data oneocr bash

Option B: Native Wine

# 1. Install Wine + MinGW cross-compiler
# Ubuntu/Debian:
sudo apt install wine64 mingw-w64

# Fedora:
sudo dnf install wine mingw64-gcc

# Arch:
sudo pacman -S wine mingw-w64-gcc

# 2. Initialize 64-bit Wine prefix
WINEARCH=win64 wineboot --init

# 3. Compile the Wine loader (one-time)
x86_64-w64-mingw32-gcc -O2 -o tools/oneocr_loader.exe tools/oneocr_loader.c

# 4. Test
python main.py screenshot.png --backend wine

Wine Bridge Architecture

Linux Python ──► subprocess (wine64) ──► oneocr_loader.exe ──► oneocr.dll
    ▲                                           │
    │                                           ▼
    └──── JSON stdout ◄──── OCR results ◄──── onnxruntime.dll

DLL Dependencies (all implemented in Wine ≥ 8.0):

DLL	Functions	Wine Status	Notes
`KERNEL32.dll`	183	✅ Full	Standard WinAPI
`bcrypt.dll`	12	✅ Full	AES-256-CFB128 for model decryption
`dbghelp.dll`	5	✅ Stubs	Debug symbols — non-critical
`onnxruntime.dll`	1	N/A	Shipped with package

Project Structure

ONEOCR/
├── main.py                          # CLI entry point (auto-selects backend)
├── Dockerfile                       # Docker setup for Linux (Wine + DLL)
├── pyproject.toml                   # Project config & dependencies
├── README.md                        # This documentation
├── .gitignore
│
├── ocr/                             # Core OCR package
│   ├── __init__.py                  # Exports all engines & models
│   ├── engine.py                    # DLL wrapper (Windows only, 374 lines)
│   ├── engine_onnx.py               # ONNX engine (cross-platform, ~1100 lines)
│   ├── engine_unified.py            # Unified wrapper (DLL → Wine → ONNX)
│   └── models.py                    # Data models: OcrResult, OcrLine, OcrWord
│
├── tools/                           # Utilities
│   ├── extract_pipeline.py          # Extraction pipeline (decrypt→extract→unlock→verify)
│   ├── visualize_ocr.py             # OCR result visualization with bounding boxes
│   ├── test_quick.py                # Quick OCR test on images
│   ├── wine_bridge.py               # Wine bridge for Linux (C loader + Python API)
│   └── oneocr_loader.c              # C source for Wine loader (auto-generated)
│
├── ocr_data/                        # Runtime data (DO NOT commit)
│   ├── oneocr.dll                   # Original DLL (Windows only)
│   ├── oneocr.onemodel              # Encrypted model container
│   └── onnxruntime.dll              # ONNX Runtime DLL
│
├── oneocr_extracted/                # Extracted models (auto-generated)
│   ├── onnx_models/                 # 34 raw ONNX (models 11-33 have custom ops)
│   ├── onnx_models_unlocked/        # 23 unlocked (models 11-33, standard ONNX ops)
│   └── config_data/                 # Character maps, rnn_info, manifest, configs
│
├── working_space/                   # Test images
│   └── input/                       # 19 test images
│
└── _archive/                        # Archive — RE scripts, analyses, prototypes
    ├── temp/re_output/              # DLL reverse engineering results
    ├── attempts/                    # Decryption attempts
    ├── analysis/                    # Cryptographic analyses
    └── hooks/                       # Frida hooks

Technical Details

.onemodel Encryption

Element	Value
Algorithm	AES-256-CFB128
Master Key	`kj)TGtrK>f]b[Piow.gU+nC@s""""""4` (32B)
IV	`Copyright @ OneO` (16B)
DX key	`SHA256(master_key + file[8:24])`
Config key	`SHA256(DX[48:64] + DX[32:48])`
Chunk key	`SHA256(chunk_header[16:32] + chunk_header[0:16])`

OneOCRFeatureExtract — Cracked Custom Op

Proprietary op (domain com.microsoft.oneocr) stores weights as a big-endian float32 blob in a STRING tensor.

Models 11–32 (21→50 features):

config_blob (4492B, big-endian float32):
  W[21×50] = 1050 floats     (weight matrix)
  b[50]    = 50 floats       (bias)
  metadata = 23 floats       (dimensions [21, 50, 2], flags, calibration)

  Replacement: Gemm(input, W^T, b)

Model 33 (256→16 channels):

config_blob (16548B, big-endian float32):
  W[256×16] = 4096 floats    (convolution weights)
  b[16]     = 16 floats      (bias)
  metadata  = 25 floats      (dimensions [256, 16], flags)

  Replacement: Conv(input, W[in,out].T → [16,256,1,1], b, kernel=1x1)

Detector Configuration (from DLL protobuf manifest)

segment_conf_threshold:               0.7   (field 8)
textline_conf_threshold per-FPN:      P2=0.7, P3=0.8, P4=0.8  (field 9)
textline_nms_threshold:               0.2   (field 10)
textline_overlap_threshold:           0.4   (field 11)
text_confidence_threshold:            0.8   (field 13)
ambiguous_nms_threshold:              0.3   (field 15)
ambiguous_overlap_threshold:          0.5   (field 16)
ambiguous_save_threshold:             0.4   (field 17)
textline_hardnms_iou_threshold:       0.32  (field 20)
textline_groupnms_span_ratio_threshold: 0.3 (field 21)

PixelLink Detector

FPN levels: fpn2 (stride=4), fpn3 (stride=8), fpn4 (stride=16)
Outputs per level: scores_hori/vert (pixel text probability), link_scores_hori/vert (8-neighbor connectivity), bbox_deltas_hori/vert (corner offsets)
Post-processing: Threshold pixels → Union-Find connected components → bbox regression → NMS
Detects TEXT LINES — word splitting comes from the recognizer

CTC Recognition

Target height: 60px, aspect ratio preserved
Input: RGB / 255.0, NCHW format
Output: log-softmax [T, 1, N_chars]
Decoding: greedy argmax with repeat merging + blank removal
Per-character confidence via exp(max_logprob)

DLL Reverse Engineering — Results & Materials

DLL Source Structure (from debug symbols)

C:\__w\1\s\CoreEngine\Native\
├── TextDetector/
│   ├── AdaptiveScaling           ← multi-level image scaling
│   ├── SeglinkProposal           ← KEY: detection post-processing
│   ├── SeglinkGroup.h            ← segment grouping into lines
│   ├── TextLinePolygon           ← precise text contouring
│   ├── RelationRCNNRpn2          ← relational region proposal network
│   ├── BaseRCNN, DQDETR          ← alternative detectors
│   ├── PolyFitting               ← polynomial fitting
│   └── BarcodePolygon            ← barcode detection
│
├── TextRecognizer/
│   ├── TextLineRecognizerImpl    ← main CTC implementation
│   ├── ArgMaxDecoder             ← CTC decoding
│   ├── ConfidenceProcessor       ← confidence models (models 11-21)
│   ├── RejectionProcessor        ← rejection models (models 22-32)
│   ├── DbLstm                    ← dynamic batch LSTM
│   └── CharacterMap/             ← per-script character maps
│
├── TextAnalyzer/
│   ├── TextAnalyzerImpl          ← text layout analysis
│   └── AuxMltClsClassifier       ← auxiliary classifier
│
├── TextNormalizer/
│   ├── BaseNormalizer            ← text line normalization
│   └── ConcatTextLines           ← line concatenation
│
├── TextPipeline/
│   ├── TextPipelineDevImpl       ← main pipeline
│   └── FilterXY                  ← position-based filtering
│
├── CustomOps/onnxruntime/
│   ├── SeglinkProposalsOp        ← ONNX op (NOT in our models)
│   ├── XYSeglinkProposalsOp      ← XY variant
│   └── FeatureExtractOp          ← = Gemm / Conv1x1
│
├── ModelParser/
│   ├── ModelParser               ← .onemodel parsing
│   └── Crypto                    ← AES-256-CFB128
│
└── Common/
    ├── ImageUtility              ← image conversion
    └── ImageFeature              ← image features

RE Materials

Reverse engineering results in _archive/temp/re_output/:

03_oneocr_classes.txt — 186 C++ classes
06_config_strings.txt — 429 config strings
15_manifest_decoded.txt — 1182 lines of decoded protobuf manifest
09_constants.txt — 42 float + 14 double constants (800.0, 0.7, 0.8, 0.92...)
10_disassembly.txt — disassembly of key exports

For Future Developers — Roadmap

Priority 1: SeglinkProposals (hardest, highest impact)

This is the key C++ post-processing in the DLL that is NOT part of the ONNX models.
Responsible for ~80% of the differences between the DLL and our implementation.

What it does:

Takes raw pixel_scores + link_scores + bbox_deltas from all 3 FPN levels
Groups segments into lines (SeglinkGroup) — merges neighboring small components into a single line
Multi-stage NMS: textline_nms → hardnms → ambiguous_nms → groupnms
Confidence filtering with text_confidence_threshold = 0.8
K_of_detections — detection count limit

Where to look:

_archive/temp/re_output/06_config_strings.txt — parameter names
_archive/temp/re_output/15_manifest_decoded.txt — parameter values
SeglinkProposal class in DLL — ~2000 lines of C++

Approach:

Decompile SeglinkProposal::Process with IDA Pro / Ghidra
Alternatively: black-box testing of different NMS configurations

Priority 2: AdaptiveScaling

The DLL dynamically scales images based on text size.

Parameters:

AS_LARGE_TEXT_THRESHOLD — large text threshold
Multi-scale: DLL can run the detector at multiple scales

Priority 3: BaseNormalizer

The DLL normalizes text crops before recognition more effectively than our simple resize.

Priority 4: Confidence/Rejection Models (11-32)

The DLL uses models 11-32 to filter results — we skip them. Integration could improve precision by removing false detections.

Performance

Operation	ONNX (CPU)	DLL	Notes
Detection (PixelLink)	~50-200ms	~15-50ms	Model inference + post-processing
ScriptID	~5ms	~3ms	Single forward pass
Recognition (CTC)	~30ms/line	~10ms/line	Per-script LSTM
Full pipeline	~300-1000ms	~15-135ms	Depends on line count

License

For research and educational purposes only.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support