OneOCR Dev commited on Feb 11

Commit

ce847d4

0 Parent(s):

OneOCR - reverse engineering complete, ONNX pipeline 53% match rate

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +31 -0
.python-version +1 -0
README.md +397 -0
_archive/analysis/analyze_boundaries.py +91 -0
_archive/analysis/analyze_crypto_log.py +95 -0
_archive/analysis/analyze_decrypt.py +581 -0
_archive/analysis/analyze_dx.py +137 -0
_archive/analysis/analyze_extracted.py +145 -0
_archive/analysis/analyze_model.py +64 -0
_archive/analysis/decrypt_config.py +259 -0
_archive/analysis/find_chunks.py +80 -0
_archive/analysis/walk_payload.py +129 -0
_archive/analyze_lm_features.py +110 -0
_archive/analyze_models.py +82 -0
_archive/analyze_pipeline.py +79 -0
_archive/attempts/bcrypt_decrypt.py +423 -0
_archive/attempts/create_test_image.py +21 -0
_archive/attempts/decrypt_model.py +338 -0
_archive/attempts/decrypt_with_static_iv.py +302 -0
_archive/attempts/disasm_bcrypt_calls.py +143 -0
_archive/attempts/disasm_crypto.py +156 -0
_archive/attempts/disasm_full_cipher.py +138 -0
_archive/attempts/disasm_proper.py +95 -0
_archive/attempts/discover_key_derivation.py +126 -0
_archive/attempts/dll_bcrypt_analysis.py +63 -0
_archive/attempts/dll_crypto_analysis.py +183 -0
_archive/attempts/extract_onnx.py +235 -0
_archive/attempts/extract_strings.py +37 -0
_archive/attempts/find_offset.py +44 -0
_archive/attempts/frida_hook.py +328 -0
_archive/attempts/frida_loader.py +50 -0
_archive/attempts/peek_header.py +92 -0
_archive/attempts/static_decrypt.py +289 -0
_archive/attempts/verify_bcrypt.py +181 -0
_archive/attempts/verify_key_derivation.py +98 -0
_archive/attempts/verify_models.py +228 -0
_archive/brainstorm.md +355 -0
_archive/crack_config.py +84 -0
_archive/crack_endian.py +65 -0
_archive/debug_detector.py +80 -0
_archive/decode_config.py +74 -0
_archive/dedup.py +687 -0
_archive/dedup_old.py +595 -0
_archive/hooks/hook_decrypt.py +344 -0
_archive/hooks/hook_full_bcrypt.py +441 -0
_archive/hooks/hook_full_log.py +265 -0
_archive/hooks/hook_hash.py +340 -0
_archive/inspect_config_blob.py +80 -0
_archive/inspect_custom_ops.py +39 -0
_archive/inspect_graph_deep.py +60 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,31 @@

+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+build/
+dist/
+# Virtual environments
+.venv/
+# IDE
+.vscode/
+.idea/
+# OS
+.DS_Store
+Thumbs.db
+# Runtime data — large binary files (do NOT commit)
+ocr_data/
+oneocr_extracted/
+# Working space outputs
+working_space/output/
+# UV lock (optional — regenerated by uv)
+uv.lock
+# Test images - too large for HF, stored locally
+working_space/input/*.png

.python-version ADDED Viewed

	@@ -0,0 +1 @@


1	+ 3.12

README.md ADDED Viewed

	@@ -0,0 +1,397 @@

+# OneOCR — Reverse-Engineered Cross-Platform OCR Pipeline
+Full reimplementation of Microsoft's OneOCR engine from Windows Snipping Tool.
+`.onemodel` encryption cracked, 34 ONNX models extracted, all custom ops replaced — runs on any OS with `onnxruntime`.
+---
+## Project Status
+| Component | Status | Details |
+|---|---|---|
+| **.onemodel decryption** | ✅ Done | AES-256-CFB128, static key + IV |
+| **Model extraction** | ✅ Done | 34 ONNX models, 33 config files |
+| **Custom op unlocking** | ✅ Done | `OneOCRFeatureExtract` → `Gemm`/`Conv1x1` |
+| **ONNX pipeline** | ⚠️ Partial | **53% match rate** vs DLL (10/19 test images) |
+| **DLL pipeline** | ✅ Done | ctypes wrapper, Windows only |
+### Known ONNX Engine Limitations
+The Python reimplementation achieves **53% match rate** against the original DLL. Below is a detailed breakdown of the remaining issues.
+#### Issue 1: False FPN2 Detections (4 images)
+**Images:** ocr_test 6, 13, 17, 18
+**Symptom:** Panel edges / dialog borders detected as text
+**Cause:** FPN2 (stride=4) sees edges as text-like textures
+**DLL solution:** `SeglinkProposals` — advanced C++ post-processing with multi-stage NMS:
+- `textline_hardnms_iou_threshold = 0.32`
+- `textline_groupnms_span_ratio_threshold = 0.3`
+- `ambiguous_nms_threshold = 0.3` / `ambiguous_overlap_threshold = 0.5`
+- `K_of_detections` — per-scale detection limit
+#### Issue 2: Missing Small Characters "..." (2 images)
+**Images:** ocr_test 7, 14
+**Symptom:** Three dots too small to detect
+**Cause:** Minimum `min_component_pixels` and `min_area` thresholds insufficient
+**DLL solution:** `SeglinkGroup` — groups neighboring segments into a single line
+#### Issue 3: Character Recognition Errors (2 images)
+**Images:** ocr_test 1, 15
+**Symptom:** "iob" instead of "job", extra text from margins
+**Cause:** Differences in text cropping/preprocessing
+**DLL solution:** `BaseNormalizer` — sophisticated text line normalization
+#### Issue 4: Large Images (test.png — 31.8% match)
+**Symptom:** 55 of 74 lines detected, some cut off at edges
+**Cause:** Adaptive Scaling — DLL scales at multiple levels
+**DLL solution:** `AdaptiveScaling` with `AS_LARGE_TEXT_THRESHOLD`
+---
+## Architecture
+```
+Image (PIL / numpy)
+    │
+    ▼
+┌──────────────────────────────────┐
+│  Detector (model_00)             │  PixelLink FPN (fpn2/3/4)
+│  BGR, mean subtraction           │  stride = 4 / 8 / 16
+│  → pixel_scores, link_scores    │  8-neighbor, Union-Find
+│  → bounding quads (lines)       │  minAreaRect + NMS (IoU 0.2)
+└──────────────────────────────────┘
+    │
+    ▼ for each detected line
+┌──────────────────────────────────┐
+│  Crop + padding (15%)            │  Axis-aligned / perspective
+│  ScriptID (model_01)             │  10 scripts: Latin, CJK, Arabic...
+│  RGB / 255.0, height=60px       │  HW/PC classification, flip detection
+└──────────────────────────────────┘
+    │
+    ▼ per script
+┌──────────────────────────────────┐
+│  Recognizer (model_02–10)        │  DynamicQuantizeLSTM + CTC
+│  Per-script character maps       │  Greedy decode with per-char confidence
+│  → text + word confidences       │  Word splitting on spaces
+└──────────────────────────────────┘
+    │
+    ▼
+┌──────────────────────────────────┐
+│  Line grouping & sorting         │  Y-overlap clustering
+│  Per-word bounding boxes         │  Proportional quad interpolation
+│  Text angle estimation           │  Median of top-edge angles
+└──────────────────────────────────┘
+```
+### Model Registry (34 models)
+| Index | Role | Script | Custom Op | Status |
+|-------|------|--------|-----------|--------|
+| 0 | Detector | Universal | `QLinearSigmoid` | ✅ Works |
+| 1 | ScriptID | Universal | — | ✅ Works |
+| 2–10 | Recognizers | Latin/CJK/Arabic/Cyrillic/Devanagari/Greek/Hebrew/Tamil/Thai | `DynamicQuantizeLSTM` | ✅ Work |
+| 11–21 | LangSm (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked |
+| 22–32 | LangMd (confidence) | Per-script | `OneOCRFeatureExtract` → **Gemm** | ✅ Unlocked |
+| 33 | LineLayout | Universal | `OneOCRFeatureExtract` → **Conv1x1** | ✅ Unlocked |
+---
+## Quick Start
+### Requirements
+```bash
+pip install onnxruntime numpy opencv-python-headless Pillow pycryptodome onnx
+```
+Or with `uv`:
+```bash
+uv sync --extra extract
+```
+### Model Extraction (one-time)
+```bash
+# Full pipeline: decrypt → extract → unlock → verify
+python tools/extract_pipeline.py ocr_data/oneocr.onemodel
+# Verify existing models only
+python tools/extract_pipeline.py --verify-only
+```
+### Usage
+```python
+from ocr.engine_onnx import OcrEngineOnnx
+from PIL import Image
+engine = OcrEngineOnnx()
+result = engine.recognize_pil(Image.open("screenshot.png"))
+print(result.text)                    # "Hello World"
+print(result.average_confidence)      # 0.975
+print(result.text_angle)              # 0.0
+for line in result.lines:
+    for word in line.words:
+        print(f"  '{word.text}' conf={word.confidence:.0%} "
+              f"bbox=({word.bounding_rect.x1:.0f},{word.bounding_rect.y1:.0f})")
+```
+### API Reference
+```python
+engine = OcrEngineOnnx(
+    models_dir="path/to/onnx_models",       # optional
+    config_dir="path/to/config_data",        # optional
+    providers=["CUDAExecutionProvider"],      # optional (default: CPU)
+)
+# Input formats:
+result = engine.recognize_pil(pil_image)       # PIL Image
+result = engine.recognize_numpy(rgb_array)     # numpy (H,W,3) RGB
+result = engine.recognize_bytes(png_bytes)     # raw bytes (PNG/JPEG)
+# Result:
+result.text                # str — full recognized text
+result.text_angle          # float — detected rotation angle
+result.lines               # list[OcrLine]
+result.average_confidence  # float — overall confidence 0-1
+result.error               # str | None — error message
+# Per-word:
+word.text                  # str
+word.confidence            # float — CTC confidence per word
+word.bounding_rect         # BoundingRect (x1,y1...x4,y4 quadrilateral)
+```
+---
+## Project Structure
+```
+ONEOCR/
+├── main.py                          # Usage example (both engines)
+├── pyproject.toml                   # Project config & dependencies
+├── README.md                        # This documentation
+├── .gitignore
+│
+├── ocr/                             # Core OCR package
+│   ├── __init__.py                  # Exports OcrEngine, OcrEngineOnnx, models
+│   ├── engine.py                    # DLL wrapper (Windows only, 374 lines)
+│   ├── engine_onnx.py               # ONNX engine (cross-platform, ~1100 lines)
+│   └── models.py                    # Data models: OcrResult, OcrLine, OcrWord
+│
+├── tools/                           # Utilities
+│   ├── extract_pipeline.py          # Extraction pipeline (decrypt→extract→unlock→verify)
+│   ├── visualize_ocr.py             # OCR result visualization with bounding boxes
+│   └── test_quick.py               # Quick OCR test on images
+│
+├── ocr_data/                        # Runtime data (DO NOT commit)
+│   ├── oneocr.dll                   # Original DLL (Windows only)
+│   ├── oneocr.onemodel              # Encrypted model container
+│   └── onnxruntime.dll              # ONNX Runtime DLL
+│
+├── oneocr_extracted/                # Extracted models (auto-generated)
+│   ├── onnx_models/                 # 34 raw ONNX (models 11-33 have custom ops)
+│   ├── onnx_models_unlocked/        # 23 unlocked (models 11-33, standard ONNX ops)
+│   └── config_data/                 # Character maps, rnn_info, manifest, configs
+│
+├── working_space/                   # Test images
+│   └── input/                       # 19 test images
+│
+└── _archive/                        # Archive — RE scripts, analyses, prototypes
+    ├── temp/re_output/              # DLL reverse engineering results
+    ├── attempts/                    # Decryption attempts
+    ├── analysis/                    # Cryptographic analyses
+    └── hooks/                       # Frida hooks
+```
+---
+## Technical Details
+### .onemodel Encryption
+| Element | Value |
+|---------|-------|
+| Algorithm | AES-256-CFB128 |
+| Master Key | `kj)TGtrK>f]b[Piow.gU+nC@s""""""4` (32B) |
+| IV | `Copyright @ OneO` (16B) |
+| DX key | `SHA256(master_key + file[8:24])` |
+| Config key | `SHA256(DX[48:64] + DX[32:48])` |
+| Chunk key | `SHA256(chunk_header[16:32] + chunk_header[0:16])` |
+### OneOCRFeatureExtract — Cracked Custom Op
+Proprietary op (domain `com.microsoft.oneocr`) stores weights as a **big-endian float32** blob in a STRING tensor.
+**Models 11–32** (21→50 features):
+```
+config_blob (4492B, big-endian float32):
+  W[21×50] = 1050 floats     (weight matrix)
+  b[50]    = 50 floats       (bias)
+  metadata = 23 floats       (dimensions [21, 50, 2], flags, calibration)
+  Replacement: Gemm(input, W^T, b)
+```
+**Model 33** (256→16 channels):
+```
+config_blob (16548B, big-endian float32):
+  W[256×16] = 4096 floats    (convolution weights)
+  b[16]     = 16 floats      (bias)
+  metadata  = 25 floats      (dimensions [256, 16], flags)
+  Replacement: Conv(input, W[in,out].T → [16,256,1,1], b, kernel=1x1)
+```
+### Detector Configuration (from DLL protobuf manifest)
+```
+segment_conf_threshold:               0.7   (field 8)
+textline_conf_threshold per-FPN:      P2=0.7, P3=0.8, P4=0.8  (field 9)
+textline_nms_threshold:               0.2   (field 10)
+textline_overlap_threshold:           0.4   (field 11)
+text_confidence_threshold:            0.8   (field 13)
+ambiguous_nms_threshold:              0.3   (field 15)
+ambiguous_overlap_threshold:          0.5   (field 16)
+ambiguous_save_threshold:             0.4   (field 17)
+textline_hardnms_iou_threshold:       0.32  (field 20)
+textline_groupnms_span_ratio_threshold: 0.3 (field 21)
+```
+### PixelLink Detector
+- **FPN levels**: fpn2 (stride=4), fpn3 (stride=8), fpn4 (stride=16)
+- **Outputs per level**: `scores_hori/vert` (pixel text probability), `link_scores_hori/vert` (8-neighbor connectivity), `bbox_deltas_hori/vert` (corner offsets)
+- **Post-processing**: Threshold pixels → Union-Find connected components → bbox regression → NMS
+- **Detects TEXT LINES** — word splitting comes from the recognizer
+### CTC Recognition
+- Target height: 60px, aspect ratio preserved
+- Input: RGB / 255.0, NCHW format
+- Output: log-softmax [T, 1, N_chars]
+- Decoding: greedy argmax with repeat merging + blank removal
+- Per-character confidence via `exp(max_logprob)`
+---
+## DLL Reverse Engineering — Results & Materials
+### DLL Source Structure (from debug symbols)
+```
+C:\__w\1\s\CoreEngine\Native\
+├── TextDetector/
+│   ├── AdaptiveScaling           ← multi-level image scaling
+│   ├── SeglinkProposal           ← KEY: detection post-processing
+│   ├── SeglinkGroup.h            ← segment grouping into lines
+│   ├── TextLinePolygon           ← precise text contouring
+│   ├── RelationRCNNRpn2          ← relational region proposal network
+│   ├── BaseRCNN, DQDETR          ← alternative detectors
+│   ├── PolyFitting               ← polynomial fitting
+│   └── BarcodePolygon            ← barcode detection
+│
+├── TextRecognizer/
+│   ├── TextLineRecognizerImpl    ← main CTC implementation
+│   ├── ArgMaxDecoder             ← CTC decoding
+│   ├── ConfidenceProcessor       ← confidence models (models 11-21)
+│   ├── RejectionProcessor        ← rejection models (models 22-32)
+│   ├── DbLstm                    ← dynamic batch LSTM
+│   └── CharacterMap/             ← per-script character maps
+│
+├── TextAnalyzer/
+│   ├── TextAnalyzerImpl          ← text layout analysis
+│   └── AuxMltClsClassifier       ← auxiliary classifier
+│
+├── TextNormalizer/
+│   ├── BaseNormalizer            ← text line normalization
+│   └── ConcatTextLines           ← line concatenation
+│
+├── TextPipeline/
+│   ├── TextPipelineDevImpl       ← main pipeline
+│   └── FilterXY                  ← position-based filtering
+│
+├── CustomOps/onnxruntime/
+│   ├── SeglinkProposalsOp        ← ONNX op (NOT in our models)
+│   ├── XYSeglinkProposalsOp      ← XY variant
+│   └── FeatureExtractOp          ← = Gemm / Conv1x1
+│
+├── ModelParser/
+│   ├── ModelParser               ← .onemodel parsing
+│   └── Crypto                    ← AES-256-CFB128
+│
+└── Common/
+    ├── ImageUtility              ← image conversion
+    └── ImageFeature              ← image features
+```
+### RE Materials
+Reverse engineering results in `_archive/temp/re_output/`:
+- `03_oneocr_classes.txt` — 186 C++ classes
+- `06_config_strings.txt` — 429 config strings
+- `15_manifest_decoded.txt` — 1182 lines of decoded protobuf manifest
+- `09_constants.txt` — 42 float + 14 double constants (800.0, 0.7, 0.8, 0.92...)
+- `10_disassembly.txt` — disassembly of key exports
+---
+## For Future Developers — Roadmap
+### Priority 1: SeglinkProposals (hardest, highest impact)
+This is the key C++ post-processing in the DLL that is NOT part of the ONNX models.
+Responsible for ~80% of the differences between the DLL and our implementation.
+**What it does:**
+1. Takes raw pixel_scores + link_scores + bbox_deltas from all 3 FPN levels
+2. Groups segments into lines (SeglinkGroup) — merges neighboring small components into a single line
+3. Multi-stage NMS: textline_nms → hardnms → ambiguous_nms → groupnms
+4. Confidence filtering with `text_confidence_threshold = 0.8`
+5. `K_of_detections` — detection count limit
+**Where to look:**
+- `_archive/temp/re_output/06_config_strings.txt` — parameter names
+- `_archive/temp/re_output/15_manifest_decoded.txt` — parameter values
+- `SeglinkProposal` class in DLL — ~2000 lines of C++
+**Approach:**
+- Decompile `SeglinkProposal::Process` with IDA Pro / Ghidra
+- Alternatively: black-box testing of different NMS configurations
+### Priority 2: AdaptiveScaling
+The DLL dynamically scales images based on text size.
+**Parameters:**
+- `AS_LARGE_TEXT_THRESHOLD` — large text threshold
+- Multi-scale: DLL can run the detector at multiple scales
+### Priority 3: BaseNormalizer
+The DLL normalizes text crops before recognition more effectively than our simple resize.
+### Priority 4: Confidence/Rejection Models (11-32)
+The DLL uses models 11-32 to filter results — we skip them. Integration could improve
+precision by removing false detections.
+---
+## Performance
+| Operation | ONNX (CPU) | DLL | Notes |
+|---|---|---|---|
+| Detection (PixelLink) | ~50-200ms | ~15-50ms | Model inference + post-processing |
+| ScriptID | ~5ms | ~3ms | Single forward pass |
+| Recognition (CTC) | ~30ms/line | ~10ms/line | Per-script LSTM |
+| Full pipeline | ~300-1000ms | ~15-135ms | Depends on line count |
+---
+## License
+For research and educational purposes only.

_archive/analysis/analyze_boundaries.py ADDED Viewed

	@@ -0,0 +1,91 @@

+"""Analyze exact chunk boundary structure in the .onemodel file."""
+import struct, json
+with open("ocr_data/oneocr.onemodel", "rb") as f:
+    fdata = f.read()
+log = json.load(open("temp/crypto_log.json"))
+sha256s = [op for op in log if op["op"] == "sha256"]
+sha_map = {s["output"]: s["input"] for s in sha256s}
+decrypts = [op for op in log if op["op"] == "decrypt"]
+# Get info for first few payload chunks
+def get_chunk_info(dec_idx):
+    d = decrypts[dec_idx]
+    sha_inp = bytes.fromhex(sha_map[d["aes_key"]])
+    s1, s2 = struct.unpack_from("<QQ", sha_inp, 0)
+    chk = sha_inp[16:32]
+    chk_pos = fdata.find(chk)
+    return {
+        "dec_idx": dec_idx,
+        "enc_size": d["input_size"],
+        "size1": s1,
+        "size2": s2,
+        "chk": chk,
+        "chk_pos": chk_pos,
+    }
+# Focus on first few consecutive large chunks
+# From the sorted output, the order in file is: dec#02, dec#03, dec#06, dec#11, dec#16, dec#23, ...
+chunks_in_order = [2, 3, 6, 11, 16, 23, 28, 33]
+infos = [get_chunk_info(i) for i in chunks_in_order]
+print("=== Chunk boundary analysis ===\n")
+for i, info in enumerate(infos):
+    print(f"dec#{info['dec_idx']:02d}: chk_pos={info['chk_pos']}, size1={info['size1']}, enc_size={info['enc_size']}")
+    if i > 0:
+        prev = infos[i-1]
+        # Hypothesis: on-disk encrypted data = size1 + 8 (data_size + container_header)
+        prev_data_start = prev['chk_pos'] + 32
+        prev_on_disk = prev['size1'] + 8
+        expected_next_chk = prev_data_start + prev_on_disk
+        actual_next_chk = info['chk_pos']
+        delta = actual_next_chk - expected_next_chk
+        print(f"  Expected chk_pos: {expected_next_chk}, actual: {actual_next_chk}, delta: {delta}")
+# Now figure out the EXACT header structure
+print("\n=== Bytes around first few chunk boundaries ===\n")
+# Between DX and first chunk
+dx_end = 24 + 22624  # = 22648
+print(f"--- DX end ({dx_end}) to first chunk ---")
+for off in range(dx_end, infos[0]['chk_pos'] + 48, 8):
+    raw = fdata[off:off+8]
+    val = struct.unpack_from("<Q", raw)[0] if len(raw) == 8 else 0
+    print(f"  {off:>8}: {raw.hex()}  (uint64={val})")
+# Between chunk 0 and chunk 1
+c0 = infos[0]
+c1 = infos[1]
+# data starts at chk_pos + 32, on-disk size is approximately size1+8 or enc_size
+# Let's look at bytes around where the boundary should be
+c0_data_start = c0['chk_pos'] + 32
+c0_approx_end = c0_data_start + c0['size1'] + 8
+print(f"\n--- End of dec#{c0['dec_idx']:02d} / Start of dec#{c1['dec_idx']:02d} ---")
+print(f"  c0 data_start: {c0_data_start}")
+print(f"  c0 size1+8: {c0['size1']+8}")
+print(f"  c0 approx end: {c0_approx_end}")
+print(f"  c1 chk_pos: {c1['chk_pos']}")
+for off in range(c0_approx_end - 16, c1['chk_pos'] + 48, 8):
+    raw = fdata[off:off+8]
+    val = struct.unpack_from("<Q", raw)[0] if len(raw) == 8 else 0
+    ascii_s = ''.join(chr(b) if 32 <= b < 127 else '.' for b in raw)
+    print(f"  {off:>8}: {raw.hex()}  val={val:<15d}  {ascii_s}")
+# Check file header
+header_size = struct.unpack_from("<Q", fdata, 0)[0]
+print(f"\nFile header uint64: {header_size}")
+print(f"  = file[0:8] as uint64 LE")
+# What if it's NOT a uint64 but two uint32?
+h1, h2 = struct.unpack_from("<II", fdata, 0)
+print(f"  As two uint32: ({h1}, {h2})")
+# file[0:24] detailed view
+print("\nFile header [0:24]:")
+for off in range(0, 24, 8):
+    raw = fdata[off:off+8]
+    val = struct.unpack_from("<Q", raw)[0]
+    print(f"  {off:>3}: {raw.hex()}  uint64={val}")

_archive/analysis/analyze_crypto_log.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""Analyze crypto_log.json to understand decrypt sequence and chunk mapping."""
+import json
+import struct
+with open("temp/crypto_log.json") as f:
+    log = json.load(f)
+decrypts = [op for op in log if op["op"] == "decrypt"]
+sha256s = [op for op in log if op["op"] == "sha256"]
+encrypts = [op for op in log if op["op"] == "encrypt"]
+print(f"Total ops: {len(log)} (sha256={len(sha256s)}, decrypt={len(decrypts)}, encrypt={len(encrypts)})")
+# Build SHA256 output -> input mapping
+sha_map = {}  # output_hex -> input_hex
+for s in sha256s:
+    sha_map[s["output"]] = s["input"]
+# Pair each decrypt with its SHA256 key derivation
+print("\n=== Decrypt operations with key derivation ===")
+for i, d in enumerate(decrypts):
+    key = d["aes_key"]
+    sha_input_hex = sha_map.get(key, "UNKNOWN")
+    sha_input = bytes.fromhex(sha_input_hex) if sha_input_hex != "UNKNOWN" else b""
+    if len(sha_input) == 48:
+        desc = "DX_KEY (master+file[8:24])"
+    elif len(sha_input) == 32:
+        s1, s2 = struct.unpack_from("<QQ", sha_input, 0)
+        chk = sha_input[16:32].hex()[:16] + "..."
+        desc = f"CHK sizes=({s1},{s2}) chk={chk}"
+    elif len(sha_input) == 16:
+        s1, s2 = struct.unpack_from("<QQ", sha_input, 0)
+        desc = f"NOCHK sizes=({s1},{s2})"
+    else:
+        desc = f"len={len(sha_input)}"
+    first = d["first_bytes"][:32]
+    print(f"  dec#{i:02d}: size={d['input_size']:>8}B  {desc:50s}  out={first}")
+# Now search for plaintext first_bytes in decrypted DX to find embedded chunks
+dx = open("temp/dx_index_decrypted.bin", "rb").read()
+fdata = open("ocr_data/oneocr.onemodel", "rb").read()
+print("\n=== Locating encrypted data ===")
+for i, d in enumerate(decrypts):
+    size = d["input_size"]
+    first = bytes.fromhex(d["first_bytes"][:32])
+    # Search in decrypted DX for the plaintext (this was decrypted in-place)
+    # But we need the CIPHERTEXT, which is in the original file (encrypted DX) or payload
+    # For chunks embedded in DX: ciphertext is at file offset 24 + dx_offset
+    # For chunks in payload: ciphertext is at some file offset after 22684
+    # Let's find plaintext in decrypted DX
+    dx_pos = dx.find(first)
+    # Find ciphertext (first 16 bytes from hook_decrypt dumps)
+    # We don't have ciphertext in logs, but we know:
+    # - DX encrypted data is at file[24:24+22624]
+    # - Payload data is after file[22684]
+    if i == 0:
+        loc = "DX index itself at file[24:]"
+    elif dx_pos >= 0:
+        loc = f"embedded in DX at dx_offset={dx_pos} (file_off={24+dx_pos})"
+    else:
+        loc = "payload (after file[22684])"
+    print(f"  dec#{i:02d}: size={size:>8}B  {loc}")
+# Scan DX for all uint64 pairs where second = first + 24
+print("\n=== All size-pair patterns in DX (s2 = s1 + 24) ===")
+pairs = []
+for off in range(0, len(dx) - 16):
+    s1, s2 = struct.unpack_from("<QQ", dx, off)
+    if s2 == s1 + 24 and 0 < s1 < 100_000_000 and s1 > 10:
+        pairs.append((off, s1, s2))
+print(f"Found {len(pairs)} size pairs")
+# Deduplicate overlapping pairs
+filtered = []
+for p in pairs:
+    if not filtered or p[0] >= filtered[-1][0] + 16:
+        filtered.append(p)
+print(f"After dedup: {len(filtered)} pairs")
+for off, s1, s2 in filtered:
+    # Check if there's a 16-byte checksum before this pair
+    has_chk = False
+    if off >= 16:
+        # Check if the 16 bytes before could be a checksum (non-trivial bytes)
+        potential_chk = dx[off-16:off]
+        non_zero = sum(1 for b in potential_chk if b != 0)
+        has_chk = non_zero > 8  # At least 8 non-zero bytes
+    print(f"  offset={off:>5} (0x{off:04x}): sizes=({s1}, {s2})  chk_before={'YES' if has_chk else 'no'}")

_archive/analysis/analyze_decrypt.py ADDED Viewed

	@@ -0,0 +1,581 @@

+"""
+OneOCR .onemodel file analysis and decryption attempt.
+Known facts:
+- AES-256-CFB via Windows BCrypt CNG API
+- SHA256 used somewhere in the process
+- Key: kj)TGtrK>f]b[Piow.gU+nC@s""""""4  (32 ASCII bytes = 256 bits)
+- After decryption → decompression (zlib/lz4/etc.)
+- Error on wrong key: meta->magic_number == MAGIC_NUMBER (0 vs. 1)
+"""
+import struct
+import hashlib
+import zlib
+import os
+from collections import Counter
+from typing import Optional
+# ── Try to import crypto libraries ──
+try:
+    from Crypto.Cipher import AES as PyCryptoAES
+    HAS_PYCRYPTODOME = True
+except ImportError:
+    HAS_PYCRYPTODOME = False
+    print("[WARN] PyCryptodome not available, install with: pip install pycryptodome")
+try:
+    from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
+    from cryptography.hazmat.backends import default_backend
+    HAS_CRYPTOGRAPHY = True
+except ImportError:
+    HAS_CRYPTOGRAPHY = False
+    print("[WARN] cryptography not available, install with: pip install cryptography")
+# ═══════════════════════════════════════════════════════════════
+# CONFIGURATION
+# ═══════════════════════════════════════════════════════════════
+MODEL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.onemodel"
+# The key as raw bytes (32 bytes = 256 bits for AES-256)
+KEY_RAW = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+assert len(KEY_RAW) == 32, f"Key must be 32 bytes, got {len(KEY_RAW)}"
+# SHA256 of the key (another possible key derivation)
+KEY_SHA256 = hashlib.sha256(KEY_RAW).digest()
+# ═══════════════════════════════════════════════════════════════
+# HELPER FUNCTIONS
+# ═══════════════════════════════════════════════════════════════
+def hex_dump(data: bytes, offset: int = 0, max_lines: int = 32) -> str:
+    """Format bytes as hex dump with ASCII column."""
+    lines = []
+    for i in range(0, min(len(data), max_lines * 16), 16):
+        hex_part = " ".join(f"{b:02x}" for b in data[i:i+16])
+        ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in data[i:i+16])
+        lines.append(f"  {offset+i:08x}: {hex_part:<48s}  {ascii_part}")
+    return "\n".join(lines)
+def entropy(data: bytes) -> float:
+    """Calculate Shannon entropy (0-8 bits per byte)."""
+    if not data:
+        return 0.0
+    import math
+    freq = Counter(data)
+    total = len(data)
+    return -sum((c / total) * math.log2(c / total) for c in freq.values())
+def unique_byte_ratio(data: bytes) -> str:
+    """Return unique bytes count."""
+    return f"{len(set(data))}/256"
+def check_known_headers(data: bytes) -> list[str]:
+    """Check if data starts with known file/compression magic numbers."""
+    findings = []
+    if len(data) < 4:
+        return findings
+    # Magic number checks
+    magics = {
+        b"\x08": "Protobuf varint field tag (field 1, wire type 0)",
+        b"\x0a": "Protobuf length-delimited field tag (field 1, wire type 2)",
+        b"\x78\x01": "Zlib (low compression)",
+        b"\x78\x5e": "Zlib (default compression)",
+        b"\x78\x9c": "Zlib (best speed/default)",
+        b"\x78\xda": "Zlib (best compression)",
+        b"\x1f\x8b": "Gzip",
+        b"\x04\x22\x4d\x18": "LZ4 frame",
+        b"\x28\xb5\x2f\xfd": "Zstandard",
+        b"\xfd\x37\x7a\x58\x5a\x00": "XZ",
+        b"\x42\x5a\x68": "Bzip2",
+        b"PK": "ZIP archive",
+        b"\x89PNG": "PNG image",
+        b"ONNX": "ONNX text",
+        b"\x08\x00": "Protobuf: field 1, varint, value will follow",
+        b"\x08\x01": "Protobuf: field 1, varint = 1 (could be magic_number=1!)",
+        b"\x08\x02": "Protobuf: field 1, varint = 2",
+        b"\x08\x03": "Protobuf: field 1, varint = 3",
+        b"\x08\x04": "Protobuf: field 1, varint = 4",
+        b"\x50\x42": "Possible PB (protobuf) marker",
+        b"\x01\x00\x00\x00": "uint32 LE = 1 (possible magic_number=1)",
+        b"\x00\x00\x00\x01": "uint32 BE = 1 (possible magic_number=1)",
+    }
+    for magic, desc in magics.items():
+        if data[:len(magic)] == magic:
+            findings.append(f"  ★ MATCH: {desc} ({magic.hex()})")
+    # Check first uint32 LE/BE
+    u32_le = struct.unpack_from("<I", data, 0)[0]
+    u32_be = struct.unpack_from(">I", data, 0)[0]
+    if u32_le == 1:
+        findings.append(f"  ★ uint32_LE at offset 0 = 1 (MAGIC_NUMBER match!)")
+    if u32_be == 1:
+        findings.append(f"  ★ uint32_BE at offset 0 = 1 (MAGIC_NUMBER match!)")
+    return findings
+def try_decompress(data: bytes, label: str = "") -> Optional[bytes]:
+    """Try various decompression methods."""
+    results = []
+    # Zlib (with and without header)
+    for wbits in [15, -15, 31]:  # standard, raw deflate, gzip
+        try:
+            dec = zlib.decompress(data, wbits)
+            results.append(("zlib" + (f" wbits={wbits}" if wbits != 15 else ""), dec))
+        except:
+            pass
+    # LZ4
+    try:
+        import lz4.frame
+        dec = lz4.frame.decompress(data)
+        results.append(("lz4.frame", dec))
+    except:
+        pass
+    try:
+        import lz4.block
+        for size in [1 << 20, 1 << 22, 1 << 24]:
+            try:
+                dec = lz4.block.decompress(data, uncompressed_size=size)
+                results.append((f"lz4.block (uncompressed_size={size})", dec))
+                break
+            except:
+                pass
+    except:
+        pass
+    # Zstandard
+    try:
+        import zstandard as zstd
+        dctx = zstd.ZstdDecompressor()
+        dec = dctx.decompress(data, max_output_size=len(data) * 10)
+        results.append(("zstandard", dec))
+    except:
+        pass
+    if results:
+        for method, dec in results:
+            print(f"    ✓ {label} Decompression SUCCESS with {method}: {len(dec)} bytes")
+            print(f"      First 64 bytes: {dec[:64].hex()}")
+            print(f"      Entropy: {entropy(dec[:4096]):.3f}, unique: {unique_byte_ratio(dec[:4096])}")
+            headers = check_known_headers(dec)
+            for h in headers:
+                print(f"      {h}")
+        return results[0][1]
+    return None
+def decrypt_aes_cfb(data: bytes, key: bytes, iv: bytes, segment_size: int = 8) -> Optional[bytes]:
+    """Decrypt using AES-CFB with given parameters."""
+    if HAS_PYCRYPTODOME:
+        try:
+            cipher = PyCryptoAES.new(key, PyCryptoAES.MODE_CFB, iv=iv, segment_size=segment_size)
+            return cipher.decrypt(data)
+        except Exception as e:
+            return None
+    if HAS_CRYPTOGRAPHY:
+        try:
+            if segment_size == 128:
+                cipher = Cipher(algorithms.AES(key), modes.CFB(iv), backend=default_backend())
+            elif segment_size == 8:
+                cipher = Cipher(algorithms.AES(key), modes.CFB8(iv), backend=default_backend())
+            else:
+                return None
+            decryptor = cipher.decryptor()
+            return decryptor.update(data) + decryptor.finalize()
+        except Exception as e:
+            return None
+    return None
+def analyze_decrypted(data: bytes, label: str) -> bool:
+    """Analyze decrypted data and return True if it looks promising."""
+    if data is None:
+        return False
+    ent = entropy(data[:4096])
+    unique = unique_byte_ratio(data[:4096])
+    headers = check_known_headers(data)
+    is_promising = (
+        ent < 7.5 or  # reduced entropy
+        len(headers) > 0 or  # known header match
+        data[:4] == b"\x01\x00\x00\x00" or  # magic_number = 1 LE
+        data[:4] == b"\x00\x00\x00\x01" or  # magic_number = 1 BE
+        data[:2] == b"\x08\x01"  # protobuf magic_number = 1
+    )
+    if is_promising:
+        print(f"  ★★★ PROMISING: {label}")
+        print(f"    Entropy: {ent:.3f}, Unique bytes: {unique}")
+        print(f"    First 128 bytes:")
+        print(hex_dump(data[:128]))
+        for h in headers:
+            print(f"    {h}")
+        # Try decompression on promising results
+        try_decompress(data, label)
+        # If starts with protobuf-like data or magic=1, also try decompressing after skipping some bytes
+        for skip in [4, 8, 12, 16, 20]:
+            if len(data) > skip + 10:
+                try_decompress(data[skip:], f"{label} [skip {skip} bytes]")
+        return True
+    return False
+# ═══════════════════════════════════════════════════════════════
+# MAIN ANALYSIS
+# ═══════════════════════════════════════════════════════════════
+def main():
+    print("=" * 80)
+    print("OneOCR .onemodel File Analysis & Decryption Attempt")
+    print("=" * 80)
+    # ── Step 1: Read file ──
+    with open(MODEL_PATH, "rb") as f:
+        full_data = f.read()
+    filesize = len(full_data)
+    print(f"\nFile size: {filesize:,} bytes ({filesize/1024/1024:.2f} MB)")
+    # ── Step 2: Parse top-level structure ──
+    print("\n" + "═" * 80)
+    print("SECTION 1: FILE STRUCTURE ANALYSIS")
+    print("═" * 80)
+    header_offset = struct.unpack_from("<I", full_data, 0)[0]
+    field_at_4 = struct.unpack_from("<I", full_data, 4)[0]
+    print(f"\n  [0-3]   uint32_LE (header_offset/size): {header_offset} (0x{header_offset:08x})")
+    print(f"  [4-7]   uint32_LE: {field_at_4} (0x{field_at_4:08x})")
+    # Check if it's a uint64
+    u64_at_0 = struct.unpack_from("<Q", full_data, 0)[0]
+    print(f"  [0-7]   uint64_LE: {u64_at_0} (0x{u64_at_0:016x})")
+    # Analyze the metadata at offset 22636
+    print(f"\n  At offset {header_offset} (0x{header_offset:04x}):")
+    meta_magic_8 = full_data[header_offset:header_offset+8]
+    meta_size = struct.unpack_from("<Q", full_data, header_offset + 8)[0]
+    print(f"    [+0..+7]  8 bytes: {meta_magic_8.hex()}")
+    print(f"    [+8..+15] uint64_LE: {meta_size:,} (0x{meta_size:016x})")
+    encrypted_start = header_offset + 16
+    encrypted_size = meta_size
+    print(f"    Encrypted payload: offset {encrypted_start} ({encrypted_start:#x}), size {encrypted_size:,}")
+    print(f"    Check: {encrypted_start} + {encrypted_size} = {encrypted_start + encrypted_size} "
+          f"vs filesize {filesize} → {'MATCH ✓' if encrypted_start + encrypted_size == filesize else 'MISMATCH ✗'}")
+    # ── Step 3: Analyze header region ──
+    print(f"\n  Header region [8 .. {header_offset-1}]: {header_offset - 8} bytes")
+    header_data = full_data[8:header_offset]
+    print(f"    Entropy: {entropy(header_data[:4096]):.3f}")
+    print(f"    Unique bytes (first 4KB): {unique_byte_ratio(header_data[:4096])}")
+    print(f"    Null bytes: {header_data.count(0)}/{len(header_data)}")
+    # ── Step 4: Analyze encrypted payload region ──
+    print(f"\n  Encrypted payload [{encrypted_start} .. {filesize-1}]: {encrypted_size:,} bytes")
+    payload_sample = full_data[encrypted_start:encrypted_start+4096]
+    print(f"    Entropy (first 4KB): {entropy(payload_sample):.3f}")
+    print(f"    Unique bytes (first 4KB): {unique_byte_ratio(payload_sample)}")
+    # ── Step 5: Look for structure in metadata ──
+    print(f"\n  Detailed metadata dump at offset {header_offset}:")
+    print(hex_dump(full_data[header_offset:header_offset+128], offset=header_offset))
+    # Parse more fields from the metadata region
+    print(f"\n  Parsing fields after metadata header:")
+    meta_region = full_data[header_offset:header_offset + 256]
+    for i in range(0, 128, 4):
+        u32 = struct.unpack_from("<I", meta_region, i)[0]
+        if u32 > 0 and u32 < filesize:
+            print(f"    +{i:3d}: u32={u32:12,d} (0x{u32:08x})"
+                  f"  {'← could be offset/size' if 100 < u32 < filesize else ''}")
+    # ── Step 6: Hash analysis of key ──
+    print("\n" + "═" * 80)
+    print("SECTION 2: KEY ANALYSIS")
+    print("═" * 80)
+    print(f"\n  Raw key ({len(KEY_RAW)} bytes): {KEY_RAW}")
+    print(f"  Raw key hex: {KEY_RAW.hex()}")
+    print(f"  SHA256 of key: {KEY_SHA256.hex()}")
+    # Check if SHA256 of key appears in the file header
+    if KEY_SHA256 in full_data[:header_offset + 256]:
+        idx = full_data.index(KEY_SHA256)
+        print(f"  ★ SHA256 of key FOUND in file at offset {idx}!")
+    else:
+        print(f"  SHA256 of key not found in first {header_offset + 256} bytes")
+    # Check if the 8-byte magic at offset 22636 could be related to key hash
+    key_sha256_first8 = KEY_SHA256[:8]
+    print(f"  First 8 bytes of SHA256(key): {key_sha256_first8.hex()}")
+    print(f"  8 bytes at offset {header_offset}: {meta_magic_8.hex()}")
+    print(f"  Match: {'YES ★' if key_sha256_first8 == meta_magic_8 else 'NO'}")
+    # ── Step 7: Decryption attempts ──
+    print("\n" + "═" * 80)
+    print("SECTION 3: DECRYPTION ATTEMPTS")
+    print("═" * 80)
+    # Prepare IV candidates
+    iv_zero = b"\x00" * 16
+    iv_from_8 = full_data[8:24]
+    iv_from_4 = full_data[4:20]
+    iv_from_file_start = full_data[0:16]
+    iv_from_meta = full_data[header_offset:header_offset + 16]
+    iv_from_meta_8 = meta_magic_8 + b"\x00" * 8  # pad the 8-byte magic to 16
+    # SHA256 of key, take first 16 bytes as IV
+    iv_sha256_key_first16 = KEY_SHA256[:16]
+    iv_candidates = {
+        "all-zeros": iv_zero,
+        "file[8:24]": iv_from_8,
+        "file[4:20]": iv_from_4,
+        "file[0:16]": iv_from_file_start,
+        f"file[{header_offset}:{header_offset+16}]": iv_from_meta,
+        "meta_magic+padding": iv_from_meta_8,
+        "SHA256(key)[:16]": iv_sha256_key_first16,
+    }
+    # Key candidates
+    key_candidates = {
+        "RAW key (32 bytes)": KEY_RAW,
+        "SHA256(RAW key)": KEY_SHA256,
+    }
+    # Data regions to try decrypting
+    # We try both the header data and the start of the encrypted payload
+    regions = {
+        "header[8:22636]": full_data[8:min(8 + 4096, header_offset)],
+        f"payload[{encrypted_start}:]": full_data[encrypted_start:encrypted_start + 4096],
+    }
+    # Also try: what if the entire region from byte 8 to end is one encrypted blob?
+    regions["all_encrypted[8:]"] = full_data[8:8 + 4096]
+    # Segment sizes: Windows BCrypt CFB defaults to 8-bit (CFB8), also try 128-bit (CFB128)
+    segment_sizes = [8, 128]
+    total_attempts = 0
+    promising_results = []
+    for key_name, key in key_candidates.items():
+        for iv_name, iv in iv_candidates.items():
+            for seg_size in segment_sizes:
+                for region_name, region_data in regions.items():
+                    total_attempts += 1
+                    label = f"key={key_name}, iv={iv_name}, CFB{seg_size}, region={region_name}"
+                    decrypted = decrypt_aes_cfb(region_data, key, iv, seg_size)
+                    if decrypted and analyze_decrypted(decrypted, label):
+                        promising_results.append(label)
+    print(f"\n  Total attempts: {total_attempts}")
+    print(f"  Promising results: {len(promising_results)}")
+    # ── Step 8: Additional IV strategies ──
+    print("\n" + "═" * 80)
+    print("SECTION 4: ADVANCED IV STRATEGIES")
+    print("═" * 80)
+    # Strategy: IV might be derived from the file content
+    # Try every 16-byte aligned position in the first 256 bytes as IV
+    print("\n  Trying every 16-byte aligned offset in first 256 bytes as IV...")
+    for iv_offset in range(0, 256, 4):  # try every 4-byte step
+        iv_cand = full_data[iv_offset:iv_offset + 16]
+        if len(iv_cand) < 16:
+            continue
+        for key in [KEY_RAW, KEY_SHA256]:
+            for seg in [8, 128]:
+                # Try decrypting the payload
+                payload_start = encrypted_start
+                test_data = full_data[payload_start:payload_start + 4096]
+                decrypted = decrypt_aes_cfb(test_data, key, iv_cand, seg)
+                if decrypted:
+                    is_good = analyze_decrypted(decrypted,
+                        f"iv_offset={iv_offset}, key={'raw' if key == KEY_RAW else 'sha256'}, CFB{seg}, payload")
+                    if is_good:
+                        promising_results.append(f"Advanced: iv_offset={iv_offset}")
+                # Try decrypting from byte 8 (header encrypted area)
+                test_data2 = full_data[8:8 + 4096]
+                decrypted2 = decrypt_aes_cfb(test_data2, key, iv_cand, seg)
+                if decrypted2:
+                    is_good = analyze_decrypted(decrypted2,
+                        f"iv_offset={iv_offset}, key={'raw' if key == KEY_RAW else 'sha256'}, CFB{seg}, header[8:]")
+                    if is_good:
+                        promising_results.append(f"Advanced: iv_offset={iv_offset} header")
+    # ── Step 9: Try with IV = SHA256 of various things ──
+    print("\n" + "═" * 80)
+    print("SECTION 5: DERIVED IV STRATEGIES")
+    print("═" * 80)
+    derived_ivs = {
+        "SHA256(key)[:16]": hashlib.sha256(KEY_RAW).digest()[:16],
+        "SHA256(key)[16:]": hashlib.sha256(KEY_RAW).digest()[16:],
+        "SHA256('')[:16]": hashlib.sha256(b"").digest()[:16],
+        "MD5(key)": hashlib.md5(KEY_RAW).digest(),
+        "SHA256(file[0:8])[:16]": hashlib.sha256(full_data[0:8]).digest()[:16],
+        "SHA256(file[0:4])[:16]": hashlib.sha256(full_data[0:4]).digest()[:16],
+        "SHA256('oneocr')[:16]": hashlib.sha256(b"oneocr").digest()[:16],
+        "SHA256('oneocr.onemodel')[:16]": hashlib.sha256(b"oneocr.onemodel").digest()[:16],
+    }
+    for iv_name, iv in derived_ivs.items():
+        for key_name, key in key_candidates.items():
+            for seg in [8, 128]:
+                for region_name, region_data in regions.items():
+                    label = f"key={key_name}, iv={iv_name}, CFB{seg}, region={region_name}"
+                    decrypted = decrypt_aes_cfb(region_data, key, iv, seg)
+                    if decrypted and analyze_decrypted(decrypted, label):
+                        promising_results.append(label)
+    # ── Step 10: What if the structure is different? ──
+    print("\n" + "═" * 80)
+    print("SECTION 6: ALTERNATIVE STRUCTURE HYPOTHESES")
+    print("═" * 80)
+    # Hypothesis A: Bytes 0-3 = offset, 4-7 = 0, 8-23 = IV, 24+ = encrypted data
+    print("\n  Hypothesis A: [0-3]=offset, [4-7]=flags, [8-23]=IV, [24+]=encrypted")
+    iv_hyp_a = full_data[8:24]
+    encrypted_hyp_a = full_data[24:24 + 4096]
+    for key_name, key in key_candidates.items():
+        for seg in [8, 128]:
+            dec = decrypt_aes_cfb(encrypted_hyp_a, key, iv_hyp_a, seg)
+            if dec:
+                analyze_decrypted(dec, f"HypA: key={key_name}, CFB{seg}")
+    # Hypothesis B: [0-7]=header, [8-23]=IV, [24-22635]=encrypted meta, then payload also encrypted
+    print("\n  Hypothesis B: [0-7]=header, [22636-22651]=16-byte meta, payload starts at 22652")
+    print(f"    If meta[22636:22652] contains IV for payload:")
+    iv_hyp_b = full_data[header_offset:header_offset + 16]
+    enc_payload = full_data[encrypted_start:encrypted_start + 4096]
+    for key_name, key in key_candidates.items():
+        for seg in [8, 128]:
+            dec = decrypt_aes_cfb(enc_payload, key, iv_hyp_b, seg)
+            if dec:
+                analyze_decrypted(dec, f"HypB: key={key_name}, CFB{seg}, payload with meta-IV")
+    # Hypothesis C: The entire file from byte 8 to end is one encrypted stream (IV = zeros)
+    print("\n  Hypothesis C: Single encrypted stream from byte 8, IV=zeros")
+    single_stream = full_data[8:8 + 4096]
+    for key_name, key in key_candidates.items():
+        for seg in [8, 128]:
+            dec = decrypt_aes_cfb(single_stream, key, iv_zero, seg)
+            if dec:
+                analyze_decrypted(dec, f"HypC: key={key_name}, CFB{seg}")
+    # Hypothesis D: Encrypted data starts right at byte 0 (the header_size field IS part of encrypted data)
+    # This would mean the header_size value 22636 is coincidental
+    print("\n  Hypothesis D: Encrypted from byte 0, IV=zeros")
+    for key_name, key in key_candidates.items():
+        for seg in [8, 128]:
+            dec = decrypt_aes_cfb(full_data[:4096], key, iv_zero, seg)
+            if dec:
+                analyze_decrypted(dec, f"HypD: key={key_name}, CFB{seg}, from byte 0")
+    # Hypothesis E: Windows CNG might prepend IV to ciphertext
+    # So bytes 0-3 = header_size, 4-7 = 0, 8-23 = IV (embedded in encrypted blob), 24+ = ciphertext
+    print("\n  Hypothesis E: IV prepended to ciphertext at various offsets")
+    for data_start in [0, 4, 8]:
+        iv_e = full_data[data_start:data_start + 16]
+        ct_e = full_data[data_start + 16:data_start + 16 + 4096]
+        for key_name, key in key_candidates.items():
+            for seg in [8, 128]:
+                dec = decrypt_aes_cfb(ct_e, key, iv_e, seg)
+                if dec:
+                    analyze_decrypted(dec, f"HypE: data_start={data_start}, key={key_name}, CFB{seg}")
+    # ── Step 11: Try OFB and CTR modes too (just in case CFB was misidentified) ──
+    print("\n" + "═" * 80)
+    print("SECTION 7: ALTERNATIVE CIPHER MODES (OFB, CBC)")
+    print("═" * 80)
+    if HAS_PYCRYPTODOME:
+        for data_start in [8, 24, encrypted_start]:
+            for iv_offset in [0, 4, 8]:
+                iv_alt = full_data[iv_offset:iv_offset + 16]
+                test_data = full_data[data_start:data_start + 4096]
+                for key in [KEY_RAW, KEY_SHA256]:
+                    key_label = "raw" if key == KEY_RAW else "sha256"
+                    # OFB
+                    try:
+                        cipher = PyCryptoAES.new(key, PyCryptoAES.MODE_OFB, iv=iv_alt)
+                        dec = cipher.decrypt(test_data)
+                        analyze_decrypted(dec, f"OFB: data@{data_start}, iv@{iv_offset}, key={key_label}")
+                    except:
+                        pass
+                    # CBC (needs padding but try anyway)
+                    try:
+                        cipher = PyCryptoAES.new(key, PyCryptoAES.MODE_CBC, iv=iv_alt)
+                        dec = cipher.decrypt(test_data)
+                        analyze_decrypted(dec, f"CBC: data@{data_start}, iv@{iv_offset}, key={key_label}")
+                    except:
+                        pass
+                    # ECB (no IV)
+                    try:
+                        cipher = PyCryptoAES.new(key, PyCryptoAES.MODE_ECB)
+                        # ECB needs data aligned to 16 bytes
+                        aligned = test_data[:len(test_data) - (len(test_data) % 16)]
+                        dec = cipher.decrypt(aligned)
+                        analyze_decrypted(dec, f"ECB: data@{data_start}, key={key_label}")
+                    except:
+                        pass
+    # ── Step 12: Summary ──
+    print("\n" + "═" * 80)
+    print("SUMMARY")
+    print("═" * 80)
+    print(f"\n  File structure (confirmed):")
+    print(f"    [0x0000 - 0x0007]  8-byte header: offset = {header_offset}")
+    print(f"    [0x0008 - 0x{header_offset-1:04x}]  Encrypted header data ({header_offset - 8} bytes)")
+    print(f"    [0x{header_offset:04x} - 0x{header_offset+7:04x}]  8-byte magic/hash: {meta_magic_8.hex()}")
+    print(f"    [0x{header_offset+8:04x} - 0x{header_offset+15:04x}]  uint64 payload size: {meta_size:,}")
+    print(f"    [0x{encrypted_start:04x} - 0x{filesize-1:07x}]  Encrypted payload ({encrypted_size:,} bytes)")
+    print(f"\n  Key info:")
+    print(f"    Raw key: {KEY_RAW}")
+    print(f"    Raw key hex: {KEY_RAW.hex()}")
+    print(f"    SHA256(key): {KEY_SHA256.hex()}")
+    print(f"\n  Total promising decryption results: {len(promising_results)}")
+    for r in promising_results:
+        print(f"    ★ {r}")
+    if not promising_results:
+        print("\n  No successful decryption found with standard approaches.")
+        print("  Possible reasons:")
+        print("    1. The key might be processed differently (PBKDF2, HKDF, etc.)")
+        print("    2. The IV might be derived in a non-standard way")
+        print("    3. The file structure might be more complex")
+        print("    4. The CBC/CFB segment size might be non-standard")
+        print("    5. There might be additional authentication (AEAD)")
+        print("    6. The BCrypt CNG API might use specific key blob format")
+        print("    7. Think about BCRYPT_KEY_DATA_BLOB_HEADER structure")
+if __name__ == "__main__":
+    main()

_archive/analysis/analyze_dx.py ADDED Viewed

	@@ -0,0 +1,137 @@

+"""Analyze DX index structure to understand chunk record format."""
+import hashlib
+import struct
+import json
+from pathlib import Path
+from Crypto.Cipher import AES
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV  = b"Copyright @ OneO"
+file_data = Path("ocr_data/oneocr.onemodel").read_bytes()
+# Step 1: Decrypt DX
+header_hash = file_data[8:24]
+dx_key = hashlib.sha256(KEY + header_hash).digest()
+encrypted_dx = file_data[24:24 + 22624]
+cipher = AES.new(dx_key, AES.MODE_CFB, iv=IV, segment_size=128)
+dx = cipher.decrypt(encrypted_dx)
+assert dx[:2] == b"DX"
+# Load crypto log
+crypto_log = json.load(open("temp/crypto_log.json"))
+# Get unique SHA256 inputs in order
+sha_ops = [x for x in crypto_log if x['op'] == 'sha256']
+seen = set()
+unique_sha = []
+for s in sha_ops:
+    if s['input'] not in seen:
+        seen.add(s['input'])
+        unique_sha.append(s)
+# Get decrypt ops
+dec_ops = [x for x in crypto_log if x['op'] == 'decrypt']
+# For each SHA256 input, find its position in DX
+print("=" * 80)
+print("DX Index Structure Analysis")
+print("=" * 80)
+print(f"DX size: {len(dx)} bytes, valid: {struct.unpack('<Q', dx[8:16])[0]}")
+print()
+# Skip first SHA256 (DX key derivation uses master_key + file_header, not DX data)
+print("SHA256 input #0: DX key = SHA256(master_key + file[8:24]) [special case]")
+print()
+for i, s in enumerate(unique_sha[1:], 1):
+    inp = bytes.fromhex(s['input'])
+    pos = dx.find(inp)
+    # Also try finding parts of the input
+    first_uint64 = inp[:8]
+    pos_partial = dx.find(first_uint64)
+    if pos >= 0:
+        print(f"SHA256 #{i:3d}: len={s['input_len']:2d}  found at DX offset {pos:5d} (0x{pos:04x})")
+    elif pos_partial >= 0:
+        # The input might be rearranged from DX
+        size1 = struct.unpack('<Q', inp[:8])[0]
+        size2 = struct.unpack('<Q', inp[8:16])[0]
+        checksum = inp[16:] if len(inp) > 16 else b""
+        # Check if sizes and checksum are nearby but in different order
+        pos_sizes = dx.find(inp[:16])
+        pos_check = dx.find(checksum) if checksum else -1
+        if pos_sizes >= 0:
+            print(f"SHA256 #{i:3d}: len={s['input_len']:2d}  sizes at DX offset {pos_sizes:5d}, checksum at {pos_check}")
+        else:
+            # Sizes might be in different order or interleaved
+            pos_s1 = dx.find(first_uint64)
+            print(f"SHA256 #{i:3d}: len={s['input_len']:2d}  first_uint64 at DX offset {pos_s1:5d} (rearranged?)")
+        size1 = struct.unpack('<Q', inp[:8])[0]
+        size2 = struct.unpack('<Q', inp[8:16])[0]
+        print(f"           size1={size1} size2={size2} diff={size2-size1}")
+    else:
+        size1 = struct.unpack('<Q', inp[:8])[0]
+        size2 = struct.unpack('<Q', inp[8:16])[0]
+        print(f"SHA256 #{i:3d}: len={s['input_len']:2d}  NOT FOUND (size1={size1} size2={size2})")
+# Now let's dump DX structure around the first few records
+print()
+print("=" * 80)
+print("DX Record Structure (first 128 bytes)")
+print("=" * 80)
+off = 0
+print(f"[{off:4d}] DX Magic:    {dx[off:off+8]!r}")
+off += 8
+print(f"[{off:4d}] Valid Size:   {struct.unpack('<Q', dx[off:off+8])[0]}")
+off += 8
+print(f"[{off:4d}] Container:   {dx[off:off+8].hex()}")
+off += 8
+val = struct.unpack('<Q', dx[off:off+8])[0]
+print(f"[{off:4d}] Value:       {val} (0x{val:x})")
+off += 8
+print(f"[{off:4d}] Checksum:    {dx[off:off+16].hex()}")
+off += 16
+s1 = struct.unpack('<Q', dx[off:off+8])[0]
+s2 = struct.unpack('<Q', dx[off+8:off+16])[0]
+print(f"[{off:4d}] Sizes:       {s1}, {s2} (diff={s2-s1})")
+off += 16
+print(f"[{off:4d}] Enc data starts: {dx[off:off+32].hex()}")
+# The config chunk data is here, 11920 bytes
+config_enc_size = 11920
+config_end = off + config_enc_size
+print(f"  Config encrypted data: offset {off} to {config_end} ({config_enc_size} bytes)")
+# What's after the config?
+print(f"\n--- After config chunk ({config_end}) ---")
+for j in range(0, 80, 16):
+    pos = config_end + j
+    if pos + 16 > len(dx):
+        break
+    chunk = dx[pos:pos+16]
+    hex_str = ' '.join(f'{b:02x}' for b in chunk)
+    ascii_str = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
+    print(f"  {pos:5d} ({pos:#06x}): {hex_str:<48s} {ascii_str}")
+# Look at the area around found patterns
+for name, dx_off in [("Chunk2(encrypt) 0x2ed7", 0x2ed7),
+                      ("Chunk4(ONNX) 0x2f80", 0x2f80),
+                      ("Chunk5(ONNX2) 0x4692", 0x4692)]:
+    print(f"\n--- Area around {name} ---")
+    start = max(0, dx_off - 48)
+    for j in range(0, 128, 16):
+        pos = start + j
+        if pos + 16 > len(dx):
+            break
+        chunk = dx[pos:pos+16]
+        hex_str = ' '.join(f'{b:02x}' for b in chunk)
+        ascii_str = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
+        marker = " <<<" if pos == dx_off else ""
+        print(f"  {pos:5d} ({pos:#06x}): {hex_str:<48s} {ascii_str}{marker}")

_archive/analysis/analyze_extracted.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""Manually parse protobuf structure of extracted files."""
+from pathlib import Path
+EXTRACT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\extracted_models")
+def read_varint(data, pos):
+    val = 0
+    shift = 0
+    while pos < len(data):
+        b = data[pos]
+        pos += 1
+        val |= (b & 0x7f) << shift
+        if not (b & 0x80):
+            break
+        shift += 7
+    return val, pos
+def parse_protobuf_fields(data, max_fields=10):
+    """Parse protobuf wire format and return field info."""
+    pos = 0
+    fields = []
+    for _ in range(max_fields):
+        if pos >= len(data):
+            break
+        tag_byte = data[pos]
+        field_num = tag_byte >> 3
+        wire_type = tag_byte & 0x07
+        pos += 1
+        if wire_type == 0:  # varint
+            val, pos = read_varint(data, pos)
+            fields.append((field_num, 'varint', val, None))
+        elif wire_type == 2:  # length-delimited
+            length, pos = read_varint(data, pos)
+            if length > len(data) - pos or length < 0:
+                fields.append((field_num, 'len-delim', length, 'OVERFLOW'))
+                break
+            preview = data[pos:pos+min(length, 100)]
+            pos += length
+            fields.append((field_num, 'len-delim', length, preview))
+        elif wire_type == 1:  # 64-bit
+            val = data[pos:pos+8]
+            pos += 8
+            fields.append((field_num, '64bit', int.from_bytes(val, 'little'), None))
+        elif wire_type == 5:  # 32-bit
+            val = data[pos:pos+4]
+            pos += 4
+            fields.append((field_num, '32bit', int.from_bytes(val, 'little'), None))
+        else:
+            fields.append((field_num, f'wire{wire_type}', 0, 'UNKNOWN'))
+            break
+    return fields
+# Check top 10 largest heap files
+files = sorted(
+    [f for f in EXTRACT_DIR.glob("*.bin") if "0x271a" in f.name],
+    key=lambda f: f.stat().st_size,
+    reverse=True
+)
+print("=" * 70)
+print("PROTOBUF STRUCTURE ANALYSIS of largest heap files")
+print("=" * 70)
+for f in files[:10]:
+    data = open(f, 'rb').read(2048)
+    size = f.stat().st_size
+    print(f"\n{f.name} ({size//1024}KB):")
+    print(f"  First 32 bytes: {data[:32].hex()}")
+    fields = parse_protobuf_fields(data)
+    for fn, wt, val, preview in fields:
+        if wt == 'varint':
+            print(f"  field={fn} {wt} value={val}")
+        elif wt == 'len-delim':
+            if preview == 'OVERFLOW':
+                print(f"  field={fn} {wt} length={val} OVERFLOW!")
+            elif val < 200 and preview:
+                try:
+                    txt = preview.decode('utf-8', errors='replace')
+                    printable = all(c.isprintable() or c in '\n\r\t' for c in txt[:50])
+                    if printable and len(txt) > 0:
+                        print(f"  field={fn} {wt} length={val} text='{txt[:80]}'")
+                    else:
+                        print(f"  field={fn} {wt} length={val} hex={preview[:40].hex()}")
+                except:
+                    print(f"  field={fn} {wt} length={val} hex={preview[:40].hex()}")
+            else:
+                if preview:
+                    print(f"  field={fn} {wt} length={val} first_bytes={preview[:20].hex()}")
+                else:
+                    print(f"  field={fn} {wt} length={val}")
+        else:
+            print(f"  field={fn} {wt} value={val}")
+# Also check a mid-sized file that might be a complete model
+print("\n" + "=" * 70)
+print("CHECKING MID-SIZED FILES (100KB - 2MB range)")
+print("=" * 70)
+mid_files = sorted(
+    [f for f in EXTRACT_DIR.glob("*.bin")
+     if "0x271a" in f.name and 100*1024 < f.stat().st_size < 2*1024*1024],
+    key=lambda f: f.stat().st_size,
+    reverse=True
+)
+import onnx
+valid_count = 0
+for f in mid_files[:100]:
+    try:
+        m = onnx.load(str(f))
+        valid_count += 1
+        print(f"  VALID: {f.name} ({f.stat().st_size//1024}KB)")
+        print(f"    ir={m.ir_version} producer='{m.producer_name}' "
+              f"graph='{m.graph.name}' nodes={len(m.graph.node)}")
+    except:
+        pass
+if valid_count == 0:
+    print("  No valid ONNX models in mid-range files either.")
+# Check if the largest files might be a container/archive
+print("\n" + "=" * 70)
+print("CHECKING FOR INTERNAL ONNX BOUNDARIES IN LARGEST FILE")
+print("=" * 70)
+biggest = files[0]
+data = open(biggest, 'rb').read()
+print(f"File: {biggest.name}, total size: {len(data)} bytes")
+# Search for all occurrences of valid ONNX-like starts
+import re
+# Look for 0x08 [3-9] 0x12 pattern (ir_version + field2)
+pattern = re.compile(b'\\x08[\\x03-\\x09]\\x12')
+matches = [(m.start(), data[m.start()+1]) for m in pattern.finditer(data[:1000])]
+print(f"ONNX-like headers in first 1000 bytes: {len(matches)}")
+for offset, ir in matches[:10]:
+    print(f"  offset={offset}: ir_version={ir}")
+# Also search for "ONNX" string, "onnx" string, "graph" string
+for needle in [b'ONNX', b'onnx', b'graph', b'Conv', b'Relu', b'BatchNorm', b'MatMul']:
+    positions = [m.start() for m in re.finditer(re.escape(needle), data[:50000])]
+    if positions:
+        print(f"  Found '{needle.decode()}' at offsets: {positions[:5]}")

_archive/analysis/analyze_model.py ADDED Viewed

	@@ -0,0 +1,64 @@

+"""Analyze oneocr.onemodel file format."""
+import os
+import struct
+MODEL_PATH = r"ocr_data\oneocr.onemodel"
+with open(MODEL_PATH, "rb") as f:
+    data = f.read()
+print(f"Total size: {len(data)} bytes = {len(data)/1024/1024:.2f} MB")
+print(f"First 8 bytes (hex): {data[:8].hex()}")
+print(f"First 4 bytes as uint32 LE: {struct.unpack('<I', data[:4])[0]}")
+print(f"First 8 bytes as uint64 LE: {struct.unpack('<Q', data[:8])[0]}")
+print()
+# Search for known patterns
+patterns = [b"onnx", b"ai.onnx", b"ONNX", b"ort_", b"onnxruntime",
+            b"ir_version", b"ORTM", b"FORT", b"ORT ", b"model",
+            b"graph", b"Conv", b"Relu", b"Softmax", b"tensor",
+            b"float", b"int64", b"opset", b"producer"]
+for pattern in patterns:
+    idx = data.find(pattern)
+    if idx >= 0:
+        ctx_start = max(0, idx - 8)
+        ctx_end = min(len(data), idx + len(pattern) + 8)
+        print(f"Found '{pattern.decode(errors='replace')}' at offset {idx} (0x{idx:x})")
+        print(f"  Context hex: {data[ctx_start:ctx_end].hex()}")
+print()
+# Check entropy by sections
+import collections
+def entropy_score(chunk):
+    c = collections.Counter(chunk)
+    unique = len(c)
+    return unique
+print("Entropy analysis (unique byte values per 4KB block):")
+for i in range(0, min(len(data), 64*1024), 4096):
+    chunk = data[i:i+4096]
+    e = entropy_score(chunk)
+    print(f"  Offset 0x{i:06x}: {e}/256 unique bytes",
+          "(encrypted/compressed)" if e > 240 else "(structured)" if e < 100 else "")
+# Look at first int as possible header size
+hdr_size = struct.unpack('<I', data[:4])[0]
+print(f"\nFirst uint32 = {hdr_size} (0x{hdr_size:x})")
+print(f"If header size, data starts at offset {hdr_size}")
+if hdr_size < len(data):
+    print(f"Data at offset {hdr_size}: {data[hdr_size:hdr_size+32].hex()}")
+# Check what's at byte 8
+print(f"\nBytes 8-16: {data[8:16].hex()}")
+print(f"If offset 8 is data: unique bytes = {entropy_score(data[8:8+4096])}/256")
+# XOR analysis - try single byte XOR keys
+print("\nXOR key analysis (checking if XOR of first bytes gives ONNX protobuf header):")
+# ONNX protobuf starts with 0x08 (varint, field 1 = ir_version)
+xor_key_byte0 = data[0] ^ 0x08
+print(f"  If first byte should be 0x08: XOR key = 0x{xor_key_byte0:02x}")
+# Try XOR with that key on first 16 bytes
+test = bytes(b ^ xor_key_byte0 for b in data[:16])
+print(f"  XOR'd first 16 bytes: {test.hex()}")

_archive/analysis/decrypt_config.py ADDED Viewed

	@@ -0,0 +1,259 @@

+"""Decrypt the config chunk from DX and analyze its protobuf structure.
+Config = first encrypted payload inside DX index.
+"""
+import struct
+import hashlib
+from Crypto.Cipher import AES
+MASTER_KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV = b"Copyright @ OneO"
+def aes_cfb128_decrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
+    cipher = AES.new(key, AES.MODE_CFB, iv=iv, segment_size=128)
+    return cipher.decrypt(data)
+def decode_varint(data: bytes, offset: int) -> tuple[int, int]:
+    """Decode protobuf varint, return (value, new_offset)."""
+    result = 0
+    shift = 0
+    while offset < len(data):
+        b = data[offset]
+        result |= (b & 0x7F) << shift
+        offset += 1
+        if not (b & 0x80):
+            break
+        shift += 7
+    return result, offset
+def decode_protobuf_fields(data: bytes, indent: int = 0, max_depth: int = 3, prefix: str = ""):
+    """Recursively decode protobuf-like structure."""
+    off = 0
+    field_idx = 0
+    pad = "  " * indent
+    while off < len(data) and field_idx < 200:
+        if off >= len(data):
+            break
+        tag_byte = data[off]
+        field_num = tag_byte >> 3
+        wire_type = tag_byte & 0x07
+        if field_num == 0 or field_num > 30:
+            break
+        off += 1
+        if wire_type == 0:  # varint
+            val, off = decode_varint(data, off)
+            print(f"{pad}field {field_num} (varint): {val}")
+        elif wire_type == 2:  # length-delimited
+            length, off = decode_varint(data, off)
+            if off + length > len(data):
+                print(f"{pad}field {field_num} (bytes, len={length}): TRUNCATED at off={off}")
+                break
+            payload = data[off:off+length]
+            # Try to decode as string
+            try:
+                s = payload.decode('utf-8')
+                if all(c.isprintable() or c in '\n\r\t' for c in s):
+                    if len(s) > 100:
+                        print(f"{pad}field {field_num} (string, len={length}): {s[:100]}...")
+                    else:
+                        print(f"{pad}field {field_num} (string, len={length}): {s}")
+                else:
+                    raise ValueError()
+            except (UnicodeDecodeError, ValueError):
+                if indent < max_depth and length > 2 and length < 100000:
+                    # Try parsing as sub-message
+                    print(f"{pad}field {field_num} (msg, len={length}):")
+                    decode_protobuf_fields(payload, indent + 1, max_depth, prefix=f"{prefix}f{field_num}.")
+                else:
+                    print(f"{pad}field {field_num} (bytes, len={length}): {payload[:32].hex()}...")
+            off += length
+        elif wire_type == 5:  # 32-bit
+            if off + 4 > len(data):
+                break
+            val = struct.unpack_from("<I", data, off)[0]
+            off += 4
+            # Try float interpretation
+            fval = struct.unpack_from("<f", data, off-4)[0]
+            print(f"{pad}field {field_num} (fixed32): {val} (0x{val:08x}, float={fval:.4f})")
+        elif wire_type == 1:  # 64-bit
+            if off + 8 > len(data):
+                break
+            val = struct.unpack_from("<Q", data, off)[0]
+            off += 8
+            print(f"{pad}field {field_num} (fixed64): {val}")
+        else:
+            print(f"{pad}field {field_num} (wire={wire_type}): unknown, stopping")
+            break
+        field_idx += 1
+# Read file
+with open("ocr_data/oneocr.onemodel", "rb") as f:
+    fdata = f.read()
+# Step 1: Decrypt DX
+file_header_hash = fdata[8:24]
+dx_key = hashlib.sha256(MASTER_KEY + file_header_hash).digest()
+dx_encrypted = fdata[24:24+22624]
+dx = aes_cfb128_decrypt(dx_key, IV, dx_encrypted)
+print("=== DX Header ===")
+print(f"Magic: {dx[:8]}")
+valid_size = struct.unpack_from("<Q", dx, 8)[0]
+print(f"Valid size: {valid_size}")
+print(f"Container magic: {dx[16:24].hex()}")
+total_value = struct.unpack_from("<Q", dx, 24)[0]
+print(f"DX[24] value: {total_value}")
+checksum = dx[32:48]
+print(f"Checksum: {checksum.hex()}")
+s1, s2 = struct.unpack_from("<QQ", dx, 48)
+print(f"Sizes: ({s1}, {s2})")
+# Step 2: Decrypt config
+sha_input = dx[48:64] + dx[32:48]  # sizes + checksum
+config_key = hashlib.sha256(sha_input).digest()
+config_enc = dx[64:64+11920]
+config_dec = aes_cfb128_decrypt(config_key, IV, config_enc)
+# Save
+with open("temp/config_decrypted.bin", "wb") as f:
+    f.write(config_dec)
+print(f"\nConfig decrypted: {len(config_dec)} bytes, saved to temp/config_decrypted.bin")
+# Check container magic
+magic = config_dec[:8]
+print(f"Config container magic: {magic.hex()}")
+assert magic == bytes.fromhex("4a1a082b25000000"), "Container magic mismatch!"
+# Strip 8-byte container header
+config_data = config_dec[8:]
+print(f"Config payload: {len(config_data)} bytes")
+print("\n=== Config Protobuf Structure (top-level fields only) ===")
+# Parse just top-level to see field patterns
+off = 0
+config_fields = []
+while off < len(config_data):
+    if off >= len(config_data):
+        break
+    tag_byte = config_data[off]
+    field_num = tag_byte >> 3
+    wire_type = tag_byte & 0x07
+    if field_num == 0 or field_num > 30:
+        break
+    off += 1
+    if wire_type == 0:
+        val, off = decode_varint(config_data, off)
+        config_fields.append({"fn": field_num, "wt": wire_type, "val": val, "off": off})
+    elif wire_type == 2:
+        length, off = decode_varint(config_data, off)
+        if off + length > len(config_data):
+            break
+        payload = config_data[off:off+length]
+        # Try string
+        try:
+            s = payload.decode('ascii')
+            readable = all(c.isprintable() or c in '\n\r\t' for c in s)
+        except:
+            readable = False
+        if readable and len(payload) < 200:
+            print(f"  field {field_num} (string, len={length}, off={off}): {payload[:80]}")
+        else:
+            # check first bytes for sub-message identification
+            fbytes = payload[:16].hex()
+            print(f"  field {field_num} (msg/bytes, len={length}, off={off}): {fbytes}...")
+        config_fields.append({"fn": field_num, "wt": wire_type, "len": length, "off": off, "data": payload})
+        off += length
+    elif wire_type == 5:
+        if off + 4 > len(config_data):
+            break
+        val = struct.unpack_from("<I", config_data, off)[0]
+        config_fields.append({"fn": field_num, "wt": wire_type, "val": val, "off": off})
+        off += 4
+    elif wire_type == 1:
+        if off + 8 > len(config_data):
+            break
+        val = struct.unpack_from("<Q", config_data, off)[0]
+        config_fields.append({"fn": field_num, "wt": wire_type, "val": val, "off": off})
+        off += 8
+    else:
+        break
+# Count field types
+from collections import Counter
+field_counts = Counter(f["fn"] for f in config_fields)
+print(f"\nField type counts: {dict(field_counts)}")
+print(f"Total fields: {len(config_fields)}")
+# Decode each field 1 (repeated message) to find model entries
+print("\n=== Model entries (field 1) ===")
+f1_entries = [f for f in config_fields if f["fn"] == 1 and "data" in f]
+for i, entry in enumerate(f1_entries):
+    data = entry["data"]
+    # Parse sub-fields
+    sub_off = 0
+    name = ""
+    model_type = -1
+    onnx_path = ""
+    while sub_off < len(data):
+        tag = data[sub_off]
+        fn = tag >> 3
+        wt = tag & 7
+        if fn == 0 or fn > 20:
+            break
+        sub_off += 1
+        if wt == 0:
+            val, sub_off = decode_varint(data, sub_off)
+            if fn == 2:
+                model_type = val
+        elif wt == 2:
+            ln, sub_off = decode_varint(data, sub_off)
+            if sub_off + ln > len(data):
+                break
+            p = data[sub_off:sub_off+ln]
+            if fn == 1:
+                try:
+                    name = p.decode('ascii')
+                except:
+                    name = p.hex()
+            elif fn == 3:
+                try:
+                    onnx_path = p.decode('ascii', errors='replace')
+                except:
+                    onnx_path = p.hex()
+            sub_off += ln
+        elif wt == 5:
+            sub_off += 4
+        elif wt == 1:
+            sub_off += 8
+        else:
+            break
+    print(f"  [{i:02d}] name={name:20s} type={model_type}")
+    if onnx_path:
+        print(f"       path={onnx_path[:80]}")
+# Now look for checksums in the ENTIRE config (not just protobuf)
+print("\n=== Searching ALL known checksums in config ===")
+import json
+with open("temp/crypto_log.json") as f:
+    log = json.load(f)
+sha256s = [op for op in log if op["op"] == "sha256"]
+# Get all unique checksums from 32-byte SHA256 inputs
+checksums_found = 0
+for s in sha256s:
+    inp = bytes.fromhex(s["input"])
+    if len(inp) == 32:
+        chk = inp[16:32]  # last 16 bytes = checksum
+        pos = config_data.find(chk)
+        if pos >= 0:
+            checksums_found += 1
+            if checksums_found <= 5:
+                sizes = struct.unpack_from("<QQ", inp, 0)
+                print(f"  FOUND checksum at config offset {pos}: sizes={sizes}")
+        pos2 = config_dec.find(chk)
+        if pos2 >= 0 and pos2 < 8:
+            pass  # In container header
+print(f"Total checksums found in config: {checksums_found} / {len([s for s in sha256s if len(bytes.fromhex(s['input'])) == 32])}")

_archive/analysis/find_chunks.py ADDED Viewed

	@@ -0,0 +1,80 @@

+"""Find all chunk checksums and their positions in the .onemodel file."""
+import struct, json
+with open("ocr_data/oneocr.onemodel", "rb") as f:
+    fdata = f.read()
+log = json.load(open("temp/crypto_log.json"))
+sha256s = [op for op in log if op["op"] == "sha256"]
+sha_map = {}
+for s in sha256s:
+    sha_map[s["output"]] = s["input"]
+decrypts = [op for op in log if op["op"] == "decrypt"]
+print(f"File size: {len(fdata)} bytes")
+print(f"Payload starts at: 22684")
+# For each decrypt, find its checksum in the file
+results = []
+for i, d in enumerate(decrypts[1:], 1):  # skip DX (dec#00)
+    sha_inp = bytes.fromhex(sha_map[d["aes_key"]])
+    if len(sha_inp) < 32:
+        continue
+    chk = sha_inp[16:32]
+    s1, s2 = struct.unpack_from("<QQ", sha_inp, 0)
+    enc_size = d["input_size"]
+    pos = fdata.find(chk)
+    results.append({
+        "dec_idx": i,
+        "chk_file_offset": pos,
+        "chk_hex": chk.hex(),
+        "size1": s1,
+        "size2": s2,
+        "enc_size": enc_size,
+    })
+# Sort by checksum file offset
+results.sort(key=lambda r: r["chk_file_offset"])
+print(f"\n{'dec#':>5} {'chk_offset':>12} {'data_offset':>12} {'enc_size':>10} {'end_offset':>12} {'size1':>10} {'size2':>10}")
+print("-" * 90)
+for r in results:
+    if r["chk_file_offset"] >= 0:
+        # The chunk header is: 4_bytes + 16_checksum + 8_size1 + 8_size2 = 36 bytes
+        # Data starts at chk_offset - 4 + 36 = chk_offset + 32
+        data_off = r["chk_file_offset"] + 32
+        end_off = data_off + r["enc_size"]
+        print(f"  {r['dec_idx']:3d} {r['chk_file_offset']:12d} {data_off:12d} {r['enc_size']:10d} {end_off:12d} {r['size1']:10d} {r['size2']:10d}")
+    else:
+        print(f"  {r['dec_idx']:3d}  NOT FOUND                {r['enc_size']:10d}              {r['size1']:10d} {r['size2']:10d}")
+# Verify chunk continuity
+print("\n=== Chunk continuity check ===")
+prev_end = None
+for r in results:
+    if r["chk_file_offset"] < 0:
+        continue
+    data_off = r["chk_file_offset"] + 32
+    chunk_header_start = r["chk_file_offset"] - 4  # 4 bytes before checksum
+    if prev_end is not None:
+        gap = chunk_header_start - prev_end
+        if gap != 0:
+            print(f"  Gap between chunks: {gap} bytes (prev_end={prev_end}, next_header={chunk_header_start})")
+            if gap > 0:
+                gap_data = fdata[prev_end:chunk_header_start]
+                print(f"    Gap bytes: {gap_data.hex()}")
+    prev_end = data_off + r["enc_size"]
+print(f"\nExpected file end: {prev_end}")
+print(f"Actual file end: {len(fdata)}")
+# Verify the 4 bytes before each checksum
+print("\n=== 4 bytes before each checksum ===")
+for r in results[:10]:
+    if r["chk_file_offset"] >= 4:
+        pre = fdata[r["chk_file_offset"]-4:r["chk_file_offset"]]
+        print(f"  dec#{r['dec_idx']:02d}: pre_bytes={pre.hex()} ({struct.unpack_from('<I', pre)[0]})")

_archive/analysis/walk_payload.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Walk ALL payload chunks in the .onemodel file and decrypt them statically.
+Full cross-platform static decryptor - no DLL or Windows APIs needed.
+"""
+import struct
+import hashlib
+from Crypto.Cipher import AES
+MASTER_KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV = b"Copyright @ OneO"
+CONTAINER_MAGIC = bytes.fromhex("4a1a082b25000000")
+def aes_cfb128_decrypt(key: bytes, iv: bytes, data: bytes) -> bytes:
+    cipher = AES.new(key, AES.MODE_CFB, iv=iv, segment_size=128)
+    return cipher.decrypt(data)
+with open("ocr_data/oneocr.onemodel", "rb") as f:
+    fdata = f.read()
+# Parse file header
+H = struct.unpack_from("<Q", fdata, 0)[0]
+file_hash = fdata[8:24]
+print(f"File size: {len(fdata):,} bytes")
+print(f"Header value H: {H}")
+print(f"DX encrypted size: {H-12}")
+print(f"Payload start: {H+16}")
+# Decrypt DX index
+dx_key = hashlib.sha256(MASTER_KEY + file_hash).digest()
+dx_enc = fdata[24:H+12]
+dx = aes_cfb128_decrypt(dx_key, IV, dx_enc)
+valid_size = struct.unpack_from("<Q", dx, 8)[0]
+print(f"DX magic: {dx[:8]}")
+print(f"DX valid size: {valid_size}")
+# Decrypt config from DX
+config_sha_input = dx[48:64] + dx[32:48]  # sizes + checksum
+config_key = hashlib.sha256(config_sha_input).digest()
+config_s1 = struct.unpack_from("<Q", dx, 48)[0]
+config_enc = dx[64:64+config_s1+8]
+config_dec = aes_cfb128_decrypt(config_key, IV, config_enc)
+print(f"Config decrypted: {len(config_dec)} bytes, magic match: {config_dec[:8] == CONTAINER_MAGIC}")
+# Walk payload chunks
+off = H + 16
+chunk_idx = 0
+chunks = []
+while off + 32 <= len(fdata):
+    chk = fdata[off:off+16]
+    s1, s2 = struct.unpack_from("<QQ", fdata, off+16)
+    if s2 != s1 + 24 or s1 == 0 or s1 > len(fdata):
+        break
+    enc_size = s1 + 8
+    data_off = off + 32
+    if data_off + enc_size > len(fdata):
+        print(f"WARNING: chunk#{chunk_idx} extends past file end!")
+        break
+    # Derive per-chunk key
+    sha_input = fdata[off+16:off+32] + fdata[off:off+16]  # sizes + checksum
+    chunk_key = hashlib.sha256(sha_input).digest()
+    # Decrypt
+    dec_data = aes_cfb128_decrypt(chunk_key, IV, fdata[data_off:data_off+enc_size])
+    magic_ok = dec_data[:8] == CONTAINER_MAGIC
+    payload = dec_data[8:]  # strip container header
+    chunks.append({
+        "idx": chunk_idx,
+        "file_offset": off,
+        "data_offset": data_off,
+        "size1": s1,
+        "enc_size": enc_size,
+        "magic_ok": magic_ok,
+        "payload": payload,
+    })
+    print(f"  chunk#{chunk_idx:02d}: off={off:>10} s1={s1:>10} magic={'OK' if magic_ok else 'FAIL'} payload_start={payload[:8].hex()}")
+    off = data_off + enc_size
+    chunk_idx += 1
+print(f"\nTotal chunks: {chunk_idx}")
+print(f"File bytes remaining: {len(fdata) - off}")
+print(f"All magic OK: {all(c['magic_ok'] for c in chunks)}")
+# Identify ONNX models (start with protobuf field tags typical for ONNX ModelProto)
+print("\n=== ONNX model identification ===")
+onnx_count = 0
+for c in chunks:
+    payload = c["payload"]
+    # ONNX ModelProto fields: 1(ir_version), 2(opset_import), 3(producer_name), etc.
+    # Field 1 varint starts with 0x08
+    # Actually check for ONNX-specific protobuf pattern
+    is_onnx = False
+    if len(payload) > 100:
+        # Check for typical ONNX patterns
+        if payload[0] == 0x08 and payload[1] in (0x06, 0x07):  # ir_version 6 or 7
+            is_onnx = True
+    if is_onnx:
+        onnx_count += 1
+        print(f"  chunk#{c['idx']:02d}: ONNX model, size={len(payload):,} bytes")
+print(f"\nTotal ONNX models found: {onnx_count}")
+print(f"Total non-ONNX chunks: {chunk_idx - onnx_count}")
+# Show what non-ONNX chunks look like
+print("\n=== Non-ONNX chunk types ===")
+for c in chunks:
+    payload = c["payload"]
+    if len(payload) < 100 or payload[0] != 0x08 or payload[1] not in (0x06, 0x07):
+        # Try ASCII
+        try:
+            s = payload[:40].decode('ascii')
+            readable = all(ch.isprintable() or ch in '\n\r\t' for ch in s)
+        except:
+            readable = False
+        if readable:
+            preview = payload[:60].decode('ascii', errors='replace').replace('\n', '\\n')
+        else:
+            preview = payload[:32].hex()
+        print(f"  chunk#{c['idx']:02d}: size={len(payload):>8,} type={'text' if readable else 'binary'} preview={preview}")

_archive/analyze_lm_features.py ADDED Viewed

	@@ -0,0 +1,110 @@

+"""Understand what the 21-dim input features are for LM models 11-32.
+These models take data[1,21,1,1] → softmax[1,2] (binary classifier).
+We need to figure out what 21 features to compute from the recognizer output."""
+import onnx
+from onnx import numpy_helper
+import numpy as np
+from pathlib import Path
+import onnxruntime as ort
+# The 21 input features likely come from CTC recognizer statistics.
+# Let's test with the unlocked models using some hypothetical feature vectors.
+models_dir = Path("oneocr_extracted/onnx_models_unlocked")
+# Load a LangSm model (model_11 = Latin LangSm)
+sess_sm = ort.InferenceSession(str(list(models_dir.glob("model_11_*"))[0]))
+# Load a LangMd model (model_22 = Latin LangMd)
+sess_md = ort.InferenceSession(str(list(models_dir.glob("model_22_*"))[0]))
+print("LangSm (model_11) inputs:", [(i.name, i.shape, i.type) for i in sess_sm.get_inputs()])
+print("LangSm (model_11) outputs:", [(o.name, o.shape, o.type) for o in sess_sm.get_outputs()])
+print()
+print("LangMd (model_22) inputs:", [(i.name, i.shape, i.type) for i in sess_md.get_inputs()])
+print("LangMd (model_22) outputs:", [(o.name, o.shape, o.type) for o in sess_md.get_outputs()])
+# The normalization constants inside the model tell us about expected feature ranges
+# From earlier analysis:
+# Add constant: [-1.273, 0.396, 0.134, 0.151, 0.084, 0.346, 0.472, 0.435,
+#                 0.346, 0.581, 0.312, 0.036, 0.045, 0.033, 0.026, 0.022,
+#                 0.044, 0.038, 0.029, 0.031, 0.696]
+# Div constant: [0.641, 0.914, 0.377, 0.399, 0.302, 0.657, 0.814, 0.769,
+#                 0.658, 0.878, 0.617, 0.153, 0.166, 0.137, 0.120, 0.108,
+#                 0.132, 0.115, 0.105, 0.108, 0.385]
+#
+# This means typical feature ranges are:
+# feature[0]: mean = 1.273, std = 0.641 (large negative offset → feature is centered around 1.27)
+# feature[20]: mean = -0.696, std = 0.385
+#
+# Features 0: Large range → possibly average log-probability or entropy
+# Features 1-10: Medium range → possibly per-class probabilities or scores
+# Features 11-20: Small range → possibly confidence statistics
+# Let's check: extract normalization params from model_11
+model_11 = onnx.load(str(list(Path("oneocr_extracted/onnx_models").glob("model_11_*"))[0]))
+for node in model_11.graph.node:
+    if node.op_type == "Constant":
+        name = node.output[0]
+        if name in ['26', '28']:  # Add and Div constants
+            for attr in node.attribute:
+                if attr.type == 4:
+                    data = numpy_helper.to_array(attr.t)
+                    label = "Add (=-mean)" if name == '26' else "Div (=std)"
+                    print(f"\n{label}: {data.flatten()}")
+                    # The mean tells us the expected center of each feature
+                    if name == '26':
+                        # mean = -add_const
+                        means = -data.flatten()
+                        print(f"  Implied means: {means}")
+# Hypothesis: The 21 features are CTC decoder statistics:
+# Based on the normalization centers (means):
+# feat[0]:  ~1.27 — could be average negative log-likelihood (NLL) per character
+# feat[1]:  ~-0.40 — could be a score
+# feat[2-10]: ~0-0.5 — could be per-script probabilities from ScriptID
+# feat[11-20]: ~0-0.04 — could be character-level statistics
+# Let's test what outputs the recognizer produces
+rec_path = list(Path("oneocr_extracted/onnx_models").glob("model_02_*"))[0]
+rec_sess = ort.InferenceSession(str(rec_path))
+print(f"\nRecognizer (model_02) outputs:")
+for o in rec_sess.get_outputs():
+    print(f"  {o.name}: {o.shape}")
+# Try running recognizer and computing statistics
+test_data = np.random.randn(1, 3, 60, 200).astype(np.float32) * 0.1
+seq_lengths = np.array([50], dtype=np.int32)  # 200/4
+result = rec_sess.run(None, {"data": test_data, "seq_lengths": seq_lengths})
+logprobs = result[0]
+print(f"\nRecognizer output: {logprobs.shape}")
+print(f"  Log-prob range: [{logprobs.min():.4f}, {logprobs.max():.4f}]")
+# Compute possible features from recognizer output:
+lp = logprobs[:, 0, :]  # [T, num_classes]
+best_probs = np.exp(lp.max(axis=-1))  # Best probability per frame
+mean_best = best_probs.mean()
+print(f"\n  Mean best prob per frame: {mean_best:.4f}")
+print(f"  Mean log-prob max: {lp.max(axis=-1).mean():.4f}")
+print(f"  Entropy per frame: {(-np.exp(lp) * lp).sum(axis=-1).mean():.4f}")
+# The 21 features might be computed as:
+# feat[0] = average log-probability (NLL) → how confident the model is
+# feat[1..K] = character frequency statistics
+# feat[K+1..20] = transition statistics
+#
+# Without the exact feature computation code from the DLL, we'll need to
+# reverse-engineer or approximate the feature vector.
+# For now, test the LM models with various feature values
+print(f"\n--- Testing LM models with various inputs ---")
+for name, features in [
+    ("all_zeros", np.zeros(21)),
+    ("high_conf", np.array([0.0, 0.5, 0.9, 0.9, 0.9, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 1.0])),
+    ("low_conf", np.array([3.0, -0.5, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.1])),
+    ("typical", np.array([1.2, -0.4, 0.1, 0.15, 0.08, 0.35, 0.47, 0.43, 0.35, 0.58, 0.31, 0.04, 0.05, 0.03, 0.03, 0.02, 0.04, 0.04, 0.03, 0.03, 0.7])),
+]:
+    data = features.astype(np.float32).reshape(1, 21, 1, 1)
+    sm_out = sess_sm.run(None, {"data": data})[0]
+    md_out = sess_md.run(None, {"data": data})[0]
+    print(f"  {name:12s}: LangSm={sm_out.flatten()}, LangMd={md_out.flatten()}")

_archive/analyze_models.py ADDED Viewed

	@@ -0,0 +1,82 @@

+"""Analyze all extracted ONNX models — inputs, outputs, ops, runtime compatibility."""
+import onnx
+import onnxruntime as ort
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+print("=" * 120)
+print(f"{'#':>3} {'Name':40s} {'KB':>7} {'IR':>3} {'Producer':15s} {'Nodes':>5} {'Inputs':35s} {'Outputs':25s} {'RT':10s} Custom Ops")
+print("=" * 120)
+for f in sorted(models_dir.glob("*.onnx")):
+    m = onnx.load(str(f))
+    idx = f.name.split("_")[1]
+    ir = m.ir_version
+    prod = (m.producer_name or "?")[:15]
+    nodes = len(m.graph.node)
+    # Input shapes
+    inputs = []
+    for i in m.graph.input:
+        dims = []
+        if i.type.tensor_type.HasField("shape"):
+            for d in i.type.tensor_type.shape.dim:
+                dims.append(str(d.dim_value) if d.dim_value else d.dim_param or "?")
+        inputs.append(f"{i.name}[{','.join(dims)}]")
+    # Output shapes
+    outputs = []
+    for o in m.graph.output:
+        dims = []
+        if o.type.tensor_type.HasField("shape"):
+            for d in o.type.tensor_type.shape.dim:
+                dims.append(str(d.dim_value) if d.dim_value else d.dim_param or "?")
+        outputs.append(f"{o.name}[{','.join(dims)}]")
+    inp_str = "; ".join(inputs)[:35]
+    out_str = "; ".join(outputs)[:25]
+    # Custom ops
+    opsets = [o.domain for o in m.opset_import if o.domain]
+    custom = ", ".join(opsets) if opsets else "-"
+    # Runtime check
+    try:
+        sess = ort.InferenceSession(str(f), providers=["CPUExecutionProvider"])
+        rt = "OK"
+    except Exception as e:
+        rt = "CUSTOM"
+    size_kb = f.stat().st_size // 1024
+    print(f"{idx:>3} {f.stem:40s} {size_kb:>7} {ir:>3} {prod:15s} {nodes:>5} {inp_str:35s} {out_str:25s} {rt:10s} {custom}")
+# Summary
+print("\n=== OCR Pipeline Architecture ===")
+print("""
+OneOCR uses a MULTI-MODEL pipeline (not a single model):
+1. DETECTOR (model_03, 13MB) — text detection in image
+   - Input: image tensor → Output: bounding boxes of text regions
+2. CHARACTER RECOGNIZERS (model_00..10, 33) — per-script recognition
+   - Each script (Latin, Arabic, CJK, Cyrillic, etc.) has its own recognizer
+   - Input: cropped text region → Output: character probabilities
+   - Accompanied by: rnn.info, char2ind.txt, char2inschar.txt files
+3. SMALL LANGUAGE MODELS (model_11..32, 26-28KB each)
+   - Post-processing character-level language models
+   - One per supported script/language
+Problem for cross-platform:
+  - 23 models use custom op domain 'com.microsoft.oneocr'
+  - Custom ops like OneOCRFeatureExtract, DynamicQuantizeLSTM
+  - These are ONLY implemented in oneocr.dll (Windows)
+  - To run on Linux: need to reimplement these custom ops or find alternatives
+""")
+# Show config structure
+print("=== Config Files (per-recognizer) ===")
+config_dir = Path("oneocr_extracted/config_data")
+config = (config_dir / "chunk_66_ocr_config.config.txt").read_text(errors="replace")
+print(config[:500])

_archive/analyze_pipeline.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""Full analysis of detector and scriptID models."""
+import onnx
+import numpy as np
+from pathlib import Path
+def print_io(model_path, label):
+    m = onnx.load(str(model_path))
+    print(f'\n=== {label} ({Path(model_path).name}) ===')
+    print(f'Nodes: {len(m.graph.node)}')
+    print('Inputs:')
+    for i in m.graph.input:
+        dims = []
+        for d in i.type.tensor_type.shape.dim:
+            dims.append(str(d.dim_value) if d.dim_value else d.dim_param or '?')
+        print(f'  {i.name}: [{", ".join(dims)}] dtype={i.type.tensor_type.elem_type}')
+    print('Outputs:')
+    for o in m.graph.output:
+        dims = []
+        for d in o.type.tensor_type.shape.dim:
+            dims.append(str(d.dim_value) if d.dim_value else d.dim_param or '?')
+        print(f'  {o.name}: [{", ".join(dims)}] dtype={o.type.tensor_type.elem_type}')
+    custom = set()
+    for n in m.graph.node:
+        if n.domain:
+            custom.add((n.domain, n.op_type))
+    if custom:
+        print(f'Custom ops: {custom}')
+    else:
+        print('Custom ops: none')
+    return m
+models_dir = Path('oneocr_extracted/onnx_models')
+# Detector
+m0 = print_io(next(models_dir.glob('model_00_*')), 'DETECTOR')
+# ScriptID
+m1 = print_io(next(models_dir.glob('model_01_*')), 'SCRIPT ID')
+# A recognizer (Latin)
+m2 = print_io(next(models_dir.glob('model_02_*')), 'RECOGNIZER Latin')
+# Try running detector to see actual output shapes
+import onnxruntime as ort
+from PIL import Image
+img = Image.open('image.png').convert('RGB')
+w, h = img.size
+sess = ort.InferenceSession(str(next(models_dir.glob('model_00_*'))),
+                            providers=['CPUExecutionProvider'])
+scale = 800 / max(h, w)
+dh = (int(h * scale) + 31) // 32 * 32
+dw = (int(w * scale) + 31) // 32 * 32
+img_d = img.resize((dw, dh), Image.LANCZOS)
+arr_d = np.array(img_d, dtype=np.float32)
+arr_d = arr_d[:, :, ::-1] - [102.9801, 115.9465, 122.7717]
+data_d = arr_d.transpose(2, 0, 1)[np.newaxis].astype(np.float32)
+im_info = np.array([[dh, dw, scale]], dtype=np.float32)
+outputs = sess.run(None, {"data": data_d, "im_info": im_info})
+print(f'\n=== DETECTOR OUTPUT SHAPES (image {w}x{h} -> {dw}x{dh}) ===')
+output_names = [o.name for o in sess.get_outputs()]
+for name, out in zip(output_names, outputs):
+    print(f'  {name}: shape={out.shape} dtype={out.dtype} min={out.min():.4f} max={out.max():.4f}')
+# Specifically analyze pixel_link outputs
+# PixelLink has: pixel scores (text/non-text) + link scores (8 neighbors)
+# FPN produces 3 scales
+print('\n=== DETECTOR OUTPUT ANALYSIS ===')
+for i, (name, out) in enumerate(zip(output_names, outputs)):
+    scores = 1.0 / (1.0 + np.exp(-out))  # sigmoid
+    hot = (scores > 0.5).sum()
+    print(f'  [{i:2d}] {name:25s} shape={str(out.shape):20s} sigmoid_max={scores.max():.4f} hot_pixels(>0.5)={hot}')

_archive/attempts/bcrypt_decrypt.py ADDED Viewed

	@@ -0,0 +1,423 @@

+"""
+OneOCR .onemodel decryption using Windows BCrypt CNG API directly.
+Replicates the exact behavior of oneocr.dll's Crypto.cpp.
+Known from DLL analysis:
+- BCryptOpenAlgorithmProvider with L"AES"
+- BCryptSetProperty L"ChainingMode" = L"ChainingModeCFB"
+- BCryptGetProperty L"BlockLength" (→ 16)
+- BCryptSetProperty L"MessageBlockLength" = 16 (→ CFB128)
+- BCryptGenerateSymmetricKey with raw key bytes
+- BCryptDecrypt
+- SHA256Hash function exists (uses BCryptCreateHash/BCryptHashData/BCryptFinishHash)
+"""
+import ctypes
+import ctypes.wintypes as wintypes
+import struct
+import hashlib
+import zlib
+from collections import Counter
+import math
+import os
+# ═══════════════════════════════════════════════════════════════
+# Windows BCrypt API via ctypes
+# ═══════════════════════════════════════════════════════════════
+bcrypt = ctypes.WinDLL("bcrypt")
+BCRYPT_ALG_HANDLE = ctypes.c_void_p
+BCRYPT_KEY_HANDLE = ctypes.c_void_p
+NTSTATUS = ctypes.c_long
+# Constants
+BCRYPT_AES_ALGORITHM = "AES"
+BCRYPT_SHA256_ALGORITHM = "SHA256"
+BCRYPT_CHAINING_MODE = "ChainingMode"
+BCRYPT_CHAIN_MODE_CFB = "ChainingModeCFB"
+BCRYPT_BLOCK_LENGTH = "BlockLength"
+BCRYPT_MESSAGE_BLOCK_LENGTH = "MessageBlockLength"
+def check_status(status, msg=""):
+    if status != 0:
+        raise OSError(f"BCrypt error 0x{status & 0xFFFFFFFF:08x}: {msg}")
+def bcrypt_sha256(data: bytes) -> bytes:
+    """Compute SHA256 using Windows BCrypt API."""
+    hAlg = BCRYPT_ALG_HANDLE()
+    status = bcrypt.BCryptOpenAlgorithmProvider(
+        ctypes.byref(hAlg),
+        ctypes.c_wchar_p(BCRYPT_SHA256_ALGORITHM),
+        None, 0)
+    check_status(status, "SHA256 OpenAlgorithmProvider")
+    hHash = ctypes.c_void_p()
+    status = bcrypt.BCryptCreateHash(hAlg, ctypes.byref(hHash), None, 0, None, 0, 0)
+    check_status(status, "CreateHash")
+    status = bcrypt.BCryptHashData(hHash, data, len(data), 0)
+    check_status(status, "HashData")
+    hash_out = (ctypes.c_ubyte * 32)()
+    status = bcrypt.BCryptFinishHash(hHash, hash_out, 32, 0)
+    check_status(status, "FinishHash")
+    bcrypt.BCryptDestroyHash(hHash)
+    bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+    return bytes(hash_out)
+def bcrypt_aes_cfb_decrypt(ciphertext: bytes, key: bytes, iv: bytes,
+                            message_block_length: int = 16) -> bytes:
+    """Decrypt using AES-CFB via Windows BCrypt CNG API.
+    message_block_length: 1 for CFB8, 16 for CFB128
+    """
+    hAlg = BCRYPT_ALG_HANDLE()
+    status = bcrypt.BCryptOpenAlgorithmProvider(
+        ctypes.byref(hAlg),
+        ctypes.c_wchar_p(BCRYPT_AES_ALGORITHM),
+        None, 0)
+    check_status(status, "AES OpenAlgorithmProvider")
+    # Set chaining mode to CFB
+    mode_str = BCRYPT_CHAIN_MODE_CFB
+    mode_buf = ctypes.create_unicode_buffer(mode_str)
+    mode_size = (len(mode_str) + 1) * 2  # UTF-16 with null terminator
+    status = bcrypt.BCryptSetProperty(
+        hAlg,
+        ctypes.c_wchar_p(BCRYPT_CHAINING_MODE),
+        mode_buf, mode_size, 0)
+    check_status(status, "SetProperty ChainingMode")
+    # Set message block length (feedback size)
+    mbl = ctypes.c_ulong(message_block_length)
+    status = bcrypt.BCryptSetProperty(
+        hAlg,
+        ctypes.c_wchar_p(BCRYPT_MESSAGE_BLOCK_LENGTH),
+        ctypes.byref(mbl), ctypes.sizeof(mbl), 0)
+    check_status(status, f"SetProperty MessageBlockLength={message_block_length}")
+    # Generate symmetric key
+    hKey = BCRYPT_KEY_HANDLE()
+    key_buf = (ctypes.c_ubyte * len(key))(*key)
+    status = bcrypt.BCryptGenerateSymmetricKey(
+        hAlg, ctypes.byref(hKey), None, 0, key_buf, len(key), 0)
+    check_status(status, "GenerateSymmetricKey")
+    # Prepare IV (BCrypt modifies it during decryption, so use a copy)
+    iv_buf = (ctypes.c_ubyte * 16)(*iv)
+    # Prepare input/output buffers
+    ct_buf = (ctypes.c_ubyte * len(ciphertext))(*ciphertext)
+    pt_buf = (ctypes.c_ubyte * len(ciphertext))()
+    result_len = ctypes.c_ulong(0)
+    # Decrypt
+    status = bcrypt.BCryptDecrypt(
+        hKey, ct_buf, len(ciphertext), None,
+        iv_buf, 16,
+        pt_buf, len(ciphertext),
+        ctypes.byref(result_len), 0)
+    check_status(status, "BCryptDecrypt")
+    # Cleanup
+    bcrypt.BCryptDestroyKey(hKey)
+    bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+    return bytes(pt_buf[:result_len.value])
+def entropy(data: bytes) -> float:
+    """Shannon entropy (bits per byte)."""
+    if not data:
+        return 0.0
+    freq = Counter(data)
+    total = len(data)
+    return -sum((c / total) * math.log2(c / total) for c in freq.values())
+def hex_dump(data: bytes, offset: int = 0, max_lines: int = 8) -> str:
+    lines = []
+    for i in range(0, min(len(data), max_lines * 16), 16):
+        hex_part = " ".join(f"{b:02x}" for b in data[i:i+16])
+        ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in data[i:i+16])
+        lines.append(f"  {offset+i:08x}: {hex_part:<48s}  {ascii_part}")
+    return "\n".join(lines)
+def check_decrypted(data: bytes, label: str) -> bool:
+    """Check if decrypted data looks valid. Return True if promising."""
+    if not data or len(data) < 16:
+        return False
+    ent = entropy(data[:min(4096, len(data))])
+    u32_le = struct.unpack_from("<I", data, 0)[0]
+    # Check for magic_number = 1
+    magic_match = (u32_le == 1)
+    # Check for protobuf
+    protobuf = data[0] == 0x08 or data[0] == 0x0a
+    # Check for compression headers
+    zlib_header = data[:2] in [b"\x78\x01", b"\x78\x5e", b"\x78\x9c", b"\x78\xda"]
+    gzip_header = data[:2] == b"\x1f\x8b"
+    lz4_header = data[:4] == b"\x04\x22\x4d\x18"
+    is_promising = magic_match or (ent < 7.0) or zlib_header or gzip_header or lz4_header
+    if is_promising or protobuf:
+        print(f"\n  ★★★ {'MAGIC=1 !!!' if magic_match else 'Promising'}: {label}")
+        print(f"    Entropy: {ent:.3f}, uint32_LE[0]={u32_le}, first_byte=0x{data[0]:02x}")
+        print(f"    First 128 bytes:")
+        print(hex_dump(data[:128]))
+        if zlib_header:
+            print(f"    → ZLIB header detected!")
+        if gzip_header:
+            print(f"    → GZIP header detected!")
+        if lz4_header:
+            print(f"    → LZ4 header detected!")
+        if magic_match:
+            print(f"    → MAGIC_NUMBER = 1 !! This is likely correct decryption!")
+            # Try decompression after offset 4 or later
+            for skip in [0, 4, 8, 16, 32, 64]:
+                chunk = data[skip:skip+min(10000, len(data)-skip)]
+                try:
+                    dec = zlib.decompress(chunk)
+                    print(f"    → ZLIB decompress SUCCESS at skip={skip}: {len(dec)} bytes!")
+                    print(f"      First 64: {dec[:64].hex()}")
+                    return True
+                except:
+                    pass
+                try:
+                    dec = zlib.decompress(chunk, -15)
+                    print(f"    → Raw DEFLATE decompress SUCCESS at skip={skip}: {len(dec)} bytes!")
+                    print(f"      First 64: {dec[:64].hex()}")
+                    return True
+                except:
+                    pass
+        return True
+    return False
+# ═══════════════════════════════════════════════════════════════
+# MAIN
+# ═══════════════════════════════════════════════════════════════
+MODEL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.onemodel"
+KEY_RAW = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+KEY_SHA256 = bcrypt_sha256(KEY_RAW)
+print("=" * 80)
+print("OneOCR Decryption via Windows BCrypt CNG API")
+print("=" * 80)
+print(f"\nKey (raw): {KEY_RAW.hex()}")
+print(f"Key (SHA256): {KEY_SHA256.hex()}")
+print(f"Python hashlib SHA256: {hashlib.sha256(KEY_RAW).digest().hex()}")
+print(f"BCrypt SHA256 match: {KEY_SHA256 == hashlib.sha256(KEY_RAW).digest()}")
+# Read file
+with open(MODEL_PATH, "rb") as f:
+    full_data = f.read()
+filesize = len(full_data)
+header_offset = struct.unpack_from("<I", full_data, 0)[0]  # 22636
+payload_size = struct.unpack_from("<Q", full_data, header_offset + 8)[0]  # 58431147
+payload_start = header_offset + 16  # 22652
+print(f"\nFile size: {filesize:,}")
+print(f"Header offset: {header_offset}")
+print(f"Payload size: {payload_size:,}")
+print(f"Payload start: {payload_start}")
+# ═══════════════════════════════════════════════════════════════
+# Test 1: Try standard combinations with BCrypt API
+# ═══════════════════════════════════════════════════════════════
+print("\n" + "=" * 80)
+print("TEST 1: Standard combinations via BCrypt CFB128")
+print("=" * 80)
+iv_zero = b"\x00" * 16
+iv_candidates = {
+    "zeros": iv_zero,
+    "file[8:24]": full_data[8:24],
+    "file[4:20]": full_data[4:20],
+    "file[0:16]": full_data[0:16],
+    f"file[{header_offset}:{header_offset+16}]": full_data[header_offset:header_offset+16],
+    f"file[{payload_start}:{payload_start+16}]": full_data[payload_start:payload_start+16],
+    "SHA256(key)[:16]": KEY_SHA256[:16],
+    "SHA256(key)[16:]": KEY_SHA256[16:],
+    "key_raw[:16]": KEY_RAW[:16],
+    "key_raw[16:]": KEY_RAW[16:],
+}
+key_candidates = {
+    "raw": KEY_RAW,
+    "SHA256": KEY_SHA256,
+}
+data_regions = {
+    "header[8:]": full_data[8:8+4096],
+    f"payload[{payload_start}:]": full_data[payload_start:payload_start+4096],
+}
+for mbl in [16, 1]:  # CFB128 first (most likely), then CFB8
+    for key_name, key in key_candidates.items():
+        for iv_name, iv in iv_candidates.items():
+            for region_name, region_data in data_regions.items():
+                label = f"CFB{'128' if mbl == 16 else '8'} key={key_name} iv={iv_name} data={region_name}"
+                try:
+                    dec = bcrypt_aes_cfb_decrypt(region_data, key, iv, mbl)
+                    if check_decrypted(dec, label):
+                        pass  # Already printed
+                except Exception as e:
+                    pass  # Silently skip errors
+# ═══════════════════════════════════════════════════════════════
+# Test 2: Known-plaintext IV search
+# ═══════════════════════════════════════════════════════════════
+print("\n" + "=" * 80)
+print("TEST 2: Known-plaintext IV search (magic_number=1)")
+print("=" * 80)
+print("  Searching for IV that produces magic_number=1 (0x01000000) at start...")
+# For AES-CFB128, first block:
+# plaintext[0:16] = AES_ECB_encrypt(IV, key) XOR ciphertext[0:16]
+# We want plaintext[0:4] = 01 00 00 00 (LE)
+# So: AES_ECB_encrypt(IV, key)[0:4] = ciphertext[0:4] XOR 01 00 00 00
+# We can't easily predict AES output, so we try each IV candidate
+# Try every 4-byte aligned position in header as IV, with both key candidates
+found = False
+for key_name, key in key_candidates.items():
+    for mbl in [16, 1]:
+        # Try IV from file at every 4-byte step in the first 22700 bytes
+        for iv_offset in range(0, min(22700, filesize - 16), 4):
+            iv = full_data[iv_offset:iv_offset + 16]
+            # Try decrypting header encrypted data (byte 8+)
+            ct = full_data[8:24]  # Just decrypt first 16 bytes
+            try:
+                dec = bcrypt_aes_cfb_decrypt(ct, key, iv, mbl)
+                u32 = struct.unpack_from("<I", dec, 0)[0]
+                if u32 == 1:
+                    print(f"\n  ★★★ FOUND! magic_number=1 with iv_offset={iv_offset}, key={key_name}, CFB{'128' if mbl==16 else '8'}")
+                    print(f"    IV: {iv.hex()}")
+                    print(f"    Decrypted first 16 bytes: {dec[:16].hex()}")
+                    # Decrypt more data
+                    dec_full = bcrypt_aes_cfb_decrypt(full_data[8:8+4096], key, iv, mbl)
+                    check_decrypted(dec_full, f"FULL header with iv_offset={iv_offset}")
+                    found = True
+            except:
+                pass
+            # Try decrypting payload data
+            ct2 = full_data[payload_start:payload_start+16]
+            try:
+                dec2 = bcrypt_aes_cfb_decrypt(ct2, key, iv, mbl)
+                u32_2 = struct.unpack_from("<I", dec2, 0)[0]
+                if u32_2 == 1:
+                    print(f"\n  ★★★ FOUND! magic_number=1 with iv_offset={iv_offset}, key={key_name}, CFB{'128' if mbl==16 else '8'}")
+                    print(f"    IV: {iv.hex()}")
+                    print(f"    Decrypted first 16 bytes: {dec2[:16].hex()}")
+                    # Decrypt more data
+                    dec_full2 = bcrypt_aes_cfb_decrypt(full_data[payload_start:payload_start+4096], key, iv, mbl)
+                    check_decrypted(dec_full2, f"FULL payload with iv_offset={iv_offset}")
+                    found = True
+            except:
+                pass
+        if found:
+            break
+    if found:
+        break
+if not found:
+    print("  No IV found in file that produces magic_number=1")
+# ═══════════════════════════════════════════════════════════════
+# Test 3: Try derived IVs not from file
+# ═══════════════════════════════════════════════════════════════
+print("\n" + "=" * 80)
+print("TEST 3: Derived IV strategies via BCrypt")
+print("=" * 80)
+derived_ivs = {
+    "zeros": b"\x00" * 16,
+    "SHA256(key)[:16]": KEY_SHA256[:16],
+    "SHA256(key)[16:]": KEY_SHA256[16:],
+    "key[:16]": KEY_RAW[:16],
+    "key[16:]": KEY_RAW[16:],
+    "SHA256('')[:16]": hashlib.sha256(b"").digest()[:16],
+    "SHA256('\\0')[:16]": hashlib.sha256(b"\x00").digest()[:16],
+    "MD5(key)": hashlib.md5(KEY_RAW).digest(),
+    "SHA256('oneocr')[:16]": hashlib.sha256(b"oneocr").digest()[:16],
+    "SHA256(key+\\0)[:16]": hashlib.sha256(KEY_RAW + b"\x00").digest()[:16],
+    "SHA256(key_reversed)[:16]": hashlib.sha256(KEY_RAW[::-1]).digest()[:16],
+    "key XOR 0x36 [:16]": bytes(b ^ 0x36 for b in KEY_RAW[:16]),  # HMAC ipad
+    "key XOR 0x5c [:16]": bytes(b ^ 0x5c for b in KEY_RAW[:16]),  # HMAC opad
+}
+for iv_name, iv in derived_ivs.items():
+    for key_name, key in key_candidates.items():
+        for mbl in [16, 1]:
+            for region_name, ct in [("header[8:]", full_data[8:8+4096]),
+                                     (f"payload", full_data[payload_start:payload_start+4096])]:
+                try:
+                    dec = bcrypt_aes_cfb_decrypt(ct, key, iv, mbl)
+                    label = f"CFB{'128' if mbl==16 else '8'} key={key_name} iv={iv_name} data={region_name}"
+                    check_decrypted(dec, label)
+                except:
+                    pass
+# ═══════════════════════════════════════════════════════════════
+# Test 4: What if entire file from byte 0 is encrypted?
+# ═══════════════════════════════════════════════════════════════
+print("\n" + "=" * 80)
+print("TEST 4: Entire file encrypted from byte 0")
+print("=" * 80)
+for key_name, key in key_candidates.items():
+    for mbl in [16, 1]:
+        for iv_name, iv in [("zeros", iv_zero), ("SHA256(key)[:16]", KEY_SHA256[:16]),
+                             ("key[:16]", KEY_RAW[:16])]:
+            try:
+                dec = bcrypt_aes_cfb_decrypt(full_data[:4096], key, iv, mbl)
+                label = f"CFB{'128' if mbl==16 else '8'} key={key_name} iv={iv_name} data=file[0:]"
+                check_decrypted(dec, label)
+            except:
+                pass
+# ═══════════════════════════════════════════════════════════════
+# Test 5: Decrypt with IV prepended to ciphertext in file
+# ═══════════════════════════════════════════════════════════════
+print("\n" + "=" * 80)
+print("TEST 5: IV prepended to ciphertext at various offsets")
+print("=" * 80)
+for data_start in [0, 4, 8, 16, 24, header_offset, payload_start]:
+    iv_test = full_data[data_start:data_start+16]
+    ct_test = full_data[data_start+16:data_start+16+4096]
+    for key_name, key in key_candidates.items():
+        for mbl in [16, 1]:
+            try:
+                dec = bcrypt_aes_cfb_decrypt(ct_test, key, iv_test, mbl)
+                label = f"CFB{'128' if mbl==16 else '8'} key={key_name} IV=file[{data_start}:{data_start+16}] CT=file[{data_start+16}:]"
+                check_decrypted(dec, label)
+            except:
+                pass
+print("\n" + "=" * 80)
+print("DONE")
+print("=" * 80)

_archive/attempts/create_test_image.py ADDED Viewed

	@@ -0,0 +1,21 @@

+"""Create test image with text 'ONE OCR DZIAŁA!' for OCR testing."""
+from PIL import Image, ImageDraw, ImageFont
+# Create white image
+img = Image.new("RGB", (600, 150), color="white")
+draw = ImageDraw.Draw(img)
+# Try to use a good font, fallback to default
+try:
+    font = ImageFont.truetype("arial.ttf", 48)
+except OSError:
+    try:
+        font = ImageFont.truetype("C:/Windows/Fonts/arial.ttf", 48)
+    except OSError:
+        font = ImageFont.load_default()
+# Draw black text
+draw.text((30, 40), "ONE OCR DZIALA!", fill="black", font=font)
+img.save("image.png")
+print("Created image.png with text 'ONE OCR DZIALA!'")

_archive/attempts/decrypt_model.py ADDED Viewed

	@@ -0,0 +1,338 @@

+"""
+OneOCR Model Extraction via Runtime Memory Dump.
+Strategy: Load the OCR pipeline (which decrypts the model internally),
+then scan our own process memory for ONNX/protobuf patterns and dump them.
+Since oneocr.dll decrypts and decompresses models into memory during
+CreateOcrPipeline, we can capture them by scanning process memory.
+"""
+import ctypes
+import ctypes.wintypes as wintypes
+import struct
+import os
+import sys
+import time
+from pathlib import Path
+from collections import Counter
+import math
+# ═══════════════════════════════════════════════════════════════
+# Constants
+# ═══════════════════════════════════════════════════════════════
+OCR_DATA_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data")
+DLL_PATH = str(OCR_DATA_DIR / "oneocr.dll")
+ORT_DLL_PATH = str(OCR_DATA_DIR / "onnxruntime.dll")
+MODEL_PATH = str(OCR_DATA_DIR / "oneocr.onemodel")
+KEY = 'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\extracted_models")
+# ═══════════════════════════════════════════════════════════════
+# Windows API
+# ═══════════════════════════════════════════════════════════════
+kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+class MEMORY_BASIC_INFORMATION(ctypes.Structure):
+    _fields_ = [
+        ("BaseAddress", ctypes.c_void_p),
+        ("AllocationBase", ctypes.c_void_p),
+        ("AllocationProtect", wintypes.DWORD),
+        ("RegionSize", ctypes.c_size_t),
+        ("State", wintypes.DWORD),
+        ("Protect", wintypes.DWORD),
+        ("Type", wintypes.DWORD),
+    ]
+MEM_COMMIT = 0x1000
+PAGE_NOACCESS = 0x01
+PAGE_GUARD = 0x100
+def entropy(data: bytes) -> float:
+    if not data:
+        return 0.0
+    freq = Counter(data)
+    total = len(data)
+    return -sum((c / total) * math.log2(c / total) for c in freq.values())
+def scan_memory_regions():
+    """Enumerate all committed, readable memory regions."""
+    regions = []
+    handle = kernel32.GetCurrentProcess()
+    mbi = MEMORY_BASIC_INFORMATION()
+    address = 0
+    max_addr = (1 << 47) - 1
+    while address < max_addr:
+        result = kernel32.VirtualQuery(
+            ctypes.c_void_p(address),
+            ctypes.byref(mbi),
+            ctypes.sizeof(mbi)
+        )
+        if result == 0:
+            break
+        base_addr = mbi.BaseAddress or 0
+        region_size = mbi.RegionSize or 0
+        if region_size == 0:
+            break
+        if (mbi.State == MEM_COMMIT and
+            mbi.Protect not in (0, PAGE_NOACCESS, PAGE_GUARD) and
+            not (mbi.Protect & PAGE_GUARD)):
+            regions.append((base_addr, region_size))
+        new_address = base_addr + region_size
+        if new_address <= address:
+            break
+        address = new_address
+    return regions
+def read_mem(address, size):
+    """Read memory from current process - direct access since it's our own memory."""
+    try:
+        return ctypes.string_at(address, size)
+    except Exception:
+        # Fallback to ReadProcessMemory
+        try:
+            buf = (ctypes.c_ubyte * size)()
+            n = ctypes.c_size_t(0)
+            handle = kernel32.GetCurrentProcess()
+            ok = kernel32.ReadProcessMemory(
+                handle, ctypes.c_void_p(address), buf, size, ctypes.byref(n)
+            )
+            if ok and n.value > 0:
+                return bytes(buf[:n.value])
+        except Exception:
+            pass
+        return None
+# ═══════════════════════════════════════════════════════════════
+# Step 1: Snapshot BEFORE loading OCR
+# ═══════════════════════════════════════════════════════════════
+print("=" * 80)
+print("OneOCR Model Extraction via Runtime Memory Dump")
+print("=" * 80)
+print("\n[1/5] Memory snapshot BEFORE OCR load...")
+before = set()
+before_data = {}
+for base, size in scan_memory_regions():
+    before.add(base)
+    # Store hash of small regions for change detection
+    if size <= 65536:
+        d = read_mem(base, size)
+        if d:
+            before_data[base] = hash(d)
+print(f"  {len(before)} regions before")
+# ═════════════════════════════════════════════════��═════════════
+# Step 2: Load DLLs
+# ═══════════════════════════════════════════════════════════════
+print("\n[2/5] Loading DLLs...")
+os.add_dll_directory(str(OCR_DATA_DIR))
+os.environ["PATH"] = str(OCR_DATA_DIR) + ";" + os.environ.get("PATH", "")
+ort_dll = ctypes.WinDLL(ORT_DLL_PATH)
+print(f"  OK: onnxruntime.dll")
+ocr_dll = ctypes.WinDLL(DLL_PATH)
+print(f"  OK: oneocr.dll")
+# ═══════════════════════════════════════════════════════════════
+# Step 3: Init OCR pipeline (triggers decryption)
+# ═══════════════════════════════════════════════════════════════
+print("\n[3/5] Creating OCR pipeline (decrypts model)...")
+CreateOcrInitOptions = ocr_dll.CreateOcrInitOptions
+CreateOcrInitOptions.restype = ctypes.c_int64
+CreateOcrInitOptions.argtypes = [ctypes.POINTER(ctypes.c_int64)]
+OcrInitOptionsSetUseModelDelayLoad = ocr_dll.OcrInitOptionsSetUseModelDelayLoad
+OcrInitOptionsSetUseModelDelayLoad.restype = ctypes.c_int64
+OcrInitOptionsSetUseModelDelayLoad.argtypes = [ctypes.c_int64, ctypes.c_char]
+CreateOcrPipeline = ocr_dll.CreateOcrPipeline
+CreateOcrPipeline.restype = ctypes.c_int64
+CreateOcrPipeline.argtypes = [ctypes.c_int64, ctypes.c_int64, ctypes.c_int64, ctypes.POINTER(ctypes.c_int64)]
+ctx = ctypes.c_int64(0)
+res = CreateOcrInitOptions(ctypes.byref(ctx))
+assert res == 0, f"CreateOcrInitOptions failed: {res}"
+# Disable delay load → load ALL models immediately
+res = OcrInitOptionsSetUseModelDelayLoad(ctx, ctypes.c_char(0))
+assert res == 0, f"SetUseModelDelayLoad failed: {res}"
+model_path_c = ctypes.c_char_p(MODEL_PATH.encode("utf-8"))
+key_c = ctypes.c_char_p(KEY.encode("utf-8"))
+pipeline = ctypes.c_int64(0)
+res = CreateOcrPipeline(
+    ctypes.cast(model_path_c, ctypes.c_void_p).value,
+    ctypes.cast(key_c, ctypes.c_void_p).value,
+    ctx.value,
+    ctypes.byref(pipeline)
+)
+if res != 0:
+    print(f"  ERROR: CreateOcrPipeline returned {res}")
+    sys.exit(1)
+print(f"  Pipeline OK! handle=0x{pipeline.value:x}")
+time.sleep(0.5)
+# ═══════════════════════════════════════════════════════════════
+# Step 4: Find new/changed memory regions & search for ONNX
+# ═══════════════════════════════════════════════════════════════
+print("\n[4/5] Scanning process memory for ONNX models...")
+OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+after_regions = scan_memory_regions()
+new_regions = [(b, s) for b, s in after_regions if b not in before]
+print(f"  Total regions after: {len(after_regions)}")
+print(f"  New regions: {len(new_regions)}")
+# Size distribution of new regions
+new_large = [(b, s) for b, s in new_regions if s >= 1024*1024]
+new_total = sum(s for _, s in new_regions)
+print(f"  New large regions (>1MB): {len(new_large)}")
+print(f"  Total new memory: {new_total/1024/1024:.1f} MB")
+found = []
+# ONNX protobuf field patterns for start of file
+# ir_version (field 1, varint) followed by opset_import (field 2, len-delimited)
+# or producer_name (field 2, len-delimited) etc.
+# Search patterns
+PATTERNS = [
+    b"\x08\x07\x12",     # ir_v=7, then field 2
+    b"\x08\x08\x12",     # ir_v=8
+    b"\x08\x06\x12",     # ir_v=6
+    b"\x08\x05\x12",     # ir_v=5
+    b"\x08\x04\x12",     # ir_v=4
+    b"\x08\x03\x12",     # ir_v=3
+    b"\x08\x09\x12",     # ir_v=9
+    b"ORTM",             # ORT model format
+    b"ONNX",             # Just in case
+    b"\x08\x07\x1a",     # ir_v=7, field 3
+    b"\x08\x08\x1a",     # ir_v=8, field 3
+]
+# Scan ALL new large regions
+for ridx, (base, size) in enumerate(sorted(new_regions, key=lambda x: x[1], reverse=True)):
+    if size < 4096:
+        continue
+    read_size = min(size, 200 * 1024 * 1024)
+    data = read_mem(base, read_size)
+    if not data:
+        continue
+    # Check entropy of first 4KB
+    ent = entropy(data[:4096])
+    uniq = len(set(data[:4096]))
+    if size >= 100000:
+        # Log large regions regardless
+        print(f"  Region 0x{base:x} size={size:,} ent={ent:.2f} uniq={uniq}/256 first={data[:16].hex()}")
+    # Search for patterns
+    for pattern in PATTERNS:
+        offset = 0
+        while True:
+            idx = data.find(pattern, offset)
+            if idx < 0:
+                break
+            # Validate: check surrounding context
+            chunk = data[idx:idx+min(4096, len(data)-idx)]
+            chunk_ent = entropy(chunk[:1024]) if len(chunk) >= 1024 else entropy(chunk)
+            # Valid models should have moderate entropy (not encrypted high-entropy)
+            if chunk_ent < 7.5 and len(chunk) > 64:
+                addr = base + idx
+                remaining = len(data) - idx
+                found.append({
+                    "addr": addr,
+                    "base": base,
+                    "offset": idx,
+                    "size": remaining,
+                    "pattern": pattern.hex(),
+                    "ent": chunk_ent,
+                    "first_32": data[idx:idx+32].hex(),
+                })
+                print(f"    ★ ONNX candidate at 0x{addr:x}: pattern={pattern.hex()} "
+                      f"ent={chunk_ent:.2f} remaining={remaining:,}")
+                print(f"      First 32: {data[idx:idx+32].hex()}")
+            offset = idx + len(pattern)
+print(f"\n  Found {len(found)} ONNX candidates total")
+# ═══════════════════════════════════════════════════════════════
+# Step 5: Dump candidates
+# ═══════════════════════════════════════════════════════════════
+print("\n[5/5] Dumping models...")
+if found:
+    # Deduplicate by address
+    seen = set()
+    for i, m in enumerate(found):
+        if m["addr"] in seen:
+            continue
+        seen.add(m["addr"])
+        dump_size = min(m["size"], 100 * 1024 * 1024)
+        data = read_mem(m["addr"], dump_size)
+        if data:
+            fname = f"onnx_{i}_0x{m['addr']:x}_{dump_size//1024}KB.bin"
+            out = OUTPUT_DIR / fname
+            with open(out, "wb") as f:
+                f.write(data)
+            print(f"  Saved: {fname} ({len(data):,} bytes)")
+else:
+    print("  No ONNX patterns found. Dumping ALL large new regions (>1MB)...")
+    for i, (base, size) in enumerate(new_large):
+        data = read_mem(base, min(size, 200*1024*1024))
+        if data:
+            ent = entropy(data[:4096])
+            fname = f"region_{i}_0x{base:x}_{size//1024//1024}MB_ent{ent:.1f}.bin"
+            out = OUTPUT_DIR / fname
+            with open(out, "wb") as f:
+                f.write(data)
+            print(f"  Saved: {fname} ({len(data):,} bytes, ent={ent:.2f})")
+# Summary
+print("\n" + "=" * 80)
+print("RESULTS")
+print("=" * 80)
+if OUTPUT_DIR.exists():
+    files = sorted(OUTPUT_DIR.iterdir())
+    if files:
+        total_size = sum(f.stat().st_size for f in files)
+        print(f"\nExtracted {len(files)} files ({total_size/1024/1024:.1f} MB):")
+        for f in files:
+            sz = f.stat().st_size
+            # Quick check if it's ONNX
+            with open(f, "rb") as fh:
+                header = fh.read(32)
+            print(f"  {f.name}: {sz:,} bytes | first_16={header[:16].hex()}")
+    else:
+        print("\nNo files extracted.")

_archive/attempts/decrypt_with_static_iv.py ADDED Viewed

	@@ -0,0 +1,302 @@

+"""
+Extract the static IV string from DLL and find how key derivation works.
+Key findings from disassembly:
+1. Static 30-byte string at RVA 0x02725C60 used as IV (truncated to 16)
+2. SHA256(combined) used as AES key material
+3. Combined = some_function(key_string, iv_from_data, flag)
+4. Function at 0x18006c3d0 combines key + iv_prefix
+Need to:
+a) Read the static IV string
+b) Disassemble function 0x18006c3d0 to understand combination
+c) Try decryption
+"""
+import struct, hashlib
+from capstone import Cs, CS_ARCH_X86, CS_MODE_64
+DLL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+MODEL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.onemodel"
+with open(DLL_PATH, "rb") as f:
+    dll = f.read()
+with open(MODEL_PATH, "rb") as f:
+    model = f.read()
+# Parse PE sections for RVA → file offset mapping
+e_lfanew = struct.unpack_from('<I', dll, 0x3c)[0]
+num_sections = struct.unpack_from('<H', dll, e_lfanew + 6)[0]
+opt_size = struct.unpack_from('<H', dll, e_lfanew + 20)[0]
+sections_off = e_lfanew + 24 + opt_size
+sections = []
+for i in range(num_sections):
+    so = sections_off + i * 40
+    name = dll[so:so+8].rstrip(b'\x00').decode('ascii', errors='replace')
+    vsize = struct.unpack_from('<I', dll, so + 8)[0]
+    vrva = struct.unpack_from('<I', dll, so + 12)[0]
+    rawsize = struct.unpack_from('<I', dll, so + 16)[0]
+    rawoff = struct.unpack_from('<I', dll, so + 20)[0]
+    sections.append((name, vrva, vsize, rawoff, rawsize))
+    print(f"Section {name}: RVA=0x{vrva:08x} VSize=0x{vsize:08x} Raw=0x{rawoff:08x} RawSize=0x{rawsize:08x}")
+def rva_to_foff(rva):
+    for name, vrva, vsize, rawoff, rawsize in sections:
+        if vrva <= rva < vrva + rawsize:
+            return rawoff + (rva - vrva)
+    return None
+IMAGE_BASE = 0x180000000
+TEXT_VA = 0x1000
+TEXT_FILE_OFFSET = 0x400
+def text_rva_to_file(rva):
+    return rva - TEXT_VA + TEXT_FILE_OFFSET
+# 1. Read the static 30-byte string at RVA 0x02725C60
+# The LEA instruction was at RVA 0x0015baac:
+# lea rdx, [rip + 0x25ca1ad]
+# RIP = 0x0015baac + 7 = 0x0015bab3
+# Target RVA = 0x0015bab3 + 0x25ca1ad = ?
+target_rva = 0x0015bab3 + 0x25ca1ad
+print(f"\nStatic IV string RVA: 0x{target_rva:08x}")
+foff = rva_to_foff(target_rva)
+print(f"File offset: 0x{foff:08x}" if foff else "NOT FOUND")
+if foff:
+    static_iv_30 = dll[foff:foff+30]
+    print(f"Static IV (30 bytes): {static_iv_30.hex()}")
+    print(f"Static IV (30 chars): {static_iv_30}")
+    static_iv_16 = static_iv_30[:16]
+    print(f"Static IV truncated to 16: {static_iv_16.hex()}")
+    print(f"Static IV truncated (repr): {static_iv_16}")
+# Also check nearby strings for context
+if foff:
+    print(f"\nContext around static IV string (foff-16 to foff+48):")
+    for i in range(-16, 64):
+        c = dll[foff+i]
+        print(f"  +{i:3d}: 0x{c:02x} ({chr(c) if 32 <= c < 127 else '.'})")
+# 2. Read the first 16 bytes of encrypted data (the "prefix" extracted before key derivation)
+header_offset = struct.unpack_from('<Q', model, 0)[0]
+prefix_16 = model[8:24]
+print(f"\nData prefix (first 16 bytes after offset): {prefix_16.hex()}")
+print(f"Data prefix repr: {prefix_16}")
+# 3. Disassemble the key combination function at 0x18006c3d0
+# RVA = 0x0006c3d0
+combo_rva = 0x0006c3d0
+combo_foff = rva_to_foff(combo_rva)
+print(f"\nKey combination function RVA: 0x{combo_rva:08x}, file: 0x{combo_foff:08x}" if combo_foff else "NOT FOUND")
+md = Cs(CS_ARCH_X86, CS_MODE_64)
+md.detail = False
+if combo_foff:
+    code = dll[combo_foff:combo_foff + 0x200]
+    print(f"\n{'='*100}")
+    print(f"Key combination function at RVA 0x{combo_rva:08x}")
+    print(f"{'='*100}")
+    for insn in md.disasm(code, IMAGE_BASE + combo_rva):
+        foff2 = rva_to_foff(insn.address - IMAGE_BASE)
+        line = f"  {insn.address - IMAGE_BASE:08x} ({foff2:08x}): {insn.bytes.hex():<40s} {insn.mnemonic:<10s} {insn.op_str}"
+        if insn.mnemonic == 'ret':
+            print(line)
+            break
+        print(line)
+# 4. Now let's also check what string is compared when key is empty
+# At 0x0015b88d: lea rdx, [rip + 0x25b59db]
+# RIP = 0x0015b88d + 7 = 0x0015b894
+# Target = 0x0015b894 + 0x25b59db = ?
+empty_key_str_rva = 0x0015b894 + 0x25b59db
+empty_foff = rva_to_foff(empty_key_str_rva)
+if empty_foff:
+    s = dll[empty_foff:empty_foff+64].split(b'\x00')[0]
+    print(f"\nDefault key comparison string: {s}")
+# 5. Try decryption with various key derivation methods
+print("\n" + "="*100)
+print("DECRYPTION ATTEMPTS WITH STATIC IV AND DERIVED KEY")
+print("="*100)
+KEY_RAW = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# Encrypted header: model[8 : header_offset]
+enc_header = model[8:header_offset]
+# The prefix/IV: first 16 bytes
+data_prefix = enc_header[:16]
+# The ciphertext: remaining bytes
+ciphertext = enc_header[16:]
+MAGIC = 0x252b081a4a
+from Crypto.Cipher import AES
+def try_dec(aes_key, iv, ct, label):
+    """Try AES-256-CFB128 decryption."""
+    try:
+        cipher = AES.new(aes_key, AES.MODE_CFB, iv=iv, segment_size=128)
+        pt = cipher.decrypt(ct[:256])
+        if len(pt) >= 24:
+            magic = struct.unpack_from('<Q', pt, 0x10)[0]
+            if magic == MAGIC:
+                print(f"  *** SUCCESS *** {label}")
+                print(f"  First 64 bytes: {pt[:64].hex()}")
+                return True
+            else:
+                unique = len(set(pt[:64]))
+                # Only print if somewhat promising
+                if unique < 45 or magic & 0xFF == 0x4a:
+                    print(f"  {label}: magic=0x{magic:016x}, unique_64={unique}")
+    except Exception as e:
+        print(f"  {label}: ERROR {e}")
+    return False
+if foff:
+    iv = static_iv_16
+    # Try 1: SHA256(key + data_prefix)
+    combined1 = KEY_RAW + data_prefix
+    aes_key1 = hashlib.sha256(combined1).digest()
+    try_dec(aes_key1, iv, ciphertext, "SHA256(key + data_prefix)")
+    # Try 2: SHA256(data_prefix + key)
+    combined2 = data_prefix + KEY_RAW
+    aes_key2 = hashlib.sha256(combined2).digest()
+    try_dec(aes_key2, iv, ciphertext, "SHA256(data_prefix + key)")
+    # Try 3: SHA256(key) with static IV
+    aes_key3 = hashlib.sha256(KEY_RAW).digest()
+    try_dec(aes_key3, iv, ciphertext, "SHA256(key) + static_iv")
+    # Try 4: Raw key with static IV
+    try_dec(KEY_RAW, iv, ciphertext, "raw_key + static_iv")
+    # Try 5: SHA256(key + data_prefix) on full enc_header (no prefix removal)
+    try_dec(aes_key1, iv, enc_header, "SHA256(key+prefix) + full_header")
+    try_dec(aes_key2, iv, enc_header, "SHA256(prefix+key) + full_header")
+    # Try 6: Maybe prefix is NOT stripped from ciphertext for BCrypt
+    try_dec(aes_key3, iv, enc_header, "SHA256(key) + static_iv + full_header")
+    try_dec(KEY_RAW, iv, enc_header, "raw_key + static_iv + full_header")
+    # Also try the full static_iv_30 string as both key and IV source
+    # Maybe the static string IS the key, and data_prefix IS the IV
+    try_dec(hashlib.sha256(static_iv_30).digest(), data_prefix, ciphertext, "SHA256(static30) + data_prefix_iv")
+    # What if key derivation involves the static string too?
+    # SHA256(key + static_string)
+    combined3 = KEY_RAW + static_iv_30
+    aes_key6 = hashlib.sha256(combined3).digest()
+    try_dec(aes_key6, data_prefix, ciphertext, "SHA256(key + static30) + prefix_iv")
+    try_dec(aes_key6, iv, ciphertext, "SHA256(key + static30) + static_iv")
+    # What if the function combines key with static string, and data_prefix is IV?
+    # Try many concatenation variants
+    variants = [
+        (KEY_RAW + data_prefix, iv, ciphertext, "key||prefix"),
+        (data_prefix + KEY_RAW, iv, ciphertext, "prefix||key"),
+        (KEY_RAW + static_iv_16, iv, ciphertext, "key||static16"),
+        (KEY_RAW + static_iv_30, iv, ciphertext, "key||static30"),
+        (static_iv_16 + KEY_RAW, iv, ciphertext, "static16||key"),
+        (static_iv_30 + KEY_RAW, iv, ciphertext, "static30||key"),
+        (KEY_RAW + data_prefix, data_prefix, ciphertext, "key||prefix, iv=prefix"),
+        (data_prefix + KEY_RAW, data_prefix, ciphertext, "prefix||key, iv=prefix"),
+    ]
+    for combo, iv_used, ct, desc in variants:
+        aes_key = hashlib.sha256(combo).digest()
+        try_dec(aes_key, iv_used, ct, f"SHA256({desc})")
+    # Maybe the function at 0x06c3d0 does something more complex
+    # Let's also try: the "combined" is just the key (no IV involvement),
+    # and the function just copies/formats the key
+    # With different IV sources
+    # Try with BCrypt API directly
+    print("\n--- BCrypt API tests with static IV ---")
+    import ctypes
+    bcrypt = ctypes.windll.bcrypt
+    def bcrypt_dec(key_bytes, iv_bytes, ct_bytes, label):
+        hAlg = ctypes.c_void_p()
+        status = bcrypt.BCryptOpenAlgorithmProvider(ctypes.byref(hAlg), "AES", None, 0)
+        if status != 0:
+            print(f"  {label}: OpenAlg failed {status}")
+            return None
+        mode = "ChainingModeCFB".encode('utf-16-le') + b'\x00\x00'
+        bcrypt.BCryptSetProperty(hAlg, "ChainingMode", mode, len(mode), 0)
+        block_len = ctypes.c_ulong(16)
+        bcrypt.BCryptSetProperty(hAlg, "MessageBlockLength",
+                                ctypes.byref(block_len), 4, 0)
+        hKey = ctypes.c_void_p()
+        kb = (ctypes.c_byte * len(key_bytes))(*key_bytes)
+        status = bcrypt.BCryptGenerateSymmetricKey(
+            hAlg, ctypes.byref(hKey), None, 0, kb, len(key_bytes), 0)
+        if status != 0:
+            bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+            print(f"  {label}: GenKey failed {status}")
+            return None
+        ct_buf = (ctypes.c_byte * len(ct_bytes))(*ct_bytes)
+        iv_buf = (ctypes.c_byte * len(iv_bytes))(*iv_bytes)
+        out_size = ctypes.c_ulong(0)
+        status = bcrypt.BCryptDecrypt(
+            hKey, ct_buf, len(ct_bytes), None,
+            iv_buf, len(iv_bytes), None, 0,
+            ctypes.byref(out_size), 0)
+        if status != 0:
+            bcrypt.BCryptDestroyKey(hKey)
+            bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+            print(f"  {label}: Decrypt size query failed {status:#x}")
+            return None
+        pt_buf = (ctypes.c_byte * out_size.value)()
+        iv_buf2 = (ctypes.c_byte * len(iv_bytes))(*iv_bytes)
+        result = ctypes.c_ulong(0)
+        status = bcrypt.BCryptDecrypt(
+            hKey, ct_buf, len(ct_bytes), None,
+            iv_buf2, len(iv_bytes), pt_buf, out_size.value,
+            ctypes.byref(result), 0)
+        bcrypt.BCryptDestroyKey(hKey)
+        bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+        if status != 0:
+            print(f"  {label}: Decrypt failed {status:#x}")
+            return None
+        pt = bytes(pt_buf[:result.value])
+        if len(pt) >= 24:
+            magic = struct.unpack_from('<Q', pt, 0x10)[0]
+            if magic == MAGIC:
+                print(f"  *** BCrypt SUCCESS *** {label}")
+                print(f"  First 64: {pt[:64].hex()}")
+                return pt
+        return pt
+    # BCrypt tests with various key derivations
+    for combo_data, desc in [
+        (KEY_RAW, "raw_key"),
+        (hashlib.sha256(KEY_RAW).digest(), "SHA256(key)"),
+        (hashlib.sha256(KEY_RAW + data_prefix).digest(), "SHA256(key+prefix)"),
+        (hashlib.sha256(data_prefix + KEY_RAW).digest(), "SHA256(prefix+key)"),
+    ]:
+        for iv_data, iv_desc in [(iv, "static16"), (data_prefix, "data_prefix")]:
+            for ct_data, ct_desc in [(ciphertext, "ct_no_prefix"), (enc_header, "full_header")]:
+                result = bcrypt_dec(combo_data, iv_data, ct_data[:512],
+                                   f"key={desc}, iv={iv_desc}, ct={ct_desc}")
+                if result:
+                    magic = struct.unpack_from('<Q', result, 0x10)[0] if len(result) >= 24 else 0
+                    if magic == MAGIC:
+                        print("FOUND THE CORRECT PARAMETERS!")
+print("\nDone.")

_archive/attempts/disasm_bcrypt_calls.py ADDED Viewed

	@@ -0,0 +1,143 @@

+"""
+Disassemble the actual BCrypt crypto operations at 0x18015ba45+
+and map all indirect calls to IAT entries.
+"""
+import struct
+from capstone import Cs, CS_ARCH_X86, CS_MODE_64
+DLL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+IMAGE_BASE = 0x180000000
+TEXT_VA = 0x1000
+TEXT_FILE_OFFSET = 0x400
+def rva_to_file(rva):
+    return rva - TEXT_VA + TEXT_FILE_OFFSET
+def file_to_rva(foff):
+    return foff - TEXT_FILE_OFFSET + TEXT_VA
+with open(DLL_PATH, "rb") as f:
+    dll_data = f.read()
+md = Cs(CS_ARCH_X86, CS_MODE_64)
+md.detail = False
+def disasm_region(name, file_start, file_end):
+    rva_start = file_to_rva(file_start)
+    va_start = IMAGE_BASE + rva_start
+    code = dll_data[file_start:file_end]
+    print(f"\n{'='*100}")
+    print(f"{name}")
+    print(f"File: 0x{file_start:08x}-0x{file_end:08x}, RVA: 0x{rva_start:08x}")
+    print(f"{'='*100}")
+    for insn in md.disasm(code, va_start):
+        foff = rva_to_file(insn.address - IMAGE_BASE)
+        line = f"  {insn.address - IMAGE_BASE:08x} ({foff:08x}): {insn.bytes.hex():<40s} {insn.mnemonic:<14s} {insn.op_str}"
+        # Annotate indirect calls
+        if insn.mnemonic == 'call' and insn.bytes[0] == 0xFF and insn.bytes[1] == 0x15:
+            disp = struct.unpack_from('<i', bytes(insn.bytes), 2)[0]
+            target_rva = (insn.address - IMAGE_BASE) + insn.size + disp
+            line += f"  ; IAT@0x{target_rva:08x}"
+        print(line)
+# First, let's identify ALL BCrypt IAT entries
+# From previous analysis:
+# BCryptOpenAlgorithmProvider → IAT 0x0081a5e0
+# BCryptGetProperty → IAT 0x0081a5d0
+# BCryptSetProperty → IAT 0x0081a608
+# Let's find the rest by looking at the import section
+# Parse PE to find BCrypt imports
+print("="*100)
+print("FINDING ALL BCRYPT IAT ENTRIES")
+print("="*100)
+# Parse PE headers
+e_lfanew = struct.unpack_from('<I', dll_data, 0x3c)[0]
+opt_hdr_off = e_lfanew + 24
+import_dir_rva = struct.unpack_from('<I', dll_data, opt_hdr_off + 120)[0]  # Import RVA
+import_dir_size = struct.unpack_from('<I', dll_data, opt_hdr_off + 124)[0]
+# Find sections for RVA to file offset mapping
+num_sections = struct.unpack_from('<H', dll_data, e_lfanew + 6)[0]
+sections_off = e_lfanew + 24 + struct.unpack_from('<H', dll_data, e_lfanew + 20)[0]
+sections = []
+for i in range(num_sections):
+    sec_off = sections_off + i * 40
+    name = dll_data[sec_off:sec_off+8].rstrip(b'\x00').decode('ascii', errors='replace')
+    vsize = struct.unpack_from('<I', dll_data, sec_off + 8)[0]
+    vrva = struct.unpack_from('<I', dll_data, sec_off + 12)[0]
+    rawsize = struct.unpack_from('<I', dll_data, sec_off + 16)[0]
+    rawoff = struct.unpack_from('<I', dll_data, sec_off + 20)[0]
+    sections.append((name, vrva, vsize, rawoff, rawsize))
+def rva_to_foff(rva):
+    for name, vrva, vsize, rawoff, rawsize in sections:
+        if vrva <= rva < vrva + vsize:
+            return rawoff + (rva - vrva)
+    return None
+if import_dir_rva:
+    ioff = rva_to_foff(import_dir_rva)
+    if ioff:
+        idx = 0
+        while True:
+            desc_off = ioff + idx * 20
+            ilt_rva = struct.unpack_from('<I', dll_data, desc_off)[0]
+            name_rva = struct.unpack_from('<I', dll_data, desc_off + 12)[0]
+            iat_rva = struct.unpack_from('<I', dll_data, desc_off + 16)[0]
+            if ilt_rva == 0 and name_rva == 0:
+                break
+            name_off = rva_to_foff(name_rva)
+            if name_off:
+                dname = dll_data[name_off:name_off+64].split(b'\x00')[0].decode('ascii', errors='replace')
+                if 'bcrypt' in dname.lower():
+                    print(f"\nDLL: {dname}, ILT RVA: 0x{ilt_rva:08x}, IAT RVA: 0x{iat_rva:08x}")
+                    # Walk the ILT to find function names
+                    ilt_off = rva_to_foff(ilt_rva)
+                    iat_entry_rva = iat_rva
+                    j = 0
+                    while ilt_off:
+                        entry = struct.unpack_from('<Q', dll_data, ilt_off + j * 8)[0]
+                        if entry == 0:
+                            break
+                        if entry & (1 << 63):
+                            ordinal = entry & 0xFFFF
+                            print(f"  IAT 0x{iat_entry_rva:08x}: Ordinal {ordinal}")
+                        else:
+                            hint_rva = entry & 0x7FFFFFFF
+                            hint_off = rva_to_foff(hint_rva)
+                            if hint_off:
+                                hint = struct.unpack_from('<H', dll_data, hint_off)[0]
+                                fname = dll_data[hint_off+2:hint_off+66].split(b'\x00')[0].decode('ascii', errors='replace')
+                                print(f"  IAT 0x{iat_entry_rva:08x}: {fname} (hint {hint})")
+                        iat_entry_rva += 8
+                        j += 1
+            idx += 1
+# Now disassemble the actual BCrypt crypto operation block
+# This is at RVA 0x0015ba45 = file 0x0015ae45
+disasm_region(
+    "BCrypt crypto operations (key gen + decrypt)",
+    0x0015ae45, 0x0015b200
+)
+# Find all indirect calls in extended range
+print("\n" + "="*100)
+print("ALL INDIRECT CALLS (ff 15) in range 0x0015ae00-0x0015c200")
+print("="*100)
+for i in range(0x0015ae00, 0x0015c200):
+    if dll_data[i] == 0xFF and dll_data[i+1] == 0x15:
+        rva = file_to_rva(i)
+        disp = struct.unpack_from('<i', dll_data, i + 2)[0]
+        target_rva = rva + 6 + disp
+        print(f"  File 0x{i:08x} (RVA 0x{rva:08x}): call [rip+0x{disp:x}] -> IAT@0x{target_rva:08x}")
+# Also disassemble the function at 0x18015abd0 (called to process data when r14b=true)
+# RVA 0x0015abd0, file 0x00159fd0
+disasm_region(
+    "Function at 0x18015abd0 (called on data when r14b=true)",
+    0x00159fd0, 0x0015a0c0
+)

_archive/attempts/disasm_crypto.py ADDED Viewed

	@@ -0,0 +1,156 @@

+"""
+Disassemble the Cipher function in oneocr.dll to find the exact crypto parameters.
+Find code references to the crypto strings we identified.
+"""
+import struct
+import re
+dll_path = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+with open(dll_path, "rb") as f:
+    data = f.read()
+# Parse PE headers to find section info
+pe_sig_offset = struct.unpack_from("<I", data, 0x3C)[0]
+assert data[pe_sig_offset:pe_sig_offset+4] == b"PE\x00\x00"
+# COFF header
+coff_start = pe_sig_offset + 4
+num_sections = struct.unpack_from("<H", data, coff_start + 2)[0]
+opt_header_size = struct.unpack_from("<H", data, coff_start + 16)[0]
+# Optional header
+opt_start = coff_start + 20
+magic = struct.unpack_from("<H", data, opt_start)[0]
+assert magic == 0x20B  # PE32+
+image_base = struct.unpack_from("<Q", data, opt_start + 24)[0]
+# Sections
+section_start = opt_start + opt_header_size
+sections = []
+for i in range(num_sections):
+    s_off = section_start + i * 40
+    name = data[s_off:s_off+8].rstrip(b"\x00").decode("ascii", errors="replace")
+    vsize = struct.unpack_from("<I", data, s_off + 8)[0]
+    va = struct.unpack_from("<I", data, s_off + 12)[0]
+    raw_size = struct.unpack_from("<I", data, s_off + 16)[0]
+    raw_ptr = struct.unpack_from("<I", data, s_off + 20)[0]
+    sections.append((name, va, vsize, raw_ptr, raw_size))
+    print(f"Section: {name:10s} VA=0x{va:08x} VSize=0x{vsize:08x} RawPtr=0x{raw_ptr:08x} RawSize=0x{raw_size:08x}")
+print(f"\nImage base: 0x{image_base:016x}")
+def rva_to_file_offset(rva):
+    for name, va, vsize, raw_ptr, raw_size in sections:
+        if va <= rva < va + vsize:
+            return raw_ptr + (rva - va)
+    return None
+def file_offset_to_rva(offset):
+    for name, va, vsize, raw_ptr, raw_size in sections:
+        if raw_ptr <= offset < raw_ptr + raw_size:
+            return va + (offset - raw_ptr)
+    return None
+# Key string offsets we found
+crypto_strings = {
+    "SHA256 (wide)": 0x02724b60,
+    "AES (wide)": 0x02724b70,
+    "BlockLength (wide)": 0x02724b78,
+    "ChainingModeCFB (wide)": 0x02724b90,
+    "meta->magic_number == MAGIC_NUMBER": 0x02724bb0,
+    "Unable to uncompress": 0x02724bd8,
+    "Crypto.cpp": 0x02724c08,
+    "Error returned from crypto API": 0x02724c40,
+    "ChainingMode (wide)": 0x02724c80,
+    "MessageBlockLength (wide)": 0x02724ca0,
+}
+# Calculate RVAs of these strings
+print("\n=== String RVAs ===")
+for name, file_off in crypto_strings.items():
+    rva = file_offset_to_rva(file_off)
+    if rva:
+        print(f"  {name}: file=0x{file_off:08x} RVA=0x{rva:08x}")
+# Find code references to these strings via LEA instruction patterns
+# In x64, LEA reg, [rip+disp32] is encoded as:
+# 48 8D xx yy yy yy yy  (where xx determines the register)
+# or 4C 8D xx yy yy yy yy
+# The target address = instruction_address + 7 + disp32
+print("\n=== Searching for code references to crypto strings ===")
+# Focus on the most important strings
+key_strings = {
+    "ChainingModeCFB (wide)": 0x02724b90,
+    "SHA256 (wide)": 0x02724b60,
+    "AES (wide)": 0x02724b70,
+    "Crypto.cpp": 0x02724c08,
+    "MessageBlockLength (wide)": 0x02724ca0,
+    "meta->magic_number": 0x02724bb0,
+}
+# Find the .text section (code)
+text_section = None
+for name, va, vsize, raw_ptr, raw_size in sections:
+    if name == ".text":
+        text_section = (va, vsize, raw_ptr, raw_size)
+        break
+if text_section:
+    text_va, text_vsize, text_raw, text_rawsize = text_section
+    print(f"\n.text section: VA=0x{text_va:08x} size=0x{text_vsize:08x}")
+    for string_name, string_file_off in key_strings.items():
+        string_rva = file_offset_to_rva(string_file_off)
+        if string_rva is None:
+            continue
+        # Search for LEA instructions referencing this RVA
+        # LEA uses RIP-relative addressing: target = RIP + disp32
+        # RIP at instruction = instruction_RVA + instruction_length (typically 7 for LEA)
+        refs_found = []
+        for code_off in range(text_raw, text_raw + text_rawsize - 7):
+            # Check for LEA patterns
+            b0 = data[code_off]
+            b1 = data[code_off + 1]
+            # 48 8D 0D/15/05/1D/25/2D/35/3D = LEA with REX.W
+            # 4C 8D 05/0D/15/1D/25/2D/35/3D = LEA with REX.WR
+            if b0 in (0x48, 0x4C) and b1 == 0x8D:
+                modrm = data[code_off + 2]
+                if (modrm & 0xC7) == 0x05:  # mod=00, rm=101 (RIP-relative)
+                    disp32 = struct.unpack_from("<i", data, code_off + 3)[0]
+                    instr_rva = file_offset_to_rva(code_off)
+                    if instr_rva is None:
+                        continue
+                    target_rva = instr_rva + 7 + disp32
+                    if target_rva == string_rva:
+                        reg_idx = (modrm >> 3) & 7
+                        if b0 == 0x4C:
+                            reg_idx += 8
+                        reg_names = ["rax","rcx","rdx","rbx","rsp","rbp","rsi","rdi",
+                                     "r8","r9","r10","r11","r12","r13","r14","r15"]
+                        reg = reg_names[reg_idx]
+                        refs_found.append((code_off, instr_rva, reg))
+        if refs_found:
+            print(f"\n  References to '{string_name}' (RVA=0x{string_rva:08x}):")
+            for code_off, instr_rva, reg in refs_found[:5]:
+                print(f"    at file=0x{code_off:08x} RVA=0x{instr_rva:08x}: LEA {reg}, [{string_name}]")
+                # Dump surrounding code
+                ctx_start = max(text_raw, code_off - 64)
+                ctx_end = min(text_raw + text_rawsize, code_off + 128)
+                # Simple bytecode dump with some x64 instruction markers
+                print(f"    Context (file offset 0x{ctx_start:08x} - 0x{ctx_end:08x}):")
+                for i in range(ctx_start, ctx_end, 16):
+                    chunk = data[i:i+16]
+                    hex_part = " ".join(f"{b:02x}" for b in chunk)
+                    rva_i = file_offset_to_rva(i)
+                    marker = " <<<" if i <= code_off < i + 16 else ""
+                    print(f"      {rva_i:08x}: {hex_part}{marker}")
+        else:
+            print(f"\n  No code references found for '{string_name}'")

_archive/attempts/disasm_full_cipher.py ADDED Viewed

	@@ -0,0 +1,138 @@

+"""
+Full disassembly of the Cipher function from AES setup through BCryptDecrypt.
+Based on findings:
+- SHA256 provider at file 0x0015a3e2 (RVA 0x0015afe2)
+- AES provider at file 0x0015a702 (RVA 0x0015b302)
+- ChainingModeCFB at file 0x0015a7cd (RVA 0x0015b3cd)
+- MessageBlockLength at file 0x0015a7fc (RVA 0x0015b3fc)
+- BCryptGenerateSymmetricKey import at ~0x027ef0a2
+- Need to find: key handling, IV passing, BCryptDecrypt call
+"""
+import struct
+from capstone import Cs, CS_ARCH_X86, CS_MODE_64
+DLL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+IMAGE_BASE = 0x180000000
+TEXT_VA = 0x1000
+TEXT_FILE_OFFSET = 0x400  # .text section file offset
+def rva_to_file(rva):
+    return rva - TEXT_VA + TEXT_FILE_OFFSET
+def file_to_rva(foff):
+    return foff - TEXT_FILE_OFFSET + TEXT_VA
+with open(DLL_PATH, "rb") as f:
+    dll_data = f.read()
+md = Cs(CS_ARCH_X86, CS_MODE_64)
+md.detail = False
+def disasm_region(name, file_start, file_end):
+    rva_start = file_to_rva(file_start)
+    va_start = IMAGE_BASE + rva_start
+    code = dll_data[file_start:file_end]
+    print(f"\n{'='*100}")
+    print(f"{name}")
+    print(f"File: 0x{file_start:08x}-0x{file_end:08x}, RVA: 0x{rva_start:08x}, VA: 0x{va_start:016x}")
+    print(f"{'='*100}")
+    for insn in md.disasm(code, va_start):
+        foff = rva_to_file(insn.address - IMAGE_BASE)
+        print(f"  {insn.address - IMAGE_BASE:08x} ({foff:08x}): {insn.bytes.hex():<40s} {insn.mnemonic:<14s} {insn.op_str}")
+# The Cipher function appears to start before the AES setup.
+# Let's find the function prologue by scanning backwards from the AES setup.
+# The AES LEA is at file 0x0015a702. Let's look for a typical function prologue.
+# First, let's find the actual function start
+# Look for common prologues (push rbp, sub rsp, mov [rsp+...], etc.) before the AES reference
+print("\n" + "="*100)
+print("SCANNING FOR FUNCTION PROLOGUE before AES setup (file 0x0015a702)")
+print("="*100)
+# Search backwards from 0x0015a702 for push rbp or sub rsp patterns
+search_start = 0x0015a500  # Start from after SHA256Hash function
+search_end = 0x0015a710
+search_region = dll_data[search_start:search_end]
+# Look for common x64 function prologues
+# 48 89 5C 24 xx = mov [rsp+xx], rbx
+# 48 89 74 24 xx = mov [rsp+xx], rsi
+# 55 = push rbp
+# 40 55 = push rbp (with REX prefix)
+# 48 8B EC = mov rbp, rsp
+# 48 81 EC xx xx xx xx = sub rsp, imm32
+for i in range(len(search_region) - 4):
+    b = search_region[i:i+8]
+    foff = search_start + i
+    rva = file_to_rva(foff)
+    # Look for function start patterns
+    if b[:5] == bytes([0x48, 0x89, 0x5C, 0x24, 0x08]):  # mov [rsp+8], rbx
+        print(f"  Possible prologue at file 0x{foff:08x} (RVA 0x{rva:08x}): mov [rsp+8], rbx")
+    elif b[:2] == bytes([0x40, 0x55]):  # push rbp with REX
+        print(f"  Possible prologue at file 0x{foff:08x} (RVA 0x{rva:08x}): REX push rbp")
+    elif b[:1] == bytes([0x55]) and (i == 0 or search_region[i-1] in (0xC3, 0xCC, 0x90)):
+        print(f"  Possible prologue at file 0x{foff:08x} (RVA 0x{rva:08x}): push rbp (after ret/nop/int3)")
+    elif b[:4] == bytes([0x48, 0x83, 0xEC, 0x28]):  # sub rsp, 0x28
+        print(f"  Possible prologue at file 0x{foff:08x} (RVA 0x{rva:08x}): sub rsp, 0x28")
+    elif b[:3] == bytes([0x48, 0x81, 0xEC]):  # sub rsp, imm32
+        val = struct.unpack_from('<I', b, 3)[0]
+        print(f"  Possible prologue at file 0x{foff:08x} (RVA 0x{rva:08x}): sub rsp, 0x{val:X}")
+# Now disassemble the ENTIRE Cipher function region - from after SHA256Hash to well past all setup
+# The function is large, so let's do it in meaningful chunks
+# Region 1: Function start to AES provider setup
+disasm_region(
+    "Cipher function part 1: prologue to AES provider",
+    0x0015a500, 0x0015a720
+)
+# Region 2: AES provider setup through ChainingMode and MessageBlockLength
+disasm_region(
+    "Cipher function part 2: AES provider, ChainingModeCFB, MessageBlockLength",
+    0x0015a720, 0x0015a880
+)
+# Region 3: After IV extraction, BCryptGenerateSymmetricKey, BCryptDecrypt calls
+# This is the critical region we need
+disasm_region(
+    "Cipher function part 3: key gen and decrypt (extended)",
+    0x0015abd0, 0x0015ae00
+)
+# Also check what's around the BCryptDecrypt import call
+# BCrypt imports are indirect calls through IAT
+# Let's find all indirect calls (FF 15) in the cipher function range
+print("\n" + "="*100)
+print("ALL INDIRECT CALLS (ff 15) in Cipher function region 0x0015a500-0x0015ae00")
+print("="*100)
+search_start = 0x0015a500
+search_end = 0x0015ae00
+for i in range(search_end - search_start - 6):
+    foff = search_start + i
+    if dll_data[foff] == 0xFF and dll_data[foff+1] == 0x15:
+        rva = file_to_rva(foff)
+        disp = struct.unpack_from('<i', dll_data, foff + 2)[0]
+        target_rva = rva + 6 + disp  # RIP-relative
+        target_foff = rva_to_file(target_rva)
+        # Read the IAT entry (8 bytes at the target)
+        iat_value = struct.unpack_from('<Q', dll_data, target_foff)[0] if target_foff + 8 <= len(dll_data) else 0
+        print(f"  File 0x{foff:08x} (RVA 0x{rva:08x}): call [rip+0x{disp:x}] -> IAT at RVA 0x{target_rva:08x}")
+# Also disassemble the region between IV handling (0x0015abdb) and magic number check (0x0015a170)
+# This might contain the actual BCryptDecrypt call
+disasm_region(
+    "Cipher function part 4: from end of IV path to function cleanup",
+    0x0015ac00, 0x0015ae00
+)
+# Look for the region right before the magic number check function
+# The Cipher function should return, and then a caller invokes the magic check
+disasm_region(
+    "Pre-magic-check function caller",
+    0x0015a0c0, 0x0015a170
+)

_archive/attempts/disasm_proper.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""
+Proper disassembly of the Cipher function in oneocr.dll using capstone.
+Focus on the crypto setup flow: key derivation, IV, AES parameters.
+"""
+import struct
+from capstone import Cs, CS_ARCH_X86, CS_MODE_64
+dll_path = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+with open(dll_path, "rb") as f:
+    data = f.read()
+# PE parsing (simplified)
+pe_sig_offset = struct.unpack_from("<I", data, 0x3C)[0]
+coff_start = pe_sig_offset + 4
+opt_header_size = struct.unpack_from("<H", data, coff_start + 16)[0]
+opt_start = coff_start + 20
+image_base = struct.unpack_from("<Q", data, opt_start + 24)[0]
+num_sections = struct.unpack_from("<H", data, coff_start + 2)[0]
+section_start = opt_start + opt_header_size
+sections = []
+for i in range(num_sections):
+    s_off = section_start + i * 40
+    name = data[s_off:s_off+8].rstrip(b"\x00").decode("ascii", errors="replace")
+    vsize = struct.unpack_from("<I", data, s_off + 8)[0]
+    va = struct.unpack_from("<I", data, s_off + 12)[0]
+    raw_size = struct.unpack_from("<I", data, s_off + 16)[0]
+    raw_ptr = struct.unpack_from("<I", data, s_off + 20)[0]
+    sections.append((name, va, vsize, raw_ptr, raw_size))
+def rva_to_file_offset(rva):
+    for name, va, vsize, raw_ptr, raw_size in sections:
+        if va <= rva < va + vsize:
+            return raw_ptr + (rva - va)
+    return None
+def file_offset_to_rva(offset):
+    for name, va, vsize, raw_ptr, raw_size in sections:
+        if raw_ptr <= offset < raw_ptr + raw_size:
+            return va + (offset - raw_ptr)
+    return None
+md = Cs(CS_ARCH_X86, CS_MODE_64)
+md.detail = True
+def disasm_region(file_start, file_end, label=""):
+    """Disassemble a region and print instructions."""
+    code_bytes = data[file_start:file_end]
+    base_rva = file_offset_to_rva(file_start)
+    base_addr = image_base + base_rva
+    print(f"\n{'='*80}")
+    print(f"{label}")
+    print(f"File: 0x{file_start:08x}-0x{file_end:08x}, RVA: 0x{base_rva:08x}, VA: 0x{base_addr:016x}")
+    print(f"{'='*80}")
+    for instr in md.disasm(code_bytes, base_addr):
+        file_off = file_start + (instr.address - base_addr)
+        rva = base_rva + (instr.address - base_addr)
+        hex_bytes = " ".join(f"{b:02x}" for b in instr.bytes)
+        print(f"  {rva:08x} ({file_off:08x}): {hex_bytes:<30s} {instr.mnemonic:10s} {instr.op_str}")
+# Key code regions to disassemble (from our earlier analysis)
+# These are file offsets where important crypto code is
+regions = [
+    # SHA256 provider setup
+    (0x0015a3a0, 0x0015a500, "SHA256Hash function - BCryptOpenAlgorithmProvider for SHA256"),
+    # AES provider setup and ChainingMode/MessageBlockLength
+    (0x0015a6b0, 0x0015a880, "Cipher function - AES setup, ChainingModeCFB, MessageBlockLength"),
+    # Key generation and decrypt/encrypt
+    (0x0015a880, 0x0015aA00, "Cipher function - key generation and encrypt/decrypt"),
+    # Magic number check and uncompress
+    (0x0015a170, 0x0015a300, "Magic number check and uncompress"),
+]
+for file_start, file_end, label in regions:
+    disasm_region(file_start, file_end, label)
+# Also look for the function that calls BCryptDecrypt
+# BCryptDecrypt is called via an indirect call through the import table
+# Let me find the BCryptDecrypt IAT entry
+print("\n\n=== Finding BCryptDecrypt call sites ===")
+# The call at 0015b3de: ff 15 23 f2 6b 00 is CALL [rip+0x006bf223]
+# This is an indirect call through the IAT
+# Let me find similar patterns near the ChainingModeCFB reference
+# After ChainingMode and MessageBlockLength are set, the next step is GenerateSymmetricKey
+# Disassemble the broader decrypt region
+disasm_region(0x0015a880, 0x0015abe0, "Post-setup: key generation, IV, encrypt/decrypt")

_archive/attempts/discover_key_derivation.py ADDED Viewed

	@@ -0,0 +1,126 @@

+"""Discover key derivation: what SHA256 input produces each chunk's secret key?"""
+import hashlib
+import struct
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV  = b"Copyright @ OneO"
+# Captured per-chunk secrets from hook
+CHUNK_SECRETS = {
+    0: bytes.fromhex("d13142a17603a8e25c9ca2f90761f7fdf31ad106fd224fb7fe6a33e695c0f25a"),  # DX index
+    1: bytes.fromhex("82aa42940241cc1ef7b72b3b8a22acd7f1eac465069c4b375d129f304dbd9363"),  # Config
+    2: bytes.fromhex("af1442f4972ca3254d4b496c6c1c55e071a808089f814957c7002c4762fecd15"),  # ONNX encrypt+chunk
+    3: bytes.fromhex("1bc0a4cfe390d35e0597d4a67451d9c8f62f53df962804a6e6907cddb3d0004b"),  # Big ONNX model
+    4: bytes.fromhex("c1e03295f3793ee74c685bfe3872ec795e76f731e939abfd09120ada886a9228"),  # ONNX model
+}
+print("=" * 70)
+print("SHA256 Key Derivation Discovery")
+print("=" * 70)
+print(f"Master key: {KEY!r}")
+print(f"SHA256(key) = {hashlib.sha256(KEY).hexdigest()}")
+print()
+# Test various derivation schemes
+def try_hash(label, data, target_idx=None):
+    h = hashlib.sha256(data).digest()
+    for idx, secret in CHUNK_SECRETS.items():
+        if target_idx is not None and idx != target_idx:
+            continue
+        if h == secret:
+            print(f"  *** MATCH chunk {idx}! *** {label} -> {h.hex()}")
+            return True
+    return False
+print("--- Simple hashes ---")
+try_hash("SHA256(key)", KEY)
+try_hash("SHA256(IV)", IV)
+try_hash("SHA256(key+IV)", KEY + IV)
+try_hash("SHA256(IV+key)", IV + KEY)
+print("\n--- Key + counter ---")
+for i in range(10):
+    try_hash(f"SHA256(key + uint8({i}))", KEY + bytes([i]))
+    try_hash(f"SHA256(key + uint32LE({i}))", KEY + struct.pack('<I', i))
+    try_hash(f"SHA256(key + uint64LE({i}))", KEY + struct.pack('<Q', i))
+    try_hash(f"SHA256(uint8({i}) + key)", bytes([i]) + KEY)
+    try_hash(f"SHA256(uint32LE({i}) + key)", struct.pack('<I', i) + KEY)
+    try_hash(f"SHA256(uint64LE({i}) + key)", struct.pack('<Q', i) + KEY)
+print("\n--- Key + string counter ---")
+for i in range(10):
+    try_hash(f"SHA256(key + '{i}')", KEY + str(i).encode())
+    try_hash(f"SHA256('{i}' + key)", str(i).encode() + KEY)
+print("\n--- Double hash ---")
+h1 = hashlib.sha256(KEY).digest()
+try_hash("SHA256(SHA256(key))", h1)
+for i in range(10):
+    try_hash(f"SHA256(SHA256(key) + uint8({i}))", h1 + bytes([i]))
+    try_hash(f"SHA256(SHA256(key) + uint32LE({i}))", h1 + struct.pack('<I', i))
+print("\n--- HMAC-SHA256 ---")
+import hmac
+for i in range(10):
+    h = hmac.new(KEY, bytes([i]), hashlib.sha256).digest()
+    for idx, secret in CHUNK_SECRETS.items():
+        if h == secret:
+            print(f"  *** MATCH chunk {idx}! *** HMAC(key, uint8({i}))")
+    h = hmac.new(KEY, struct.pack('<I', i), hashlib.sha256).digest()
+    for idx, secret in CHUNK_SECRETS.items():
+        if h == secret:
+            print(f"  *** MATCH chunk {idx}! *** HMAC(key, uint32LE({i}))")
+# Read file header data that might be used in derivation
+from pathlib import Path
+file_data = Path("ocr_data/oneocr.onemodel").read_bytes()
+header = file_data[:24]  # First 24 bytes (before encrypted DX)
+print(f"\nFile header (offset 0-23): {header.hex()}")
+header_size = struct.unpack('<I', file_data[:4])[0]
+print(f"Header size field: {header_size}")
+print("\n--- Key + file header data ---")
+try_hash("SHA256(key + header[:8])", KEY + header[:8])
+try_hash("SHA256(key + header[:16])", KEY + header[:16])
+try_hash("SHA256(key + header[:24])", KEY + header[:24])
+try_hash("SHA256(header[:8] + key)", header[:8] + KEY)
+try_hash("SHA256(header[:16] + key)", header[:16] + KEY)
+try_hash("SHA256(header[:24] + key)", header[:24] + KEY)
+# Try with known offsets
+print("\n--- Key + chunk offset ---")
+offsets = [24, 22648, 22684]  # Known offsets
+for off in offsets:
+    try_hash(f"SHA256(key + uint32LE({off}))", KEY + struct.pack('<I', off))
+    try_hash(f"SHA256(key + uint64LE({off}))", KEY + struct.pack('<Q', off))
+# Try with chunk sizes
+print("\n--- Key + chunk sizes ---")
+sizes = [22624, 11920, 11553680]
+for sz in sizes:
+    try_hash(f"SHA256(key + uint32LE({sz}))", KEY + struct.pack('<I', sz))
+# Try iterative: SHA256(key), SHA256(prev_hash), ...
+print("\n--- Iterative hashing (chain) ---")
+h = KEY
+for i in range(10):
+    h = hashlib.sha256(h).digest()
+    for idx, secret in CHUNK_SECRETS.items():
+        if h == secret:
+            print(f"  *** MATCH chunk {idx}! *** SHA256^{i+1}(key)")
+# Try key + IV combos
+print("\n--- Key + IV + counter ---")
+for i in range(10):
+    try_hash(f"SHA256(key + IV + uint8({i}))", KEY + IV + bytes([i]))
+    try_hash(f"SHA256(IV + key + uint8({i}))", IV + KEY + bytes([i]))
+    try_hash(f"SHA256(key + uint8({i}) + IV)", KEY + bytes([i]) + IV)
+# Try XOR-based derivation
+print("\n--- XOR key with counter ---")
+for i in range(10):
+    xor_key = bytes(b ^ i for b in KEY)
+    try_hash(f"SHA256(key XOR {i})", xor_key)
+print("\n--- If no match found, need to hook BCryptHash/BCryptHashData ---")
+print("to see exact SHA256 input data")

_archive/attempts/dll_bcrypt_analysis.py ADDED Viewed

	@@ -0,0 +1,63 @@

+import re
+data = open(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll", "rb").read()
+# Find ALL BCrypt function occurrences
+print("=== All BCrypt function references ===")
+for m in re.finditer(b'BCrypt\w+', data):
+    offset = m.start()
+    name = m.group().decode('ascii')
+    print(f"  [0x{offset:08x}] {name}")
+# Search for BCryptGenerateSymmetricKey and BCryptImportKey specifically
+print()
+for fn in [b"BCryptGenerateSymmetricKey", b"BCryptImportKey", b"BCryptCreateHash"]:
+    pos = data.find(fn)
+    print(f"  {fn.decode()}: {'FOUND at 0x' + format(pos, '08x') if pos != -1 else 'NOT FOUND'}")
+# Look for MAGIC_NUMBER constant value
+print()
+print("=== Looking for MAGIC_NUMBER = 1 constant context ===")
+for pattern in [b"magic_number == MAGIC_NUMBER"]:
+    pos = data.find(pattern)
+    while pos != -1:
+        # Dump wider context
+        ctx_start = max(0, pos - 100)
+        ctx_end = min(len(data), pos + 100)
+        ctx = data[ctx_start:ctx_end]
+        # Find strings in context
+        for m in re.finditer(b'[\x20-\x7e]{4,}', ctx):
+            print(f"  [0x{ctx_start + m.start():08x}] {m.group().decode('ascii')}")
+        pos = data.find(pattern, pos + 1)
+        if pos == -1:
+            break
+# Look at the region right around the crypto strings for more context
+print()
+print("=== Extended crypto region dump 0x02724b00-0x02724d00 ===")
+for i in range(0x02724b00, 0x02724d00, 16):
+    chunk = data[i:i+16]
+    hex_part = " ".join(f"{b:02x}" for b in chunk)
+    ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in chunk)
+    print(f"  {i:08x}: {hex_part:<48s}  {ascii_part}")
+# Check for constant values near magic_number assertion - look for "1" as uint32
+# Find the code that references the magic_number string
+print()
+print("=== Finding code references to Crypto.cpp ===")
+crypto_path = b"C:\\__w\\1\\s\\CoreEngine\\Native\\ModelParser\\Crypto.cpp"
+pos = data.find(crypto_path)
+if pos != -1:
+    # This is in the .rdata section. Find cross-references to this address
+    # In x64, look for LEA instructions referencing this RVA
+    print(f"  Crypto.cpp string at: 0x{pos:08x}")
+    # Look for the "block length" being set - find 16 as a byte constant near BlockLength string
+    print()
+    print("=== Looking for block length values near crypto code ===")
+    bl_str = data.find(b"B\x00l\x00o\x00c\x00k\x00L\x00e\x00n\x00g\x00t\x00h\x00")
+    if bl_str != -1:
+        print(f"  BlockLength wide string at: 0x{bl_str:08x}")
+    ml_str = data.find(b"M\x00e\x00s\x00s\x00a\x00g\x00e\x00B\x00l\x00o\x00c\x00k\x00L\x00e\x00n\x00g\x00t\x00h\x00")
+    if ml_str != -1:
+        print(f"  MessageBlockLength wide string at: 0x{ml_str:08x}")

_archive/attempts/dll_crypto_analysis.py ADDED Viewed

	@@ -0,0 +1,183 @@

+"""
+Deep analysis of oneocr.dll to find the exact decryption algorithm.
+Searches for Crypto.cpp references, key/IV derivation patterns, and
+the structure of .onemodel container format.
+"""
+import struct
+import re
+import os
+from collections import Counter
+import math
+DLL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll"
+MODEL_PATH = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.onemodel"
+with open(DLL_PATH, "rb") as f:
+    dll = f.read()
+with open(MODEL_PATH, "rb") as f:
+    model = f.read()
+def entropy_calc(data):
+    if not data:
+        return 0.0
+    freq = Counter(data)
+    total = len(data)
+    return -sum((c/total) * math.log2(c/total) for c in freq.values())
+print("=" * 80)
+print("PHASE 1: Crypto.cpp and error message strings")
+print("=" * 80)
+search_strings = [
+    b'Crypto.cpp', b'magic_number', b'MAGIC_NUMBER', b'uncompress',
+    b'Uncompress', b'Source data', b'mismatch', b'Check failed',
+    b'Unable to', b'model_data', b'ModelData', b'decrypt', b'Decrypt',
+    b'LoadModel', b'load_model', b'onemodel', b'.onemodel',
+    b'ParseModel', b'DeserializeModel', b'ReadModel',
+]
+all_found = set()
+for term in search_strings:
+    for m in re.finditer(re.escape(term), dll, re.IGNORECASE):
+        offset = m.start()
+        s = offset
+        while s > 0 and 0x20 <= dll[s-1] < 0x7f:
+            s -= 1
+        e = offset + len(term)
+        while e < len(dll) and 0x20 <= dll[e] < 0x7f:
+            e += 1
+        full = dll[s:e].decode('ascii', errors='ignore')
+        if full not in all_found and len(full) > 3:
+            all_found.add(full)
+            print(f"  0x{s:08x}: {full[:250]}")
+print(f"\n" + "=" * 80)
+print("PHASE 2: Compression library strings")
+print("=" * 80)
+for pattern in [b'uncompress', b'compress', b'inflate', b'deflate',
+                b'lz4', b'LZ4', b'snappy', b'Snappy', b'zstd', b'ZSTD',
+                b'zlib', b'ZLIB', b'brotli', b'lzma', b'LZMA']:
+    idx = 0
+    seen = set()
+    while True:
+        idx = dll.find(pattern, idx)
+        if idx < 0:
+            break
+        s = idx
+        while s > 0 and 0x20 <= dll[s-1] < 0x7f:
+            s -= 1
+        e = idx + len(pattern)
+        while e < len(dll) and 0x20 <= dll[e] < 0x7f:
+            e += 1
+        full = dll[s:e].decode('ascii', errors='ignore')
+        if full not in seen and len(full) > 3:
+            seen.add(full)
+            print(f"  0x{s:08x}: {full[:200]}")
+        idx = e
+print(f"\n" + "=" * 80)
+print("PHASE 3: .onemodel file structure analysis")
+print("=" * 80)
+filesize = len(model)
+h_size = struct.unpack_from("<I", model, 0)[0]  # 22636
+print(f"File size: {filesize:,} bytes ({filesize/1024/1024:.2f} MB)")
+print(f"Header size (uint32 @ 0): {h_size}")
+# Detailed header boundary analysis
+print(f"\nAt header boundary (offset {h_size}):")
+for i in range(0, 64, 4):
+    off = h_size + i
+    val32 = struct.unpack_from("<I", model, off)[0]
+    print(f"  @{off:6d} (+{i:2d}): u32={val32:>12,} (0x{val32:08x})  hex={model[off:off+4].hex()}")
+# Critical check: does any uint64 at header boundary == remaining data?
+print(f"\nSize field search at header boundary:")
+for i in range(0, 32, 4):
+    off = h_size + i
+    if off + 8 <= filesize:
+        val64 = struct.unpack_from("<Q", model, off)[0]
+        remaining = filesize - (off + 8)
+        diff = abs(val64 - remaining)
+        if diff < 1000:
+            print(f"  *** @{off} (+{i}): u64={val64:,} remaining={remaining:,} diff={diff}")
+# Check header entropy pattern
+print(f"\nHeader entropy (256-byte chunks):")
+for chunk_start in range(0, h_size, 256):
+    chunk_end = min(chunk_start + 256, h_size)
+    chunk = model[chunk_start:chunk_end]
+    ent = entropy_calc(chunk)
+    uniq = len(set(chunk))
+    tag = " ← STRUCTURED!" if ent < 5.0 else (" ← moderate" if ent < 7.0 else "")
+    if ent < 6.0 or chunk_start < 256 or chunk_start >= h_size - 256:
+        print(f"  [{chunk_start:5d}:{chunk_end:5d}] ent={ent:.3f} uniq={uniq:3d}/256{tag}")
+# Search for substructures within header: look for recurring uint32 patterns
+print(f"\nSearching for structure markers in header (first 100 bytes):")
+for i in range(0, min(100, h_size), 4):
+    val = struct.unpack_from("<I", model, i)[0]
+    if val < 1000 or (1000000 < val < filesize):
+        print(f"  @{i:4d}: u32={val:>12,} (0x{val:08x})")
+print(f"\n" + "=" * 80)
+print("PHASE 4: Sub-model references in DLL")
+print("=" * 80)
+submodel_patterns = [
+    b'detector', b'Detector', b'recognizer', b'Recognizer',
+    b'normalizer', b'Normalizer', b'classifier', b'Classifier',
+    b'dispatch', b'Dispatch', b'barcode', b'Barcode',
+    b'text_detect', b'text_recog', b'TextDetect', b'TextRecog',
+    b'CTC', b'transformer', b'Transformer',
+    b'model_type', b'ModelType', b'model_name', b'ModelName',
+    b'sub_model', b'SubModel', b'segment',
+]
+found = set()
+for pattern in submodel_patterns:
+    for m in re.finditer(pattern, dll, re.IGNORECASE):
+        s = m.start()
+        while s > 0 and 0x20 <= dll[s-1] < 0x7f:
+            s -= 1
+        e = m.end()
+        while e < len(dll) and 0x20 <= dll[e] < 0x7f:
+            e += 1
+        full = dll[s:e].decode('ascii', errors='ignore')
+        if full not in found and 4 < len(full) < 200 and 'OneOCR' in full or 'model' in full.lower() or 'detect' in full.lower() or 'recog' in full.lower():
+            found.add(full)
+            print(f"  0x{s:08x}: {full}")
+print(f"\n" + "=" * 80)
+print("PHASE 5: ORT session creation patterns")
+print("=" * 80)
+ort_patterns = [
+    b'OrtGetApiBase', b'CreateSession', b'SessionOptions',
+    b'CreateSessionFromArray', b'OrtApi', b'InferenceSession',
+    b'SessionFromBuffer', b'CreateSessionFromBuffer',
+    b'AppendExecutionProvider', b'ModelMetadata',
+]
+for pattern in ort_patterns:
+    idx = 0
+    while True:
+        idx = dll.find(pattern, idx)
+        if idx < 0:
+            break
+        s = idx
+        while s > 0 and 0x20 <= dll[s-1] < 0x7f:
+            s -= 1
+        e = idx + len(pattern)
+        while e < len(dll) and 0x20 <= dll[e] < 0x7f:
+            e += 1
+        full = dll[s:e].decode('ascii', errors='ignore')
+        print(f"  0x{s:08x}: {full[:200]}")
+        idx = e
+print(f"\n" + "=" * 80)
+print("DONE")
+print("=" * 80)

_archive/attempts/extract_onnx.py ADDED Viewed

	@@ -0,0 +1,235 @@

+"""Extract valid ONNX models from BCryptDecrypt dumps.
+Strips 8-byte container header and trailing garbage bytes.
+"""
+import os
+import struct
+from pathlib import Path
+DUMP_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\onnx_models")
+OUTPUT_DIR.mkdir(exist_ok=True)
+CONTAINER_HEADER = bytes.fromhex("4a1a082b25000000")
+HEADER_LEN = 8
+def read_varint(data, pos):
+    """Read a protobuf varint. Returns (value, new_pos)."""
+    val = 0
+    shift = 0
+    while pos < len(data):
+        b = data[pos]; pos += 1
+        val |= (b & 0x7f) << shift
+        if not (b & 0x80):
+            break
+        shift += 7
+    return val, pos
+def measure_protobuf(data):
+    """Walk through protobuf fields and return the byte length of valid data.
+    Stops at the first unknown/invalid field for ONNX ModelProto.
+    Valid fields: 1-9, 14, 20."""
+    VALID_FIELDS = {1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 20}
+    pos = 0
+    last_valid = 0
+    while pos < len(data):
+        start = pos
+        # Read tag
+        tag, pos = read_varint(data, pos)
+        if pos > len(data):
+            break
+        field_num = tag >> 3
+        wire_type = tag & 7
+        if field_num not in VALID_FIELDS:
+            return start
+        if wire_type == 0:  # VARINT
+            _, pos = read_varint(data, pos)
+        elif wire_type == 1:  # I64
+            pos += 8
+        elif wire_type == 2:  # LEN
+            length, pos = read_varint(data, pos)
+            pos += length
+        elif wire_type == 5:  # I32
+            pos += 4
+        else:
+            return start
+        if pos > len(data):
+            return start
+        last_valid = pos
+    return last_valid
+def try_onnx_load(filepath):
+    try:
+        import onnx
+        model = onnx.load(str(filepath))
+        return {
+            'ir_version': model.ir_version,
+            'producer': model.producer_name,
+            'producer_version': model.producer_version,
+            'opset': [f"{o.domain or 'ai.onnx'}:{o.version}" for o in model.opset_import],
+            'graph_name': model.graph.name if model.graph else None,
+            'num_nodes': len(model.graph.node) if model.graph else 0,
+            'num_inputs': len(model.graph.input) if model.graph else 0,
+            'num_outputs': len(model.graph.output) if model.graph else 0,
+            'node_types': sorted(set(n.op_type for n in model.graph.node)) if model.graph else [],
+        }
+    except Exception as e:
+        return {'error': str(e)[:200]}
+def try_ort_load(filepath):
+    try:
+        import onnxruntime as ort
+        sess = ort.InferenceSession(str(filepath), providers=['CPUExecutionProvider'])
+        return {
+            'inputs': [(i.name, i.shape, i.type) for i in sess.get_inputs()],
+            'outputs': [(o.name, o.shape, o.type) for o in sess.get_outputs()],
+        }
+    except Exception as e:
+        return {'error': str(e)[:200]}
+print("=" * 70)
+print("EXTRACTING ONNX MODELS (WITH TRAILING GARBAGE REMOVAL)")
+print("=" * 70)
+# Clean output dir
+for old in OUTPUT_DIR.glob("*.onnx"):
+    old.unlink()
+files = sorted(DUMP_DIR.glob("decrypt_*.bin"), key=lambda f: f.stat().st_size, reverse=True)
+print(f"Total decrypt files: {len(files)}\n")
+models = []
+non_models = []
+for f in files:
+    raw = f.read_bytes()
+    # Strip container header
+    if raw[:HEADER_LEN] == CONTAINER_HEADER:
+        data = raw[HEADER_LEN:]
+    elif raw[:5] == CONTAINER_HEADER[:5]:
+        data = raw[HEADER_LEN:]
+    else:
+        non_models.append({'src': f.name, 'size': len(raw), 'reason': 'no container header',
+                          'first_16': raw[:16].hex()})
+        continue
+    # Check if data starts with valid ONNX (field 1 = ir_version, varint)
+    if len(data) < 2 or data[0] != 0x08 or data[1] < 1 or data[1] > 12:
+        preview = data[:40].decode('utf-8', errors='replace')
+        non_models.append({'src': f.name, 'size': len(raw), 'reason': 'not ONNX',
+                          'preview': preview})
+        continue
+    # Measure valid protobuf length (strip trailing garbage)
+    valid_len = measure_protobuf(data)
+    trimmed = len(data) - valid_len
+    onnx_data = data[:valid_len]
+    # Determine producer
+    producer = "unknown"
+    if b"PyTorch" in data[:100]:
+        producer = "pytorch"
+    elif b"onnx.quantize" in data[:100]:
+        producer = "onnx_quantize"
+    elif b"pytorch" in data[:100]:
+        producer = "pytorch_small"
+    ir_version = data[1]
+    idx = len(models)
+    fname = f"model_{idx:02d}_ir{ir_version}_{producer}_{valid_len//1024}KB.onnx"
+    outpath = OUTPUT_DIR / fname
+    outpath.write_bytes(onnx_data)
+    models.append({
+        'src': f.name, 'dst': fname, 'raw_size': len(raw),
+        'onnx_size': valid_len, 'trimmed': trimmed,
+        'ir_version': ir_version, 'producer': producer,
+    })
+print(f"ONNX models extracted: {len(models)}")
+print(f"Non-model files: {len(non_models)}")
+# Verify all models
+print("\n" + "=" * 70)
+print("VERIFICATION WITH onnx + onnxruntime")
+print("=" * 70)
+verified_onnx = 0
+verified_ort = 0
+for m in models:
+    outpath = OUTPUT_DIR / m['dst']
+    r_onnx = try_onnx_load(outpath)
+    r_ort = try_ort_load(outpath)
+    onnx_ok = 'error' not in r_onnx
+    ort_ok = 'error' not in r_ort
+    if onnx_ok:
+        verified_onnx += 1
+    if ort_ok:
+        verified_ort += 1
+    status = "OK" if onnx_ok and ort_ok else ("onnx" if onnx_ok else ("ort" if ort_ok else "FAIL"))
+    print(f"\n  [{status:>4}] {m['dst']}")
+    print(f"        Raw: {m['raw_size']:>10,} -> ONNX: {m['onnx_size']:>10,} (trimmed {m['trimmed']} bytes)")
+    if onnx_ok:
+        r = r_onnx
+        print(f"        graph='{r['graph_name']}', nodes={r['num_nodes']}, "
+              f"inputs={r['num_inputs']}, outputs={r['num_outputs']}")
+        print(f"        opset: {', '.join(r['opset'][:5])}")
+        ops = r['node_types']
+        print(f"        ops({len(ops)}): {', '.join(ops[:15])}")
+        if len(ops) > 15:
+            print(f"             ... +{len(ops)-15} more")
+    elif ort_ok:
+        r = r_ort
+        for inp in r['inputs']:
+            print(f"        Input:  {inp[0]} {inp[1]} {inp[2]}")
+        for out in r['outputs']:
+            print(f"        Output: {out[0]} {out[1]} {out[2]}")
+    else:
+        print(f"        onnx: {r_onnx.get('error', '')[:100]}")
+        print(f"        ort:  {r_ort.get('error', '')[:100]}")
+# Summary
+print("\n" + "=" * 70)
+print("FINAL SUMMARY")
+print("=" * 70)
+print(f"Decrypted dumps: {len(files)}")
+print(f"ONNX models: {len(models)}")
+print(f"  - onnx.load OK: {verified_onnx}")
+print(f"  - onnxruntime OK: {verified_ort}")
+print(f"Non-model data: {len(non_models)}")
+if models:
+    total = sum(m['onnx_size'] for m in models)
+    print(f"\nTotal ONNX model size: {total:,} bytes ({total/1024/1024:.1f} MB)")
+print(f"\nNon-model content:")
+for nm in non_models[:15]:
+    desc = nm.get('preview', nm.get('first_16', ''))[:50]
+    print(f"  {nm['src']}: {nm['size']:>10,} bytes | {nm['reason']} | {desc!r}")
+print(f"\n{'='*70}")
+print(f"CRYPTO PARAMS (CONFIRMED)")
+print(f"{'='*70}")
+print(f'Key:  kj)TGtrK>f]b[Piow.gU+nC@s""""""4  (32 bytes, raw)')
+print(f'IV:   Copyright @ OneO  (16 bytes)')
+print(f"Mode: AES-256-CFB (full block, BCrypt CNG)")
+print(f"Container: 8-byte header 4a1a082b25000000 per chunk")
+print(f"Model: ONNX protobuf + trailing metadata (trimmed)")

_archive/attempts/extract_strings.py ADDED Viewed

	@@ -0,0 +1,37 @@

+import re
+data = open(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.dll", "rb").read()
+all_strings = re.findall(b'[\x20-\x7e]{6,}', data)
+crypto_keywords = [b'crypt', b'aes', b'bcrypt', b'key', b'iv', b'cipher', b'cfb', b'hash',
+                   b'sha', b'magic', b'decomp', b'uncomp', b'compress', b'model', b'meta',
+                   b'onnx', b'ONNX', b'decrypt', b'encrypt', b'Crypto', b'init', b'blob',
+                   b'MAGIC', b'check', b'Check', b'fail', b'Fail', b'number']
+print(f"Total strings: {len(all_strings)}")
+print()
+print("=== Crypto/model-related strings ===")
+seen = set()
+for s in all_strings:
+    s_lower = s.lower()
+    for kw in crypto_keywords:
+        if kw.lower() in s_lower:
+            if s not in seen:
+                seen.add(s)
+                offset = data.find(s)
+                text = s.decode("ascii", errors="replace")
+                print(f"  [0x{offset:08x}] {text}")
+            break
+# Also look for wide strings (UTF-16LE) related to BCrypt
+print()
+print("=== Wide (UTF-16LE) strings ===")
+wide_strings = re.findall(b'(?:[\x20-\x7e]\x00){4,}', data)
+for ws in wide_strings:
+    decoded = ws.decode("utf-16-le", errors="replace")
+    d_lower = decoded.lower()
+    for kw in [b'crypt', b'aes', b'cfb', b'chain', b'algorithm', b'key', b'sha', b'hash']:
+        if kw.decode().lower() in d_lower:
+            offset = data.find(ws)
+            print(f"  [0x{offset:08x}] {decoded}")
+            break

_archive/attempts/find_offset.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""Map encrypted input bytes from hook to file offsets."""
+from pathlib import Path
+import struct
+data = Path("ocr_data/oneocr.onemodel").read_bytes()
+# Encrypted input first bytes from hook (call #, first 32 enc bytes hex, chunk_size)
+chunks_encrypted = [
+    (0, "2e0c10c7c967f66b6d03821271115ad6c19ca7d91b668e5c484018e02c9632b4", 22624),
+    (2, "f7d14a6dbd04af02b6de5e5454af59d007bb5c174e3b6be6a73513b995c7dc1a", 11920),
+    (4, "7bf021af201c559217035b95ebf758ff70c860f126c9c1529421bb2d75898bf9", 11553680),
+]
+print("Searching for encrypted chunk starts in file:")
+print(f"File size: {len(data):,}")
+print()
+prev_end = 0
+for call_num, hex_str, chunk_size in chunks_encrypted:
+    search_bytes = bytes.fromhex(hex_str[:16])  # First 8 bytes
+    idx = data.find(search_bytes)
+    if idx >= 0:
+        gap = idx - prev_end if prev_end > 0 else idx
+        print(f"  Call #{call_num}: offset {idx} ({idx:#x}), size={chunk_size:,}, gap={gap}")
+        print(f"    Range: [{idx:#x}, {idx+chunk_size:#x})")
+        prev_end = idx + chunk_size
+        full = bytes.fromhex(hex_str)
+        if data[idx:idx+len(full)] == full:
+            print(f"    32-byte match: OK")
+    else:
+        print(f"  Call #{call_num}: NOT FOUND")
+# File structure
+print(f"\n--- File structure ---")
+print(f"Offset 0: header_size = {struct.unpack_from('<I', data, 0)[0]}")
+print(f"Offset 4: {struct.unpack_from('<I', data, 4)[0]}")
+print(f"Offset 8-23: {data[8:24].hex()}")
+chunk1_end = 24 + 22624  # = 22648
+print(f"\nChunk 1 ends at offset {chunk1_end}")
+for o in range(22636, 22680, 4):
+    v = struct.unpack_from('<I', data, o)[0]
+    print(f"  offset {o}: {data[o:o+4].hex()} = uint32 {v} ({v:#x})")

_archive/attempts/frida_hook.py ADDED Viewed

	@@ -0,0 +1,328 @@

+"""
+Frida-based hooking of BCryptDecrypt in oneocr.dll to intercept decrypted ONNX models.
+Strategy:
+1. Load oneocr.dll in a child process
+2. Hook BCryptDecrypt in bcrypt.dll to capture decrypted output
+3. Call CreateOcrPipeline which triggers model decryption
+4. Save all decrypted buffers
+"""
+import frida
+import sys
+import os
+import struct
+import time
+import json
+import ctypes
+import subprocess
+from pathlib import Path
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR.mkdir(exist_ok=True)
+# JavaScript hook script for Frida
+FRIDA_SCRIPT = """
+'use strict';
+var MIN_SIZE = 100;
+var decryptCallNum = 0;
+// Hook BCryptDecrypt
+var bcryptDecrypt = Module.findExportByName('bcrypt.dll', 'BCryptDecrypt');
+if (bcryptDecrypt) {
+    Interceptor.attach(bcryptDecrypt, {
+        onEnter: function(args) {
+            this.pbInput = args[1];
+            this.cbInput = args[2].toInt32();
+            this.pbIV = args[4];
+            this.cbIV = args[5].toInt32();
+            this.pbOutput = args[6];
+            this.cbOutput = args[7].toInt32();
+            this.pcbResult = args[8];
+            this.dwFlags = args[9].toInt32();
+            this.callNum = decryptCallNum++;
+        },
+        onLeave: function(retval) {
+            var status = retval.toInt32();
+            var cbResult = 0;
+            try {
+                if (!this.pcbResult.isNull()) {
+                    cbResult = this.pcbResult.readU32();
+                }
+            } catch(e) {}
+            var info = {
+                call: this.callNum,
+                status: status,
+                inputSize: this.cbInput,
+                ivSize: this.cbIV,
+                outputSize: cbResult,
+                flags: this.dwFlags
+            };
+            if (this.cbIV > 0 && !this.pbIV.isNull()) {
+                try {
+                    info.iv = [];
+                    var ivBuf = this.pbIV.readByteArray(this.cbIV);
+                    var ivArr = new Uint8Array(ivBuf);
+                    for (var k = 0; k < ivArr.length; k++) info.iv.push(ivArr[k]);
+                } catch(e) {}
+            }
+            send({type: 'decrypt_call', info: info});
+            if (status === 0 && cbResult >= MIN_SIZE && !this.pbOutput.isNull()) {
+                try {
+                    var data = this.pbOutput.readByteArray(cbResult);
+                    send({type: 'decrypt_data', call: this.callNum, size: cbResult}, data);
+                } catch(e) {
+                    send({type: 'log', msg: 'Read output failed: ' + e});
+                }
+            }
+        }
+    });
+    send({type: 'log', msg: 'Hooked BCryptDecrypt at ' + bcryptDecrypt});
+} else {
+    send({type: 'log', msg: 'ERROR: BCryptDecrypt not found'});
+}
+// Hook BCryptGenerateSymmetricKey
+var bcryptGenKey = Module.findExportByName('bcrypt.dll', 'BCryptGenerateSymmetricKey');
+if (bcryptGenKey) {
+    Interceptor.attach(bcryptGenKey, {
+        onEnter: function(args) {
+            this.pbSecret = args[3];
+            this.cbSecret = args[4].toInt32();
+        },
+        onLeave: function(retval) {
+            if (retval.toInt32() === 0 && this.cbSecret > 0) {
+                try {
+                    var keyBuf = this.pbSecret.readByteArray(this.cbSecret);
+                    var keyArr = new Uint8Array(keyBuf);
+                    var arr = [];
+                    for (var i = 0; i < keyArr.length; i++) arr.push(keyArr[i]);
+                    send({type: 'key_generated', size: this.cbSecret, key: arr});
+                } catch(e) {}
+            }
+        }
+    });
+    send({type: 'log', msg: 'Hooked BCryptGenerateSymmetricKey'});
+}
+// Hook BCryptSetProperty
+var bcryptSetProp = Module.findExportByName('bcrypt.dll', 'BCryptSetProperty');
+if (bcryptSetProp) {
+    Interceptor.attach(bcryptSetProp, {
+        onEnter: function(args) {
+            try {
+                var propName = args[1].readUtf16String();
+                var cbInput = args[3].toInt32();
+                var propValue = null;
+                if (cbInput > 0 && cbInput < 256 && !args[2].isNull()) {
+                    try { propValue = args[2].readUtf16String(); } catch(e2) {}
+                }
+                send({type: 'set_property', name: propName, value: propValue, size: cbInput});
+            } catch(e) {}
+        }
+    });
+    send({type: 'log', msg: 'Hooked BCryptSetProperty'});
+}
+send({type: 'log', msg: 'All hooks installed. Ready.'});
+"""
+def create_loader_script():
+    """Create a small Python script that loads oneocr.dll and creates a pipeline."""
+    script = r'''
+import ctypes
+from ctypes import c_int64, c_char_p, POINTER, byref
+import time
+import sys
+import os
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# Load DLLs
+kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+kernel32.SetDllDirectoryW(DLL_DIR)
+dll = ctypes.WinDLL(os.path.join(DLL_DIR, "oneocr.dll"))
+# Setup function types
+dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+dll.CreateOcrInitOptions.restype = c_int64
+dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, ctypes.c_char]
+dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+dll.CreateOcrPipeline.restype = c_int64
+# Create init options
+init_options = c_int64()
+ret = dll.CreateOcrInitOptions(byref(init_options))
+print(f"LOADER: CreateOcrInitOptions -> {ret}", flush=True)
+assert ret == 0
+ret = dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+print(f"LOADER: SetUseModelDelayLoad -> {ret}", flush=True)
+assert ret == 0
+# Create pipeline (this triggers decryption!)
+pipeline = c_int64()
+model_buf = ctypes.create_string_buffer(MODEL_PATH.encode())
+key_buf = ctypes.create_string_buffer(KEY)
+print("LOADER: Creating OCR pipeline (triggers decryption)...", flush=True)
+ret = dll.CreateOcrPipeline(model_buf, key_buf, init_options, byref(pipeline))
+print(f"LOADER: CreateOcrPipeline returned {ret}, pipeline={pipeline.value}", flush=True)
+if ret != 0:
+    print(f"LOADER: ERROR - return code {ret}", flush=True)
+    sys.exit(1)
+print("LOADER: Pipeline created successfully! Waiting...", flush=True)
+time.sleep(5)
+print("LOADER: Done.", flush=True)
+'''
+    loader_path = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_loader.py")
+    loader_path.write_text(script)
+    return loader_path
+def on_message(message, data):
+    """Handle messages from Frida script."""
+    if message['type'] == 'send':
+        payload = message['payload']
+        msg_type = payload.get('type', '')
+        if msg_type == 'log':
+            print(f"[FRIDA] {payload['msg']}")
+        elif msg_type == 'decrypt_call':
+            info = payload['info']
+            iv_hex = ''
+            if 'iv' in info:
+                iv_hex = bytes(info['iv']).hex()
+            print(f"[DECRYPT #{info['call']}] status={info['status']} "
+                  f"in={info['inputSize']} out={info['outputSize']} "
+                  f"iv_size={info['ivSize']} iv={iv_hex[:32]}... flags={info['flags']}")
+        elif msg_type == 'decrypt_data':
+            call_num = payload['call']
+            size = payload['size']
+            fname = OUTPUT_DIR / f"decrypt_{call_num}_{size}bytes.bin"
+            fname.write_bytes(data)
+            # Check first 4 bytes for magic number
+            magic = struct.unpack('<I', data[:4])[0] if len(data) >= 4 else 0
+            first_16 = data[:16].hex() if data else ''
+            print(f"  -> Saved {fname.name} | magic={magic} | first_16={first_16}")
+            if magic == 1:
+                print(f"  *** MAGIC NUMBER == 1 FOUND! This is the decrypted model container! ***")
+        elif msg_type == 'key_generated':
+            key_bytes = bytes(payload['key'])
+            print(f"[KEY] size={payload['size']} key={key_bytes}")
+            try:
+                print(f"  ASCII: {key_bytes.decode('ascii', errors='replace')}")
+            except:
+                pass
+        elif msg_type == 'set_property':
+            print(f"[PROPERTY] {payload['name']} = {payload['value']} (size={payload['size']})")
+        elif msg_type == 'uncompress':
+            print(f"[UNCOMPRESS] sourceLen={payload['sourceLen']} -> destLen={payload['destLen']}")
+        elif msg_type == 'uncompress_data':
+            size = payload['size']
+            fname = OUTPUT_DIR / f"uncompressed_{size}bytes.bin"
+            fname.write_bytes(data)
+            first_32 = data[:32].hex() if data else ''
+            print(f"  -> Saved {fname.name} | first_32={first_32}")
+        elif msg_type == 'ort_export':
+            print(f"[ORT] {payload['name']} @ {payload['addr']}")
+        else:
+            print(f"[MSG] {payload}")
+    elif message['type'] == 'error':
+        print(f"[FRIDA ERROR] {message['description']}")
+        if 'stack' in message:
+            print(message['stack'])
+def main():
+    print("=" * 70)
+    print("FRIDA HOOKING: Intercepting OneOCR model decryption")
+    print("=" * 70)
+    # Create loader script
+    loader_path = create_loader_script()
+    print(f"Loader script: {loader_path}")
+    # Find Python executable in venv
+    venv_python = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\.venv\Scripts\python.exe")
+    if not venv_python.exists():
+        print("ERROR: Python venv not found")
+        sys.exit(1)
+    # Spawn the loader process
+    print(f"Spawning: {venv_python} {loader_path}")
+    pid = frida.spawn([str(venv_python), str(loader_path)])
+    print(f"Process spawned, PID={pid}")
+    session = frida.attach(pid)
+    print("Attached to process")
+    script = session.create_script(FRIDA_SCRIPT)
+    script.on('message', on_message)
+    script.load()
+    print("Script loaded, resuming process...")
+    frida.resume(pid)
+    # Wait for the process to finish
+    print("Waiting for process to complete...")
+    try:
+        # Wait up to 60 seconds
+        for _ in range(120):
+            time.sleep(0.5)
+            try:
+                # Check if process is still alive
+                session.is_detached
+            except:
+                break
+    except KeyboardInterrupt:
+        print("\nInterrupted by user")
+    except frida.InvalidOperationError:
+        print("Process terminated")
+    # Summary
+    print()
+    print("=" * 70)
+    print("RESULTS")
+    print("=" * 70)
+    if OUTPUT_DIR.exists():
+        files = sorted(OUTPUT_DIR.iterdir())
+        if files:
+            print(f"Dumped {len(files)} files:")
+            for f in files:
+                size = f.stat().st_size
+                print(f"  {f.name}: {size:,} bytes")
+                if size >= 4:
+                    header = open(f, 'rb').read(16)
+                    magic = struct.unpack('<I', header[:4])[0]
+                    print(f"    magic={magic}, first_16={header.hex()}")
+        else:
+            print("No files dumped.")
+    print("\nDone!")
+if __name__ == '__main__':
+    main()

_archive/attempts/frida_loader.py ADDED Viewed

	@@ -0,0 +1,50 @@

+import ctypes
+from ctypes import c_int64, c_char_p, POINTER, byref
+import time
+import sys
+import os
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# Load DLLs
+kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+kernel32.SetDllDirectoryW(DLL_DIR)
+dll = ctypes.WinDLL(os.path.join(DLL_DIR, "oneocr.dll"))
+# Setup function types
+dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+dll.CreateOcrInitOptions.restype = c_int64
+dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, ctypes.c_char]
+dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+dll.CreateOcrPipeline.restype = c_int64
+# Create init options
+init_options = c_int64()
+ret = dll.CreateOcrInitOptions(byref(init_options))
+print(f"LOADER: CreateOcrInitOptions -> {ret}", flush=True)
+assert ret == 0
+ret = dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+print(f"LOADER: SetUseModelDelayLoad -> {ret}", flush=True)
+assert ret == 0
+# Create pipeline (this triggers decryption!)
+pipeline = c_int64()
+model_buf = ctypes.create_string_buffer(MODEL_PATH.encode())
+key_buf = ctypes.create_string_buffer(KEY)
+print("LOADER: Creating OCR pipeline (triggers decryption)...", flush=True)
+ret = dll.CreateOcrPipeline(model_buf, key_buf, init_options, byref(pipeline))
+print(f"LOADER: CreateOcrPipeline returned {ret}, pipeline={pipeline.value}", flush=True)
+if ret != 0:
+    print(f"LOADER: ERROR - return code {ret}", flush=True)
+    sys.exit(1)
+print("LOADER: Pipeline created successfully! Waiting...", flush=True)
+time.sleep(5)
+print("LOADER: Done.", flush=True)

_archive/attempts/peek_header.py ADDED Viewed

	@@ -0,0 +1,92 @@

+import struct
+filepath = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data\oneocr.onemodel"
+with open(filepath, "rb") as f:
+    data = f.read(23000)  # read a bit more than 22636
+    f.seek(0, 2)
+    filesize = f.tell()
+print(f"File size: {filesize} bytes ({filesize/1024/1024:.2f} MB)")
+print()
+# Hex dump first 512 bytes
+print("=== First 512 bytes hex dump ===")
+for i in range(0, 512, 16):
+    hex_part = " ".join(f"{b:02x}" for b in data[i:i+16])
+    ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in data[i:i+16])
+    print(f"{i:08x}: {hex_part:<48s}  {ascii_part}")
+print()
+print("=== uint32 LE values at key offsets ===")
+for off in range(0, 64, 4):
+    val = struct.unpack_from("<I", data, off)[0]
+    print(f"  offset {off:4d} (0x{off:04x}): {val:12d} (0x{val:08x})")
+print()
+print("=== Check around offset 22636 (header size?) ===")
+off = 22636
+for i in range(off - 32, off + 64, 16):
+    hex_part = " ".join(f"{b:02x}" for b in data[i:i+16])
+    ascii_part = "".join(chr(b) if 32 <= b < 127 else "." for b in data[i:i+16])
+    print(f"{i:08x}: {hex_part:<48s}  {ascii_part}")
+print()
+print("=== Entropy analysis of header vs body ===")
+from collections import Counter
+header = data[:22636]
+body_sample = data[22636:22636+4096]
+h_counter = Counter(header)
+b_counter = Counter(body_sample)
+print(f"  Header unique bytes: {len(h_counter)}/256")
+print(f"  Body sample unique bytes: {len(b_counter)}/256")
+# Check for null bytes in header
+null_count = header.count(0)
+print(f"  Header null bytes: {null_count}/{len(header)} ({100*null_count/len(header):.1f}%)")
+# Look for patterns in header
+print()
+print("=== Looking for potential sub-structures in header ===")
+# Check if there are recognizable strings
+import re
+strings = re.findall(b'[\x20-\x7e]{4,}', header)
+if strings:
+    print("  ASCII strings found in header:")
+    for s in strings[:30]:
+        print(f"    {s.decode('ascii', errors='replace')}")
+else:
+    print("  No ASCII strings >= 4 chars found in header")
+# Check for potential magic numbers
+print()
+print("=== Magic number checks at offset 0 ===")
+print(f"  Bytes 0-3: {data[0:4].hex()}")
+print(f"  Bytes 0-7: {data[0:8].hex()}")
+print(f"  As string: {data[0:8]}")
+# Look for repeating 4-byte patterns
+print()
+print("=== Byte frequency in first 64 bytes ===")
+for i in range(64):
+    if i % 16 == 0:
+        print(f"  {i:3d}: ", end="")
+    print(f"{data[i]:3d}", end=" ")
+    if i % 16 == 15:
+        print()
+# Check if header has structure - look for uint32 values that could be offsets/sizes
+print()
+print("=== Potential offset/size table at start ===")
+for i in range(0, min(256, len(header)), 4):
+    val = struct.unpack_from("<I", data, i)[0]
+    if 0 < val < filesize:
+        print(f"  offset {i}: uint32={val} (could be offset/size, {val/1024:.1f}KB)")
+# Check byte patterns for IV detection
+print()
+print("=== 16-byte blocks that could be IV ===")
+for start in [4, 8, 12, 16, 20]:
+    block = data[start:start+16]
+    unique = len(set(block))
+    print(f"  offset {start:3d}: {block.hex()} (unique bytes: {unique}/16)")

_archive/attempts/static_decrypt.py ADDED Viewed

	@@ -0,0 +1,289 @@

+"""
+Static decryptor for OneOCR .onemodel files using BCrypt CNG API.
+Finds chunk boundaries by re-encrypting known plaintext patterns.
+Works on Windows only (BCrypt CNG). For Linux, use the hook-based approach.
+Usage: python static_decrypt.py [model_path] [-o output_dir]
+"""
+import ctypes
+import ctypes.wintypes as wt
+from ctypes import c_void_p, c_ulong, POINTER, byref
+import struct
+import sys
+import os
+from pathlib import Path
+# ═══════════════════════════════════════════════════════════════
+# CRYPTO PARAMETERS (discovered via IAT hook interception)
+# ═══════════════════════════════════════════════════════════════
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV  = b"Copyright @ OneO"
+CONTAINER_HEADER = bytes.fromhex("4a1a082b25000000")
+ONNX_VALID_FIELDS = {1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 20}
+# BCrypt constants
+BCRYPT_AES = "AES\0".encode('utf-16-le')
+BCRYPT_CHAINING_MODE = "ChainingMode\0".encode('utf-16-le')
+BCRYPT_CHAIN_MODE_CFB = "ChainingModeCFB\0".encode('utf-16-le')
+bcrypt = ctypes.windll.bcrypt
+class BCRYPT_KEY_DATA_BLOB_HEADER(ctypes.Structure):
+    _fields_ = [
+        ("dwMagic", c_ulong),
+        ("dwVersion", c_ulong),
+        ("cbKeyData", c_ulong),
+    ]
+def setup_bcrypt():
+    hAlg = c_void_p()
+    assert bcrypt.BCryptOpenAlgorithmProvider(byref(hAlg), BCRYPT_AES, None, 0) == 0
+    assert bcrypt.BCryptSetProperty(hAlg, BCRYPT_CHAINING_MODE,
+                                     BCRYPT_CHAIN_MODE_CFB, len(BCRYPT_CHAIN_MODE_CFB), 0) == 0
+    header = BCRYPT_KEY_DATA_BLOB_HEADER(dwMagic=0x4d42444b, dwVersion=1, cbKeyData=len(KEY))
+    blob = bytes(header) + KEY
+    hKey = c_void_p()
+    assert bcrypt.BCryptGenerateSymmetricKey(hAlg, byref(hKey), None, 0, blob, len(blob), 0) == 0
+    return hAlg, hKey
+def bcrypt_op(hKey, data, encrypt=False):
+    """Encrypt or decrypt data using BCrypt AES-CFB with fresh IV."""
+    iv = bytearray(IV)
+    func = bcrypt.BCryptEncrypt if encrypt else bcrypt.BCryptDecrypt
+    result_size = c_ulong(0)
+    func(hKey, data, len(data), None, None, 0, None, 0, byref(result_size), 0)
+    output = (ctypes.c_ubyte * result_size.value)()
+    actual = c_ulong(0)
+    status = func(hKey, data, len(data), None,
+                  (ctypes.c_ubyte * len(iv))(*iv), len(iv),
+                  output, result_size.value, byref(actual), 0)
+    assert status == 0, f"BCrypt op failed: {status:#x}"
+    return bytes(output[:actual.value])
+def read_varint(data, pos):
+    val = 0; shift = 0
+    while pos < len(data):
+        b = data[pos]; pos += 1
+        val |= (b & 0x7f) << shift
+        if not (b & 0x80): break
+        shift += 7
+    return val, pos
+def measure_onnx(data):
+    pos = 0; last = 0
+    while pos < len(data):
+        start = pos
+        tag, pos = read_varint(data, pos)
+        if pos > len(data): break
+        fn = tag >> 3; wt = tag & 7
+        if fn not in ONNX_VALID_FIELDS: return start
+        if wt == 0: _, pos = read_varint(data, pos)
+        elif wt == 1: pos += 8
+        elif wt == 2: l, pos = read_varint(data, pos); pos += l
+        elif wt == 5: pos += 4
+        else: return start
+        if pos > len(data): return start
+        last = pos
+    return last
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description="OneOCR .onemodel decryptor (Windows BCrypt)")
+    parser.add_argument("model_path", nargs="?", default="ocr_data/oneocr.onemodel")
+    parser.add_argument("-o", "--output", default="onnx_models_static")
+    args = parser.parse_args()
+    model_path = Path(args.model_path)
+    output_dir = Path(args.output)
+    output_dir.mkdir(exist_ok=True, parents=True)
+    for old in output_dir.glob("*"): old.unlink()
+    data = model_path.read_bytes()
+    print(f"{'='*70}")
+    print(f"OneOCR Static Decryptor (BCrypt CNG)")
+    print(f"{'='*70}")
+    print(f"File: {model_path} ({len(data):,} bytes)")
+    hAlg, hKey = setup_bcrypt()
+    print(f"AES-256-CFB initialized")
+    # Step 1: Decrypt DX index (offset 24, size 22624)
+    dx_offset = 24
+    dx_size = 22624
+    dx_dec = bcrypt_op(hKey, data[dx_offset:dx_offset + dx_size])
+    print(f"\nDX index: starts with {dx_dec[:2].hex()}")
+    assert dx_dec[:2] == b'DX', f"DX header not found! Got: {dx_dec[:8].hex()}"
+    (output_dir / "dx_index.bin").write_bytes(dx_dec)
+    # Step 2: Parse DX to find embedded chunks
+    # DX contains sub-chunks that need independent decryption
+    # We'll also find main payload chunks by scanning the file
+    # The DX contains a list of uint64 values that might be chunk sizes/offsets
+    dx_values = []
+    for i in range(0, len(dx_dec) - 7, 8):
+        v = struct.unpack_from('<Q', dx_dec, i)[0]
+        if v > 0 and v < len(data):
+            dx_values.append((i, v))
+    # Step 3: Try to decrypt every possible chunk in the payload area
+    # Payload starts after DX (offset 22648) + 36 bytes gap = 22684
+    payload_start = dx_offset + dx_size + 36
+    print(f"\n--- Scanning payload for encrypted chunks ---")
+    print(f"Payload starts at offset {payload_start}")
+    # Strategy: try decrypting at current offset, check if result starts
+    # with container magic. If yes, extract chunk, determine its size
+    # from the DX index or by scanning forward.
+    # Known chunk sizes from the DX index analysis:
+    # We know the DX has entries like 11943, 11903, 11927 etc.
+    # And the main payload has large ONNX models.
+    # Let's try a different approach: scan the encrypted file for positions
+    # where decryption produces valid container magic
+    print(f"\nSearching for chunk boundaries by trial decryption...")
+    # The container magic `4a1a082b25000000` after decryption = specific encrypted pattern
+    # Compute what the container magic encrypts TO:
+    magic_encrypted = bcrypt_op(hKey, CONTAINER_HEADER, encrypt=True)
+    print(f"Container magic encrypted: {magic_encrypted.hex()}")
+    # Search for this pattern in the payload area
+    chunk_starts = []
+    search_start = payload_start
+    # Also check DX sub-chunks
+    # First, find container magic encryptions within the DX encrypted data
+    while search_start < len(data) - 16:
+        idx = data.find(magic_encrypted[:8], search_start)
+        if idx < 0:
+            break
+        # Verify by decrypting 16 bytes
+        test = bcrypt_op(hKey, data[idx:idx+16])
+        if test[:8] == CONTAINER_HEADER:
+            chunk_starts.append(idx)
+            search_start = idx + 1
+        else:
+            search_start = idx + 1
+    print(f"Found {len(chunk_starts)} potential chunk starts")
+    if not chunk_starts:
+        # Fallback: just try sequential decryption
+        print("No chunk starts found via magic pattern. Trying sequential...")
+        # Try decrypting from payload_start with large block sizes
+        remaining = len(data) - payload_start
+        dec = bcrypt_op(hKey, data[payload_start:payload_start + remaining])
+        # Find container magic in decrypted data
+        pos = 0
+        chunks_data = []
+        while True:
+            idx = dec.find(CONTAINER_HEADER, pos)
+            if idx < 0:
+                # Handle remaining data
+                if pos < len(dec):
+                    chunks_data.append(dec[pos:])
+                break
+            if idx > pos:
+                chunks_data.append(dec[pos:idx])
+            pos = idx  # Will be split on next iteration
+            # Find next occurrence
+            next_idx = dec.find(CONTAINER_HEADER, pos + 8)
+            if next_idx < 0:
+                chunks_data.append(dec[pos:])
+                break
+            chunks_data.append(dec[pos:next_idx])
+            pos = next_idx
+        print(f"Found {len(chunks_data)} chunks in sequential decryption")
+    else:
+        # Decrypt each chunk
+        chunk_starts.sort()
+        chunks_data = []
+        for i, start in enumerate(chunk_starts):
+            end = chunk_starts[i + 1] if i + 1 < len(chunk_starts) else len(data)
+            encrypted = data[start:end]
+            try:
+                dec = bcrypt_op(hKey, encrypted)
+                chunks_data.append(dec)
+            except:
+                pass
+    # Extract models from chunks
+    print(f"\n--- Extracting ONNX models ---")
+    models = []
+    data_files = []
+    for chunk in chunks_data:
+        if chunk[:8] == CONTAINER_HEADER:
+            payload = chunk[8:]
+        else:
+            payload = chunk
+        if len(payload) >= 2 and payload[0] == 0x08 and 1 <= payload[1] <= 12:
+            valid_len = measure_onnx(payload)
+            onnx_data = payload[:valid_len]
+            if valid_len < 100:  # Too small to be a real model
+                continue
+            producer = "unknown"
+            if b"PyTorch" in payload[:100]: producer = "pytorch"
+            elif b"onnx.quantize" in payload[:100]: producer = "onnx_quantize"
+            elif b"pytorch" in payload[:100]: producer = "pytorch_small"
+            ir = payload[1]
+            idx = len(models)
+            fname = f"model_{idx:02d}_ir{ir}_{producer}_{valid_len//1024}KB.onnx"
+            (output_dir / fname).write_bytes(onnx_data)
+            models.append({'name': fname, 'size': valid_len})
+            print(f"  ONNX: {fname} ({valid_len:,} bytes)")
+        elif len(payload) > 100:
+            preview = payload[:30].decode('utf-8', errors='replace')
+            idx = len(data_files)
+            fname = f"data_{idx:02d}_{len(payload)}B.bin"
+            (output_dir / fname).write_bytes(payload)
+            data_files.append({'name': fname, 'size': len(payload)})
+            print(f"  Data: {fname} ({len(payload):,} bytes) {preview[:30]!r}")
+    # Summary
+    print(f"\n{'='*70}")
+    print(f"EXTRACTION COMPLETE")
+    print(f"{'='*70}")
+    print(f"ONNX models: {len(models)}")
+    print(f"Data files:  {len(data_files)}")
+    if models:
+        total = sum(m['size'] for m in models)
+        print(f"Total ONNX: {total:,} bytes ({total/1024/1024:.1f} MB)")
+    # Verify
+    try:
+        import onnx
+        ok = sum(1 for m in models if not _try_load(onnx, output_dir / m['name']))
+        ok = 0
+        for m in models:
+            try:
+                onnx.load(str(output_dir / m['name']))
+                ok += 1
+            except:
+                pass
+        print(f"Verified with onnx.load: {ok}/{len(models)}")
+    except ImportError:
+        pass
+    bcrypt.BCryptDestroyKey(hKey)
+    bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+if __name__ == "__main__":
+    main()

_archive/attempts/verify_bcrypt.py ADDED Viewed

	@@ -0,0 +1,181 @@

+"""Verify BCrypt CNG setup - test raw key + different CFB segment sizes."""
+import ctypes
+from ctypes import c_void_p, c_ulong, byref
+from pathlib import Path
+import struct
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV  = b"Copyright @ OneO"
+bcrypt = ctypes.windll.bcrypt
+# Known plaintext (DX header from hook dump)
+dx_plain = bytes.fromhex("44580000000000005c58000000000000")
+# Known ciphertext (from file at offset 24, first 16 bytes)
+file_ct = bytes.fromhex("2e0c10c7c967f66b6d03821271115ad6")
+# Full file data
+file_data = Path("ocr_data/oneocr.onemodel").read_bytes()
+hook_dx = Path("frida_dump/decrypt_1_in22624_out22624.bin").read_bytes()
+print("=" * 70)
+print("BCrypt CNG CFB Segment Size Test")
+print("=" * 70)
+print(f"KEY:         {KEY}")
+print(f"IV:          {IV}")
+print(f"Expected PT: {dx_plain.hex()}")
+print(f"Expected CT: {file_ct.hex()}")
+print()
+def test_cfb(msg_block_length, use_blob=False):
+    """Test BCrypt AES-CFB with given MessageBlockLength."""
+    tag = "MBL={}".format("default" if msg_block_length is None else msg_block_length)
+    if use_blob:
+        tag += "+blob"
+    hAlg = c_void_p()
+    status = bcrypt.BCryptOpenAlgorithmProvider(
+        byref(hAlg), "AES\0".encode("utf-16-le"), None, 0
+    )
+    if status != 0:
+        print("  [{}] OpenAlgorithm failed: {:#010x}".format(tag, status))
+        return None
+    mode = "ChainingModeCFB\0".encode("utf-16-le")
+    status = bcrypt.BCryptSetProperty(
+        hAlg, "ChainingMode\0".encode("utf-16-le"), mode, len(mode), 0
+    )
+    if status != 0:
+        print("  [{}] SetProperty ChainingMode failed: {:#010x}".format(tag, status))
+        bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+        return None
+    if msg_block_length is not None:
+        mbl = c_ulong(msg_block_length)
+        status = bcrypt.BCryptSetProperty(
+            hAlg, "MessageBlockLength\0".encode("utf-16-le"),
+            byref(mbl), 4, 0
+        )
+        if status != 0:
+            print("  [{}] SetProperty MBL={} failed: {:#010x}".format(tag, msg_block_length, status))
+            bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+            return None
+    hKey = c_void_p()
+    if use_blob:
+        blob = struct.pack('<III', 0x4d42444b, 1, len(KEY)) + KEY
+        status = bcrypt.BCryptGenerateSymmetricKey(
+            hAlg, byref(hKey), None, 0, blob, len(blob), 0
+        )
+    else:
+        status = bcrypt.BCryptGenerateSymmetricKey(
+            hAlg, byref(hKey), None, 0, KEY, len(KEY), 0
+        )
+    if status != 0:
+        print("  [{}] GenerateSymmetricKey failed: {:#010x}".format(tag, status))
+        bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+        return None
+    # Encrypt test
+    iv_enc = bytearray(IV)
+    result_size = c_ulong(0)
+    bcrypt.BCryptEncrypt(hKey, dx_plain, len(dx_plain), None, None, 0,
+                         None, 0, byref(result_size), 0)
+    output = (ctypes.c_ubyte * result_size.value)()
+    actual = c_ulong(0)
+    iv_buf = (ctypes.c_ubyte * 16)(*iv_enc)
+    bcrypt.BCryptEncrypt(hKey, dx_plain, len(dx_plain), None,
+                         iv_buf, 16, output, result_size.value, byref(actual), 0)
+    our_ct = bytes(output[:actual.value])
+    ct_match = our_ct[:16] == file_ct
+    # Decrypt test (fresh key)
+    hKey2 = c_void_p()
+    if use_blob:
+        blob = struct.pack('<III', 0x4d42444b, 1, len(KEY)) + KEY
+        bcrypt.BCryptGenerateSymmetricKey(hAlg, byref(hKey2), None, 0, blob, len(blob), 0)
+    else:
+        bcrypt.BCryptGenerateSymmetricKey(hAlg, byref(hKey2), None, 0, KEY, len(KEY), 0)
+    iv_dec = bytearray(IV)
+    encrypted_chunk = file_data[24:24 + 32]
+    result_size = c_ulong(0)
+    bcrypt.BCryptDecrypt(hKey2, encrypted_chunk, len(encrypted_chunk), None, None, 0,
+                         None, 0, byref(result_size), 0)
+    output2 = (ctypes.c_ubyte * result_size.value)()
+    iv_buf2 = (ctypes.c_ubyte * 16)(*iv_dec)
+    status = bcrypt.BCryptDecrypt(hKey2, encrypted_chunk, len(encrypted_chunk), None,
+                                  iv_buf2, 16, output2, result_size.value, byref(actual), 0)
+    our_pt = bytes(output2[:actual.value])
+    pt_match = our_pt[:2] == b"DX"
+    mark = "*** MATCH! ***" if ct_match else ""
+    print("  [{}] Enc->CT: {}  {} {}".format(tag, our_ct[:16].hex(), "OK" if ct_match else "FAIL", mark))
+    print("  [{}] Dec->PT: {}  {}".format(tag, our_pt[:16].hex(), "OK DX" if pt_match else "FAIL"))
+    bcrypt.BCryptDestroyKey(hKey)
+    bcrypt.BCryptDestroyKey(hKey2)
+    bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+    return ct_match
+print("--- Raw key (correct for BCryptGenerateSymmetricKey) ---")
+test_cfb(None)
+test_cfb(1)
+test_cfb(16)
+print()
+print("--- Blob key (has 12-byte header prepended - wrong) ---")
+test_cfb(None, use_blob=True)
+test_cfb(1, use_blob=True)
+test_cfb(16, use_blob=True)
+print()
+print("--- BCryptImportKey with BCRYPT_KEY_DATA_BLOB ---")
+for mbl in [None, 1, 16]:
+    tag = "Import+MBL={}".format("default" if mbl is None else mbl)
+    hAlg = c_void_p()
+    bcrypt.BCryptOpenAlgorithmProvider(byref(hAlg), "AES\0".encode("utf-16-le"), None, 0)
+    mode = "ChainingModeCFB\0".encode("utf-16-le")
+    bcrypt.BCryptSetProperty(hAlg, "ChainingMode\0".encode("utf-16-le"), mode, len(mode), 0)
+    if mbl is not None:
+        mbl_val = c_ulong(mbl)
+        bcrypt.BCryptSetProperty(hAlg, "MessageBlockLength\0".encode("utf-16-le"),
+                                 byref(mbl_val), 4, 0)
+    blob = struct.pack('<III', 0x4d42444b, 1, len(KEY)) + KEY
+    hKey = c_void_p()
+    status = bcrypt.BCryptImportKey(
+        hAlg, None, "KeyDataBlob\0".encode("utf-16-le"),
+        byref(hKey), None, 0, blob, len(blob), 0
+    )
+    if status != 0:
+        print("  [{}] ImportKey failed: {:#010x}".format(tag, status))
+        bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+        continue
+    iv_dec = bytearray(IV)
+    encrypted_chunk = file_data[24:24 + 32]
+    result_size = c_ulong(0)
+    bcrypt.BCryptDecrypt(hKey, encrypted_chunk, 32, None, None, 0,
+                         None, 0, byref(result_size), 0)
+    output = (ctypes.c_ubyte * result_size.value)()
+    actual = c_ulong(0)
+    iv_buf = (ctypes.c_ubyte * 16)(*iv_dec)
+    status = bcrypt.BCryptDecrypt(hKey, encrypted_chunk, 32, None,
+                                  iv_buf, 16, output, result_size.value, byref(actual), 0)
+    dec = bytes(output[:actual.value])
+    match = dec[:2] == b"DX"
+    mark = "*** MATCH! ***" if match else ""
+    print("  [{}] Decrypt: {}  {} {}".format(tag, dec[:16].hex(), "OK DX" if match else "FAIL", mark))
+    bcrypt.BCryptDestroyKey(hKey)
+    bcrypt.BCryptCloseAlgorithmProvider(hAlg, 0)
+print()
+print("=" * 70)
+print("If no method matched, need to hook BCryptSetProperty in the DLL")
+print("to discover ALL properties set before BCryptDecrypt is called.")

_archive/attempts/verify_key_derivation.py ADDED Viewed

	@@ -0,0 +1,98 @@

+"""
+Verify key derivation and analyze DX index structure.
+Proven scheme:
+  DX key = SHA256(master_key_32 + file[8:24])
+  AES-256-CFB128, IV = "Copyright @ OneO"
+"""
+import hashlib
+import struct
+from pathlib import Path
+from Crypto.Cipher import AES
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+IV  = b"Copyright @ OneO"
+file_data = Path("ocr_data/oneocr.onemodel").read_bytes()
+# Step 1: Derive DX key
+header_hash = file_data[8:24]
+derived_key = hashlib.sha256(KEY + header_hash).digest()
+print(f"DX derived key: {derived_key.hex()}")
+# Step 2: Decrypt DX index
+encrypted_dx = file_data[24:24 + 22624]
+cipher = AES.new(derived_key, AES.MODE_CFB, iv=IV, segment_size=128)
+dx = cipher.decrypt(encrypted_dx)
+assert dx[:2] == b"DX", "DX header mismatch!"
+valid_size = struct.unpack('<Q', dx[8:16])[0]
+print(f"DX valid size: {valid_size}, starts with DX: OK")
+# Step 3: Hex dump
+print(f"\nDX hex dump (first 512 bytes):")
+for i in range(0, min(512, len(dx)), 16):
+    chunk = dx[i:i+16]
+    hex_str = ' '.join(f'{b:02x}' for b in chunk)
+    ascii_str = ''.join(chr(b) if 32 <= b < 127 else '.' for b in chunk)
+    print(f"  {i:04x}: {hex_str:<48s} {ascii_str}")
+# Step 4: Search for known hash inputs from hook data
+print(f"\n--- Searching for hash input patterns in DX ---")
+patterns = {
+    "Chunk1(config)": "7f2e000000000000972e0000000000003fe51f12a6d7432577c9b6b367b1ff4d",
+    "Chunk2(encrypt)": "78000000000000009000000000000000",
+    "Chunk3(bigONNX)": "7f4bb00000000000974bb00000000000165e6ebce48ad4c5b45554019f6cefe8",
+    "Chunk4(ONNX)":    "5c000000000000007400000000000000",
+    "Chunk5(ONNX2)":   "63000000000000007b00000000000000",
+    "Chunk6(ONNX3)":   "69bf34000000000081bf340000000000c7ed80dc84ea4fc4a891feae316ccc8e",
+}
+for name, hex_pat in patterns.items():
+    target = bytes.fromhex(hex_pat)
+    pos = dx.find(target)
+    if pos >= 0:
+        print(f"  {name}: found at DX offset {pos} ({pos:#x})")
+    else:
+        print(f"  {name}: NOT found in DX (len={len(target)})")
+# Step 5: Analyze DX structure around container header magic
+magic = bytes.fromhex("4a1a082b25000000")
+print(f"\nContainer magic 4a1a082b25000000 locations:")
+pos = 0
+while True:
+    pos = dx.find(magic, pos)
+    if pos < 0:
+        break
+    # Read surrounding context
+    ctx = dx[pos:pos+40]
+    print(f"  offset {pos} ({pos:#x}): {ctx.hex()}")
+    pos += 1
+# Step 6: Parse DX as record-based structure
+# Looking at the structure:
+# Offset 0-7: "DX\x00\x00\x00\x00\x00\x00"
+# Offset 8-15: valid_size (uint64) = 22620
+# Offset 16-23: container magic = 4a1a082b25000000
+# Offset 24-31: uint64 = 0x2ea7 = 11943
+# Let's see what's after that
+print(f"\n--- DX parsed fields ---")
+off = 0
+print(f"  [{off}] Magic: {dx[off:off+8]}")
+off = 8
+print(f"  [{off}] ValidSize: {struct.unpack('<Q', dx[off:off+8])[0]}")
+off = 16
+print(f"  [{off}] ContainerMagic: {dx[off:off+8].hex()}")
+off = 24
+print(f"  [{off}] Value: {struct.unpack('<Q', dx[off:off+8])[0]}")
+off = 32
+# Look for uint64 pairs that were hash inputs
+# The 16-byte patterns are two uint64 LE values
+# The 32-byte patterns are two uint64 LE + 16-byte hash
+# Let me scan for all pairs of uint64 in DX and see structure
+# Save DX for manual analysis
+Path("temp").mkdir(exist_ok=True)
+Path("temp/dx_index_decrypted.bin").write_bytes(dx)
+print(f"\nSaved DX to temp/dx_index_decrypted.bin ({len(dx)} bytes)")

_archive/attempts/verify_models.py ADDED Viewed

	@@ -0,0 +1,228 @@

+"""Verify extracted .bin files as valid ONNX models."""
+import os
+import struct
+from pathlib import Path
+EXTRACT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\extracted_models")
+VERIFIED_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\verified_models")
+VERIFIED_DIR.mkdir(exist_ok=True)
+def try_parse_onnx_protobuf(data: bytes) -> dict | None:
+    """Try to parse the first few fields of an ONNX ModelProto protobuf."""
+    # ONNX ModelProto:
+    #   field 1 (varint) = ir_version
+    #   field 2 (len-delimited) = opset_import (repeated)
+    #   field 3 (len-delimited) = producer_name
+    #   field 4 (len-delimited) = producer_version
+    #   field 5 (len-delimited) = domain
+    #   field 6 (varint) = model_version
+    #   field 7 (len-delimited) = doc_string
+    #   field 8 (len-delimited) = graph (GraphProto)
+    if len(data) < 4:
+        return None
+    pos = 0
+    result = {}
+    try:
+        # Field 1: ir_version (varint, field tag = 0x08)
+        if data[pos] != 0x08:
+            return None
+        pos += 1
+        # Read varint
+        ir_version = 0
+        shift = 0
+        while pos < len(data):
+            b = data[pos]
+            pos += 1
+            ir_version |= (b & 0x7F) << shift
+            if not (b & 0x80):
+                break
+            shift += 7
+        if ir_version < 1 or ir_version > 12:
+            return None
+        result['ir_version'] = ir_version
+        # Next field - check tag
+        if pos >= len(data):
+            return None
+        tag = data[pos]
+        field_num = tag >> 3
+        wire_type = tag & 0x07
+        # We expect field 2 (opset_import, len-delimited, tag=0x12) or
+        # field 3 (producer_name, len-delimited, tag=0x1a)
+        if tag == 0x12:  # field 2, length-delimited
+            pos += 1
+            # Read length varint
+            length = 0
+            shift = 0
+            while pos < len(data):
+                b = data[pos]
+                pos += 1
+                length |= (b & 0x7F) << shift
+                if not (b & 0x80):
+                    break
+                shift += 7
+            if length > 0 and length < len(data):
+                result['has_opset_or_producer'] = True
+                result['next_field_len'] = length
+            else:
+                return None
+        elif tag == 0x1a:  # field 3, length-delimited
+            pos += 1
+            length = 0
+            shift = 0
+            while pos < len(data):
+                b = data[pos]
+                pos += 1
+                length |= (b & 0x7F) << shift
+                if not (b & 0x80):
+                    break
+                shift += 7
+            if length > 0 and length < 1000:
+                producer = data[pos:pos+length]
+                try:
+                    result['producer_name'] = producer.decode('utf-8', errors='strict')
+                except:
+                    result['producer_name'] = f"<binary {length}b>"
+            result['has_opset_or_producer'] = True
+        else:
+            return None
+        return result
+    except (IndexError, ValueError):
+        return None
+def check_onnx_with_lib(filepath: str) -> dict | None:
+    """Try loading with onnx library."""
+    try:
+        import onnx
+        model = onnx.load(filepath)
+        return {
+            'ir_version': model.ir_version,
+            'producer': model.producer_name,
+            'model_version': model.model_version,
+            'opset': [f"{o.domain or 'ai.onnx'}:{o.version}" for o in model.opset_import],
+            'graph_name': model.graph.name if model.graph else None,
+            'num_nodes': len(model.graph.node) if model.graph else 0,
+            'num_inputs': len(model.graph.input) if model.graph else 0,
+            'num_outputs': len(model.graph.output) if model.graph else 0,
+        }
+    except Exception as e:
+        return None
+# Phase 1: Quick protobuf header scan
+print("=" * 70)
+print("PHASE 1: Quick protobuf header scan")
+print("=" * 70)
+candidates = []
+files = sorted(EXTRACT_DIR.glob("*.bin"), key=lambda f: f.stat().st_size, reverse=True)
+print(f"Total files: {len(files)}")
+for f in files:
+    size = f.stat().st_size
+    if size < 1000:  # Skip tiny files
+        continue
+    with open(f, 'rb') as fh:
+        header = fh.read(256)
+    info = try_parse_onnx_protobuf(header)
+    if info and info.get('ir_version', 0) >= 3:
+        candidates.append((f, size, info))
+print(f"Candidates with valid ONNX protobuf header: {len(candidates)}")
+print()
+# Group by ir_version
+from collections import Counter
+ir_counts = Counter(c[2]['ir_version'] for c in candidates)
+print("IR version distribution:")
+for v, cnt in sorted(ir_counts.items()):
+    total_size = sum(c[1] for c in candidates if c[2]['ir_version'] == v)
+    print(f"  ir_version={v}: {cnt} files, total {total_size/1024/1024:.1f} MB")
+# Phase 2: Try onnx.load on top candidates (by size, unique sizes to avoid duplicates)
+print()
+print("=" * 70)
+print("PHASE 2: Verify with onnx library (top candidates by size)")
+print("=" * 70)
+# Take unique sizes - many files may be near-duplicates from overlapping memory
+seen_sizes = set()
+unique_candidates = []
+for f, size, info in candidates:
+    # Round to nearest 1KB to detect near-duplicates
+    size_key = size // 1024
+    if size_key not in seen_sizes:
+        seen_sizes.add(size_key)
+        unique_candidates.append((f, size, info))
+print(f"Unique-size candidates: {len(unique_candidates)}")
+print()
+verified = []
+for i, (f, size, info) in enumerate(unique_candidates[:50]):  # Check top 50 by size
+    result = check_onnx_with_lib(str(f))
+    if result:
+        verified.append((f, size, result))
+        print(f"  VALID ONNX: {f.name}")
+        print(f"    Size: {size/1024:.0f} KB")
+        print(f"    ir={result['ir_version']} producer='{result['producer']}' "
+              f"opset={result['opset']}")
+        print(f"    graph='{result['graph_name']}' nodes={result['num_nodes']} "
+              f"inputs={result['num_inputs']} outputs={result['num_outputs']}")
+        # Copy to verified dir
+        import shutil
+        dest_name = f"model_{len(verified):02d}_ir{result['ir_version']}_{result['graph_name'] or 'unknown'}_{size//1024}KB.onnx"
+        # Clean filename
+        dest_name = dest_name.replace('/', '_').replace('\\', '_').replace(':', '_')
+        dest = VERIFIED_DIR / dest_name
+        shutil.copy2(f, dest)
+        print(f"    -> Saved as {dest_name}")
+        print()
+if not verified:
+    print("  No files passed onnx.load validation in top 50.")
+    print()
+    # Try even more
+    print("  Trying ALL candidates...")
+    for i, (f, size, info) in enumerate(unique_candidates):
+        if i < 50:
+            continue
+        result = check_onnx_with_lib(str(f))
+        if result:
+            verified.append((f, size, result))
+            print(f"  VALID ONNX: {f.name}")
+            print(f"    Size: {size/1024:.0f} KB, ir={result['ir_version']}, "
+                  f"producer='{result['producer']}', nodes={result['num_nodes']}")
+            import shutil
+            dest_name = f"model_{len(verified):02d}_ir{result['ir_version']}_{result['graph_name'] or 'unknown'}_{size//1024}KB.onnx"
+            dest_name = dest_name.replace('/', '_').replace('\\', '_').replace(':', '_')
+            dest = VERIFIED_DIR / dest_name
+            shutil.copy2(f, dest)
+print()
+print("=" * 70)
+print(f"SUMMARY: {len(verified)} verified ONNX models out of {len(candidates)} candidates")
+print("=" * 70)
+if verified:
+    total_size = sum(v[1] for v in verified)
+    print(f"Total size: {total_size/1024/1024:.1f} MB")
+    for f, size, result in verified:
+        print(f"  {f.name}: {size/1024:.0f}KB, {result['num_nodes']} nodes, "
+              f"graph='{result['graph_name']}'")

_archive/brainstorm.md ADDED Viewed

	@@ -0,0 +1,355 @@

+# ⚡ Skill: Brainstorm
+> **Kategoria:** analysis | **Trudność:** advanced
+> **Tokens:** ~2500 | **Model:** any (zalecany: Claude / GPT-4+)
+> **Wersja:** 1.0.0 | **Utworzono:** 2026-02-10
+> **Komendy aktywacyjne:** `mały brainstorm` | `duży brainstorm`
+---
+## Kiedy używać
+Gdy potrzebujesz **dogłębnie przemyśleć** problem, pomysł, decyzję, strategię lub architekturę — zamiast od razu działać. Brainstorm to faza deliberatywna przed fazą wykonawczą.
+---
+## Tryby
+| Tryb | Komenda | Długość outputu | Zastosowanie |
+|------|---------|----------------|--------------|
+| 🟢 Mały | `mały brainstorm` | ~500 linii (~2-4 stron A4) | Szybkie przemyślenie tematu, decyzja, pros/cons |
+| 🔴 Duży | `duży brainstorm` | ~1000-2000 linii (~6-15 stron A4) | Głębokie planowanie, architektura, strategia, multi-dimensional analysis |
+---
+## Rola (System Prompt)
+<role>
+Jesteś **Strategic Brainstorm Architect** — ekspert od deliberatywnego myślenia, analizy wielowymiarowej i systematycznej ewaluacji pomysłów. Łączysz techniki **Chain-of-Thought** (krokowe rozumowanie), **Tree-of-Thought** (rozgałęziona eksploracja z backtrackingiem) oraz **kreatywną dywergencję** (generowanie nieoczywistych rozwiązań).
+**Twoja misja:** Nie odpowiadaj od razu — **MYŚL GŁĘBOKO**, eksploruj przestrzeń rozwiązań, oceniaj, eliminuj, syntetyzuj. Brainstorm to Twoja arena, a rezultatem jest treść, której user nie wygeneruje sam.
+**Kompetencje kluczowe:**
+- Wielowymiarowa analiza problemów (techniczne, biznesowe, ludzkie, czasowe)
+- Generowanie 5-15+ rozwiązań/podejść na każdy problem (dywergencja)
+- Krytyczna ewaluacja z użyciem skal, matryc i metryk (konwergencja)
+- Eksploracja repozytorium i kontekstu projektu nim zaczniesz myśleć
+- Identyfikacja ukrytych ryzyk, zależności i efektów drugiego rzędu
+- Synteza: wybór najlepszej opcji z jasnym uzasadnieniem "dlaczego"
+**Zasady pracy:**
+- 🔍 **Kontekst first** — ZANIM zaczniesz brainstorm: przeskanuj repozytorium, przeczytaj README, zrozum co user buduje, zbierz kontekst, czasem użuj narzędzia do ankiety i zapytaj usera
+- 🌐 **Szukaj w sieci** — jeśli masz dostęp do wyszukiwania, UŻYWAJ GO aktywnie. Sprawdzaj trendy, best practices, istniejące rozwiązania, benchmarki
+- 🧠 **Self-prompting** — zadawaj SOBIE pytania pomocnicze w trakcie myślenia: "Czego jeszcze nie rozważyłem?", "Jakie jest drugie dno?", "Co by powiedział ekspert od X?"
+- 🎨 **Uwolnij kreatywność** — generuj też rozwiązania niestandardowe, śmiałe, eksperymentalne — nawet jeśli ryzykowne
+- 📏 **Tablica prawdy** — wyznaczone przez usera ŚWIĘTE ZASADY (constraints) są ABSOLUTNE — nigdy ich nie łam
+- ⭐ **Oceniaj wszystko** — każde rozwiązanie/pomysł dostaje ocenę gwiazdkową 1-10
+- 🔄 **Iteruj** — wracaj do wcześniejszych pomysłów w świetle nowych odkryć (backtracking ToT)
+</role>
+---
+## Instrukcje
+<instructions>
+### 📋 Struktura Brainstormu (Output)
+Brainstorm generuje **2 pliki .md**:
+**Plik 1:** `BRAINSTORM_{TEMAT}.md` — pełny brainstorm (w `temp/brain_storm/`)
+**Plik 2:** `BRAINSTORM_{TEMAT}_SUMMARY.md` — podsumowanie + lista zadań (w `temp/brain_storm/`)
+---
+### FAZA 0: Zbieranie Kontekstu (OBOWIĄZKOWE)
+Zanim napiszesz choćby jeden nagłówek:
+1. **Przeskanuj repozytorium** — przeczytaj README, strukturę folderów, kluczowe pliki
+2. **Zrozum kontekst usera** — kim jest, co buduje, jaki ma cel (sprawdź knowledge/ jeśli istnieje)
+3. **Przeczytaj pliki powiązane z tematem** — jeśli brainstorm dotyczy kodu → przeczytaj kod; jeśli strategii → przeczytaj plany
+4. **Szukaj w sieci** (jeśli dostępne) — sprawdź trendy, istniejące rozwiązania, artykuły, benchmarki
+5. **Zidentyfikuj ŚWIĘTE ZASADY usera** — ograniczenia, które NIE podlegają dyskusji (constraints/non-negotiables)
+> 💡 **Self-prompt:** "Czy mam wystarczająco kontekstu? Czego mi brakuje? O co powinienem dopytać?"
+---
+### FAZA 1: Definicja Problemu i Tablicy Prawdy
+```markdown
+## 🎯 Definicja Problemu
+[Jasne, precyzyjne sformułowanie: CO dokładnie brainstormujemy i DLACZEGO]
+## 📐 Tablica Prawdy (Constraints)
+| # | Święta Zasada (Non-Negotiable) | Źródło | Status |
+|---|-------------------------------|--------|--------|
+| 1 | [zasada usera]                | user   | 🔒 ABSOLUTNA |
+| 2 | [zasada usera]                | user   | 🔒 ABSOLUTNA |
+| 3 | [zasada kontekstu]            | repo   | 🔒 ABSOLUTNA |
+> ⚠️ Każde rozwiązanie MUSI przejść test tablicy prawdy. Jeśli łamie choć jedną zasadę → ODRZUCONE.
+```
+---
+### FAZA 2: Dywergencja — Generowanie Pomysłów (Tree-of-Thought)
+Generuj **wiele** podejść/rozwiązań. Minimum:
+- 🟢 Mały brainstorm: **5-8 pomysłów**
+- 🔴 Duży brainstorm: **10-20+ pomysłów**
+Dla każdego pomysłu:
+```markdown
+### 💡 Pomysł X: [Nazwa]
+**Opis:** [2-5 zdań: na czym polega]
+**Mechanizm:** [Jak to działa / jak to zrealizować]
+**Mocne strony:** [Co jest genialne]
+**Słabe strony:** [Co może nie zagrać]
+**Ryzyko:** [Co może pójść nie tak]
+**Ocena:** ⭐⭐⭐⭐⭐⭐⭐⭐☆☆ (8/10)
+**Test tablicy prawdy:** ✅ Przeszedł / ❌ Narusza zasadę #X
+```
+> 💡 **Self-prompt w trakcie generowania:**
+> - "Jakie rozwiązanie zaproponowałby ktoś z zupełnie innej branży?"
+> - "Co jeśli odwrócę problem do góry nogami?"
+> - "Jakie podejście jest najbardziej ryzykowne, ale też najbardziej obiecujące?"
+> - "Czego bym NIE chciał tutaj zrobić — i dlaczego? Czy na pewno słusznie to wykluczam?"
+**Kategorie pomysłów do rozważenia:**
+- 🛡️ **Bezpieczne** — sprawdzone, niskie ryzyko, proven solutions
+- 🚀 **Ambitne** — wymagające, ale z dużym potencjałem
+- 🎲 **Eksperymentalne** — wildcard, innowacyjne, mogą nie zadziałać
+- 🤝 **Hybrydowe** — kombinacja kilku podejść
+---
+### FAZA 3: Konwergencja — Ewaluacja i Ranking (Chain-of-Thought)
+#### 3.1 Matryca Porównawcza
+```markdown
+## 📊 Matryca Porównawcza
+| Kryterium | Waga | Pomysł 1 | Pomysł 2 | Pomysł 3 | ... |
+|-----------|------|----------|----------|----------|-----|
+| Wykonalność | 25% | ⭐⭐⭐⭐⭐⭐⭐⭐☆☆ | ⭐⭐⭐⭐⭐⭐☆☆☆☆ | ... | ... |
+| ROI / Wartość | 25% | ⭐⭐⭐⭐⭐⭐⭐☆☆☆ | ⭐⭐⭐⭐⭐⭐⭐⭐⭐☆ | ... | ... |
+| Ryzyko (niższe=lepsze) | 20% | ⭐⭐⭐⭐⭐⭐⭐⭐☆☆ | ⭐⭐⭐⭐☆☆☆☆☆☆ | ... | ... |
+| Czas realizacji | 15% | ⭐⭐⭐⭐⭐⭐⭐☆☆☆ | ⭐⭐⭐⭐⭐⭐⭐⭐☆☆ | ... | ... |
+| Innowacyjność | 15% | ⭐⭐⭐⭐⭐☆☆☆☆☆ | ⭐⭐⭐⭐⭐⭐⭐⭐⭐☆ | ... | ... |
+| **SUMA WAŻONA** | 100% | **7.3** | **7.1** | ... | ... |
+```
+#### 3.2 Strategie Decyzyjne
+Zastosuj **minimum 3** strategie ewaluacji do zestawu pomysłów:
+| Strategia | Opis | Kiedy skuteczna |
+|-----------|------|-----------------|
+| **Eliminacja negatywna** | Odrzuć wszystko co łamie constraints → zobacz co zostaje | Gdy masz dużo opcji do filtrowania |
+| **Pareto 80/20** | Który pomysł daje 80% rezultatu za 20% wysiłku? | Gdy czas/zasoby są ograniczone |
+| **Premortum** | "Jest rok później, projekt się nie powiódł — DLACZEGO?" | Identyfikacja ukrytych ryzyk |
+| **10/10/10** | Jak oceniam tę decyzję za 10 minut / 10 miesięcy / 10 lat? | Decyzje strategiczne z długim horyzontem |
+| **Odwrócenie** | "Co by się stało gdybym wybrał NAJGORSZĄ opcję?" | Uświadamianie, że różnica między opcjami może być mała |
+| **First Principles** | Rozbij problem na fundamentalne prawdy → buduj od zera | Gdy istniejące rozwiązania nie pasują |
+| **Matryca Eisenhowera** | Pilne vs. Ważne → priorytety | Planowanie i roadmapa |
+| **Red Team / Devil's Advocate** | Aktywnie atakuj swoją najlepszą opcję — co jest w niej złe? | Walidacja przed finalną decyzją |
+---
+### FAZA 4: Deep Dive — Analiza Top 3 (tylko duży brainstorm)
+Dla **dużego brainstormu** — rozbudowana analiza 3 najlepszych pomysłów:
+```markdown
+## 🔬 Deep Dive: [Pomysł X]
+### Plan implementacji
+[Krok po kroku: co, jak, kiedy, kto]
+### Zależności
+[Co musi istnieć / być gotowe ZANIM to zrobimy]
+### Potencjalne problemy i mitygacja
+| Problem | Prawdopodobieństwo | Wpływ | Mitygacja |
+|---------|-------------------|-------|-----------|
+| [problem] | WYSOKIE/ŚREDNIE/NISKIE | KRYTYCZNY/ZNACZĄCY/MAŁY | [jak zapobiec] |
+### Zasoby wymagane
+[Czas, narzędzia, wiedza, ludzie]
+### Metryki sukcesu
+[Jak zmierzymy, że to działa?]
+```
+---
+### FAZA 5: Rozpoznanie Terenu — Dobre vs. Złe (Podział Kontekstowy)
+```markdown
+## ✅❌ Podział Kontekstowy
+### ✅ Potencjalnie DOBRE w tym kontekście
+| # | Co | Dlaczego dobre | Warunek sukcesu |
+|---|----|----------------|-----------------|
+| 1 | [element] | [uzasadnienie] | [co musi zaistnieć] |
+### ❌ Potencjalnie ZŁE w tym kontekście
+| # | Co | Dlaczego złe | Kiedy mogłoby zadziałać |
+|---|----|-------------|------------------------|
+| 1 | [element] | [uzasadnienie] | [inny kontekst] |
+### ⚠️ Zależy od kontekstu (może być dobre LUB złe)
+| # | Co | Kiedy dobre | Kiedy złe |
+|---|----|-------------|-----------|
+| 1 | [element] | [warunek A] | [warunek B] |
+```
+---
+### FAZA 6: Wybór Najlepszej Opcji (Final Verdict)
+```markdown
+## 🏆 REKOMENDACJA FINALNA
+### Wybrany pomysł: [Nazwa]
+**Ocena końcowa:** ⭐⭐⭐⭐⭐⭐⭐⭐⭐☆ (9/10)
+### Dlaczego ten?
+[3-5 zdań uzasadnienia — odwołuj się do matrycy, strategii i tablicy prawdy]
+### Dlaczego NIE pozostałe?
+[Krótko: co dyskwalifikuje top-2 i top-3]
+### Plan B (fallback)
+[Który pomysł jest backup'em i kiedy na niego przejść]
+```
+---
+### FAZA 7: Podsumowanie + Generowanie Pliku Summary
+Po napisaniu pełnego brainstormu — **STWÓRZ DRUGI PLIK**:
+**`BRAINSTORM_{TEMAT}_SUMMARY.md`** zawiera:
+```markdown
+# 📋 SUMMARY: [Temat]
+> **Źródło:** `BRAINSTORM_{TEMAT}.md`
+> **Data:** [data] | **Tryb:** [mały/duży]
+## TL;DR
+[3-5 zdań: problem → rekomendacja → dlaczego]
+## Rekomendacja
+[Wybrany pomysł + uzasadnienie]
+## Kluczowe Insights
+1. [Insight 1]
+2. [Insight 2]
+3. [Insight 3]
+## 📝 Lista Zadań (Actionable Steps)
+### Priorytet: 🔴 KRYTYCZNY
+- [ ] **Krok 1:** [Co dokładnie zrobić] → **Rezultat:** [co powinno powstać]
+- [ ] **Krok 2:** [Co dokładnie zrobić] → **Rezultat:** [co powinno powstać]
+### Priorytet: 🟡 WYSOKI
+- [ ] **Krok 3:** [Co dokładnie zrobić] → **Rezultat:** [co powinno powstać]
+- [ ] **Krok 4:** [Co dokładnie zrobić] → **Rezultat:** [co powinno powstać]
+### Priorytet: 🟢 NORMALNY
+- [ ] **Krok 5:** [Co dokładnie zrobić] → **Rezultat:** [co powinno powstać]
+## Ryzyka do monitorowania
+| Ryzyko | Trigger | Akcja |
+|--------|---------|-------|
+| [risk] | [kiedy reagować] | [co zrobić] |
+## Otwarte pytania
+- ❓ [Pytanie wymagające decyzji usera]
+```
+</instructions>
+---
+## Ograniczenia
+<constraints>
+**Absolutne zasady (łamanie = fail):**
+- ❌ **NIE pomijaj Fazy 0** (zbieranie kontekstu) — bez kontekstu brainstorm jest bezwartościowy
+- ❌ **NIE łam Tablicy Prawdy** — constraints usera są ŚWIĘTE
+- ❌ **NIE oceniaj bez uzasadnienia** — każda ocena gwiazdkowa musi mieć "dlaczego"
+- ❌ **NIE kończ bez Summary** — ZAWSZE generuj 2 pliki (brainstorm + summary)
+- ❌ **NIE generuj banalnych/oczywistych pomysłów** — twoja wartość to głębia, nie ilość
+**Best practices (zawsze stosowane):**
+- ✅ **Aktywnie szukaj w sieci** — jeśli masz narzędzia do wyszukiwania, UŻYWAJ ICH
+- ✅ **Self-prompting** — regularnie zadawaj sobie pytania naprowadzające
+- ✅ **Gwiazdki z uzasadnieniem** — ⭐ skala 1-10, ale ZAWSZE z komentarzem
+- ✅ **Minimum 3 strategie decyzyjne** na fazę konwergencji
+- ✅ **Emoji-driven structure** — użyj emoji jako wizualnych markerów sekcji
+- ✅ **Backtracking** — wracaj do wcześniejszych pomysłów, jeśli nowe informacje zmieniają ocenę
+- ✅ **Adaptuj kryteria** — dopasuj kryteria matrycy do konkretnego problemu (nie zawsze te same 5)
+</constraints>
+---
+## Skala Gwiazdkowa (Referencja)
+| Ocena | Gwiazdki | Znaczenie |
+|-------|----------|-----------|
+| 1/10 | ⭐☆☆☆☆☆☆☆☆☆ | Tragiczne — nie do użycia |
+| 2/10 | ⭐⭐☆☆☆☆☆☆☆☆ | Bardzo słabe — poważne wady |
+| 3/10 | ⭐⭐⭐☆☆☆☆☆☆☆ | Słabe — więcej wad niż zalet |
+| 4/10 | ⭐⭐⭐⭐☆☆☆☆☆☆ | Poniżej średniej — ryzykowne |
+| 5/10 | ⭐⭐⭐⭐⭐☆☆☆☆☆ | Średnie — OK ale nic specjalnego |
+| 6/10 | ⭐⭐⭐⭐⭐⭐☆☆☆☆ | Przyzwoite — potencjał jest |
+| 7/10 | ⭐⭐⭐⭐⭐⭐⭐☆☆☆ | Dobre — solidna opcja |
+| 8/10 | ⭐⭐⭐⭐⭐⭐⭐⭐☆☆ | Bardzo dobre — mocna rekomendacja |
+| 9/10 | ⭐⭐⭐⭐⭐⭐⭐⭐⭐☆ | Świetne — top tier |
+| 10/10 | ⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐ | Perfekcyjne — rzadkość, uzasadnij wyjątkowo |
+---
+## Przykład użycia
+**Input użytkownika:**
+```
+duży brainstorm: Jak zaprojektować system agentów AI do mojego repozytorium ProPrompts?
+```
+**Odpowiedź agenta:**
+```
+[Faza 0: Skanuje repozytorium, czyta README, MattyMroz.md, ZIP.md, istniejące agenty]
+[Faza 1: Definiuje problem, tworzy tablicę prawdy z constraints usera]
+[Faza 2: Generuje 12+ pomysłów z ocenami gwiazdkowymi]
+[Faza 3: Matryca porównawcza + 4 strategie decyzyjne]
+[Faza 4: Deep dive top 3 pomysłów]
+[Faza 5: Podział kontekstowy dobre/złe]
+[Faza 6: Finalna rekomendacja z uzasadnieniem]
+[Faza 7: Tworzy BRAINSTORM_SYSTEM_AGENTOW_SUMMARY.md z listą zadań]
+```
+---
+## Warianty
+- **Wariant A: Brainstorm techniczny** — focus na architekturze, kodzie, toolingu. Dodaj kryteria: performance, maintainability, scalability.
+- **Wariant B: Brainstorm strategiczny** — focus na biznesie, rynku, decyzjach. Dodaj kryteria: ROI, market fit, competitive advantage.
+- **Wariant C: Brainstorm kreatywny** — focus na pomysłach, naming, branding. Poluzuj rygory, maksymalizuj dywergencję (20+ pomysłów), używaj technik jak SCAMPER, lateral thinking.
+---
+## Changelog
+- **v1.0.0** [2026-02-10]: Pierwsza wersja skilla brainstorm — pełna struktura 7-fazowa z trybami mały/duży

_archive/crack_config.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""Crack the OneOCRFeatureExtract config blob — find the hidden weight matrix."""
+import onnx
+import numpy as np
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+# Load model_11
+model = onnx.load(str(list(models_dir.glob("model_11_*"))[0]))
+# Get feature/config blob
+config_blob = None
+for init in model.graph.initializer:
+    if init.name == "feature/config":
+        config_blob = bytes(init.string_data[0])
+        break
+print(f"Config blob size: {len(config_blob)} bytes")
+print(f"As float32 count: {len(config_blob) // 4} = {len(config_blob) / 4}")
+# Full float32 interpretation
+all_floats = np.frombuffer(config_blob, dtype=np.float32)
+print(f"\nFull blob as float32:")
+print(f"  Count: {len(all_floats)}")
+print(f"  Finite: {np.isfinite(all_floats).sum()}")
+print(f"  In [-10,10]: {np.sum(np.abs(all_floats) < 10)}")
+print(f"  Range: [{all_floats.min():.4f}, {all_floats.max():.4f}]")
+print(f"  Mean: {all_floats.mean():.4f}, Std: {all_floats.std():.4f}")
+print(f"  First 20: {all_floats[:20]}")
+# 4492 bytes / 4 = 1123 floats
+# Hypothesis: some header + 21×50 weight matrix + 50 bias
+# 1123 - 1050 - 50 = 23 extra floats (92 bytes header)
+# Try different header sizes
+for header_floats in range(0, 40):
+    remaining = len(all_floats) - header_floats
+    # Check if remaining = in_dim * out_dim + out_dim for some dimensions
+    for in_dim in [20, 21, 22]:
+        for out_dim in [48, 49, 50, 51, 52]:
+            needed = in_dim * out_dim + out_dim
+            if remaining == needed:
+                print(f"\n  *** MATCH: header={header_floats} ({header_floats*4}B) + "
+                      f"W[{in_dim}×{out_dim}] + b[{out_dim}] = {needed} floats")
+                W = all_floats[header_floats:header_floats + in_dim*out_dim].reshape(in_dim, out_dim)
+                b = all_floats[header_floats + in_dim*out_dim:header_floats + needed]
+                print(f"      W range: [{W.min():.4f}, {W.max():.4f}], mean={W.mean():.4f}")
+                print(f"      b range: [{b.min():.4f}, {b.max():.4f}], mean={b.mean():.4f}")
+                if header_floats > 0:
+                    header = all_floats[:header_floats]
+                    print(f"      Header values: {header}")
+# Also try: the blob might encode multiple layers
+# Or maybe it's quantized (int8/uint8)?
+print(f"\n--- Trying int8 interpretation ---")
+int8_arr = np.frombuffer(config_blob, dtype=np.int8)
+print(f"  int8 range: [{int8_arr.min()}, {int8_arr.max()}]")
+uint8_arr = np.frombuffer(config_blob, dtype=np.uint8)
+print(f"  uint8 range: [{uint8_arr.min()}, {uint8_arr.max()}]")
+# Maybe float16?
+if len(config_blob) % 2 == 0:
+    f16_arr = np.frombuffer(config_blob, dtype=np.float16)
+    finite_f16 = np.isfinite(f16_arr).sum()
+    print(f"  float16 count: {len(f16_arr)}, finite: {finite_f16}")
+    if finite_f16 > len(f16_arr) * 0.9:
+        print(f"  float16 could work! range=[{f16_arr[np.isfinite(f16_arr)].min():.4f}, {f16_arr[np.isfinite(f16_arr)].max():.4f}]")
+# Check the Slice in model_11 to understand input dimensions
+print(f"\n--- Checking Slice constants to understand feature extraction ---")
+for node in model.graph.node:
+    if node.op_type == "Constant":
+        for attr in node.attribute:
+            if attr.type == 4:  # TENSOR
+                t = attr.t
+                data = onnx.numpy_helper.to_array(t)
+                print(f"  Constant '{node.output[0]}': {data}")
+# Check Add and Div constants
+for node in model.graph.node:
+    if node.op_type in ("Add", "Div"):
+        print(f"\n  {node.op_type}: {list(node.input)} → {list(node.output)}")

_archive/crack_endian.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""Test big-endian float32 interpretation of OneOCRFeatureExtract config blob."""
+import onnx
+import numpy as np
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+model = onnx.load(str(list(models_dir.glob("model_11_*"))[0]))
+# Get config blob
+for init in model.graph.initializer:
+    if init.name == "feature/config":
+        blob = bytes(init.string_data[0])
+        break
+print(f"Blob: {len(blob)} bytes = {len(blob) // 4} float32s")
+# Big-endian float32
+be_arr = np.frombuffer(blob, dtype='>f4')  # big-endian
+le_arr = np.frombuffer(blob, dtype='<f4')  # little-endian
+print(f"\nBig-endian float32:")
+print(f"  Finite: {np.isfinite(be_arr).sum()} / {len(be_arr)}")
+in_range = np.sum(np.abs(be_arr[np.isfinite(be_arr)]) < 10)
+print(f"  In [-10,10]: {in_range} ({100*in_range/len(be_arr):.1f}%)")
+be_finite = be_arr[np.isfinite(be_arr)]
+print(f"  Mean: {be_finite.mean():.4f}, Std: {be_finite.std():.4f}")
+print(f"  Range: [{be_finite.min():.4f}, {be_finite.max():.4f}]")
+print(f"  First 20: {be_arr[:20]}")
+print(f"\nLittle-endian float32:")
+print(f"  Finite: {np.isfinite(le_arr).sum()} / {len(le_arr)}")
+in_range_le = np.sum(np.abs(le_arr[np.isfinite(le_arr)]) < 10)
+print(f"  In [-10,10]: {in_range_le} ({100*in_range_le/len(le_arr):.1f}%)")
+# If big-endian works, try to extract 21×50 weight matrix + 50 bias
+# 1123 total floats
+# Check feasible dimensions
+print(f"\n--- Dimension search for big-endian ---")
+for header in range(0, 40):
+    remaining = len(be_arr) - header
+    for in_d in [20, 21, 22]:
+        for out_d in [48, 49, 50, 51, 52]:
+            if remaining == in_d * out_d + out_d:
+                W = be_arr[header:header + in_d*out_d].reshape(in_d, out_d)
+                b = be_arr[header + in_d*out_d:]
+                w_finite = np.isfinite(W).sum()
+                w_reasonable = np.sum(np.abs(W[np.isfinite(W)]) < 10)
+                if w_reasonable > in_d * out_d * 0.7:
+                    print(f"  *** header={header} + W[{in_d}×{out_d}] + b[{out_d}]")
+                    print(f"      W finite={w_finite}, reasonable={w_reasonable}")
+                    print(f"      W range: [{W[np.isfinite(W)].min():.4f}, {W[np.isfinite(W)].max():.4f}]")
+                    print(f"      b range: [{b[np.isfinite(b)].min():.4f}, {b[np.isfinite(b)].max():.4f}]")
+# Also test: could be byteswapped structure with header
+# Try offset by checking where the "nice" values start
+print(f"\n--- Finding good float32 regions (big-endian) ---")
+for start_byte in range(0, 100, 4):
+    chunk = np.frombuffer(blob[start_byte:start_byte+84], dtype='>f4')
+    all_reasonable = all(np.isfinite(chunk)) and all(np.abs(chunk) < 10)
+    if all_reasonable:
+        print(f"  offset={start_byte}: ALL 21 values reasonable: {chunk}")
+        break
+    decent = np.sum((np.abs(chunk) < 10) & np.isfinite(chunk))
+    if decent >= 18:
+        print(f"  offset={start_byte}: {decent}/21 reasonable: {chunk}")

_archive/debug_detector.py ADDED Viewed

	@@ -0,0 +1,80 @@

+"""Debug detector output to understand word segmentation."""
+import numpy as np
+import onnxruntime as ort
+from PIL import Image
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+img = Image.open("image.png").convert("RGB")
+w, h = img.size
+# Detector setup
+sess = ort.InferenceSession(str(next(models_dir.glob("model_00_*"))),
+                            providers=["CPUExecutionProvider"])
+scale = 800 / max(h, w)
+dh = (int(h * scale) + 31) // 32 * 32
+dw = (int(w * scale) + 31) // 32 * 32
+img_d = np.array(img.resize((dw, dh), Image.LANCZOS), dtype=np.float32)
+img_d = img_d[:, :, ::-1] - np.array([102.9801, 115.9465, 122.7717], dtype=np.float32)
+data = img_d.transpose(2, 0, 1)[np.newaxis].astype(np.float32)
+im_info = np.array([[dh, dw, scale]], dtype=np.float32)
+outputs = sess.run(None, {"data": data, "im_info": im_info})
+output_names = [o.name for o in sess.get_outputs()]
+out_dict = dict(zip(output_names, outputs))
+# Analyze FPN2 (highest resolution)
+pixel_scores = out_dict["scores_hori_fpn2"][0, 0]  # [56, 200]
+link_scores = out_dict["link_scores_hori_fpn2"][0]  # [8, 56, 200]
+print(f"FPN2 shape: {pixel_scores.shape}")
+print(f"Pixel scores: min={pixel_scores.min():.4f} max={pixel_scores.max():.4f}")
+# Find text region
+text_mask = pixel_scores > 0.6
+print(f"Text pixels (>0.6): {text_mask.sum()}")
+# Get the row/column range of text pixels
+ys, xs = np.where(text_mask)
+if len(ys) > 0:
+    print(f"Text region: rows [{ys.min()}-{ys.max()}], cols [{xs.min()}-{xs.max()}]")
+    # Check link scores within text region - do they separate words?
+    # Link 2 is East neighbor (right), Link 6 is West neighbor (left)
+    # If link between words is low, they should separate
+    row_mid = (ys.min() + ys.max()) // 2
+    print(f"\nHorizontal link scores at row {row_mid} (East neighbor):")
+    link_east = link_scores[2, row_mid, :]  # E neighbor
+    for x in range(xs.min(), xs.max()+1):
+        ps = pixel_scores[row_mid, x]
+        le = link_east[x]
+        marker = "TEXT" if ps > 0.6 else "    "
+        link_marker = "LINK" if le > 0.5 else "gap "
+        if ps > 0.3:
+            print(f"  col={x:3d}: pixel={ps:.3f} [{marker}] east_link={le:.3f} [{link_marker}]")
+    # Also check if there are distinct "gap" regions in pixel scores
+    print(f"\nPixel scores along row {row_mid}:")
+    for x in range(max(0, xs.min()-2), min(pixel_scores.shape[1], xs.max()+3)):
+        ps = pixel_scores[row_mid, x]
+        bar = "█" * int(ps * 40)
+        print(f"  col={x:3d}: {ps:.3f} {bar}")
+# Try different thresholds
+for thresh in [0.5, 0.6, 0.7, 0.8, 0.9]:
+    mask = pixel_scores > thresh
+    n = mask.sum()
+    # Connected components using simple scan
+    from scipy import ndimage
+    try:
+        labels, n_comps = ndimage.label(mask)
+        print(f"\nThreshold {thresh}: {n} pixels, {n_comps} components")
+        for c in range(1, min(n_comps+1, 10)):
+            comp_mask = labels == c
+            area = comp_mask.sum()
+            ys_c, xs_c = np.where(comp_mask)
+            print(f"  Component {c}: area={area}, cols=[{xs_c.min()}-{xs_c.max()}]")
+    except ImportError:
+        # Fallback without scipy
+        print(f"Threshold {thresh}: {n} pixels")

_archive/decode_config.py ADDED Viewed

	@@ -0,0 +1,74 @@

+"""Decode OneOCRFeatureExtract config blob."""
+import onnx
+import numpy as np
+import struct
+from pathlib import Path
+m = onnx.load('oneocr_extracted/onnx_models/model_11_ir6_1.9_26KB.onnx')
+for init in m.graph.initializer:
+    if init.name == 'feature/config':
+        raw = init.string_data[0]
+        print(f'Total bytes: {len(raw)}')
+        print(f'First 100 bytes hex: {raw[:100].hex()}')
+        # Try different structure interpretations
+        for offset in [0, 4, 8, 12]:
+            vals = struct.unpack_from('<4f', raw, offset)
+            print(f'Offset {offset:3d} as 4xfloat32: {vals}')
+        # Parse rnn_info to find LogPrior values
+        rnn = Path('oneocr_extracted/config_data/chunk_36_rnn_info.rnn_info').read_text()
+        rnn_lines = rnn.strip().split('\n')
+        lp_count = int(rnn_lines[0].split()[-1])
+        print(f'\nLogPrior count from rnn_info: {lp_count}')
+        lp_val = float(rnn_lines[1])
+        print(f'LogPrior[0] = {lp_val}')
+        lp_f32 = struct.pack('<f', np.float32(lp_val))
+        lp_f64 = struct.pack('<d', lp_val)
+        pos_f32 = raw.find(lp_f32)
+        pos_f64 = raw.find(lp_f64)
+        print(f'LogPrior as float32 at pos: {pos_f32}')
+        print(f'LogPrior as float64 at pos: {pos_f64}')
+        # Just look at data structure sections
+        # Check for repeating patterns, zeros, etc.
+        arr_f32 = np.frombuffer(raw, dtype=np.float32)
+        # Find sections of "reasonable" float values
+        reasonable = (np.abs(arr_f32) < 20) & (arr_f32 != 0)
+        transitions = np.diff(reasonable.astype(int))
+        starts = np.where(transitions == 1)[0] + 1
+        ends = np.where(transitions == -1)[0] + 1
+        print(f'\nSections of reasonable float32 values:')
+        for s, e in zip(starts[:10], ends[:10]):
+            print(f'  [{s}:{e}] ({e-s} values) first: {arr_f32[s:s+3]}')
+        # Check if first few bytes are a header
+        header_ints = struct.unpack_from('<8I', raw, 0)
+        print(f'\nFirst 8 uint32: {header_ints}')
+        header_shorts = struct.unpack_from('<16H', raw, 0)
+        print(f'First 16 uint16: {header_shorts}')
+        # Maybe it's a rnn_info-like structure embedded
+        # The rnn_info has sections: <LogPrior>, <TransMat>, <LmSmall>/<LmMedium>
+        # Let's check the rnn_info structure fully
+        print('\n=== rnn_info structure ===')
+        section = None
+        counts = {}
+        for line in rnn_lines:
+            if line.startswith('<') and line.endswith('>'):
+                section = line
+            elif line.startswith('<') and '>' in line:
+                parts = line.strip().split()
+                section = parts[0].rstrip('>')+'>'
+                count = int(parts[-1]) if len(parts) > 1 else 0
+                counts[section] = count
+                print(f'Section: {section} count={count}')
+            else:
+                if section and section not in counts:
+                    counts[section] = 0
+        print(f'Sections found: {counts}')

_archive/dedup.py ADDED Viewed

	@@ -0,0 +1,687 @@

+"""Smart OCR deduplication — stabilization-first approach.
+Core principle: **don't read text until it STOPS CHANGING**.
+Then check against read history to avoid repeats.
+Architecture:
+    Phase 1 — **Snapshot Stabilization**
+        Each tick compares the full OCR output (all regions merged) with the
+        previous tick.  If text is growing (typewriter effect), we wait.
+        Only when the snapshot is identical for ``stabilize_ticks`` consecutive
+        ticks do we consider it "stable" and proceed.
+    Phase 2 — **Line History Dedup**
+        Once stable, each line is fuzzy-compared against a history of previously
+        emitted lines.  Only genuinely new lines pass through.  History entries
+        expire via TTL so the same text can be re-read after a cooldown.
+    Phase 3 — **Significance Check**
+        Rejects composed output that is too short, has too few real words,
+        or is mostly non-alphanumeric (OCR garbage / UI artifacts).
+This naturally handles:
+- **Typewriter effects**: text grows → wait → stabilize → read complete sentence
+- **Static UI** (HP bars, names): stabilizes → read once → in history → skip
+- **OCR noise**: fuzzy matching tolerates minor variations
+- **Dialog changes**: snapshot changes → re-stabilize → emit new parts only
+- **Repeated dialog**: TTL expiry allows re-reading after cooldown
+Usage::
+    from src.services.ocr.dedup import SmartDedup
+    dedup = SmartDedup()
+    text = dedup.process(region_labels, ocr_results)
+    if text is not None:
+        translate_and_speak(text)
+"""
+from __future__ import annotations
+import time
+from collections import deque
+from dataclasses import dataclass
+from difflib import SequenceMatcher
+from src.services.ocr.models import OcrResult
+from src.utils.logger import logger
+# ── Constants (sensible defaults) ────────────────────────────────
+DEFAULT_STABILIZE_TICKS: int = 3
+DEFAULT_SNAPSHOT_SIMILARITY: float = 0.92
+DEFAULT_LINE_SIMILARITY: float = 0.80
+DEFAULT_LINE_TTL: float = 120.0
+DEFAULT_HISTORY_TTL: float = 90.0
+DEFAULT_HISTORY_SIZE: int = 30
+DEFAULT_MIN_NEW_CHARS: int = 8
+DEFAULT_MIN_NEW_WORDS: int = 2
+DEFAULT_MIN_ALNUM_RATIO: float = 0.35
+# ── Data classes ─────────────────────────────────────────────────
+@dataclass
+class HistoryEntry:
+    """An entry in the global text history ring buffer."""
+    norm_text: str
+    original_text: str
+    first_seen: float
+    last_seen: float
+    hit_count: int = 1
+@dataclass
+class DedupConfig:
+    """All tunable knobs for the dedup system.
+    Attributes:
+        stabilize_ticks: Consecutive identical ticks before text is considered "stable".
+        snapshot_similarity: Fuzzy threshold for treating two snapshots as identical (0-1).
+        line_similarity: Fuzzy threshold for line-level history matching (0-1).
+        line_ttl: Seconds before a known line in history expires.
+        history_ttl: Seconds before a global history entry expires.
+        history_size: Max entries in the global history ring buffer.
+        history_similarity: Alias for line_similarity (backward compat with bridge.py).
+        min_new_chars: Minimum characters for a change to be significant.
+        min_new_words: Minimum word count for significance.
+        min_alnum_ratio: Minimum alphanumeric ratio for significance.
+        debounce_time: Legacy field — not used internally, kept for bridge compat.
+    """
+    stabilize_ticks: int = DEFAULT_STABILIZE_TICKS
+    snapshot_similarity: float = DEFAULT_SNAPSHOT_SIMILARITY
+    line_similarity: float = DEFAULT_LINE_SIMILARITY
+    line_ttl: float = DEFAULT_LINE_TTL
+    history_ttl: float = DEFAULT_HISTORY_TTL
+    history_size: int = DEFAULT_HISTORY_SIZE
+    history_similarity: float = DEFAULT_LINE_SIMILARITY
+    min_new_chars: int = DEFAULT_MIN_NEW_CHARS
+    min_new_words: int = DEFAULT_MIN_NEW_WORDS
+    min_alnum_ratio: float = DEFAULT_MIN_ALNUM_RATIO
+    debounce_time: float = 0.0  # legacy — mapped to stabilize_ticks externally
+    instant_mode: bool = False  # skip stabilization — emit text on first identical tick
+# ── Helpers ──────────────────────────────────────────────────────
+def _normalize(text: str) -> str:
+    """Collapse whitespace, strip, lowercase — for comparison only."""
+    return " ".join(text.split()).strip().lower()
+# ── Line History ─────────────────────────────────────────────────
+class LineHistory:
+    """Tracks previously emitted lines with TTL-based expiry.
+    Each emitted line is stored (normalized) with a timestamp.
+    Old entries expire after ``ttl`` seconds, allowing re-reading.
+    Fuzzy matching handles OCR noise on short lines.
+    """
+    def __init__(
+        self,
+        ttl: float = DEFAULT_LINE_TTL,
+        similarity: float = DEFAULT_LINE_SIMILARITY,
+    ) -> None:
+        self._entries: dict[str, float] = {}  # norm_line → last_emitted_at
+        self._ttl = ttl
+        self._similarity = similarity
+    def is_known(self, line: str) -> bool:
+        """Check if a line was emitted recently (within TTL).
+        Uses exact match first, then fuzzy for short lines.
+        Args:
+            line: Raw (non-normalized) line text.
+        Returns:
+            True if line is in recent history (should be skipped).
+        """
+        norm = _normalize(line)
+        if len(norm) < 2:
+            return True  # too short → treat as known (skip garbage)
+        now = time.monotonic()
+        self._gc(now)
+        # Fast path: exact match
+        if norm in self._entries:
+            return True
+        # Slow path: fuzzy match (short lines where OCR noise matters)
+        if len(norm) < 60:
+            for key in self._entries:
+                if abs(len(norm) - len(key)) > max(5, len(key) * 0.25):
+                    continue
+                ratio = SequenceMatcher(None, norm, key).ratio()
+                if ratio >= self._similarity:
+                    return True
+        return False
+    def mark_emitted(self, line: str) -> None:
+        """Record a line as emitted."""
+        norm = _normalize(line)
+        if norm:
+            self._entries[norm] = time.monotonic()
+    def reset(self) -> None:
+        """Clear all history."""
+        self._entries.clear()
+    @property
+    def size(self) -> int:
+        return len(self._entries)
+    def _gc(self, now: float) -> None:
+        """Remove entries older than TTL."""
+        expired = [k for k, ts in self._entries.items() if now - ts > self._ttl]
+        for k in expired:
+            del self._entries[k]
+# ── Global Text History (ring buffer for full text blocks) ───────
+class GlobalTextHistory:
+    """Ring buffer of recently emitted text blocks with TTL.
+    Prevents the same composed text from being re-emitted within
+    the TTL window.  Uses fuzzy matching to handle OCR noise.
+    """
+    def __init__(
+        self,
+        max_size: int = DEFAULT_HISTORY_SIZE,
+        ttl: float = DEFAULT_HISTORY_TTL,
+        similarity: float = DEFAULT_LINE_SIMILARITY,
+    ) -> None:
+        self._entries: deque[HistoryEntry] = deque(maxlen=max_size)
+        self._ttl = ttl
+        self._similarity = similarity
+    def is_duplicate(self, text: str) -> tuple[bool, float]:
+        """Check whether text duplicates something in recent history.
+        Args:
+            text: Composed text block.
+        Returns:
+            ``(is_dup, best_similarity)``
+        """
+        now = time.monotonic()
+        norm = _normalize(text)
+        if not norm:
+            return (True, 1.0)
+        best_sim = 0.0
+        for entry in self._entries:
+            if now - entry.last_seen > self._ttl:
+                continue
+            if entry.norm_text == norm:
+                entry.last_seen = now
+                entry.hit_count += 1
+                return (True, 1.0)
+            ratio = SequenceMatcher(None, norm, entry.norm_text).ratio()
+            best_sim = max(best_sim, ratio)
+            if ratio >= self._similarity:
+                entry.last_seen = now
+                entry.hit_count += 1
+                return (True, ratio)
+        return (False, best_sim)
+    def add(self, text: str) -> None:
+        """Record a new text block in history."""
+        norm = _normalize(text)
+        now = time.monotonic()
+        self._entries.append(
+            HistoryEntry(
+                norm_text=norm,
+                original_text=text,
+                first_seen=now,
+                last_seen=now,
+            )
+        )
+    def reset(self) -> None:
+        self._entries.clear()
+    @property
+    def size(self) -> int:
+        return len(self._entries)
+# ── Significance Check ───────────────────────────────────────────
+class ChangeDetector:
+    """Decide whether new lines constitute a meaningful change.
+    Rejects very short text, too few words, or mostly non-alphanumeric content.
+    """
+    def __init__(
+        self,
+        min_chars: int = DEFAULT_MIN_NEW_CHARS,
+        min_words: int = DEFAULT_MIN_NEW_WORDS,
+        min_alnum_ratio: float = DEFAULT_MIN_ALNUM_RATIO,
+    ) -> None:
+        self._min_chars = min_chars
+        self._min_words = min_words
+        self._min_alnum_ratio = min_alnum_ratio
+    def is_significant(self, new_lines: list[str]) -> bool:
+        """Return True if the new lines represent real content, not OCR garbage."""
+        text = " ".join(line.strip() for line in new_lines).strip()
+        if len(text) < self._min_chars:
+            return False
+        words = text.split()
+        if len(words) < self._min_words:
+            return False
+        alnum = sum(1 for c in text if c.isalnum())
+        ratio = alnum / len(text) if text else 0
+        if ratio < self._min_alnum_ratio:
+            return False
+        return True
+# ── Main Facade: SmartDedup ──────────────────────────────────────
+class SmartDedup:
+    """Stabilization-first OCR deduplication.
+    Core algorithm:
+    1. Each tick: merge all OCR results into a single text snapshot
+    2. Compare snapshot with previous tick — growing? same? different?
+    3. When snapshot is identical for ``stabilize_ticks`` consecutive ticks → STABLE
+    4. Extract lines, filter against read history → emit only NEW lines
+    5. Significance check → reject OCR garbage
+    6. Add emitted lines to history, record in global ring buffer
+    This replaces the old per-line-tracker approach which caused:
+    - Sentence fragments (read partial text too early)
+    - Infinite silence (partial lines marked "known" too aggressively)
+    Example::
+        dedup = SmartDedup()
+        # On each pipeline tick:
+        text = dedup.process(region_labels, ocr_results)
+        if text is not None:
+            await translate_and_speak(text)
+        # On pipeline stop or config change:
+        dedup.reset()
+    """
+    def __init__(self, config: DedupConfig | None = None) -> None:
+        self._cfg = config or DedupConfig()
+        # Stabilization state
+        self._last_snapshot: str | None = None
+        self._last_raw: str | None = None
+        self._stable_count: int = 0
+        self._processed_snapshot: str | None = None
+        # Why: track last emitted text to detect post-emit growth
+        # (e.g. we emitted 2 lines, then lines 3-4 appear → continuation, not new text)
+        self._last_emitted_norm: str | None = None
+        # History layers
+        self._line_history = LineHistory(
+            ttl=self._cfg.line_ttl,
+            similarity=self._cfg.line_similarity,
+        )
+        self._global_history = GlobalTextHistory(
+            max_size=self._cfg.history_size,
+            ttl=self._cfg.history_ttl,
+            similarity=self._cfg.history_similarity,
+        )
+        self._change_detector = ChangeDetector(
+            min_chars=self._cfg.min_new_chars,
+            min_words=self._cfg.min_new_words,
+            min_alnum_ratio=self._cfg.min_alnum_ratio,
+        )
+    # ── Public API ───────────────────────────────────────────────
+    def process(
+        self,
+        region_labels: list[str],
+        ocr_results: list[OcrResult],
+        *,
+        force: bool = False,
+    ) -> str | None:
+        """Run stabilization-based dedup on multi-region OCR results.
+        Args:
+            region_labels: Label/ID for each region (for diagnostics).
+            ocr_results: OCR result per region (same order as labels).
+            force: If True, skip all dedup and return all text immediately.
+        Returns:
+            Text to translate + speak, or None if suppressed by dedup.
+        """
+        # ── Merge all regions into one snapshot ──
+        raw_parts: list[str] = []
+        for result in ocr_results:
+            if result.error or result.is_empty:
+                continue
+            text = result.text.strip()
+            if text:
+                raw_parts.append(text)
+        if not raw_parts:
+            return None
+        full_raw = "\n".join(raw_parts)
+        full_norm = _normalize(full_raw)
+        if not full_norm or len(full_norm) < 2:
+            return None
+        # ── Force read: bypass all dedup ──
+        if force:
+            self._global_history.add(full_raw)
+            self._mark_all_lines_known(full_raw)
+            self._last_snapshot = full_norm
+            self._last_raw = full_raw
+            self._processed_snapshot = full_norm
+            self._stable_count = 0
+            logger.info("Dedup: force read — emitting %d chars", len(full_raw))
+            return full_raw
+        # ── Phase 1: Stabilization check ──
+        if self._last_snapshot is None:
+            # First tick — record snapshot, wait for next
+            self._last_snapshot = full_norm
+            self._last_raw = full_raw
+            self._stable_count = 0
+            self._processed_snapshot = None
+            # Why: in instant mode, skip waiting — proceed on the very first tick
+            if not self._cfg.instant_mode:
+                return None
+        # Compare current snapshot with previous
+        snapshot_sim = self._snapshot_similarity(self._last_snapshot, full_norm)
+        if snapshot_sim >= self._cfg.snapshot_similarity:
+            # Same (or very similar due to OCR noise) → count toward stability
+            self._stable_count += 1
+        elif self._is_text_growing(self._last_snapshot, full_norm):
+            # Text is expanding (typewriter effect) → reset, keep waiting
+            self._stable_count = 0
+            self._last_snapshot = full_norm
+            self._last_raw = full_raw
+            self._processed_snapshot = None
+            logger.debug("Dedup: text growing, waiting for stabilization")
+            return None
+        elif (
+            self._last_emitted_norm is not None
+            and self._is_text_growing(self._last_emitted_norm, full_norm)
+        ):
+            # Why: post-emit growth — we emitted lines 1-2, now lines 1-4 are visible.
+            # The new snapshot is a SUPERSET of what we emitted → continuation.
+            # Reset stability and wait for the full text to settle.
+            self._stable_count = 0
+            self._last_snapshot = full_norm
+            self._last_raw = full_raw
+            self._processed_snapshot = None
+            logger.debug("Dedup: post-emit growth detected, waiting for continuation")
+            return None
+        else:
+            # Completely different content → new text, start fresh
+            self._stable_count = 0
+            self._last_snapshot = full_norm
+            self._last_raw = full_raw
+            self._processed_snapshot = None
+            logger.debug("Dedup: snapshot changed, waiting for stabilization")
+            return None
+        # Update raw text (keep latest version even during stability counting)
+        self._last_snapshot = full_norm
+        self._last_raw = full_raw
+        # Not stable yet?
+        required_ticks = 1 if self._cfg.instant_mode else self._cfg.stabilize_ticks
+        if self._stable_count < required_ticks:
+            return None
+        # ── Already processed this exact snapshot? ──
+        if self._processed_snapshot is not None:
+            sim = self._snapshot_similarity(full_norm, self._processed_snapshot)
+            if sim >= self._cfg.snapshot_similarity:
+                return None  # already evaluated, nothing new
+        # ── Phase 2: Text is STABLE — extract new lines ──
+        all_lines = self._extract_lines(full_raw, ocr_results)
+        new_lines: list[str] = []
+        for line in all_lines:
+            if not self._line_history.is_known(line):
+                new_lines.append(line)
+        # Also check against global text history (full text block dedup)
+        if new_lines:
+            composed = "\n".join(new_lines)
+            is_dup, sim = self._global_history.is_duplicate(composed)
+            if is_dup:
+                logger.debug("Dedup: global history match (sim=%.3f)", sim)
+                new_lines = []
+        if not new_lines:
+            # All lines already known — mark snapshot as processed
+            self._processed_snapshot = full_norm
+            return None
+        # ── Phase 3: Significance check ──
+        if not self._change_detector.is_significant(new_lines):
+            logger.debug(
+                "Dedup: new lines not significant (%d lines, %d chars)",
+                len(new_lines),
+                sum(len(line) for line in new_lines),
+            )
+            self._processed_snapshot = full_norm
+            return None
+        # ── EMIT! ──
+        composed = "\n".join(new_lines)
+        self._mark_all_lines_known(composed)
+        self._global_history.add(composed)
+        self._processed_snapshot = full_norm
+        # Why: track what we emitted so we can detect post-emit growth
+        self._last_emitted_norm = full_norm
+        # Why: reset stable_count to prevent immediate re-emit on next tick
+        self._stable_count = 0
+        logger.info(
+            "Dedup: emitting %d new lines (%d chars, %d known lines in history)",
+            len(new_lines),
+            len(composed),
+            self._line_history.size,
+        )
+        return composed
+    def force_flush(self) -> str | None:
+        """Force-emit whatever raw text is pending (for force-read button)."""
+        if self._last_raw:
+            raw = self._last_raw
+            self._global_history.add(raw)
+            self._mark_all_lines_known(raw)
+            return raw
+        return None
+    def update_config(self, config: DedupConfig) -> None:
+        """Apply new configuration. Rebuilds internal components."""
+        self._cfg = config
+        self._line_history = LineHistory(
+            ttl=config.line_ttl,
+            similarity=config.line_similarity,
+        )
+        self._global_history = GlobalTextHistory(
+            max_size=config.history_size,
+            ttl=config.history_ttl,
+            similarity=config.history_similarity,
+        )
+        self._change_detector = ChangeDetector(
+            min_chars=config.min_new_chars,
+            min_words=config.min_new_words,
+            min_alnum_ratio=config.min_alnum_ratio,
+        )
+        logger.info("SmartDedup: config updated")
+    def reset(self) -> None:
+        """Clear all state (e.g. on scene change or pipeline restart)."""
+        self._last_snapshot = None
+        self._last_raw = None
+        self._stable_count = 0
+        self._processed_snapshot = None
+        self._last_emitted_norm = None
+        self._line_history.reset()
+        self._global_history.reset()
+        logger.info("SmartDedup: all state reset")
+    def reset_region(self, label: str) -> None:
+        """No-op in snapshot-based approach — kept for backward compat."""
+        pass
+    @property
+    def stats(self) -> dict[str, int]:
+        """Return diagnostic stats."""
+        return {
+            "tracked_regions": 0,
+            "total_known_lines": self._line_history.size,
+            "history_size": self._global_history.size,
+            "stable_count": self._stable_count,
+        }
+    # ── Internal ─────────────────────────────────────────────────
+    @staticmethod
+    def _snapshot_similarity(a: str, b: str) -> float:
+        """Fast similarity between two normalized snapshots."""
+        if a == b:
+            return 1.0
+        if not a or not b:
+            return 0.0
+        return SequenceMatcher(None, a, b).ratio()
+    @staticmethod
+    def _is_text_growing(old_norm: str, new_norm: str) -> bool:
+        """Check if new text is an expansion of old text (typewriter effect).
+        Returns True if new_norm is longer AND contains most of old_norm's
+        words at the beginning (prefix-like growth).
+        """
+        if len(new_norm) <= len(old_norm):
+            return False
+        # Simple prefix check — covers most typewriter cases
+        if new_norm.startswith(old_norm):
+            return True
+        # Word-level check: old words appear at the start of new word sequence
+        old_words = old_norm.split()
+        new_words = new_norm.split()
+        if len(new_words) <= len(old_words):
+            return False
+        # Count matching words at the beginning
+        matching = 0
+        for old_w, new_w in zip(old_words, new_words):
+            if old_w == new_w:
+                matching += 1
+            elif SequenceMatcher(None, old_w, new_w).ratio() > 0.8:
+                # Why: OCR noise may corrupt already-visible words slightly
+                matching += 1
+        # Why: 60% threshold — allows some OCR noise in the matching portion
+        return matching >= len(old_words) * 0.6
+    def _extract_lines(
+        self, raw_text: str, ocr_results: list[OcrResult]
+    ) -> list[str]:
+        """Extract individual lines from OCR results.
+        Prefers structured ``OcrResult.lines`` when available.
+        Deduplicates across regions (overlapping capture areas).
+        Args:
+            raw_text: Fallback raw text (used if no structured lines).
+            ocr_results: OCR results with structured lines.
+        Returns:
+            List of unique raw line texts.
+        """
+        lines: list[str] = []
+        seen_norms: set[str] = set()
+        for result in ocr_results:
+            if result.error or result.is_empty:
+                continue
+            for ocr_line in result.lines:
+                raw = ocr_line.text.strip()
+                if not raw:
+                    continue
+                norm = _normalize(raw)
+                if len(norm) < 2:
+                    continue
+                # Why: skip duplicate lines across regions (overlapping capture areas)
+                if norm in seen_norms:
+                    continue
+                # Fuzzy cross-region dedup for short lines
+                # Why: high threshold (0.95) because overlapping regions produce
+                # near-identical text, not merely similar text
+                is_cross_dup = False
+                if len(norm) < 60:
+                    for seen in seen_norms:
+                        if abs(len(norm) - len(seen)) > 3:
+                            continue
+                        if SequenceMatcher(None, norm, seen).ratio() >= 0.95:
+                            is_cross_dup = True
+                            break
+                if is_cross_dup:
+                    continue
+                seen_norms.add(norm)
+                lines.append(raw)
+        # Fallback: if no structured lines, split raw text
+        if not lines:
+            for line in raw_text.split("\n"):
+                stripped = line.strip()
+                if stripped and len(_normalize(stripped)) >= 2:
+                    norm = _normalize(stripped)
+                    if norm not in seen_norms:
+                        seen_norms.add(norm)
+                        lines.append(stripped)
+        return lines
+    def _mark_all_lines_known(self, text: str) -> None:
+        """Add all lines in text to line history."""
+        for line in text.split("\n"):
+            stripped = line.strip()
+            if stripped and len(_normalize(stripped)) >= 2:
+                self._line_history.mark_emitted(stripped)

_archive/dedup_old.py ADDED Viewed

	@@ -0,0 +1,595 @@

+"""Smart OCR deduplication — multi-layer heuristic to avoid re-reading the same text.
+Architecture (3 layers):
+    Layer 1 — **Per-Region Line Tracker**
+        Each capture region keeps a dict of known OCR lines (normalized text → metadata).
+        New OCR results are compared line-by-line; only genuinely new lines pass through.
+        Stale entries expire after ``line_ttl`` seconds.
+    Layer 2 — **Global Text History** (ring buffer)
+        After composing new lines into a text block, the block is fuzzy-matched against
+        a bounded history of recently emitted texts.  TTL-based expiry allows the same
+        dialog to be read again after a configurable cooldown.
+    Layer 3 — **Semantic Change Detector**
+        Rejects composed text that is too short, has too few real words, or is mostly
+        non-alphanumeric (OCR garbage / UI artifacts).
+    Debounce (optional)
+        When text grows incrementally (typewriter effect), the emitter waits for
+        stabilization before yielding the final text.
+Usage::
+    from src.services.ocr.dedup import SmartDedup
+    dedup = SmartDedup()
+    text = dedup.process(regions, ocr_results)
+    if text is not None:
+        translate_and_speak(text)
+"""
+from __future__ import annotations
+import time
+from collections import deque
+from dataclasses import dataclass
+from difflib import SequenceMatcher
+from src.services.ocr.models import OcrResult
+from src.utils.logger import logger
+# ── Constants (sensible defaults) ────────────────────────────────
+DEFAULT_LINE_TTL: float = 120.0
+DEFAULT_LINE_SIMILARITY: float = 0.80
+DEFAULT_HISTORY_SIZE: int = 30
+DEFAULT_HISTORY_TTL: float = 90.0
+DEFAULT_HISTORY_SIMILARITY: float = 0.82
+DEFAULT_MIN_NEW_CHARS: int = 8
+DEFAULT_MIN_NEW_WORDS: int = 2
+DEFAULT_MIN_ALNUM_RATIO: float = 0.35
+DEFAULT_DEBOUNCE_TIME: float = 0.0  # 0 = disabled
+# ── Data classes ─────────────────────────────────────────────────
+@dataclass
+class KnownLine:
+    """A line previously seen by a RegionLineTracker."""
+    text: str
+    first_seen: float
+    last_seen: float
+    hit_count: int = 1
+@dataclass
+class HistoryEntry:
+    """An entry in the global text history ring buffer."""
+    norm_text: str
+    original_text: str
+    first_seen: float
+    last_seen: float
+    hit_count: int = 1
+@dataclass
+class DedupConfig:
+    """All tunable knobs for the dedup system.
+    Attributes:
+        line_ttl: Seconds before a known line expires (Layer 1).
+        line_similarity: Fuzzy threshold for line-level dedup (0-1).
+        history_size: Max entries in global ring buffer (Layer 2).
+        history_ttl: Seconds before a global history entry expires.
+        history_similarity: Fuzzy threshold for global dedup (0-1).
+        min_new_chars: Minimum characters for a change to be significant (Layer 3).
+        min_new_words: Minimum word count for significance.
+        min_alnum_ratio: Minimum alphanumeric ratio for significance.
+        debounce_time: Seconds to wait for text stabilization (0 = off).
+    """
+    line_ttl: float = DEFAULT_LINE_TTL
+    line_similarity: float = DEFAULT_LINE_SIMILARITY
+    history_size: int = DEFAULT_HISTORY_SIZE
+    history_ttl: float = DEFAULT_HISTORY_TTL
+    history_similarity: float = DEFAULT_HISTORY_SIMILARITY
+    min_new_chars: int = DEFAULT_MIN_NEW_CHARS
+    min_new_words: int = DEFAULT_MIN_NEW_WORDS
+    min_alnum_ratio: float = DEFAULT_MIN_ALNUM_RATIO
+    debounce_time: float = DEFAULT_DEBOUNCE_TIME
+# ── Helpers ──────────────────────────────────────────────────────
+def _normalize(text: str) -> str:
+    """Collapse whitespace, strip, lowercase — for comparison only."""
+    return " ".join(text.split()).strip().lower()
+# ── Layer 1: Per-Region Line Tracker ─────────────────────────────
+class RegionLineTracker:
+    """Track known lines for a single capture region.
+    Lines already seen (exact or fuzzy match) are filtered out.
+    Entries expire after ``line_ttl`` seconds so the same text
+    can be re-read after a cooldown.
+    """
+    def __init__(
+        self,
+        similarity: float = DEFAULT_LINE_SIMILARITY,
+        line_ttl: float = DEFAULT_LINE_TTL,
+    ) -> None:
+        self._known: dict[str, KnownLine] = {}
+        self._similarity = similarity
+        self._line_ttl = line_ttl
+    def extract_new_lines(self, ocr_result: OcrResult) -> list[str]:
+        """Return only lines that are NOT already known.
+        Args:
+            ocr_result: OCR result with ``.lines`` populated.
+        Returns:
+            List of *original* (non-normalized) line texts that are new.
+        """
+        now = time.monotonic()
+        self._gc(now)
+        new_lines: list[str] = []
+        for line in ocr_result.lines:
+            raw = line.text.strip()
+            if not raw:
+                continue
+            norm = _normalize(raw)
+            if len(norm) < 2:
+                continue
+            # Fast path: exact match
+            if norm in self._known:
+                self._known[norm].last_seen = now
+                self._known[norm].hit_count += 1
+                continue
+            # Slow path: fuzzy match (only short texts where OCR noise matters)
+            matched = False
+            if len(norm) < 60:
+                for key, entry in self._known.items():
+                    # Skip candidates with very different length
+                    if abs(len(norm) - len(key)) > max(5, len(key) * 0.2):
+                        continue
+                    ratio = SequenceMatcher(None, norm, key).ratio()
+                    if ratio >= self._similarity:
+                        entry.last_seen = now
+                        entry.hit_count += 1
+                        matched = True
+                        break
+            if not matched:
+                self._known[norm] = KnownLine(
+                    text=norm, first_seen=now, last_seen=now
+                )
+                new_lines.append(raw)
+        return new_lines
+    def reset(self) -> None:
+        """Clear all known lines (e.g. on scene change)."""
+        self._known.clear()
+    @property
+    def known_count(self) -> int:
+        """Number of tracked lines."""
+        return len(self._known)
+    def _gc(self, now: float) -> None:
+        """Remove lines not seen for longer than TTL."""
+        expired = [
+            k for k, v in self._known.items() if now - v.last_seen > self._line_ttl
+        ]
+        for k in expired:
+            del self._known[k]
+# ── Layer 2: Global Text History ─────────────────────────────────
+class GlobalTextHistory:
+    """Ring buffer of recently emitted text blocks with TTL.
+    Prevents the same composed text from being processed twice
+    within the TTL window, even if it comes from different regions
+    or after a brief interruption.
+    """
+    def __init__(
+        self,
+        max_size: int = DEFAULT_HISTORY_SIZE,
+        ttl: float = DEFAULT_HISTORY_TTL,
+        similarity: float = DEFAULT_HISTORY_SIMILARITY,
+    ) -> None:
+        self._entries: deque[HistoryEntry] = deque(maxlen=max_size)
+        self._ttl = ttl
+        self._similarity = similarity
+    def is_duplicate(self, text: str) -> tuple[bool, float]:
+        """Check whether *text* duplicates something in recent history.
+        Args:
+            text: Composed text block (already new-line joined).
+        Returns:
+            ``(is_dup, best_similarity)`` — whether it matched and how closely.
+        """
+        now = time.monotonic()
+        norm = _normalize(text)
+        if not norm:
+            return (True, 1.0)  # empty → always "duplicate"
+        best_sim = 0.0
+        for entry in self._entries:
+            if now - entry.last_seen > self._ttl:
+                continue  # expired
+            # Fast path: identical normalized text
+            if entry.norm_text == norm:
+                entry.last_seen = now
+                entry.hit_count += 1
+                return (True, 1.0)
+            # Fuzzy path
+            ratio = SequenceMatcher(None, norm, entry.norm_text).ratio()
+            best_sim = max(best_sim, ratio)
+            if ratio >= self._similarity:
+                entry.last_seen = now
+                entry.hit_count += 1
+                return (True, ratio)
+        return (False, best_sim)
+    def add(self, text: str) -> None:
+        """Record a new text block in history."""
+        norm = _normalize(text)
+        now = time.monotonic()
+        self._entries.append(
+            HistoryEntry(
+                norm_text=norm,
+                original_text=text,
+                first_seen=now,
+                last_seen=now,
+            )
+        )
+    def reset(self) -> None:
+        """Clear all history entries."""
+        self._entries.clear()
+    @property
+    def size(self) -> int:
+        return len(self._entries)
+# ── Layer 3: Semantic Change Detector ────────────────────────────
+class ChangeDetector:
+    """Decide whether a set of new lines constitutes a meaningful change.
+    Rejects:
+    - Very short text (< ``min_chars`` printable characters)
+    - Too few words (< ``min_words``)
+    - Mostly non-alphanumeric (ratio < ``min_alnum_ratio``)
+    """
+    def __init__(
+        self,
+        min_chars: int = DEFAULT_MIN_NEW_CHARS,
+        min_words: int = DEFAULT_MIN_NEW_WORDS,
+        min_alnum_ratio: float = DEFAULT_MIN_ALNUM_RATIO,
+    ) -> None:
+        self._min_chars = min_chars
+        self._min_words = min_words
+        self._min_alnum_ratio = min_alnum_ratio
+    def is_significant(self, new_lines: list[str]) -> bool:
+        """Return ``True`` if the new lines represent a real content change."""
+        text = " ".join(line.strip() for line in new_lines).strip()
+        if len(text) < self._min_chars:
+            return False
+        words = text.split()
+        if len(words) < self._min_words:
+            return False
+        alnum = sum(1 for c in text if c.isalnum())
+        ratio = alnum / len(text) if text else 0
+        if ratio < self._min_alnum_ratio:
+            return False
+        return True
+# ── Debounce Emitter ─────────────────────────────────────────────
+class DebouncedEmitter:
+    """Buffer text and only yield it after stabilization.
+    Useful for typewriter-effect dialogs where text appears incrementally.
+    If ``stabilize_time`` is 0, debouncing is disabled (pass-through).
+    """
+    def __init__(self, stabilize_time: float = DEFAULT_DEBOUNCE_TIME) -> None:
+        self._stabilize = stabilize_time
+        self._pending: str | None = None
+        self._pending_since: float = 0.0
+    def feed(self, text: str) -> str | None:
+        """Feed new text.  Returns the text once it has been stable long enough.
+        Args:
+            text: The candidate text to emit.
+        Returns:
+            The stabilized text, or ``None`` if still waiting.
+        """
+        if self._stabilize <= 0:
+            return text  # debounce disabled → immediate
+        now = time.monotonic()
+        if self._pending is None or _normalize(text) != _normalize(self._pending):
+            # New or changed text → reset timer
+            self._pending = text
+            self._pending_since = now
+            return None
+        # Text unchanged — check if stable long enough
+        if now - self._pending_since >= self._stabilize:
+            result = self._pending
+            self._pending = None
+            return result
+        return None  # still waiting
+    def flush(self) -> str | None:
+        """Force-emit whatever is pending (used on pipeline stop / force-read)."""
+        result = self._pending
+        self._pending = None
+        return result
+    def reset(self) -> None:
+        """Discard pending text."""
+        self._pending = None
+# ── Cross-Region Dedup Pool ──────────────────────────────────────
+class CrossRegionPool:
+    """Tracks lines across regions within a single tick to prevent cross-region duplication.
+    Within a single pipeline tick, if region A already yielded line X,
+    region B should skip it.
+    """
+    def __init__(self, similarity: float = DEFAULT_LINE_SIMILARITY) -> None:
+        self._seen: dict[str, str] = {}  # norm → original
+        self._similarity = similarity
+    def is_seen(self, line: str) -> bool:
+        """Check if this line was already yielded by another region this tick."""
+        norm = _normalize(line)
+        if not norm:
+            return True
+        # Exact
+        if norm in self._seen:
+            return True
+        # Fuzzy (short lines only)
+        if len(norm) < 60:
+            for key in self._seen:
+                if abs(len(norm) - len(key)) > max(4, len(key) * 0.2):
+                    continue
+                if SequenceMatcher(None, norm, key).ratio() >= self._similarity:
+                    return True
+        return False
+    def mark(self, line: str) -> None:
+        """Record a line as yielded this tick."""
+        norm = _normalize(line)
+        if norm:
+            self._seen[norm] = line
+    def clear(self) -> None:
+        """Reset for next tick."""
+        self._seen.clear()
+# ── Main Facade: SmartDedup ──────────────────────────────────────
+class SmartDedup:
+    """Three-layer OCR deduplication with debounce and cross-region awareness.
+    Replaces the old single-``_last_ocr_text`` comparison in ``bridge.py``.
+    Example::
+        dedup = SmartDedup()
+        # On each pipeline tick:
+        text = dedup.process(region_labels, ocr_results)
+        if text is not None:
+            await translate_and_speak(text)
+        # On pipeline stop or config change:
+        dedup.reset()
+    """
+    def __init__(self, config: DedupConfig | None = None) -> None:
+        self._cfg = config or DedupConfig()
+        self._region_trackers: dict[str, RegionLineTracker] = {}
+        self._global_history = GlobalTextHistory(
+            max_size=self._cfg.history_size,
+            ttl=self._cfg.history_ttl,
+            similarity=self._cfg.history_similarity,
+        )
+        self._change_detector = ChangeDetector(
+            min_chars=self._cfg.min_new_chars,
+            min_words=self._cfg.min_new_words,
+            min_alnum_ratio=self._cfg.min_alnum_ratio,
+        )
+        self._debouncer = DebouncedEmitter(stabilize_time=self._cfg.debounce_time)
+        self._cross_pool = CrossRegionPool(similarity=self._cfg.line_similarity)
+    # ── Public API ───────────────────────────────────────────────
+    def process(
+        self,
+        region_labels: list[str],
+        ocr_results: list[OcrResult],
+        *,
+        force: bool = False,
+    ) -> str | None:
+        """Run all dedup layers on multi-region OCR results.
+        Args:
+            region_labels: Label/ID for each region (used as tracker key).
+            ocr_results: OCR result per region (same order as labels).
+            force: If ``True``, skip all dedup and return all text.
+        Returns:
+            Text to translate + speak, or ``None`` if dedup suppressed it.
+        """
+        if force:
+            texts = [r.text.strip() for r in ocr_results if r.text.strip()]
+            combined = "\n".join(texts) if texts else None
+            if combined:
+                self._global_history.add(combined)
+                # Also update region trackers so we don't double-read next tick
+                for label, result in zip(region_labels, ocr_results):
+                    tracker = self._get_tracker(label)
+                    tracker.extract_new_lines(result)  # just mark as known
+                flushed = self._debouncer.flush()
+            return combined
+        # Layer 1: Per-region line tracking + cross-region dedup
+        self._cross_pool.clear()
+        all_new_lines: list[str] = []
+        for label, result in zip(region_labels, ocr_results):
+            if result.error or result.is_empty:
+                continue
+            tracker = self._get_tracker(label)
+            region_new = tracker.extract_new_lines(result)
+            for line in region_new:
+                if not self._cross_pool.is_seen(line):
+                    self._cross_pool.mark(line)
+                    all_new_lines.append(line)
+        if not all_new_lines:
+            return None
+        # Layer 3: Semantic significance check
+        if not self._change_detector.is_significant(all_new_lines):
+            logger.debug(
+                "Dedup: new lines not significant (%d lines, %d chars)",
+                len(all_new_lines),
+                sum(len(l) for l in all_new_lines),
+            )
+            return None
+        composed = "\n".join(all_new_lines)
+        # Layer 2: Global history check
+        is_dup, sim = self._global_history.is_duplicate(composed)
+        if is_dup:
+            logger.debug("Dedup: global history match (sim=%.3f)", sim)
+            return None
+        # Debounce (typewriter effect protection)
+        stabilized = self._debouncer.feed(composed)
+        if stabilized is None:
+            logger.debug("Dedup: waiting for text stabilization")
+            return None
+        # ✅ New, significant, stabilized text — emit!
+        self._global_history.add(stabilized)
+        return stabilized
+    def force_flush(self) -> str | None:
+        """Force-emit any debounced pending text."""
+        pending = self._debouncer.flush()
+        if pending:
+            self._global_history.add(pending)
+        return pending
+    def update_config(self, config: DedupConfig) -> None:
+        """Apply new configuration. Recreates internal components."""
+        self._cfg = config
+        # Rebuild components with new settings
+        self._global_history = GlobalTextHistory(
+            max_size=config.history_size,
+            ttl=config.history_ttl,
+            similarity=config.history_similarity,
+        )
+        self._change_detector = ChangeDetector(
+            min_chars=config.min_new_chars,
+            min_words=config.min_new_words,
+            min_alnum_ratio=config.min_alnum_ratio,
+        )
+        self._debouncer = DebouncedEmitter(stabilize_time=config.debounce_time)
+        self._cross_pool = CrossRegionPool(similarity=config.line_similarity)
+        # Update existing region trackers
+        for tracker in self._region_trackers.values():
+            tracker._similarity = config.line_similarity
+            tracker._line_ttl = config.line_ttl
+    def reset(self) -> None:
+        """Clear all state (e.g. on scene change or pipeline restart)."""
+        for tracker in self._region_trackers.values():
+            tracker.reset()
+        self._global_history.reset()
+        self._debouncer.reset()
+        self._cross_pool.clear()
+        logger.info("SmartDedup: all layers reset")
+    def reset_region(self, label: str) -> None:
+        """Reset a specific region tracker."""
+        if label in self._region_trackers:
+            self._region_trackers[label].reset()
+    @property
+    def stats(self) -> dict[str, int]:
+        """Return diagnostic stats."""
+        return {
+            "tracked_regions": len(self._region_trackers),
+            "total_known_lines": sum(
+                t.known_count for t in self._region_trackers.values()
+            ),
+            "history_size": self._global_history.size,
+        }
+    # ── Internal ─────────────────────────────────────────────────
+    def _get_tracker(self, label: str) -> RegionLineTracker:
+        """Get or create a line tracker for the given region label."""
+        if label not in self._region_trackers:
+            self._region_trackers[label] = RegionLineTracker(
+                similarity=self._cfg.line_similarity,
+                line_ttl=self._cfg.line_ttl,
+            )
+        return self._region_trackers[label]

_archive/hooks/hook_decrypt.py ADDED Viewed

	@@ -0,0 +1,344 @@

+"""
+Hook BCryptDecrypt using ctypes in-process hooking via DLL detour.
+Instead of Frida, we directly hook BCryptDecrypt's IAT entry in oneocr.dll.
+"""
+import ctypes
+import ctypes.wintypes as wt
+from ctypes import (
+    c_int64, c_char_p, c_ubyte, POINTER, byref, Structure,
+    c_void_p, c_ulong, c_int32, WINFUNCTYPE, CFUNCTYPE, c_uint8
+)
+import os
+import sys
+import struct
+from pathlib import Path
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR.mkdir(exist_ok=True)
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# ── Globals to collect intercepted data ──
+intercepted_calls = []
+decrypt_call_num = 0
+# ── BCryptDecrypt signature ──
+# NTSTATUS BCryptDecrypt(BCRYPT_KEY_HANDLE, PUCHAR pbInput, ULONG cbInput,
+#   VOID* pPadding, PUCHAR pbIV, ULONG cbIV, PUCHAR pbOutput,
+#   ULONG cbOutput, ULONG* pcbResult, ULONG dwFlags)
+BCRYPT_DECRYPT_TYPE = WINFUNCTYPE(
+    c_ulong,      # NTSTATUS return
+    c_void_p,     # hKey
+    c_void_p,     # pbInput
+    c_ulong,      # cbInput
+    c_void_p,     # pPaddingInfo
+    c_void_p,     # pbIV
+    c_ulong,      # cbIV
+    c_void_p,     # pbOutput
+    c_ulong,      # cbOutput
+    POINTER(c_ulong),  # pcbResult
+    c_ulong,      # dwFlags
+)
+# Store original function
+original_bcrypt_decrypt = None
+def hooked_bcrypt_decrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                          pbOutput, cbOutput, pcbResult, dwFlags):
+    """Our hook that intercepts BCryptDecrypt calls."""
+    global decrypt_call_num
+    call_num = decrypt_call_num
+    decrypt_call_num += 1
+    # Read IV before the call (it may be modified)
+    iv_before = None
+    if pbIV and cbIV > 0:
+        try:
+            iv_before = ctypes.string_at(pbIV, cbIV)
+        except:
+            pass
+    # Read encrypted input BEFORE the call
+    encrypted_input = None
+    if pbInput and cbInput > 0:
+        try:
+            encrypted_input = ctypes.string_at(pbInput, min(cbInput, 64))
+        except:
+            pass
+    # Call original
+    status = original_bcrypt_decrypt(hKey, pbInput, cbInput, pPadding,
+                                     pbIV, cbIV, pbOutput, cbOutput,
+                                     pcbResult, dwFlags)
+    # Get result size
+    result_size = 0
+    if pcbResult:
+        result_size = pcbResult[0]
+    # Read IV after (CFB mode modifies the IV)
+    iv_after = None
+    if pbIV and cbIV > 0:
+        try:
+            iv_after = ctypes.string_at(pbIV, cbIV)
+        except:
+            pass
+    info = {
+        'call': call_num,
+        'status': status,
+        'cbInput': cbInput,
+        'cbIV': cbIV,
+        'cbOutput': result_size,
+        'dwFlags': dwFlags,
+        'iv_before': iv_before.hex() if iv_before else None,
+        'iv_after': iv_after.hex() if iv_after else None,
+    }
+    print(f"[BCryptDecrypt #{call_num}] status={status:#x} "
+          f"in={cbInput} out={result_size} iv_len={cbIV} flags={dwFlags}")
+    if encrypted_input:
+        print(f"  Encrypted input[:32]: {encrypted_input[:32].hex()}")
+        print(f"  pbInput addr: {pbInput:#x}")
+    if iv_before:
+        print(f"  IV before: {iv_before.hex()}")
+    if iv_after and iv_after != iv_before:
+        print(f"  IV after:  {iv_after.hex()}")
+    # Save decrypted data
+    if status == 0 and result_size > 0 and pbOutput:
+        try:
+            decrypted = ctypes.string_at(pbOutput, result_size)
+            # Check for magic number
+            if len(decrypted) >= 4:
+                magic = struct.unpack('<I', decrypted[:4])[0]
+                info['magic'] = magic
+                print(f"  Magic: {magic} | First 32 bytes: {decrypted[:32].hex()}")
+                if magic == 1:
+                    print(f"  *** MAGIC NUMBER == 1 FOUND! ***")
+            # Save to file
+            fname = OUTPUT_DIR / f"decrypt_{call_num}_in{cbInput}_out{result_size}.bin"
+            fname.write_bytes(decrypted)
+            print(f"  -> Saved: {fname.name} ({result_size:,} bytes)")
+        except Exception as e:
+            print(f"  Error reading output: {e}")
+    intercepted_calls.append(info)
+    return status
+def hook_iat(dll_handle, target_dll_name, target_func_name, hook_func):
+    """
+    Hook a function by patching the Import Address Table (IAT) of a DLL.
+    Returns the original function pointer.
+    """
+    import pefile
+    # Get the DLL file path
+    kernel32 = ctypes.windll.kernel32
+    buf = ctypes.create_unicode_buffer(260)
+    h = ctypes.c_void_p(dll_handle)
+    kernel32.GetModuleFileNameW(h, buf, 260)
+    dll_path = buf.value
+    print(f"Analyzing IAT of: {dll_path}")
+    pe = pefile.PE(dll_path)
+    # Find the import
+    base_addr = dll_handle
+    if hasattr(dll_handle, '_handle'):
+        base_addr = dll_handle._handle
+    for entry in pe.DIRECTORY_ENTRY_IMPORT:
+        import_name = entry.dll.decode('utf-8', errors='ignore').lower()
+        if target_dll_name.lower() not in import_name:
+            continue
+        for imp in entry.imports:
+            if imp.name and imp.name.decode('utf-8', errors='ignore') == target_func_name:
+                # Found it! The IAT entry is at base_addr + imp.address - pe.OPTIONAL_HEADER.ImageBase
+                iat_rva = imp.address - pe.OPTIONAL_HEADER.ImageBase
+                iat_addr = base_addr + iat_rva
+                print(f"Found {target_func_name} in IAT at RVA={iat_rva:#x}, "
+                      f"VA={iat_addr:#x}")
+                # Read current value (original function pointer)
+                original_ptr = ctypes.c_void_p()
+                ctypes.memmove(ctypes.byref(original_ptr), iat_addr, 8)
+                print(f"Original function pointer: {original_ptr.value:#x}")
+                # Create callback
+                callback = BCRYPT_DECRYPT_TYPE(hook_func)
+                callback_ptr = ctypes.cast(callback, c_void_p).value
+                # Make IAT page writable
+                old_protect = c_ulong()
+                PAGE_READWRITE = 0x04
+                kernel32.VirtualProtect(
+                    ctypes.c_void_p(iat_addr), 8,
+                    PAGE_READWRITE, ctypes.byref(old_protect)
+                )
+                # Patch IAT
+                new_ptr = ctypes.c_void_p(callback_ptr)
+                ctypes.memmove(iat_addr, ctypes.byref(new_ptr), 8)
+                # Restore protection
+                kernel32.VirtualProtect(
+                    ctypes.c_void_p(iat_addr), 8,
+                    old_protect.value, ctypes.byref(old_protect)
+                )
+                print(f"IAT patched! New function pointer: {callback_ptr:#x}")
+                # Create callable from original
+                original_func = BCRYPT_DECRYPT_TYPE(original_ptr.value)
+                pe.close()
+                return original_func, callback  # Return both to prevent GC
+    pe.close()
+    return None, None
+def main():
+    global original_bcrypt_decrypt
+    print("=" * 70)
+    print("IN-PROCESS BCryptDecrypt HOOKING")
+    print("=" * 70)
+    # Load DLL
+    kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+    kernel32.SetDllDirectoryW(DLL_DIR)
+    dll_path = os.path.join(DLL_DIR, "oneocr.dll")
+    print(f"Loading: {dll_path}")
+    dll = ctypes.WinDLL(dll_path)
+    # Setup function types
+    dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+    dll.CreateOcrInitOptions.restype = c_int64
+    dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, c_ubyte]
+    dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+    dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+    dll.CreateOcrPipeline.restype = c_int64
+    # Try approach 1: Direct BCryptDecrypt function pointer replacement
+    print("\n--- Setting up BCryptDecrypt hook ---")
+    # Get the real BCryptDecrypt
+    bcrypt_dll = ctypes.WinDLL("bcrypt")
+    real_decrypt_addr = ctypes.cast(
+        bcrypt_dll.BCryptDecrypt, c_void_p
+    ).value
+    print(f"Real BCryptDecrypt address: {real_decrypt_addr:#x}")
+    # Instead of IAT patching, let's use a simpler approach:
+    # We'll call BCryptDecrypt ourselves to first get a "sizing" call,
+    # then intercept the actual decrypt.
+    # Actually, the simplest approach: use a manual detour
+    # But let's try IAT patching first if pefile is available
+    try:
+        import pefile
+        print("pefile available, trying IAT hook...")
+        original_bcrypt_decrypt_func, callback_ref = hook_iat(
+            dll._handle, 'bcrypt', 'BCryptDecrypt', hooked_bcrypt_decrypt
+        )
+        if original_bcrypt_decrypt_func:
+            original_bcrypt_decrypt = original_bcrypt_decrypt_func
+            print("IAT hook installed successfully!")
+        else:
+            raise Exception("IAT hook failed - function not found in imports")
+    except ImportError:
+        print("pefile not available, installing...")
+        os.system("uv pip install pefile")
+        import pefile
+        original_bcrypt_decrypt_func, callback_ref = hook_iat(
+            dll._handle, 'bcrypt', 'BCryptDecrypt', hooked_bcrypt_decrypt
+        )
+        if original_bcrypt_decrypt_func:
+            original_bcrypt_decrypt = original_bcrypt_decrypt_func
+        else:
+            print("ERROR: Could not hook BCryptDecrypt")
+            return
+    # Now create the pipeline - this will trigger decryption via our hook
+    print("\n--- Creating OCR Pipeline (will trigger BCryptDecrypt) ---")
+    init_options = c_int64()
+    ret = dll.CreateOcrInitOptions(byref(init_options))
+    print(f"CreateOcrInitOptions: {ret}")
+    ret = dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+    print(f"SetUseModelDelayLoad: {ret}")
+    pipeline = c_int64()
+    model_buf = ctypes.create_string_buffer(MODEL_PATH.encode())
+    key_buf = ctypes.create_string_buffer(KEY)
+    print(f"\nCalling CreateOcrPipeline...")
+    print(f"Model: {MODEL_PATH}")
+    print(f"Key: {KEY}")
+    print()
+    ret = dll.CreateOcrPipeline(model_buf, key_buf, init_options, byref(pipeline))
+    print(f"\nCreateOcrPipeline returned: {ret}")
+    print(f"Pipeline handle: {pipeline.value}")
+    # Summary
+    print()
+    print("=" * 70)
+    print("SUMMARY")
+    print("=" * 70)
+    print(f"Total BCryptDecrypt calls intercepted: {len(intercepted_calls)}")
+    magic_1_files = []
+    for info in intercepted_calls:
+        if info.get('magic') == 1:
+            magic_1_files.append(info)
+    if magic_1_files:
+        print(f"\n*** Found {len(magic_1_files)} calls with magic_number == 1! ***")
+        for info in magic_1_files:
+            print(f"  Call #{info['call']}: input={info['cbInput']:,}, "
+                  f"output={info['cbOutput']:,}")
+    # List saved files
+    if OUTPUT_DIR.exists():
+        files = sorted(OUTPUT_DIR.glob("decrypt_*.bin"))
+        if files:
+            print(f"\nSaved {len(files)} decrypted buffers:")
+            total = 0
+            for f in files:
+                sz = f.stat().st_size
+                total += sz
+                header = open(f, 'rb').read(4)
+                magic = struct.unpack('<I', header)[0] if len(header) >= 4 else -1
+                marker = " *** MAGIC=1 ***" if magic == 1 else ""
+                print(f"  {f.name}: {sz:,} bytes (magic={magic}){marker}")
+            print(f"Total: {total:,} bytes ({total/1024/1024:.1f} MB)")
+    print("\nDone!")
+if __name__ == '__main__':
+    main()

_archive/hooks/hook_full_bcrypt.py ADDED Viewed

	@@ -0,0 +1,441 @@

+"""
+Extended BCrypt hook - intercepts ALL BCrypt functions to capture the full
+crypto setup: algorithm provider, properties, and actual key material.
+"""
+import ctypes
+import ctypes.wintypes as wt
+from ctypes import (
+    c_int64, c_char_p, c_ubyte, POINTER, byref, Structure,
+    c_void_p, c_ulong, c_int32, WINFUNCTYPE, CFUNCTYPE, c_uint8
+)
+import os
+import sys
+import struct
+from pathlib import Path
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR.mkdir(exist_ok=True)
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# ── Globals ──
+intercepted_bcrypt = []
+decrypt_call_num = 0
+# ── Function types ──
+# BCryptDecrypt
+BCRYPT_DECRYPT_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, c_void_p, c_ulong, c_void_p,
+    c_void_p, c_ulong, c_void_p, c_ulong, POINTER(c_ulong), c_ulong
+)
+# BCryptOpenAlgorithmProvider(phAlgorithm, pszAlgId, pszImplementation, dwFlags)
+BCRYPT_OPEN_ALG_TYPE = WINFUNCTYPE(
+    c_ulong, POINTER(c_void_p), c_void_p, c_void_p, c_ulong
+)
+# BCryptSetProperty(hObject, pszProperty, pbInput, cbInput, dwFlags)
+BCRYPT_SET_PROP_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, c_void_p, c_void_p, c_ulong, c_ulong
+)
+# BCryptGetProperty(hObject, pszProperty, pbOutput, cbOutput, pcbResult, dwFlags)
+BCRYPT_GET_PROP_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, c_void_p, c_void_p, c_ulong, POINTER(c_ulong), c_ulong
+)
+# BCryptGenerateSymmetricKey(hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+#                            pbSecret, cbSecret, dwFlags)
+BCRYPT_GEN_KEY_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, POINTER(c_void_p), c_void_p, c_ulong,
+    c_void_p, c_ulong, c_ulong
+)
+# BCryptImportKey(hAlgorithm, hImportKey, pszBlobType, phKey, pbKeyObject,
+#                 cbKeyObject, pbInput, cbInput, dwFlags)
+BCRYPT_IMPORT_KEY_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, c_void_p, c_void_p, POINTER(c_void_p),
+    c_void_p, c_ulong, c_void_p, c_ulong, c_ulong
+)
+# BCryptEncrypt - same signature as BCryptDecrypt
+BCRYPT_ENCRYPT_TYPE = WINFUNCTYPE(
+    c_ulong, c_void_p, c_void_p, c_ulong, c_void_p,
+    c_void_p, c_ulong, c_void_p, c_ulong, POINTER(c_ulong), c_ulong
+)
+# Store originals
+orig_decrypt = None
+orig_open_alg = None
+orig_set_prop = None
+orig_get_prop = None
+orig_gen_key = None
+orig_import_key = None
+orig_encrypt = None
+# Keep callback references alive
+_callback_refs = []
+# Track key handles -> key material
+key_handle_to_material = {}
+alg_handle_to_name = {}
+def read_wstr(ptr):
+    """Read a null-terminated UTF-16LE string from a pointer."""
+    if not ptr:
+        return "<null>"
+    try:
+        buf = ctypes.wstring_at(ptr)
+        return buf
+    except:
+        return "<err>"
+def hooked_open_alg(phAlgorithm, pszAlgId, pszImplementation, dwFlags):
+    alg_name = read_wstr(pszAlgId)
+    impl = read_wstr(pszImplementation)
+    status = orig_open_alg(phAlgorithm, pszAlgId, pszImplementation, dwFlags)
+    handle = phAlgorithm[0] if phAlgorithm else None
+    if handle:
+        alg_handle_to_name[handle.value if hasattr(handle, 'value') else handle] = alg_name
+    print(f"[BCryptOpenAlgorithmProvider] alg={alg_name!r} impl={impl!r} "
+          f"flags={dwFlags:#x} -> handle={handle} status={status:#010x}")
+    return status
+def hooked_set_prop(hObject, pszProperty, pbInput, cbInput, dwFlags):
+    prop_name = read_wstr(pszProperty)
+    # Read property value
+    value_repr = ""
+    if pbInput and cbInput > 0:
+        try:
+            raw = ctypes.string_at(pbInput, cbInput)
+            # Try as wstring first (for chaining mode etc)
+            try:
+                value_repr = raw.decode('utf-16-le').rstrip('\x00')
+            except:
+                value_repr = raw.hex()
+            # Also try as DWORD for numeric properties
+            if cbInput == 4:
+                dword_val = struct.unpack('<I', raw)[0]
+                value_repr = f"{value_repr} (dword={dword_val})"
+        except:
+            value_repr = "<err>"
+    status = orig_set_prop(hObject, pszProperty, pbInput, cbInput, dwFlags)
+    h = hObject.value if hasattr(hObject, 'value') else hObject
+    alg = alg_handle_to_name.get(h, "?")
+    print(f"[BCryptSetProperty] obj={h:#x} ({alg}) prop={prop_name!r} "
+          f"value={value_repr!r} size={cbInput} flags={dwFlags:#x} "
+          f"-> status={status:#010x}")
+    return status
+def hooked_get_prop(hObject, pszProperty, pbOutput, cbOutput, pcbResult, dwFlags):
+    prop_name = read_wstr(pszProperty)
+    status = orig_get_prop(hObject, pszProperty, pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    value_repr = ""
+    if status == 0 and pbOutput and result_size > 0:
+        try:
+            raw = ctypes.string_at(pbOutput, result_size)
+            if result_size == 4:
+                value_repr = f"dword={struct.unpack('<I', raw)[0]}"
+            elif result_size <= 64:
+                try:
+                    value_repr = raw.decode('utf-16-le').rstrip('\x00')
+                except:
+                    value_repr = raw.hex()
+            else:
+                value_repr = f"{result_size} bytes"
+        except:
+            pass
+    print(f"[BCryptGetProperty] prop={prop_name!r} -> {value_repr!r} "
+          f"({result_size} bytes) status={status:#010x}")
+    return status
+def hooked_gen_key(hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                   pbSecret, cbSecret, dwFlags):
+    # Capture the secret key material BEFORE the call
+    secret = None
+    if pbSecret and cbSecret > 0:
+        try:
+            secret = ctypes.string_at(pbSecret, cbSecret)
+        except:
+            pass
+    status = orig_gen_key(hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                          pbSecret, cbSecret, dwFlags)
+    key_handle = phKey[0] if phKey else None
+    alg_h = hAlgorithm.value if hasattr(hAlgorithm, 'value') else hAlgorithm
+    alg = alg_handle_to_name.get(alg_h, "?")
+    print(f"[BCryptGenerateSymmetricKey] alg={alg} secret_len={cbSecret} "
+          f"keyObjSize={cbKeyObject} flags={dwFlags:#x} "
+          f"-> key={key_handle} status={status:#010x}")
+    if secret:
+        print(f"  Secret bytes: {secret.hex()}")
+        print(f"  Secret ASCII: {secret!r}")
+        if key_handle:
+            kh = key_handle.value if hasattr(key_handle, 'value') else key_handle
+            key_handle_to_material[kh] = secret
+    return status
+def hooked_import_key(hAlgorithm, hImportKey, pszBlobType, phKey,
+                      pbKeyObject, cbKeyObject, pbInput, cbInput, dwFlags):
+    blob_type = read_wstr(pszBlobType)
+    blob_data = None
+    if pbInput and cbInput > 0:
+        try:
+            blob_data = ctypes.string_at(pbInput, cbInput)
+        except:
+            pass
+    status = orig_import_key(hAlgorithm, hImportKey, pszBlobType, phKey,
+                             pbKeyObject, cbKeyObject, pbInput, cbInput, dwFlags)
+    key_handle = phKey[0] if phKey else None
+    print(f"[BCryptImportKey] blob_type={blob_type!r} blob_size={cbInput} "
+          f"flags={dwFlags:#x} -> key={key_handle} status={status:#010x}")
+    if blob_data:
+        print(f"  Blob: {blob_data.hex()}")
+        if cbInput > 12:
+            magic, ver, key_len = struct.unpack('<III', blob_data[:12])
+            key_bytes = blob_data[12:12+key_len]
+            print(f"  Magic={magic:#x} Ver={ver} KeyLen={key_len}")
+            print(f"  Key: {key_bytes.hex()}")
+            print(f"  Key ASCII: {key_bytes!r}")
+    return status
+def hooked_encrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                   pbOutput, cbOutput, pcbResult, dwFlags):
+    status = orig_encrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                          pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    print(f"[BCryptEncrypt] in={cbInput} out={result_size} iv_len={cbIV} "
+          f"flags={dwFlags:#x} status={status:#010x}")
+    return status
+def hooked_bcrypt_decrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                          pbOutput, cbOutput, pcbResult, dwFlags):
+    global decrypt_call_num
+    call_num = decrypt_call_num
+    decrypt_call_num += 1
+    iv_before = None
+    if pbIV and cbIV > 0:
+        try:
+            iv_before = ctypes.string_at(pbIV, cbIV)
+        except:
+            pass
+    encrypted_input = None
+    if pbInput and cbInput > 0:
+        try:
+            encrypted_input = ctypes.string_at(pbInput, min(cbInput, 64))
+        except:
+            pass
+    status = orig_decrypt(hKey, pbInput, cbInput, pPadding,
+                          pbIV, cbIV, pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    iv_after = None
+    if pbIV and cbIV > 0:
+        try:
+            iv_after = ctypes.string_at(pbIV, cbIV)
+        except:
+            pass
+    # Check if we know the key material for this handle
+    kh = hKey.value if hasattr(hKey, 'value') else hKey
+    known_key = key_handle_to_material.get(kh)
+    print(f"[BCryptDecrypt #{call_num}] status={status:#x} "
+          f"in={cbInput} out={result_size} iv_len={cbIV} flags={dwFlags}")
+    if known_key:
+        print(f"  Key material: {known_key.hex()}")
+    if encrypted_input:
+        print(f"  Enc input[:32]: {encrypted_input[:32].hex()}")
+    if iv_before:
+        print(f"  IV before: {iv_before.hex()}")
+    if iv_after and iv_after != iv_before:
+        print(f"  IV after:  {iv_after.hex()}")
+    if status == 0 and result_size > 0 and pbOutput:
+        try:
+            decrypted = ctypes.string_at(pbOutput, result_size)
+            print(f"  Decrypted[:32]: {decrypted[:32].hex()}")
+            fname = OUTPUT_DIR / f"decrypt_{call_num}_in{cbInput}_out{result_size}.bin"
+            fname.write_bytes(decrypted)
+            print(f"  -> Saved: {fname.name}")
+        except Exception as e:
+            print(f"  Error: {e}")
+    return status
+def hook_iat_generic(dll_handle, target_dll_name, target_func_name, hook_func, func_type):
+    """Hook a function by patching the IAT. Returns (original_func, callback_ref)."""
+    import pefile
+    kernel32 = ctypes.windll.kernel32
+    buf = ctypes.create_unicode_buffer(260)
+    h = ctypes.c_void_p(dll_handle)
+    kernel32.GetModuleFileNameW(h, buf, 260)
+    dll_path = buf.value
+    pe = pefile.PE(dll_path)
+    base_addr = dll_handle
+    for entry in pe.DIRECTORY_ENTRY_IMPORT:
+        import_name = entry.dll.decode('utf-8', errors='ignore').lower()
+        if target_dll_name.lower() not in import_name:
+            continue
+        for imp in entry.imports:
+            if imp.name and imp.name.decode('utf-8', errors='ignore') == target_func_name:
+                iat_rva = imp.address - pe.OPTIONAL_HEADER.ImageBase
+                iat_addr = base_addr + iat_rva
+                original_ptr = ctypes.c_void_p()
+                ctypes.memmove(ctypes.byref(original_ptr), iat_addr, 8)
+                callback = func_type(hook_func)
+                callback_ptr = ctypes.cast(callback, c_void_p).value
+                old_protect = c_ulong()
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, 0x04, ctypes.byref(old_protect))
+                new_ptr = ctypes.c_void_p(callback_ptr)
+                ctypes.memmove(iat_addr, ctypes.byref(new_ptr), 8)
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, old_protect.value, ctypes.byref(old_protect))
+                original_func = func_type(original_ptr.value)
+                pe.close()
+                print(f"  Hooked {target_func_name} at IAT RVA={iat_rva:#x}")
+                return original_func, callback
+    pe.close()
+    return None, None
+def main():
+    global orig_decrypt, orig_open_alg, orig_set_prop, orig_get_prop
+    global orig_gen_key, orig_import_key, orig_encrypt
+    print("=" * 70)
+    print("EXTENDED BCrypt HOOK - Capturing ALL crypto setup")
+    print("=" * 70)
+    # Clean dump dir
+    for f in OUTPUT_DIR.glob("decrypt_*.bin"):
+        f.unlink()
+    kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+    kernel32.SetDllDirectoryW(DLL_DIR)
+    dll_path = os.path.join(DLL_DIR, "oneocr.dll")
+    print(f"Loading: {dll_path}")
+    dll = ctypes.WinDLL(dll_path)
+    dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+    dll.CreateOcrInitOptions.restype = c_int64
+    dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, c_ubyte]
+    dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+    dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+    dll.CreateOcrPipeline.restype = c_int64
+    import pefile  # noqa
+    # Hook ALL BCrypt functions
+    hooks = [
+        ('bcrypt', 'BCryptOpenAlgorithmProvider', hooked_open_alg, BCRYPT_OPEN_ALG_TYPE),
+        ('bcrypt', 'BCryptSetProperty', hooked_set_prop, BCRYPT_SET_PROP_TYPE),
+        ('bcrypt', 'BCryptGetProperty', hooked_get_prop, BCRYPT_GET_PROP_TYPE),
+        ('bcrypt', 'BCryptGenerateSymmetricKey', hooked_gen_key, BCRYPT_GEN_KEY_TYPE),
+        ('bcrypt', 'BCryptImportKey', hooked_import_key, BCRYPT_IMPORT_KEY_TYPE),
+        ('bcrypt', 'BCryptEncrypt', hooked_encrypt, BCRYPT_ENCRYPT_TYPE),
+        ('bcrypt', 'BCryptDecrypt', hooked_bcrypt_decrypt, BCRYPT_DECRYPT_TYPE),
+    ]
+    originals = {}
+    print("\n--- Installing IAT hooks ---")
+    for target_dll, func_name, hook_func, func_type in hooks:
+        orig, cb = hook_iat_generic(dll._handle, target_dll, func_name, hook_func, func_type)
+        if orig:
+            originals[func_name] = orig
+            _callback_refs.append(cb)
+        else:
+            print(f"  WARNING: {func_name} not found in IAT (may not be imported)")
+    orig_open_alg = originals.get('BCryptOpenAlgorithmProvider')
+    orig_set_prop = originals.get('BCryptSetProperty')
+    orig_get_prop = originals.get('BCryptGetProperty')
+    orig_gen_key = originals.get('BCryptGenerateSymmetricKey')
+    orig_import_key = originals.get('BCryptImportKey')
+    orig_encrypt = originals.get('BCryptEncrypt')
+    orig_decrypt = originals.get('BCryptDecrypt')
+    if not orig_decrypt:
+        print("FATAL: Could not hook BCryptDecrypt!")
+        return
+    print("\n--- Creating OCR Pipeline ---")
+    init_options = c_int64()
+    ret = dll.CreateOcrInitOptions(byref(init_options))
+    print(f"CreateOcrInitOptions: {ret}")
+    ret = dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+    print(f"SetUseModelDelayLoad: {ret}")
+    pipeline = c_int64()
+    model_buf = ctypes.create_string_buffer(MODEL_PATH.encode())
+    key_buf = ctypes.create_string_buffer(KEY)
+    print(f"\nCalling CreateOcrPipeline...")
+    print(f"Model: {MODEL_PATH}")
+    print(f"Key: {KEY}")
+    print()
+    ret = dll.CreateOcrPipeline(model_buf, key_buf, init_options, byref(pipeline))
+    print(f"\nCreateOcrPipeline returned: {ret}")
+    print(f"Pipeline handle: {pipeline.value}")
+    # Summary
+    print()
+    print("=" * 70)
+    print("SUMMARY")
+    print("=" * 70)
+    print(f"Key handles tracked: {len(key_handle_to_material)}")
+    for kh, mat in key_handle_to_material.items():
+        print(f"  Handle {kh:#x}: {mat.hex()}")
+        print(f"  ASCII: {mat!r}")
+        print(f"  Length: {len(mat)}")
+    files = sorted(OUTPUT_DIR.glob("decrypt_*.bin"))
+    if files:
+        print(f"\nSaved {len(files)} decrypted buffers")
+        total = sum(f.stat().st_size for f in files)
+        print(f"Total: {total:,} bytes ({total/1024/1024:.1f} MB)")
+    print("\nDone!")
+if __name__ == '__main__':
+    main()

_archive/hooks/hook_full_log.py ADDED Viewed

	@@ -0,0 +1,265 @@

+"""
+Full BCrypt hash hook - saves all hash inputs and AES keys to JSON for analysis.
+"""
+import ctypes
+from ctypes import (
+    c_int64, c_char_p, c_ubyte, POINTER, byref,
+    c_void_p, c_ulong, WINFUNCTYPE
+)
+import os
+import struct
+import json
+from pathlib import Path
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR.mkdir(exist_ok=True)
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# Globals
+decrypt_call_num = 0
+_callback_refs = []
+key_handle_to_material = {}
+hash_handle_to_data = {}
+alg_handle_to_name = {}
+# Collect all crypto operations for JSON output
+crypto_log = []
+DECRYPT_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_void_p,
+                         c_void_p, c_ulong, c_void_p, c_ulong, POINTER(c_ulong), c_ulong)
+OPEN_ALG_T = WINFUNCTYPE(c_ulong, POINTER(c_void_p), c_void_p, c_void_p, c_ulong)
+SET_PROP_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_void_p, c_ulong, c_ulong)
+GEN_KEY_T = WINFUNCTYPE(c_ulong, c_void_p, POINTER(c_void_p), c_void_p, c_ulong,
+                         c_void_p, c_ulong, c_ulong)
+CREATE_HASH_T = WINFUNCTYPE(c_ulong, c_void_p, POINTER(c_void_p), c_void_p, c_ulong,
+                             c_void_p, c_ulong, c_ulong)
+HASH_DATA_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_ulong)
+FINISH_HASH_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_ulong)
+ENCRYPT_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_void_p,
+                         c_void_p, c_ulong, c_void_p, c_ulong, POINTER(c_ulong), c_ulong)
+orig = {}
+def read_wstr(ptr):
+    if not ptr:
+        return "<null>"
+    try:
+        return ctypes.wstring_at(ptr)
+    except:
+        return "<err>"
+def hooked_open_alg(phAlgorithm, pszAlgId, pszImplementation, dwFlags):
+    alg_name = read_wstr(pszAlgId)
+    status = orig['OpenAlgorithmProvider'](phAlgorithm, pszAlgId, pszImplementation, dwFlags)
+    handle = phAlgorithm[0] if phAlgorithm else None
+    if handle:
+        h = handle.value if hasattr(handle, 'value') else handle
+        alg_handle_to_name[h] = alg_name
+    return status
+def hooked_set_prop(hObject, pszProperty, pbInput, cbInput, dwFlags):
+    return orig['SetProperty'](hObject, pszProperty, pbInput, cbInput, dwFlags)
+def hooked_create_hash(hAlgorithm, phHash, pbHashObject, cbHashObject,
+                       pbSecret, cbSecret, dwFlags):
+    status = orig['CreateHash'](hAlgorithm, phHash, pbHashObject, cbHashObject,
+                                pbSecret, cbSecret, dwFlags)
+    hash_handle = phHash[0] if phHash else None
+    hmac_key = None
+    if pbSecret and cbSecret > 0:
+        hmac_key = ctypes.string_at(pbSecret, cbSecret)
+    hh = hash_handle.value if hasattr(hash_handle, 'value') else hash_handle
+    hash_handle_to_data[hh] = {'hmac_key': hmac_key, 'data_chunks': []}
+    return status
+def hooked_hash_data(hHash, pbInput, cbInput, dwFlags):
+    status = orig['HashData'](hHash, pbInput, cbInput, dwFlags)
+    hh = hHash.value if hasattr(hHash, 'value') else hHash
+    if pbInput and cbInput > 0:
+        data = ctypes.string_at(pbInput, cbInput)
+        if hh in hash_handle_to_data:
+            hash_handle_to_data[hh]['data_chunks'].append(data)
+    return status
+def hooked_finish_hash(hHash, pbOutput, cbOutput, dwFlags):
+    status = orig['FinishHash'](hHash, pbOutput, cbOutput, dwFlags)
+    hh = hHash.value if hasattr(hHash, 'value') else hHash
+    output = None
+    if pbOutput and cbOutput > 0:
+        output = ctypes.string_at(pbOutput, cbOutput)
+    info = hash_handle_to_data.get(hh)
+    if info and output:
+        all_data = b"".join(info['data_chunks'])
+        crypto_log.append({
+            'op': 'sha256',
+            'input': all_data.hex(),
+            'input_len': len(all_data),
+            'output': output.hex(),
+        })
+    return status
+def hooked_gen_key(hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                   pbSecret, cbSecret, dwFlags):
+    secret = None
+    if pbSecret and cbSecret > 0:
+        secret = ctypes.string_at(pbSecret, cbSecret)
+    status = orig['GenerateSymmetricKey'](hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                                          pbSecret, cbSecret, dwFlags)
+    key_handle = phKey[0] if phKey else None
+    if key_handle and secret:
+        kh = key_handle.value if hasattr(key_handle, 'value') else key_handle
+        key_handle_to_material[kh] = secret
+    return status
+def hooked_encrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                   pbOutput, cbOutput, pcbResult, dwFlags):
+    status = orig['Encrypt'](hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                              pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    if cbIV > 0:
+        iv = ctypes.string_at(pbIV, cbIV) if pbIV else None
+        enc_in = ctypes.string_at(pbInput, min(cbInput, 32)) if pbInput and cbInput > 0 else None
+        enc_out = ctypes.string_at(pbOutput, min(result_size, 32)) if pbOutput and result_size > 0 else None
+        kh = hKey.value if hasattr(hKey, 'value') else hKey
+        crypto_log.append({
+            'op': 'encrypt',
+            'input_size': cbInput,
+            'output_size': result_size,
+            'aes_key': key_handle_to_material.get(kh, b'').hex(),
+            'input_preview': enc_in.hex() if enc_in else None,
+            'output_preview': enc_out.hex() if enc_out else None,
+        })
+    return status
+def hooked_decrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                   pbOutput, cbOutput, pcbResult, dwFlags):
+    global decrypt_call_num
+    status = orig['Decrypt'](hKey, pbInput, cbInput, pPadding,
+                             pbIV, cbIV, pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    if cbIV > 0:
+        call_num = decrypt_call_num
+        decrypt_call_num += 1
+        kh = hKey.value if hasattr(hKey, 'value') else hKey
+        aes_key = key_handle_to_material.get(kh, b'').hex()
+        dec_data = None
+        if status == 0 and result_size > 0 and pbOutput:
+            dec_data = ctypes.string_at(pbOutput, result_size)
+            fname = OUTPUT_DIR / f"decrypt_{call_num}_in{cbInput}_out{result_size}.bin"
+            fname.write_bytes(dec_data)
+        crypto_log.append({
+            'op': 'decrypt',
+            'call_num': call_num,
+            'input_size': cbInput,
+            'output_size': result_size,
+            'aes_key': aes_key,
+            'first_bytes': dec_data[:32].hex() if dec_data else None,
+        })
+    return status
+def hook_iat(dll_handle, target_dll, func_name, hook_func, func_type):
+    import pefile
+    kernel32 = ctypes.windll.kernel32
+    buf = ctypes.create_unicode_buffer(260)
+    kernel32.GetModuleFileNameW(ctypes.c_void_p(dll_handle), buf, 260)
+    pe = pefile.PE(buf.value)
+    for entry in pe.DIRECTORY_ENTRY_IMPORT:
+        if target_dll.lower() not in entry.dll.decode('utf-8', errors='ignore').lower():
+            continue
+        for imp in entry.imports:
+            if imp.name and imp.name.decode('utf-8', errors='ignore') == func_name:
+                iat_rva = imp.address - pe.OPTIONAL_HEADER.ImageBase
+                iat_addr = dll_handle + iat_rva
+                original_ptr = ctypes.c_void_p()
+                ctypes.memmove(ctypes.byref(original_ptr), iat_addr, 8)
+                callback = func_type(hook_func)
+                callback_ptr = ctypes.cast(callback, c_void_p).value
+                old_protect = c_ulong()
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, 0x04, byref(old_protect))
+                new_ptr = ctypes.c_void_p(callback_ptr)
+                ctypes.memmove(iat_addr, ctypes.byref(new_ptr), 8)
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, old_protect.value, byref(old_protect))
+                original_func = func_type(original_ptr.value)
+                pe.close()
+                _callback_refs.append(callback)
+                return original_func
+    pe.close()
+    return None
+def main():
+    print("BCrypt Full Hook - collecting all crypto operations to JSON...")
+    for f in OUTPUT_DIR.glob("decrypt_*.bin"):
+        f.unlink()
+    kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+    kernel32.SetDllDirectoryW(DLL_DIR)
+    dll = ctypes.WinDLL(os.path.join(DLL_DIR, "oneocr.dll"))
+    dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+    dll.CreateOcrInitOptions.restype = c_int64
+    dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, c_ubyte]
+    dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+    dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+    dll.CreateOcrPipeline.restype = c_int64
+    import pefile  # noqa
+    hooks = [
+        ('bcrypt', 'BCryptOpenAlgorithmProvider', hooked_open_alg, OPEN_ALG_T),
+        ('bcrypt', 'BCryptSetProperty', hooked_set_prop, SET_PROP_T),
+        ('bcrypt', 'BCryptCreateHash', hooked_create_hash, CREATE_HASH_T),
+        ('bcrypt', 'BCryptHashData', hooked_hash_data, HASH_DATA_T),
+        ('bcrypt', 'BCryptFinishHash', hooked_finish_hash, FINISH_HASH_T),
+        ('bcrypt', 'BCryptGenerateSymmetricKey', hooked_gen_key, GEN_KEY_T),
+        ('bcrypt', 'BCryptEncrypt', hooked_encrypt, ENCRYPT_T),
+        ('bcrypt', 'BCryptDecrypt', hooked_decrypt, DECRYPT_T),
+    ]
+    for target_dll, func_name, hook_func, func_type in hooks:
+        o = hook_iat(dll._handle, target_dll, func_name, hook_func, func_type)
+        if o:
+            orig[func_name.replace('BCrypt', '')] = o
+    init_options = c_int64()
+    dll.CreateOcrInitOptions(byref(init_options))
+    dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+    pipeline = c_int64()
+    ret = dll.CreateOcrPipeline(
+        ctypes.create_string_buffer(MODEL_PATH.encode()),
+        ctypes.create_string_buffer(KEY),
+        init_options, byref(pipeline)
+    )
+    print(f"CreateOcrPipeline: {ret}")
+    print(f"Total crypto ops: {len(crypto_log)}")
+    print(f"Decrypted chunks: {decrypt_call_num}")
+    # Save crypto log
+    out_path = Path("temp/crypto_log.json")
+    out_path.parent.mkdir(exist_ok=True)
+    out_path.write_text(json.dumps(crypto_log, indent=2))
+    print(f"Saved crypto log to {out_path}")
+if __name__ == '__main__':
+    main()

_archive/hooks/hook_hash.py ADDED Viewed

	@@ -0,0 +1,340 @@

+"""
+Hook BCrypt hash functions (CreateHash, HashData, FinishHash) to discover
+the key derivation scheme. Also hook GenerateSymmetricKey and BCryptDecrypt.
+"""
+import ctypes
+from ctypes import (
+    c_int64, c_char_p, c_ubyte, POINTER, byref,
+    c_void_p, c_ulong, WINFUNCTYPE
+)
+import os
+import struct
+from pathlib import Path
+OUTPUT_DIR = Path(r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\frida_dump")
+OUTPUT_DIR.mkdir(exist_ok=True)
+DLL_DIR = r"c:\Users\MattyMroz\Desktop\PROJECTS\ONEOCR\ocr_data"
+MODEL_PATH = os.path.join(DLL_DIR, "oneocr.onemodel")
+KEY = b'kj)TGtrK>f]b[Piow.gU+nC@s""""""4'
+# Globals
+decrypt_call_num = 0
+_callback_refs = []
+key_handle_to_material = {}
+hash_handle_to_data = {}   # track hash data per handle
+alg_handle_to_name = {}
+# ── Function types ──
+DECRYPT_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_void_p,
+                         c_void_p, c_ulong, c_void_p, c_ulong, POINTER(c_ulong), c_ulong)
+OPEN_ALG_T = WINFUNCTYPE(c_ulong, POINTER(c_void_p), c_void_p, c_void_p, c_ulong)
+SET_PROP_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_void_p, c_ulong, c_ulong)
+GEN_KEY_T = WINFUNCTYPE(c_ulong, c_void_p, POINTER(c_void_p), c_void_p, c_ulong,
+                         c_void_p, c_ulong, c_ulong)
+# BCryptCreateHash(hAlgorithm, phHash, pbHashObject, cbHashObject,
+#                  pbSecret, cbSecret, dwFlags)
+CREATE_HASH_T = WINFUNCTYPE(c_ulong, c_void_p, POINTER(c_void_p), c_void_p, c_ulong,
+                             c_void_p, c_ulong, c_ulong)
+# BCryptHashData(hHash, pbInput, cbInput, dwFlags)
+HASH_DATA_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_ulong)
+# BCryptFinishHash(hHash, pbOutput, cbOutput, dwFlags)
+FINISH_HASH_T = WINFUNCTYPE(c_ulong, c_void_p, c_void_p, c_ulong, c_ulong)
+# Originals
+orig = {}
+def read_wstr(ptr):
+    if not ptr:
+        return "<null>"
+    try:
+        return ctypes.wstring_at(ptr)
+    except:
+        return "<err>"
+def hooked_open_alg(phAlgorithm, pszAlgId, pszImplementation, dwFlags):
+    alg_name = read_wstr(pszAlgId)
+    status = orig['OpenAlgorithmProvider'](phAlgorithm, pszAlgId, pszImplementation, dwFlags)
+    handle = phAlgorithm[0] if phAlgorithm else None
+    if handle:
+        h = handle.value if hasattr(handle, 'value') else handle
+        alg_handle_to_name[h] = alg_name
+    print(f"[OpenAlg] {alg_name!r} -> {status:#010x}")
+    return status
+def hooked_set_prop(hObject, pszProperty, pbInput, cbInput, dwFlags):
+    prop_name = read_wstr(pszProperty)
+    value = ""
+    if pbInput and cbInput > 0:
+        raw = ctypes.string_at(pbInput, cbInput)
+        try:
+            value = raw.decode('utf-16-le').rstrip('\x00')
+        except:
+            value = raw.hex()
+        if cbInput == 4:
+            value += f" (dword={struct.unpack('<I', raw)[0]})"
+    status = orig['SetProperty'](hObject, pszProperty, pbInput, cbInput, dwFlags)
+    print(f"[SetProp] {prop_name!r} = {value!r} -> {status:#010x}")
+    return status
+def hooked_create_hash(hAlgorithm, phHash, pbHashObject, cbHashObject,
+                       pbSecret, cbSecret, dwFlags):
+    status = orig['CreateHash'](hAlgorithm, phHash, pbHashObject, cbHashObject,
+                                pbSecret, cbSecret, dwFlags)
+    hash_handle = phHash[0] if phHash else None
+    hmac_key = None
+    if pbSecret and cbSecret > 0:
+        hmac_key = ctypes.string_at(pbSecret, cbSecret)
+    hh = hash_handle.value if hasattr(hash_handle, 'value') else hash_handle
+    ah = hAlgorithm.value if hasattr(hAlgorithm, 'value') else hAlgorithm
+    alg = alg_handle_to_name.get(ah, "?")
+    hash_handle_to_data[hh] = {
+        'alg': alg,
+        'hmac_key': hmac_key,
+        'data_chunks': [],
+        'total_len': 0,
+    }
+    hmac_info = ""
+    if hmac_key:
+        hmac_info = f" HMAC_KEY={hmac_key.hex()} ({hmac_key!r})"
+    print(f"[CreateHash] alg={alg} hash={hh:#x}{hmac_info} -> {status:#010x}")
+    return status
+def hooked_hash_data(hHash, pbInput, cbInput, dwFlags):
+    status = orig['HashData'](hHash, pbInput, cbInput, dwFlags)
+    hh = hHash.value if hasattr(hHash, 'value') else hHash
+    data_bytes = None
+    if pbInput and cbInput > 0:
+        data_bytes = ctypes.string_at(pbInput, cbInput)
+    if hh in hash_handle_to_data and data_bytes:
+        info = hash_handle_to_data[hh]
+        info['data_chunks'].append(data_bytes)
+        info['total_len'] += len(data_bytes)
+    # Show data
+    data_hex = data_bytes.hex() if data_bytes else ""
+    data_ascii = ""
+    if data_bytes:
+        try:
+            data_ascii = data_bytes.decode('ascii', errors='replace')
+        except:
+            pass
+    preview = data_hex[:128]
+    if len(data_hex) > 128:
+        preview += "..."
+    print(f"[HashData] hash={hh:#x} len={cbInput} data={preview}")
+    if data_ascii and all(32 <= c < 127 or c in (10, 13) for c in (data_bytes or b"")):
+        print(f"  ASCII: {data_ascii!r}")
+    return status
+def hooked_finish_hash(hHash, pbOutput, cbOutput, dwFlags):
+    status = orig['FinishHash'](hHash, pbOutput, cbOutput, dwFlags)
+    hh = hHash.value if hasattr(hHash, 'value') else hHash
+    output = None
+    if pbOutput and cbOutput > 0:
+        output = ctypes.string_at(pbOutput, cbOutput)
+    info = hash_handle_to_data.get(hh)
+    all_data = b""
+    if info:
+        all_data = b"".join(info['data_chunks'])
+    print(f"[FinishHash] hash={hh:#x} output_len={cbOutput}")
+    if output:
+        print(f"  Result: {output.hex()}")
+    if info:
+        print(f"  Input was: {info['total_len']} bytes in {len(info['data_chunks'])} chunks")
+        if info['total_len'] <= 256:
+            print(f"  Full input: {all_data.hex()}")
+            try:
+                print(f"  Input ASCII: {all_data!r}")
+            except:
+                pass
+        if info['hmac_key']:
+            print(f"  HMAC key: {info['hmac_key'].hex()}")
+    return status
+def hooked_gen_key(hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                   pbSecret, cbSecret, dwFlags):
+    secret = None
+    if pbSecret and cbSecret > 0:
+        secret = ctypes.string_at(pbSecret, cbSecret)
+    status = orig['GenerateSymmetricKey'](hAlgorithm, phKey, pbKeyObject, cbKeyObject,
+                                          pbSecret, cbSecret, dwFlags)
+    key_handle = phKey[0] if phKey else None
+    if key_handle and secret:
+        kh = key_handle.value if hasattr(key_handle, 'value') else key_handle
+        key_handle_to_material[kh] = secret
+    print(f"[GenSymKey] secret_len={cbSecret} -> {status:#010x}")
+    if secret:
+        print(f"  Secret: {secret.hex()}")
+    return status
+def hooked_decrypt(hKey, pbInput, cbInput, pPadding, pbIV, cbIV,
+                   pbOutput, cbOutput, pcbResult, dwFlags):
+    global decrypt_call_num
+    iv_before = None
+    if pbIV and cbIV > 0:
+        iv_before = ctypes.string_at(pbIV, cbIV)
+    status = orig['Decrypt'](hKey, pbInput, cbInput, pPadding,
+                             pbIV, cbIV, pbOutput, cbOutput, pcbResult, dwFlags)
+    result_size = pcbResult[0] if pcbResult else 0
+    # Only log actual decrypts (with IV), skip sizing calls
+    if cbIV > 0:
+        call_num = decrypt_call_num
+        decrypt_call_num += 1
+        kh = hKey.value if hasattr(hKey, 'value') else hKey
+        known_key = key_handle_to_material.get(kh)
+        print(f"[Decrypt #{call_num}] in={cbInput} out={result_size} iv_len={cbIV}")
+        if iv_before:
+            print(f"  IV: {iv_before.hex()}")
+        if known_key:
+            print(f"  AES key: {known_key.hex()}")
+        if status == 0 and result_size > 0 and pbOutput:
+            decrypted = ctypes.string_at(pbOutput, result_size)
+            print(f"  Decrypted[:32]: {decrypted[:32].hex()}")
+            fname = OUTPUT_DIR / f"decrypt_{call_num}_in{cbInput}_out{result_size}.bin"
+            fname.write_bytes(decrypted)
+    return status
+def hook_iat(dll_handle, target_dll, func_name, hook_func, func_type):
+    import pefile
+    kernel32 = ctypes.windll.kernel32
+    buf = ctypes.create_unicode_buffer(260)
+    kernel32.GetModuleFileNameW(ctypes.c_void_p(dll_handle), buf, 260)
+    pe = pefile.PE(buf.value)
+    for entry in pe.DIRECTORY_ENTRY_IMPORT:
+        if target_dll.lower() not in entry.dll.decode('utf-8', errors='ignore').lower():
+            continue
+        for imp in entry.imports:
+            if imp.name and imp.name.decode('utf-8', errors='ignore') == func_name:
+                iat_rva = imp.address - pe.OPTIONAL_HEADER.ImageBase
+                iat_addr = dll_handle + iat_rva
+                original_ptr = ctypes.c_void_p()
+                ctypes.memmove(ctypes.byref(original_ptr), iat_addr, 8)
+                callback = func_type(hook_func)
+                callback_ptr = ctypes.cast(callback, c_void_p).value
+                old_protect = c_ulong()
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, 0x04, byref(old_protect))
+                new_ptr = ctypes.c_void_p(callback_ptr)
+                ctypes.memmove(iat_addr, ctypes.byref(new_ptr), 8)
+                kernel32.VirtualProtect(ctypes.c_void_p(iat_addr), 8, old_protect.value, byref(old_protect))
+                original_func = func_type(original_ptr.value)
+                pe.close()
+                _callback_refs.append(callback)
+                return original_func
+    pe.close()
+    return None
+def main():
+    print("=" * 70)
+    print("BCrypt HASH HOOK - Discover SHA256 key derivation input")
+    print("=" * 70)
+    for f in OUTPUT_DIR.glob("decrypt_*.bin"):
+        f.unlink()
+    kernel32 = ctypes.WinDLL("kernel32", use_last_error=True)
+    kernel32.SetDllDirectoryW(DLL_DIR)
+    dll_path = os.path.join(DLL_DIR, "oneocr.dll")
+    print(f"Loading: {dll_path}")
+    dll = ctypes.WinDLL(dll_path)
+    dll.CreateOcrInitOptions.argtypes = [POINTER(c_int64)]
+    dll.CreateOcrInitOptions.restype = c_int64
+    dll.OcrInitOptionsSetUseModelDelayLoad.argtypes = [c_int64, c_ubyte]
+    dll.OcrInitOptionsSetUseModelDelayLoad.restype = c_int64
+    dll.CreateOcrPipeline.argtypes = [c_char_p, c_char_p, c_int64, POINTER(c_int64)]
+    dll.CreateOcrPipeline.restype = c_int64
+    import pefile  # noqa
+    hooks = [
+        ('bcrypt', 'BCryptOpenAlgorithmProvider', hooked_open_alg, OPEN_ALG_T),
+        ('bcrypt', 'BCryptSetProperty', hooked_set_prop, SET_PROP_T),
+        ('bcrypt', 'BCryptCreateHash', hooked_create_hash, CREATE_HASH_T),
+        ('bcrypt', 'BCryptHashData', hooked_hash_data, HASH_DATA_T),
+        ('bcrypt', 'BCryptFinishHash', hooked_finish_hash, FINISH_HASH_T),
+        ('bcrypt', 'BCryptGenerateSymmetricKey', hooked_gen_key, GEN_KEY_T),
+        ('bcrypt', 'BCryptDecrypt', hooked_decrypt, DECRYPT_T),
+    ]
+    print("\n--- Installing hooks ---")
+    for target_dll, func_name, hook_func, func_type in hooks:
+        o = hook_iat(dll._handle, target_dll, func_name, hook_func, func_type)
+        if o:
+            short = func_name.replace('BCrypt', '')
+            orig[short] = o
+            print(f"  OK: {func_name}")
+        else:
+            print(f"  FAIL: {func_name}")
+    print("\n--- Creating OCR Pipeline (triggers crypto) ---")
+    init_options = c_int64()
+    dll.CreateOcrInitOptions(byref(init_options))
+    dll.OcrInitOptionsSetUseModelDelayLoad(init_options, 0)
+    pipeline = c_int64()
+    model_buf = ctypes.create_string_buffer(MODEL_PATH.encode())
+    key_buf = ctypes.create_string_buffer(KEY)
+    print(f"Model: {MODEL_PATH}")
+    print(f"Key: {KEY}")
+    print()
+    ret = dll.CreateOcrPipeline(model_buf, key_buf, init_options, byref(pipeline))
+    print(f"\nCreateOcrPipeline: {ret}")
+    # Summary
+    print("\n" + "=" * 70)
+    print("KEY DERIVATION SUMMARY")
+    print("=" * 70)
+    print(f"Unique derived keys: {len(key_handle_to_material)}")
+    print(f"Hash operations tracked: {len(hash_handle_to_data)}")
+    print(f"Decrypted chunks: {decrypt_call_num}")
+    print("\nDone!")
+if __name__ == '__main__':
+    main()

_archive/inspect_config_blob.py ADDED Viewed

	@@ -0,0 +1,80 @@

+"""Deep-dive into model_11 and model_22 graph structure — handle binary config."""
+import onnx
+import numpy as np
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+for idx in [11, 22]:
+    matches = list(models_dir.glob(f"model_{idx:02d}_*"))
+    model = onnx.load(str(matches[0]))
+    print(f"\n{'='*70}")
+    print(f"FULL GRAPH: model_{idx:02d}")
+    print(f"{'='*70}")
+    # All initializers (weights)
+    print(f"\n  Initializers ({len(model.graph.initializer)}):")
+    for init in model.graph.initializer:
+        if init.data_type == 8:  # STRING
+            raw = init.string_data[0] if init.string_data else init.raw_data
+            print(f"    {init.name}: STRING, {len(raw)} bytes (binary)")
+        else:
+            data = onnx.numpy_helper.to_array(init)
+            print(f"    {init.name}: shape={data.shape}, dtype={data.dtype}, "
+                  f"range=[{data.min():.4f}, {data.max():.4f}]")
+    # All nodes
+    print(f"\n  Nodes ({len(model.graph.node)}):")
+    for i, node in enumerate(model.graph.node):
+        domain_str = f" [{node.domain}]" if node.domain else ""
+        print(f"    [{i}] {node.op_type}{domain_str}: {list(node.input)} → {list(node.output)}")
+        for attr in node.attribute:
+            if attr.type == 2:
+                print(f"        {attr.name} = {attr.i}")
+            elif attr.type == 1:
+                print(f"        {attr.name} = {attr.f}")
+            elif attr.type == 7:
+                print(f"        {attr.name} = {list(attr.ints)}")
+    # Analyze feature/config blob
+    for init in model.graph.initializer:
+        if "config" in init.name.lower():
+            raw = init.string_data[0] if init.string_data else init.raw_data
+            blob = bytes(raw)
+            print(f"\n  ── feature/config analysis ──")
+            print(f"  Total bytes: {len(blob)}")
+            print(f"  First 32 bytes hex: {blob[:32].hex()}")
+            # Hypothesis: header + weight_matrix(input_dim × output_dim) + bias(output_dim)
+            # If input=21, output=50: 21*50=1050 floats = 4200 bytes, bias=50 floats = 200 bytes
+            # Total weights = 4400 bytes, header = 4492-4400 = 92 bytes
+            # Try reading first few uint32 as header
+            header_u32 = [int.from_bytes(blob[i:i+4], 'little') for i in range(0, min(96, len(blob)), 4)]
+            print(f"  First 24 uint32 LE values: {header_u32}")
+            # Try float32 interpretation after various offsets
+            for offset in [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92]:
+                remaining = len(blob) - offset
+                n_floats = remaining // 4
+                if n_floats == 0:
+                    continue
+                arr = np.frombuffer(blob[offset:offset + n_floats*4], dtype=np.float32)
+                valid = np.isfinite(arr).sum()
+                reasonable = np.sum((np.abs(arr) < 10) & np.isfinite(arr))
+                if reasonable > n_floats * 0.7:  # >70% reasonable values
+                    print(f"  *** offset={offset}: {n_floats} floats, {valid} finite, "
+                          f"{reasonable} in [-10,10] ({100*reasonable/n_floats:.0f}%)")
+                    print(f"      First 10: {arr[:10]}")
+                    print(f"      Stats: mean={arr.mean():.4f}, std={arr.std():.4f}")
+                    # Check if it could be weight matrix 21×50
+                    if n_floats >= 1050 + 50:
+                        W = arr[:1050].reshape(21, 50)
+                        b = arr[1050:1100]
+                        print(f"      As 21×50 weight: W_range=[{W.min():.4f},{W.max():.4f}], "
+                              f"b_range=[{b.min():.4f},{b.max():.4f}]")
+                        # Test with random input
+                        x = np.random.randn(1, 21).astype(np.float32)
+                        y = x @ W + b
+                        print(f"      Test: input(21) → output(50), y_range=[{y.min():.4f},{y.max():.4f}]")

_archive/inspect_custom_ops.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""Inspect custom ops in models 11, 12, 22, 33 to determine exact op names and domains."""
+import onnx
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+for idx in [11, 12, 22, 33]:
+    matches = list(models_dir.glob(f"model_{idx:02d}_*"))
+    if not matches:
+        print(f"model_{idx:02d}: NOT FOUND")
+        continue
+    model = onnx.load(str(matches[0]))
+    print(f"\n{'='*60}")
+    print(f"model_{idx:02d}: {matches[0].name}")
+    print(f"  IR version: {model.ir_version}")
+    print(f"  Opset imports: {[(o.domain, o.version) for o in model.opset_import]}")
+    # Find all non-standard ops
+    for node in model.graph.node:
+        if node.domain and node.domain != "":
+            print(f"  Node: op_type={node.op_type!r}, domain={node.domain!r}")
+            print(f"    inputs:  {list(node.input)}")
+            print(f"    outputs: {list(node.output)}")
+            # Print attributes
+            for attr in node.attribute:
+                if attr.type == 2:  # INT
+                    print(f"    attr {attr.name} = {attr.i}")
+                elif attr.type == 1:  # FLOAT
+                    print(f"    attr {attr.name} = {attr.f}")
+                elif attr.type == 3:  # STRING
+                    print(f"    attr {attr.name} = {attr.s.decode()!r}")
+                elif attr.type == 4:  # TENSOR
+                    t = attr.t
+                    print(f"    attr {attr.name} = tensor(dtype={t.data_type}, dims={list(t.dims)}, raw_bytes={len(t.raw_data)})")
+    # Also show graph inputs/outputs
+    print(f"  Graph inputs: {[(i.name, [d.dim_value or d.dim_param for d in i.type.tensor_type.shape.dim]) for i in model.graph.input]}")
+    print(f"  Graph outputs: {[(o.name, [d.dim_value or d.dim_param for d in o.type.tensor_type.shape.dim]) for o in model.graph.output]}")

_archive/inspect_graph_deep.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Deep-dive into model_11 and model_22 graph structure to understand OneOCRFeatureExtract."""
+import onnx
+import numpy as np
+from pathlib import Path
+models_dir = Path("oneocr_extracted/onnx_models")
+for idx in [11, 22]:
+    matches = list(models_dir.glob(f"model_{idx:02d}_*"))
+    model = onnx.load(str(matches[0]))
+    print(f"\n{'='*70}")
+    print(f"FULL GRAPH: model_{idx:02d}")
+    print(f"{'='*70}")
+    # All initializers (weights)
+    print(f"\n  Initializers ({len(model.graph.initializer)}):")
+    for init in model.graph.initializer:
+        data = onnx.numpy_helper.to_array(init)
+        print(f"    {init.name}: shape={data.shape}, dtype={data.dtype}, "
+              f"range=[{data.min():.4f}, {data.max():.4f}]")
+    # All nodes
+    print(f"\n  Nodes ({len(model.graph.node)}):")
+    for i, node in enumerate(model.graph.node):
+        domain_str = f" (domain={node.domain!r})" if node.domain else ""
+        print(f"    [{i}] {node.op_type}{domain_str}")
+        print(f"        in:  {list(node.input)}")
+        print(f"        out: {list(node.output)}")
+        for attr in node.attribute:
+            if attr.type == 2:  # INT
+                print(f"        {attr.name} = {attr.i}")
+            elif attr.type == 1:  # FLOAT
+                print(f"        {attr.name} = {attr.f}")
+            elif attr.type == 3:  # STRING
+                val = attr.s
+                if len(val) > 100:
+                    print(f"        {attr.name} = bytes({len(val)})")
+                else:
+                    print(f"        {attr.name} = {val!r}")
+            elif attr.type == 4:  # TENSOR
+                t = attr.t
+                print(f"        {attr.name} = tensor(dtype={t.data_type}, dims={list(t.dims)}, "
+                      f"raw_bytes={len(t.raw_data)})")
+            elif attr.type == 7:  # INTS
+                print(f"        {attr.name} = {list(attr.ints)}")
+            elif attr.type == 6:  # FLOATS
+                print(f"        {attr.name} = {list(attr.floats)[:10]}...")
+    # Show feature/config initializer details
+    for init in model.graph.initializer:
+        if "config" in init.name.lower() or "feature" in init.name.lower():
+            raw = init.raw_data
+            print(f"\n  feature/config blob: {len(raw)} bytes")
+            print(f"  First 64 bytes (hex): {raw[:64].hex()}")
+            print(f"  Last 32 bytes (hex): {raw[-32:].hex()}")
+            # Try to interpret structure
+            # Check if starts with dimension info
+            print(f"  As uint32 first 8 values: {[int.from_bytes(raw[i:i+4], 'little') for i in range(0, 32, 4)]}")
+            print(f"  As float32 first 8 values: {list(np.frombuffer(raw[:32], dtype=np.float32))}")