Spaces:

Devam0
/

gridlock

Sleeping

App Files Files Community

Devam0 commited on 13 days ago

Commit

c25b15a

1 Parent(s): 4a390bf

Initialize project with proper LFS tracking

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +1 -0
README.md +253 -11
app.py +32 -0
models/FSRCNN_x3.pb +3 -0
models/depth_anything_v2/.gitattributes +35 -0
models/depth_anything_v2/README.md +108 -0
models/depth_anything_v2/config.json +53 -0
models/depth_anything_v2/model.safetensors +3 -0
models/depth_anything_v2/preprocessor_config.json +44 -0
models/helmet_v11.pt +3 -0
models/license.pt +3 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/.gitattributes +36 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/README.md +152 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/config.json +102 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.json +0 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.pdiparams +3 -0
models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.yml +48 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/.gitattributes +36 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/README.md +104 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/config.json +98 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.json +0 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.pdiparams +3 -0
models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.yml +46 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/.msc +0 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/.mv +1 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/README.md +219 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/config.json +111 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.json +0 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.pdiparams +3 -0
models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.yml +53 -0
models/paddleocr/official_models/UVDoc/.gitattributes +36 -0
models/paddleocr/official_models/UVDoc/README.md +131 -0
models/paddleocr/official_models/UVDoc/config.json +57 -0
models/paddleocr/official_models/UVDoc/inference.json +0 -0
models/paddleocr/official_models/UVDoc/inference.pdiparams +3 -0
models/paddleocr/official_models/UVDoc/inference.yml +16 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/.gitattributes +36 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/README.md +169 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/config.json +533 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.json +0 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.pdiparams +3 -0
models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.yml +479 -0
models/stage1_best.pt +3 -0
models/yolov8s.pt +3 -0
patch_safetensors.py +13 -0
requirements.txt +48 -0
run_inference.py +25 -0
solution.py +405 -0
testimages/1.jpg +0 -0
testimages/2.webp +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.pdiparams filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,14 +1,256 @@
 ---
-title: Gridlock
-emoji: 👁
-colorFrom: gray
-colorTo: red
-sdk: gradio
-sdk_version: 6.19.0
-python_version: '3.13'
-app_file: app.py
-pinned: false
-short_description: flipkart gridlock hackathon
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# AID 728 — Traffic Rule Violation Detection
+**IIIT Bangalore**
+Detects traffic rule violations involving two-wheelers from single RGB street-camera images. Identifies **helmet violations**, **over-riding (>2 riders on one bike)**, and extracts the **license plate text** of every violating vehicle.
+---
+## Submission Files
+```
+final_submission/
+├── solution.py          # Core detection pipeline (TrafficViolationDetector class)
+├── requirements.txt     # All Python dependencies
+├── README.md            # This file
+└── models/              # All model weights (bundled, fully offline)
+    ├── yolov8s.pt                        # COCO primary detector          (21.54 MB)
+    ├── stage1_best.pt                    # Custom two-wheeler detector     (21.49 MB)
+    ├── helmet_v11.pt                     # Helmet classifier               (5.22 MB)
+    ├── license.pt                        # License plate localiser         (42.77 MB)
+    ├── FSRCNN_x3.pb                      # Super-resolution for plates     (0.04 MB)
+    ├── depth_anything_v2/                # Depth-Anything V2 Small (HF)    (47.31 MB fp16)
+    └── paddleocr/                        # Bundled PaddleOCR models
+        └── official_models/
+            └── ...
+The pipeline also uses the `inference_sdk` to query the Roboflow API for:
+- **Wrong-way driving detection** (`wrong-way-driving-detection-gqdmg/1`)
+- **Seatbelt classification** (`seat-belt-detection-udcfg/5`)
+Total model size: 194.59 MB  (limit: 250 MB)
+```
+---
+## Quick Start
+### Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### Run inference
+```python
+from solution import TrafficViolationDetector
+detector = TrafficViolationDetector(model_dir="./models")
+result   = detector.predict("path/to/image.jpg")
+print(result)
+```
+### Output format
+```json
+{
+  "violations": [
+    {
+      "vehicle_type":      "two_wheeler",
+      "num_riders":        2,
+      "helmet_violations": 1,
+      "wrong_way":         false,
+      "license_plate":     "DL 7S AF 8144"
+    },
+    {
+      "vehicle_type":        "four_wheeler",
+      "seatbelt_violations": 1,
+      "wrong_way":           true,
+      "license_plate":       "MH 12 AB 1234"
+    }
+  ]
+}
+```
+- One entry per **violating** two-wheeler only
+- `violations` is an empty list `[]` if no violations are found
+- `license_plate` is `"UNKNOWN"` when the plate cannot be read
+- `num_riders` counts riders per bike; `helmet_violations` counts those without a helmet
+---
+## Pipeline Architecture
+The pipeline runs in 7 sequential stages per image:
+```
+Input Image
+    │
+    ▼
+┌─────────────────────────────────────────────────────────────────┐
+│  Stage 1 — Primary Detection (yolov8s.pt, COCO)                │
+│  Detects: persons (cls 0), motorcycles (cls 3)                  │
+└────────────────────────┬────────────────────────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 2 — Supplemental Bike Detection  │
+    │  (stage1_best.pt — custom trained)      │
+    │  Merged with Stage 1 bikes via NMS      │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 3 — Monocular Depth Estimation   │
+    │  (Depth-Anything V2 Small, fp16 stored) │
+    │  Produces normalised depth map [0,1]    │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 4 — Person → Bike Association    │
+    │  Criteria: IoU overlap + column align   │
+    │            + depth proximity check      │
+    └────────────────────┬────────────────────┘
+                         │
+              ┌──────────▼──────────┐
+              │   Per-bike loop     │
+              └──────────┬──────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 5 — Helmet Classification        │
+    │  (helmet_v11.pt — YOLOv11 custom)       │
+    │  Crops top 45% of each rider bbox       │
+    │  (head region), runs cls 0=helmet       │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 6 — Wrong Way Detection (API)    │
+    │  (wrong-way-driving-detection-gqdmg/1)  │
+    │  Flags vehicle bounding boxes that      │
+    │  overlap with 'wrong-side' detections   │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 7 — Seatbelt Detection (API)     │
+    │  (seat-belt-detection-udcfg/5)          │
+    │  Runs only on four-wheeler crops        │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 8 — License Plate Localisation   │
+    │  (license.pt — YOLO custom)             │
+    │  Runs on violating vehicles             │
+    └────────────────────┬────────────────────┘
+                         │
+    ┌────────────────────▼────────────────────┐
+    │  Stage 9 — OCR (PaddleOCR 3.5.0)       │
+    │  FSRCNN x3 super-resolution → CLAHE     │
+    │  sharpening → PP-OCRv5 mobile det+rec   │
+    │  Text cleaned: uppercase alphanumeric   │
+    └────────────────────┬────────────────────┘
+                         │
+                         ▼
+              Output: violations list
+```
+### Violation Logic
+- A bike is flagged as a **violation** if:
+  - `num_riders >= 3` (over-riding), **OR**
+  - `helmet_violations > 0` (at least one rider without a helmet)
+- Only violating bikes appear in the output list
+---
+## Model Details
+### `yolov8s.pt` — COCO Primary Detector
+- **Type**: YOLOv8 Small, pretrained on COCO
+- **Used for**: Detecting persons (class 0) and motorcycles (class 3)
+- **Confidence**: 0.30, IoU: 0.45
+### `stage1_best.pt` — Custom Two-Wheeler Detector
+- **Type**: YOLOv8-based, custom trained
+- **Used for**: Supplementing COCO detections with domain-specific two-wheeler types (scooters, three-wheelers, etc. that COCO misses)
+- **Merge**: Combined with COCO bike boxes via IoU-based NMS (threshold 0.45)
+- **Augmented inference** (`augment=True`) for improved recall
+### `depth_anything_v2/` — Monocular Depth Estimation
+- **Type**: Depth-Anything V2 Small (Hugging Face Transformers)
+- **Used for**: Filtering out background pedestrians that share column overlap with a detected bike but are at a different depth plane
+- **Storage**: fp16 safetensors on disk (47.3 MB vs 94.6 MB fp32) — loaded as fp32 at runtime for CPU inference speed
+- **Output**: Normalised depth map [0, 1] resized to match the input image
+### `helmet_v11.pt` — Helmet Classifier
+- **Type**: YOLOv11-based, custom trained on merged dataset
+- **Training data**: 4 merged Kaggle datasets (andrewmvd, aneesarom, roboflow ×2) — all remapped to 2 classes: `with_helmet (0)`, `without_helmet (1)`
+- **Input**: Top 45% of each rider bounding box (head crop) with 5% lateral padding
+- **Confidence**: 0.25
+### `license.pt` — License Plate Localiser
+- **Type**: YOLO custom, trained on Indian license plates
+- **Used for**: Detecting the tight bounding box of the license plate within a bike crop
+- **Confidence**: 0.20 (low threshold to catch partially visible plates)
+### `FSRCNN_x3.pb` — Super-Resolution
+- **Type**: FSRCNN (Fast Super-Resolution CNN), ×3 scale, TensorFlow/OpenCV DNN
+- **Used for**: Upscaling small plate crops (often <100px tall) 3× before OCR to improve recognition accuracy
+### `paddleocr/` — OCR Engine (PaddleOCR 3.5.0)
+- **Detection**: `PP-OCRv5_mobile_det` (4.7 MB) — finds text line bounding boxes within the plate crop
+- **Recognition**: `en_PP-OCRv5_mobile_rec` (7.6 MB) — reads each text line
+- **Orientation models**: `PP-LCNet_x1_0_doc_ori`, `PP-LCNet_x1_0_textline_ori` — handle rotated plates
+- **Unwarping**: `UVDoc` — corrects perspective distortion
+- **API**: Uses the legacy `.ocr()` method (not `.predict()`). Both call the same underlying pipeline, but `.ocr()` uses a compatible inference backend on Windows/Linux CPU without triggering the OneDNN fused_conv2d operator crash present in the newer `.predict()` path
+- **Post-processing**: Text is uppercased, non-alphanumeric characters stripped, tokens shorter than 2 characters discarded
+---
+## Offline Operation
+All model weights are bundled in `./models/`. No internet connection is required at runtime.
+PaddleOCR 3.5.0 uses [paddlex](https://github.com/PaddlePaddle/PaddleX) internally and looks for models via the `PADDLE_PDX_CACHE_HOME` environment variable. `solution.py` sets this variable to `./models/paddleocr/` **before** any paddle import, so paddlex resolves all models from the bundled path:
+```python
+os.environ["PADDLE_PDX_CACHE_HOME"] = str(Path(__file__).parent / "models" / "paddleocr")
+```
 ---
+## Design Decisions
+### Why two bike detectors?
+COCO's `motorcycle` class (cls 3) misses many Indian two-wheeler types. The custom `stage1_best.pt` trained on traffic footage recovers these. Boxes from both are merged via NMS.
+### Why depth filtering?
+In busy street scenes, COCO frequently detects pedestrians on the footpath who share horizontal overlap with a detected bike. Depth-Anything V2 provides a proxy for Z-distance; persons whose median depth differs from the bike's median depth by more than 35% are excluded from association.
+### Why not use PaddleOCR's server detection model?
+`PP-OCRv5_server_det` is 84.3 MB — bundling it would push the total over 250 MB. Instead, `license.pt` performs the coarse plate localisation (narrowing the search area to ~125×90 px), then `PP-OCRv5_mobile_det` (4.7 MB) finds individual text lines within that small crop, and `en_PP-OCRv5_mobile_rec` reads them. This two-stage localisation gives equivalent quality at a fraction of the size.
+### Why store depth model as fp16?
+`model.safetensors` converted from fp32 (94.6 MB) to fp16 (47.3 MB) at submission time using `safetensors.torch`. At runtime the model is loaded as fp32 (`dtype=torch.float32`) because x86 CPUs have no native fp16 compute units — running fp16 tensors on CPU causes a 10× slowdown. The disk saving is free; the compute cost is zero.
+### Fallback for missing riders
+If no COCO person is associated with a detected bike (e.g., very small image, occluded rider), one rider with no helmet is assumed. This is a conservative choice — it risks a false positive but never misses a genuine violation.
 ---
+## Constraints Compliance
+| Constraint | Status |
+|---|---|
+| Model size ≤ 250 MB | ✅ 194.6 MB |
+| No VLMs > 1B parameters | ✅ Largest model is Depth-Anything V2 Small (~24M params) |
+| Fully offline execution | ✅ All weights in `./models/`, `PADDLE_PDX_CACHE_HOME` redirected |
+| `TrafficViolationDetector` interface | ✅ `__init__(model_dir)` + `predict(image_path) → dict` |
+| Stateless `predict()` | ✅ No mutable shared state between calls |
+| Error handling | ✅ All exceptions caught; returns `{"violations": []}` on failure |
+---
+## Performance (Local Windows CPU)
+| Metric | Value |
+|---|---|
+| Init time (cold start) | ~3–4 s |
+| Inference — simple scene (1–2 bikes) | ~4–5 s |
+| Inference — dense scene (8+ bikes) | ~10–12 s |
+> **Note**: The evaluation server runs Linux with a faster CPU; inference times are expected to be lower. Depth estimation (Depth-Anything V2) is the primary bottleneck on CPU.

app.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import gradio as gr
+from solution import TrafficViolationDetector
+# Initialize the detector once at startup to keep models loaded in memory
+print("Loading models for Hugging Face Space...")
+detector = TrafficViolationDetector(model_dir="./models")
+print("Models loaded successfully!")
+def detect_violations(image_path):
+    if image_path is None:
+        return {"error": "No image provided"}
+    try:
+        # The detector.predict expects a path to the image
+        result = detector.predict(image_path)
+        return result
+    except Exception as e:
+        return {"error": str(e)}
+# Create the Gradio interface
+iface = gr.Interface(
+    fn=detect_violations,
+    inputs=gr.Image(type="filepath", label="Upload Traffic Image"),
+    outputs=gr.JSON(label="Violation Results"),
+    title="Traffic Rule Violation Detection API",
+    description="Upload an image to detect traffic violations. Supports two-wheelers (helmet, over-riding, wrong-way) and four-wheelers (seatbelt, wrong-way). Detects and runs OCR on the license plates of violating vehicles.\n\nThis application can be accessed programmatically via its built-in API.",
+    allow_flagging="never"
+)
+if __name__ == "__main__":
+    # Launch on 0.0.0.0 to allow Hugging Face to route traffic
+    iface.launch(server_name="0.0.0.0", server_port=7860)

models/FSRCNN_x3.pb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:efd38655a815908c6c8954db6052f128e76a735f1de657894c477d0dc0b64481
+size 40093

models/depth_anything_v2/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

models/depth_anything_v2/README.md ADDED Viewed

	@@ -0,0 +1,108 @@

+---
+license: apache-2.0
+tags:
+- depth
+- relative depth
+pipeline_tag: depth-estimation
+library: transformers
+widget:
+- inference: false
+---
+# Depth Anything V2 Small – Transformers Version
+Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features:
+- more fine-grained details than Depth Anything V1
+- more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard)
+- more efficient (10x faster) and more lightweight than SD-based models
+- impressive fine-tuned performance with our pre-trained models
+This model checkpoint is compatible with the transformers library.
+Depth Anything V2 was introduced in [the paper of the same name](https://arxiv.org/abs/2406.09414) by Lihe Yang et al. It uses the same architecture as the original Depth Anything release, but uses synthetic data and a larger capacity teacher model to achieve much finer and robust depth predictions. The original Depth Anything model was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al., and was first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
+[Online demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-V2).
+## Model description
+Depth Anything V2 leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
+The model is trained on ~600K synthetic labeled images and ~62 million real unlabeled images, obtaining state-of-the-art results for both relative and absolute depth estimation.
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
+alt="drawing" width="600"/>
+<small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
+## Intended uses & limitations
+You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
+other versions on a task that interests you.
+### How to use
+Here is how to use this model to perform zero-shot depth estimation:
+```python
+from transformers import pipeline
+from PIL import Image
+import requests
+# load pipe
+pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")
+# load image
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+# inference
+depth = pipe(image)["depth"]
+```
+Alternatively, you can use the model and processor classes:
+```python
+from transformers import AutoImageProcessor, AutoModelForDepthEstimation
+import torch
+import numpy as np
+from PIL import Image
+import requests
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+image_processor = AutoImageProcessor.from_pretrained("depth-anything/Depth-Anything-V2-Small-hf")
+model = AutoModelForDepthEstimation.from_pretrained("depth-anything/Depth-Anything-V2-Small-hf")
+# prepare image for the model
+inputs = image_processor(images=image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    predicted_depth = outputs.predicted_depth
+# interpolate to original size
+prediction = torch.nn.functional.interpolate(
+    predicted_depth.unsqueeze(1),
+    size=image.size[::-1],
+    mode="bicubic",
+    align_corners=False,
+)
+```
+For more code examples, please refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
+### Citation
+```bibtex
+@misc{yang2024depth,
+      title={Depth Anything V2},
+      author={Lihe Yang and Bingyi Kang and Zilong Huang and Zhen Zhao and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
+      year={2024},
+      eprint={2406.09414},
+      archivePrefix={arXiv},
+      primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
+}
+```

models/depth_anything_v2/config.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "_commit_hash": null,
+  "architectures": [
+    "DepthAnythingForDepthEstimation"
+  ],
+  "backbone": null,
+  "backbone_config": {
+    "architectures": [
+      "Dinov2Model"
+    ],
+    "hidden_size": 384,
+    "image_size": 518,
+    "model_type": "dinov2",
+    "num_attention_heads": 6,
+    "out_features": [
+      "stage3",
+      "stage6",
+      "stage9",
+      "stage12"
+    ],
+    "out_indices": [
+      3,
+      6,
+      9,
+      12
+    ],
+    "patch_size": 14,
+    "reshape_hidden_states": false,
+    "torch_dtype": "float32"
+  },
+  "fusion_hidden_size": 64,
+  "head_hidden_size": 32,
+  "head_in_index": -1,
+  "initializer_range": 0.02,
+  "model_type": "depth_anything",
+  "neck_hidden_sizes": [
+    48,
+    96,
+    192,
+    384
+  ],
+  "patch_size": 14,
+  "reassemble_factors": [
+    4,
+    2,
+    1,
+    0.5
+  ],
+  "reassemble_hidden_size": 384,
+  "torch_dtype": "float32",
+  "transformers_version": null,
+  "use_pretrained_backbone": false
+}

models/depth_anything_v2/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:453e3d7ffaa5d89985ea6593e87af0091026e098d028ae7ab7f929780c3a2f85
+size 49603410

models/depth_anything_v2/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "_valid_processor_keys": [
+    "images",
+    "do_resize",
+    "size",
+    "keep_aspect_ratio",
+    "ensure_multiple_of",
+    "resample",
+    "do_rescale",
+    "rescale_factor",
+    "do_normalize",
+    "image_mean",
+    "image_std",
+    "do_pad",
+    "size_divisor",
+    "return_tensors",
+    "data_format",
+    "input_data_format"
+  ],
+  "do_normalize": true,
+  "do_pad": false,
+  "do_rescale": true,
+  "do_resize": true,
+  "ensure_multiple_of": 14,
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_processor_type": "DPTImageProcessor",
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "keep_aspect_ratio": true,
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 518,
+    "width": 518
+  },
+  "size_divisor": null
+}

models/helmet_v11.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a228e55c20f452a1e19ca42d5c9fd2f115611667a7cddce3c68046aa6c590e43
+size 5478490

models/license.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e30a2bfd1f8342eb7f21d9f7d3bb9452c8570eb597df3eb8bbe04e66f8fde0f6
+size 44849729

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+inference.pdiparams filter=lfs diff=lfs merge=lfs -text

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/README.md ADDED Viewed

	@@ -0,0 +1,152 @@

+---
+license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+- zh
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
+- doc_img_orientation_classification
+---
+# PP-LCNet_x1_0_doc_ori
+## Introduction
+The Document Image Orientation Classification Module is primarily designed to distinguish the orientation of document images and correct them through post-processing. During processes such as document scanning or ID photo capturing, the device might be rotated to achieve clearer images, resulting in images with various orientations. Standard OCR pipelines may not handle these images effectively. By leveraging image classification techniques, the orientation of documents or IDs containing text regions can be pre-determined and adjusted, thereby improving the accuracy of OCR processing. The key accuracy metrics are as follow:
+<table>
+<tr>
+<th>Model</th>
+<th>Recognition Avg Accuracy(%)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+<tr>
+<td>PP-LCNet_x1_0_doc_ori</td>
+<td>99.06</td>
+<td>7</td>
+<td>A document image classification model based on PP-LCNet_x1_0, with four categories: 0°, 90°, 180°, and 270°.</td>
+</tr>
+</table>
+## Quick Start
+### Installation
+1. PaddlePaddle
+Please refer to the following commands to install PaddlePaddle using pip:
+```bash
+# for CUDA11.8
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# for CUDA12.6
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+# for CPU
+python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
+2. PaddleOCR
+Install the latest version of the PaddleOCR inference package from PyPI:
+```bash
+python -m pip install paddleocr
+```
+### Model Usage
+You can quickly experience the functionality with a single command:
+```bash
+paddleocr doc_img_orientation_classification \
+    --model_name PP-LCNet_x1_0_doc_ori \
+    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/4ifXaBJmFByG_mAnF86Vv.png
+```
+You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine.
+```python
+from paddleocr import DocImgOrientationClassification
+model = DocImgOrientationClassification(model_name="PP-LCNet_x1_0_doc_ori")
+output = model.predict(input="4ifXaBJmFByG_mAnF86Vv.png", batch_size=1)
+for res in output:
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
+```
+After running, the obtained result is as follows:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/4ifXaBJmFByG_mAnF86Vv.png', 'page_index': None, 'class_ids': array([2], dtype=int32), 'scores': array([0.90971], dtype=float32), 'label_names': ['180']}}
+```
+The visualized image is as follows:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/DU_k30fxijLXFdXl179-0.png)
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_recognition.html#iii-quick-start).
+### Pipeline Usage
+The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
+#### doc_preprocessor
+The Document Image Preprocessing Pipeline integrates two key functions: document orientation classification and geometric distortion correction. The document orientation classification module automatically identifies the four possible orientations of a document (0°, 90°, 180°, 270°), ensuring that the document is processed in the correct direction. The text image unwarping model is designed to correct geometric distortions that occur during document photography or scanning, restoring the document's original shape and proportions. This pipeline is suitable for digital document management, preprocessing tasks for OCR, and any scenario requiring improved document image quality. By automating orientation correction and geometric distortion correction, this module significantly enhances the accuracy and efficiency of document processing, providing a more reliable foundation for image analysis. The pipeline also offers flexible service-oriented deployment options, supporting calls from various programming languages on multiple hardware platforms. Additionally, the pipeline supports secondary development, allowing you to fine-tune the models on your own datasets and seamlessly integrate the trained models. And there are 2 modules in the pipeline:
+* Document Image Orientation Classification Module (Optional)
+* Text Image Unwarping Module (Optional)
+Run a single command to quickly experience the OCR pipeline:
+```bash
+paddleocr doc_preprocessor -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/pY6sY6wLDuoHF1-cGUvDr.png \
+    --use_doc_orientation_classify True \
+    --use_doc_unwarping True \
+    --doc_orientation_classify_model_name PP-LCNet_x1_0_doc_ori \
+    --save_path ./output \
+    --device gpu:0
+```
+Results are printed to the terminal:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/pY6sY6wLDuoHF1-cGUvDr.png', 'page_index': None, 'model_settings': {'use_doc_orientation_classify': True, 'use_doc_unwarping': True}, 'angle': 180}}
+```
+If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/HM8xQKtyBHx-CNVGk2ZJd.png)
+The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
+```python
+from paddleocr import DocPreprocessor
+ocr = DocPreprocessor(
+    doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori",
+    use_doc_orientation_classify=True, # Use use_doc_orientation_classify to enable/disable document orientation classification model
+    use_doc_unwarping=True, # Use use_doc_unwarping to enable/disable document unwarping module
+    device="gpu:0", # Use device to specify GPU for model inference
+)
+result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/pY6sY6wLDuoHF1-cGUvDr.png")
+for res in result:
+    res.print()
+    res.save_to_img("output")
+    res.save_to_json("output")
+```
+## Links
+[PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
+[PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/config.json ADDED Viewed

	@@ -0,0 +1,102 @@

+{
+    "Global": {
+        "model_name": "PP-LCNet_x1_0_doc_ori"
+    },
+    "Hpi": {
+        "backend_configs": {
+            "paddle_infer": {
+                "trt_dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            224,
+                            224
+                        ],
+                        [
+                            1,
+                            3,
+                            224,
+                            224
+                        ],
+                        [
+                            8,
+                            3,
+                            224,
+                            224
+                        ]
+                    ]
+                }
+            },
+            "tensorrt": {
+                "dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            224,
+                            224
+                        ],
+                        [
+                            1,
+                            3,
+                            224,
+                            224
+                        ],
+                        [
+                            8,
+                            3,
+                            224,
+                            224
+                        ]
+                    ]
+                }
+            }
+        }
+    },
+    "PreProcess": {
+        "transform_ops": [
+            {
+                "ResizeImage": {
+                    "resize_short": 256
+                }
+            },
+            {
+                "CropImage": {
+                    "size": 224
+                }
+            },
+            {
+                "NormalizeImage": {
+                    "channel_num": 3,
+                    "mean": [
+                        0.485,
+                        0.456,
+                        0.406
+                    ],
+                    "order": "",
+                    "scale": 0.00392156862745098,
+                    "std": [
+                        0.229,
+                        0.224,
+                        0.225
+                    ]
+                }
+            },
+            {
+                "ToCHWImage": null
+            }
+        ]
+    },
+    "PostProcess": {
+        "Topk": {
+            "topk": 1,
+            "label_list": [
+                "0",
+                "90",
+                "180",
+                "270"
+            ]
+        }
+    }
+}

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.pdiparams ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8d6e7c5d264507e40e58a655779059d616b20d7441ea22047d829eb3931989c
+size 6754166

models/paddleocr/official_models/PP-LCNet_x1_0_doc_ori/inference.yml ADDED Viewed

	@@ -0,0 +1,48 @@

+Global:
+  model_name: PP-LCNet_x1_0_doc_ori
+Hpi:
+  backend_configs:
+    paddle_infer:
+      trt_dynamic_shapes: &id001
+        x:
+        - - 1
+          - 3
+          - 224
+          - 224
+        - - 1
+          - 3
+          - 224
+          - 224
+        - - 8
+          - 3
+          - 224
+          - 224
+    tensorrt:
+      dynamic_shapes: *id001
+PreProcess:
+  transform_ops:
+  - ResizeImage:
+      resize_short: 256
+  - CropImage:
+      size: 224
+  - NormalizeImage:
+      channel_num: 3
+      mean:
+      - 0.485
+      - 0.456
+      - 0.406
+      order: ''
+      scale: 0.00392156862745098
+      std:
+      - 0.229
+      - 0.224
+      - 0.225
+  - ToCHWImage: null
+PostProcess:
+  Topk:
+    topk: 1
+    label_list:
+    - '0'
+    - '90'
+    - '180'
+    - '270'

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+inference.pdiparams filter=lfs diff=lfs merge=lfs -text

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/README.md ADDED Viewed

	@@ -0,0 +1,104 @@

+---
+license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+- zh
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
+- textline_orientation_classification
+---
+# PP-LCNet_x1_0_textline_ori
+## Introduction
+The text line orientation classification module primarily distinguishes the orientation of text lines and corrects them using post-processing. In processes such as document scanning and license/certificate photography, to capture clearer images, the capture device may be rotated, resulting in text lines in various orientations. Standard OCR pipelines cannot handle such data well. By utilizing image classification technology, the orientation of text lines can be predetermined and adjusted, thereby enhancing the accuracy of OCR processing. The key accuracy metrics are as follow:
+<table>
+<tr>
+<th>Model</th>
+<th>Recognition Avg Accuracy(%)</th>
+<th>Model Storage Size (M)</th>
+<th>Introduction</th>
+</tr>
+<tr>
+<td>PP-LCNet_x1_0_textline_ori</td>
+<td>98.85</td>
+<td>0.96</td>
+<td>Text line classification model based on PP-LCNet_x0_25, with two classes: 0 degrees and 180 degrees</td>
+</tr>
+</table>
+## Quick Start
+### Installation
+1. PaddlePaddle
+Please refer to the following commands to install PaddlePaddle using pip:
+```bash
+# for CUDA11.8
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# for CUDA12.6
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+# for CPU
+python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
+2. PaddleOCR
+Install the latest version of the PaddleOCR inference package from PyPI:
+```bash
+python -m pip install paddleocr
+```
+### Model Usage
+You can quickly experience the functionality with a single command:
+```bash
+paddleocr text_line_orientation_classification \
+    --model_name PP-LCNet_x1_0_textline_ori \
+    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/m3ZmUPAnst1f9xXvTVLKS.png
+```
+You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine.
+```python
+from paddleocr import TextLineOrientationClassification
+model = TextLineOrientationClassification(model_name="PP-LCNet_x1_0_textline_ori")
+output = model.predict(input="m3ZmUPAnst1f9xXvTVLKS.png", batch_size=1)
+for res in output:
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
+```
+After running, the obtained result is as follows:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/m3ZmUPAnst1f9xXvTVLKS.png', 'page_index': None, 'class_ids': array([1], dtype=int32), 'scores': array([0.99829], dtype=float32), 'label_names': ['180_degree']}}
+```
+The visualized image is as follows:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/0y5rEbMTzgsqP6Ptnj-Er.png)
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_recognition.html#iii-quick-start).
+## Links
+[PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
+[PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/config.json ADDED Viewed

	@@ -0,0 +1,98 @@

+{
+    "Global": {
+        "model_name": "PP-LCNet_x1_0_textline_ori"
+    },
+    "Hpi": {
+        "backend_configs": {
+            "paddle_infer": {
+                "trt_dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            80,
+                            160
+                        ],
+                        [
+                            1,
+                            3,
+                            80,
+                            160
+                        ],
+                        [
+                            8,
+                            3,
+                            80,
+                            160
+                        ]
+                    ]
+                }
+            },
+            "tensorrt": {
+                "dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            80,
+                            160
+                        ],
+                        [
+                            1,
+                            3,
+                            80,
+                            160
+                        ],
+                        [
+                            8,
+                            3,
+                            80,
+                            160
+                        ]
+                    ]
+                }
+            }
+        }
+    },
+    "PreProcess": {
+        "transform_ops": [
+            {
+                "ResizeImage": {
+                    "size": [
+                        160,
+                        80
+                    ]
+                }
+            },
+            {
+                "NormalizeImage": {
+                    "channel_num": 3,
+                    "mean": [
+                        0.485,
+                        0.456,
+                        0.406
+                    ],
+                    "order": "",
+                    "scale": 0.00392156862745098,
+                    "std": [
+                        0.229,
+                        0.224,
+                        0.225
+                    ]
+                }
+            },
+            {
+                "ToCHWImage": null
+            }
+        ]
+    },
+    "PostProcess": {
+        "Topk": {
+            "topk": 1,
+            "label_list": [
+                "0_degree",
+                "180_degree"
+            ]
+        }
+    }
+}

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.pdiparams ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0de2bcf996cf553e2b848dd7b1769dafffc6917b1ccdf55c1d8efe7909fbf743
+size 6743918

models/paddleocr/official_models/PP-LCNet_x1_0_textline_ori/inference.yml ADDED Viewed

	@@ -0,0 +1,46 @@

+Global:
+  model_name: PP-LCNet_x1_0_textline_ori
+Hpi:
+  backend_configs:
+    paddle_infer:
+      trt_dynamic_shapes: &id001
+        x:
+        - - 1
+          - 3
+          - 80
+          - 160
+        - - 1
+          - 3
+          - 80
+          - 160
+        - - 8
+          - 3
+          - 80
+          - 160
+    tensorrt:
+      dynamic_shapes: *id001
+PreProcess:
+  transform_ops:
+  - ResizeImage:
+      size:
+      - 160
+      - 80
+  - NormalizeImage:
+      channel_num: 3
+      mean:
+      - 0.485
+      - 0.456
+      - 0.406
+      order: ''
+      scale: 0.00392156862745098
+      std:
+      - 0.229
+      - 0.224
+      - 0.225
+  - ToCHWImage: null
+PostProcess:
+  Topk:
+    topk: 1
+    label_list:
+    - 0_degree
+    - 180_degree

models/paddleocr/official_models/PP-OCRv5_mobile_det/.msc ADDED Viewed

Binary file (366 Bytes). View file

models/paddleocr/official_models/PP-OCRv5_mobile_det/.mv ADDED Viewed

	@@ -0,0 +1 @@


1	+ Revision:master,CreatedAt:1751518563

models/paddleocr/official_models/PP-OCRv5_mobile_det/README.md ADDED Viewed

	@@ -0,0 +1,219 @@

+---
+license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+- zh
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
+---
+# PP-OCRv5_mobile_det
+## Introduction
+PP-OCRv5_mobile_det is one of the PP-OCRv5_det series, the latest generation of text detection models developed by the PaddleOCR team. It aims to efficiently and accurately supports the detection of text in diverse scenarios—including handwriting, vertical, rotated, and curved text—across multiple languages such as Simplified Chinese, Traditional Chinese, English, and Japanese. Key features include robust handling of complex layouts, varying text sizes, and challenging backgrounds, making it suitable for practical applications like document analysis, license plate recognition, and scene text detection. The key accuracy metrics are as follow:
+| Handwritten Chinese | Handwritten English | Printed Chinese | Printed English | Traditional Chinese | Ancient Text | Japanese | General Scenario | Pinyin | Rotation | Distortion | Artistic Text | Average |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+| 0.744 | 0.777 | 0.905 | 0.910	| 0.823 | 0.581 | 0.727	 | 0.721 | 0.575 | 0.647 | 0.827 | 0.525 | 0.770 |
+## Quick Start
+### Installation
+1. PaddlePaddle
+Please refer to the following commands to install PaddlePaddle using pip:
+```bash
+# for CUDA11.8
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# for CUDA12.6
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+# for CPU
+python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
+2. PaddleOCR
+Install the latest version of the PaddleOCR inference package from PyPI:
+```bash
+python -m pip install paddleocr
+```
+### Model Usage
+You can quickly experience the functionality with a single command:
+```bash
+paddleocr text_detection \
+    --model_name PP-OCRv5_mobile_det \
+    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png
+```
+You can also integrate the model inference of the text detection module into your project. Before running the following code, please download the sample image to your local machine.
+```python
+from paddleocr import TextDetection
+model = TextDetection(model_name="PP-OCRv5_mobile_det")
+output = model.predict(input="3ul2Rq4Sk5Cn-l69D695U.png", batch_size=1)
+for res in output:
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
+```
+After running, the obtained result is as follows:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'dt_polys': array([[[ 105, 1431],
+        ...,
+        [ 105, 1452]],
+       ...,
+       [[ 353,  106],
+        ...,
+        [ 353,  129]]], dtype=int16), 'dt_scores': [0.8306416015066644, 0.7603795581201811, ..., 0.8819806867477359]}}
+```
+The visualized image is as follows:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/x7iTnr_hOnfTdyblW0qcb.jpeg)
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_detection.html#iii-quick-start).
+### Pipeline Usage
+The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
+#### PP-OCRv5
+The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline:
+* Document Image Orientation Classification Module (Optional)
+* Text Image Unwarping Module (Optional)
+* Text Line Orientation Classification Module (Optional)
+* Text Detection Module
+* Text Recognition Module
+Run a single command to quickly experience the OCR pipeline:
+```bash
+paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png \
+    --text_detection_model_name PP-OCRv5_mobile_det \
+    --use_doc_orientation_classify False \
+    --use_doc_unwarping False \
+    --use_textline_orientation True \
+    --save_path ./output \
+    --device gpu:0
+```
+Results are printed to the terminal:
+```json
+{'res': {'input_path': 'printing_en/3ul2Rq4Sk5Cn-l69D695U.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': True}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[ 352,  105],
+        ...,
+        [ 352,  128]],
+       ...,
+       [[ 632, 1431],
+        ...,
+        [ 632, 1447]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([0, ..., 0]), 'text_rec_score_thresh': 0.0, 'rec_texts': ['Algorithms for the Markov Entropy Decomposition', 'Andrew J. Ferris and David Poulin', 'Département de Physique, Université de Sherbrooke, Québec, JI K 2R1, Canada', '(Dated: October 31, 2018)', 'The Markov entropy decomposition (MED) is a recently-proposed, cluster-based simulation method for fi -', 'nite temperature quantum systems with arbitrary geometry. In this paper, we detail numerical algorithms for', 'performing the required steps of the MED, principally solving a minimization problem with a preconditioned', 'arXiv:1212.1442v1 [cond-mat.stat-mech] 6 Dec 2012', "Newton's algorithm, as well as how to extract global susceptibilities and thermal responses. We demonstrate", 'the power of the method with the spin-1/2 XXZ model on the 2D square lattice, including the extraction of', 'critical points and details of each phase. Although the method shares some qualitative similarities with exact-', 'diagonalization, we show the MED is both more accurate and significantly more flexible.', 'PACS numbers: 05.10.—a, 02.50.Ng, 03.67.–a, 74.40.Kb', 'I. INTRODUCTION', 'This approximation becomes exact in the case of a 1D quan-', 'tum (or classical) Markov chain [1O], and leads to an expo-', 'Although the equations governing quantum many-body', 'nential reduction of cost for exact entropy calculations when', 'systems are simple to write down, finding solutions for the', 'the global density matrix is a higher-dimensional Markov net-', 'majority of systems remains incredibly difficult. Modern', 'work state [12, 13].', 'physics finds itself in need of new tools to compute the emer-', 'The second approximation used in the MED approach is', 'gent behavior of large, many-body systems.', 'related to the N-representibility problem. Given a set of lo-', 'There has been a great variety of tools developed to tackle', 'cal but overlapping reduced density matrices { ρi }, it is a very', 'many-body problems, but in general, large 2D and 3D quan-', 'challenging problem to determine if there exists a global den.', 'tum systems remain hard to deal with. Most systems are', 'sity operator which is positive semi-definite and whose partial', 'thought to be non-integrable, so exact analytic solutions are', 'trace agrees with each ρi. This problem is QMA-hard (the', 'not usually expected. Direct numerical diagonalization can be', 'quantum analogue of NP) [14, 15], and is hopelessly diffi-', 'performed for relatively small systems — however the emer-', 'cult to enforce. Thus, the second approximation employed', 'gent behavior of a system in the thermodynamic limit may be', 'involves ignoring global consistency with a positive opera-', 'difficult to extract, especially in systems with large correlation', 'tor, while requiring local consistency on any overlapping re-', 'lengths. Monte Carlo approaches are technically exact (up to', 'gions between the ρi. At the zero-temperature limit, the MED', 'sampling error), but suffer from the so-called sign problem', 'approach becomes analogous to the variational nth-order re-', 'for fermionic, frustrated, or dynamical problems. Thus we are', 'duced density matrix approach, where positivity is enforced', 'limited to search for clever approximations to solve the ma-', 'on all reduced density matrices of size n [16–18].', 'jority of many-body problems.', 'The MED approach is an extremely flexible cluster method.', 'Over the past century, hundreds of such approximations', 'applicable to both translationally invariant systems of any di-', 'have been proposed, and we will mention just a few notable', 'mension in the thermodynamic limit, as well as finite systems', 'examples applicable to quantum lattice models. Mean-field', 'or systems without translational invariance (e.g. disordered', 'theory is simple and frequently arrives at the correct quali-', 'lattices, or harmonically trapped atoms in optical lattices).', 'tative description, but often fails when correlations are im-', 'The free energy given by MED is guaranteed to lower bound', 'portant. Density-matrix renormalisation group (DMRG) [1]', 'the true free energy, which in turn lower-bounds the ground', 'is efficient and extremely accurate at solving 1D problems,', 'state energy — thus providing a natural complement to varia-', 'but the computational cost grows exponentially with system', 'tional approaches which upper-bound the ground state energy.', 'size in two- or higher-dimensions [2, 3]. Related tensor-', 'The ability to provide a rigorous ground-state energy window', 'network techniques designed for 2D systems are still in their', 'is a powerful validation tool, creating a very compelling rea-', 'infancy [4–6]. Series-expansion methods [7] can be success-', 'son to use this approach.', 'ful, but may diverge or otherwise converge slowly, obscuring', 'In this paper we paper we present a pedagogical introduc-', 'the state in certain regimes. There exist a variety of cluster-', 'tion to MED, including numerical implementation issues and', 'based techniques, such as dynamical-mean-field theory [8]', 'applications to 2D quantum lattice models in the thermody-', 'and density-matrix embedding [9]', 'namic limit. In Sec. II. we giye a brief deriyation of the', 'Here we discuss the so-called Markov entropy decompo-', 'Markov entropy decomposition. Section III outlines a robust', 'sition (MED), recently proposed by Poulin & Hastings [10]', 'numerical strategy for optimizing the clusters that make up', '(and analogous to a slightly earlier classical algorithm [11]).', 'the decomposition. In Sec. IV we show how we can extend', 'This is a self-consistent cluster method for fi nite temperature', 'these algorithms to extract non-trivial information, such as', 'systems that takes advantage of an approximation of the (von', 'specific heat and susceptibilities. We present an application of', 'Neumann) entropy. In [10], it was shown that the entropy', 'the method to the spin-1/2 XXZ model on a 2D square lattice', 'per site can be rigorously upper bounded using only local in-', 'in Sec. V, describing how to characterize the phase diagram', 'formation — a local, reduced density matrix on N sites, say.', 'and determine critical points, before concluding in Sec. VI.'], 'rec_scores': array([0.99388635, ..., 0.99304372]), 'rec_polys': array([[[ 352,  105],
+        ...,
+        [ 352,  128]],
+       ...,
+       [[ 632, 1431],
+        ...,
+        [ 632, 1447]]], dtype=int16), 'rec_boxes': array([[ 352, ...,  128],
+       ...,
+       [ 632, ..., 1447]], dtype=int16)}}
+```
+If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/4lLYO_jQJwz3qWuv7CAyf.png)
+The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
+```python
+from paddleocr import PaddleOCR
+ocr = PaddleOCR(
+    text_detection_model_name="PP-OCRv5_mobile_det",
+    use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
+    use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
+    use_textline_orientation=True, # Use use_textline_orientation to enable/disable textline orientation classification model
+    device="gpu:0", # Use device to specify GPU for model inference
+)
+result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/3ul2Rq4Sk5Cn-l69D695U.png")
+for res in result:
+    res.print()
+    res.save_to_img("output")
+    res.save_to_json("output")
+```
+The default model used in pipeline is `PP-OCRv5_server_det`, so it is needed that specifing to `PP-OCRv5_mobile_det` by argument `text_detection_model_name`. And you can also use the local model file by argument `text_detection_model_dir`. For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/OCR.html#2-quick-start).
+#### PP-StructureV3
+Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules:
+* Layout Detection Module
+* General OCR Pipeline
+* Document Image Preprocessing Pipeline （Optional）
+* Table Recognition Pipeline （Optional）
+* Seal Recognition Pipeline （Optional）
+* Formula Recognition Pipeline （Optional）
+Run a single command to quickly experience the PP-StructureV3 pipeline:
+```bash
+paddleocr pp_structurev3 -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/mG4tnwfrvECoFMu-S9mxo.png \
+    --text_detection_model_name PP-OCRv5_mobile_det \
+    --use_doc_orientation_classify False \
+    --use_doc_unwarping False \
+    --use_textline_orientation False \
+    --device gpu:0
+```
+Results would be printed to the terminal. If save_path is specified, the results will be saved under `save_path`. The predicted markdown visualization is shown below:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/SfxF0X4drBTNGnfFOtZij.png)
+Just a few lines of code can experience the inference of the pipeline. Taking the PP-StructureV3 pipeline as an example:
+```python
+from paddleocr import PPStructureV3
+pipeline = PPStructureV3(
+    text_detection_model_name="PP-OCRv5_mobile_det",
+    use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
+    use_doc_unwarping=False,    # Use use_doc_unwarping to enable/disable document unwarping module
+    use_textline_orientation=False, # Use use_textline_orientation to enable/disable textline orientation classification model
+    device="gpu:0", # Use device to specify GPU for model inference
+    )
+output = pipeline.predict("./pp_structure_v3_demo.png")
+for res in output:
+    res.print() # Print the structured prediction output
+    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
+    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
+```
+The default model used in pipeline is `PP-OCRv5_server_det`, so it is needed that specifing to `PP-OCRv5_mobile_det` by argument `text_detection_model_name`. And you can also use the local model file by argument `text_detection_model_dir`. For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/PP-StructureV3.html#2-quick-start).
+## Links
+[PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
+[PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)

models/paddleocr/official_models/PP-OCRv5_mobile_det/config.json ADDED Viewed

	@@ -0,0 +1,111 @@

+{
+    "Global": {
+        "model_name": "PP-OCRv5_mobile_det"
+    },
+    "Hpi": {
+        "backend_configs": {
+            "paddle_infer": {
+                "trt_dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            32,
+                            32
+                        ],
+                        [
+                            1,
+                            3,
+                            736,
+                            736
+                        ],
+                        [
+                            1,
+                            3,
+                            4000,
+                            4000
+                        ]
+                    ]
+                }
+            },
+            "tensorrt": {
+                "dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            32,
+                            32
+                        ],
+                        [
+                            1,
+                            3,
+                            736,
+                            736
+                        ],
+                        [
+                            1,
+                            3,
+                            4000,
+                            4000
+                        ]
+                    ]
+                }
+            }
+        }
+    },
+    "PreProcess": {
+        "transform_ops": [
+            {
+                "DecodeImage": {
+                    "channel_first": false,
+                    "img_mode": "BGR"
+                }
+            },
+            {
+                "DetLabelEncode": null
+            },
+            {
+                "DetResizeForTest": {
+                    "resize_long": 960
+                }
+            },
+            {
+                "NormalizeImage": {
+                    "mean": [
+                        0.485,
+                        0.456,
+                        0.406
+                    ],
+                    "order": "hwc",
+                    "scale": "1./255.",
+                    "std": [
+                        0.229,
+                        0.224,
+                        0.225
+                    ]
+                }
+            },
+            {
+                "ToCHWImage": null
+            },
+            {
+                "KeepKeys": {
+                    "keep_keys": [
+                        "image",
+                        "shape",
+                        "polys",
+                        "ignore_tags"
+                    ]
+                }
+            }
+        ]
+    },
+    "PostProcess": {
+        "name": "DBPostProcess",
+        "thresh": 0.3,
+        "box_thresh": 0.6,
+        "max_candidates": 1000,
+        "unclip_ratio": 1.5
+    }
+}

models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.pdiparams ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:afa1820cb16c1fd0dad589d0f8b389139061c1ef6d68019685fd07be997dda5b
+size 4692937

models/paddleocr/official_models/PP-OCRv5_mobile_det/inference.yml ADDED Viewed

	@@ -0,0 +1,53 @@

+Global:
+  model_name: PP-OCRv5_mobile_det
+Hpi:
+  backend_configs:
+    paddle_infer:
+      trt_dynamic_shapes: &id001
+        x:
+        - - 1
+          - 3
+          - 32
+          - 32
+        - - 1
+          - 3
+          - 736
+          - 736
+        - - 1
+          - 3
+          - 4000
+          - 4000
+    tensorrt:
+      dynamic_shapes: *id001
+PreProcess:
+  transform_ops:
+  - DecodeImage:
+      channel_first: false
+      img_mode: BGR
+  - DetLabelEncode: null
+  - DetResizeForTest:
+      resize_long: 960
+  - NormalizeImage:
+      mean:
+      - 0.485
+      - 0.456
+      - 0.406
+      order: hwc
+      scale: 1./255.
+      std:
+      - 0.229
+      - 0.224
+      - 0.225
+  - ToCHWImage: null
+  - KeepKeys:
+      keep_keys:
+      - image
+      - shape
+      - polys
+      - ignore_tags
+PostProcess:
+  name: DBPostProcess
+  thresh: 0.3
+  box_thresh: 0.6
+  max_candidates: 1000
+  unclip_ratio: 1.5

models/paddleocr/official_models/UVDoc/.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+inference.pdiparams filter=lfs diff=lfs merge=lfs -text

models/paddleocr/official_models/UVDoc/README.md ADDED Viewed

	@@ -0,0 +1,131 @@

+---
+license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+- zh
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
+- doc_img_unwarping
+---
+# UVDoc
+## Introduction
+The main purpose of text image correction is to carry out geometric transformation on the image to correct the document distortion, inclination, perspective deformation and other problems in the image, so that the subsequent text recognition can be more accurate.
+| Model| CER |
+|  --- | --- |
+|UVDoc |  0.179 |
+**Note**: Test data set: docunet benchmark data set.
+## Quick Start
+### Installation
+1. PaddlePaddle
+Please refer to the following commands to install PaddlePaddle using pip:
+```bash
+# for CUDA11.8
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# for CUDA12.6
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+# for CPU
+python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
+2. PaddleOCR
+Install the latest version of the PaddleOCR inference package from PyPI:
+```bash
+python -m pip install paddleocr
+```
+### Model Usage
+You can quickly experience the functionality with a single command:
+```bash
+paddleocr text_image_unwarping --model_name UVDoc -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/SfMVKd0xnMII5KBDV6Mfz.jpeg
+```
+You can also integrate the model inference of the TextImageUnwarping module into your project. Before running the following code, please download the sample image to your local machine.
+```python
+from paddleocr import TextImageUnwarping
+model = TextImageUnwarping(model_name="UVDoc")
+output = model.predict("SfMVKd0xnMII5KBDV6Mfz.jpeg", batch_size=1)
+for res in output:
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
+```
+After running, the obtained result is as follows:
+```json
+{'res': {'input_path': 'doc_test.jpg', 'page_index': None, 'doctr_img': '...'}}
+```
+The visualized image is as follows:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/1405yNIYq_hA9VL3_8Itn.jpeg)
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_image_unwarping.html#iii-quick-integration).
+### Pipeline Usage
+The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
+#### PP-StructureV3
+Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules:
+* Layout Detection Module
+* General OCR Sub-pipeline
+* Document Image Preprocessing Sub-pipeline （Optional）
+* Table Recognition Sub-pipeline （Optional）
+* Seal Recognition Sub-pipeline （Optional）
+* Formula Recognition Sub-pipeline （Optional）
+You can quickly experience the PP-StructureV3 pipeline with a single command.
+```bash
+paddleocr pp_structurev3 --use_doc_unwarping True -i https://cdn-uploads.huggingface.co/production/uploads/63d7b8ee07cd1aa3c49a2026/KP10tiSZfAjMuwZUSLtRp.png
+```
+You can experience the inference of the pipeline with just a few lines of code. Taking the PP-StructureV3 pipeline as an example:
+```python
+from paddleocr import PPStructureV3
+pipeline = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module
+output = pipeline.predict("./KP10tiSZfAjMuwZUSLtRp.png")
+for res in output:
+    res.print() ## Print the structured prediction output
+    res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
+    res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
+```
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/PP-StructureV3.html#2-quick-start).
+## Links
+[PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
+[PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)

models/paddleocr/official_models/UVDoc/config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+    "Global": {
+        "model_name": "UVDoc"
+    },
+    "Hpi": {
+        "backend_configs": {
+            "paddle_infer": {
+                "trt_dynamic_shapes": {
+                    "img": [
+                        [
+                            1,
+                            3,
+                            128,
+                            64
+                        ],
+                        [
+                            1,
+                            3,
+                            256,
+                            128
+                        ],
+                        [
+                            8,
+                            3,
+                            512,
+                            256
+                        ]
+                    ]
+                }
+            },
+            "tensorrt": {
+                "dynamic_shapes": {
+                    "img": [
+                        [
+                            1,
+                            3,
+                            128,
+                            64
+                        ],
+                        [
+                            1,
+                            3,
+                            256,
+                            128
+                        ],
+                        [
+                            8,
+                            3,
+                            512,
+                            256
+                        ]
+                    ]
+                }
+            }
+        }
+    }
+}

models/paddleocr/official_models/UVDoc/inference.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/paddleocr/official_models/UVDoc/inference.pdiparams ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:810488899520e0da843b9bd9769ba4949f1c81e357f0eceb12d4a7da459c3eca
+size 32054311

models/paddleocr/official_models/UVDoc/inference.yml ADDED Viewed

	@@ -0,0 +1,16 @@

+Global:
+  model_name: UVDoc
+Hpi:
+  backend_configs:
+    paddle_infer:
+      trt_dynamic_shapes:
+        img:
+          - [1, 3, 128, 64]
+          - [1, 3, 256, 128]
+          - [8, 3, 512, 256]
+    tensorrt:
+      dynamic_shapes:
+        img:
+          - [1, 3, 128, 64]
+          - [1, 3, 256, 128]
+          - [8, 3, 512, 256]

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+inference.pdiparams filter=lfs diff=lfs merge=lfs -text

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/README.md ADDED Viewed

	@@ -0,0 +1,169 @@

+---
+license: apache-2.0
+library_name: PaddleOCR
+language:
+- en
+pipeline_tag: image-to-text
+tags:
+- OCR
+- PaddlePaddle
+- PaddleOCR
+- textline_recognition
+---
+# en_PP-OCRv5_mobile_rec
+## Introduction
+en_PP-OCRv5_mobile_rec is one of the PP-OCRv5_rec that are the latest generation text line recognition models developed by PaddleOCR team. It aims to efficiently and accurately support the recognition of English. The key accuracy metrics are as follow:
+| Model | Accuracy (%) |
+|-|-|
+| en_PP-OCRv5_mobile_rec | 85.3|
+**Note**: If any character (including punctuation) in a line was incorrect, the entire line was marked as wrong. This ensures higher accuracy in practical applications.
+## Quick Start
+### Installation
+1. PaddlePaddle
+Please refer to the following commands to install PaddlePaddle using pip:
+```bash
+# for CUDA11.8
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+# for CUDA12.6
+python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
+# for CPU
+python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
+```
+For details about PaddlePaddle installation, please refer to the [PaddlePaddle official website](https://www.paddlepaddle.org.cn/en/install/quick).
+2. PaddleOCR
+Install the latest version of the PaddleOCR inference package from PyPI:
+```bash
+python -m pip install paddleocr
+```
+### Model Usage
+You can quickly experience the functionality with a single command:
+```bash
+paddleocr text_recognition \
+    --model_name en_PP-OCRv5_mobile_rec \
+    -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/QmaPtftqwOgCtx0AIvU2z.png
+```
+You can also integrate the model inference of the text recognition module into your project. Before running the following code, please download the sample image to your local machine.
+```python
+from paddleocr import TextRecognition
+model = TextRecognition(model_name="en_PP-OCRv5_mobile_rec")
+output = model.predict(input="QmaPtftqwOgCtx0AIvU2z.png", batch_size=1)
+for res in output:
+    res.print()
+    res.save_to_img(save_path="./output/")
+    res.save_to_json(save_path="./output/res.json")
+```
+After running, the obtained result is as follows:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/QmaPtftqwOgCtx0AIvU2z.png', 'page_index': None, 'rec_text': 'the number of model parameters and FLOPs get larger, it', 'rec_score': 0.993655264377594}}
+```
+The visualized image is as follows:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/Xe-blNpCl-X-U1o3L4Rav.png)
+For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/module_usage/text_recognition.html#iii-quick-start).
+### Pipeline Usage
+The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
+#### PP-OCRv5
+The general OCR pipeline is used to solve text recognition tasks by extracting text information from images and outputting it in string format. And there are 5 modules in the pipeline:
+* Document Image Orientation Classification Module (Optional)
+* Text Image Unwarping Module (Optional)
+* Text Line Orientation Classification Module (Optional)
+* Text Detection Module
+* Text Recognition Module
+Run a single command to quickly experience the OCR pipeline:
+```bash
+paddleocr ocr -i https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/c3hSldnYVQXp48T5V0Ze4.png \
+    --text_recognition_model_name en_PP-OCRv5_mobile_rec \
+    --use_doc_orientation_classify False \
+    --use_doc_unwarping False \
+    --use_textline_orientation True \
+    --save_path ./output \
+    --device gpu:0
+```
+Results are printed to the terminal:
+```json
+{'res': {'input_path': '/root/.paddlex/predict_input/c3hSldnYVQXp48T5V0Ze4.png', 'page_index': None, 'model_settings': {'use_doc_preprocessor': True, 'use_textline_orientation': False}, 'doc_preprocessor_res': {'input_path': None, 'page_index': None, 'model_settings': {'use_doc_orientation_classify': False, 'use_doc_unwarping': False}, 'angle': -1}, 'dt_polys': array([[[252, 172],
+        ...,
+        [254, 241]],
+       ...,
+       [[665, 566],
+        ...,
+        [663, 601]]], dtype=int16), 'text_det_params': {'limit_side_len': 64, 'limit_type': 'min', 'thresh': 0.3, 'max_side_limit': 4000, 'box_thresh': 0.6, 'unclip_ratio': 1.5}, 'text_type': 'general', 'textline_orientation_angles': array([-1, ..., -1]), 'text_rec_score_thresh': 0.0, 'return_word_box': False, 'rec_texts': ['The moon tells the sky', 'The sky tells the sea', 'The sea tells the tide', 'And the tide tells me', 'Lemn Sissay'], 'rec_scores': array([0.98405874, ..., 0.9837752 ]), 'rec_polys': array([[[252, 172],
+        ...,
+        [254, 241]],
+       ...,
+       [[665, 566],
+        ...,
+        [663, 601]]], dtype=int16), 'rec_boxes': array([[252, ..., 241],
+       ...,
+       [663, ..., 612]], dtype=int16)}}
+```
+If save_path is specified, the visualization results will be saved under `save_path`. The visualization output is shown below:
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/DcAem61DifjkUQK9f-0iZ.png)
+The command-line method is for quick experience. For project integration, also only a few codes are needed as well:
+```python
+from paddleocr import PaddleOCR
+ocr = PaddleOCR(
+    text_recognition_model_name="en_PP-OCRv5_mobile_rec",
+    use_doc_orientation_classify=False, # Use use_doc_orientation_classify to enable/disable document orientation classification model
+    use_doc_unwarping=False, # Use use_doc_unwarping to enable/disable document unwarping module
+    use_textline_orientation=True, # Use use_textline_orientation to enable/disable textline orientation classification model
+    device="gpu:0", # Use device to specify GPU for model inference
+)
+result = ocr.predict("https://cdn-uploads.huggingface.co/production/uploads/681c1ecd9539bdde5ae1733c/6KQKOS42DKVEUnrticvhd.png")
+for res in result:
+    res.print()
+    res.save_to_img("output")
+    res.save_to_json("output")
+```
+The default model used in pipeline is `PP-OCRv5_server_rec`, so it is needed that specifing to `en_PP-OCRv5_mobile_rec` by argument `text_recognition_model_name`. And you can also use the local model file by argument `text_recognition_model_dir`. For details about usage command and descriptions of parameters, please refer to the [Document](https://paddlepaddle.github.io/PaddleOCR/latest/en/version3.x/pipeline_usage/OCR.html#2-quick-start).
+## Links
+[PaddleOCR Repo](https://github.com/paddlepaddle/paddleocr)
+[PaddleOCR Documentation](https://paddlepaddle.github.io/PaddleOCR/latest/en/index.html)

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/config.json ADDED Viewed

	@@ -0,0 +1,533 @@

+{
+    "Global": {
+        "model_name": "en_PP-OCRv5_mobile_rec"
+    },
+    "Hpi": {
+        "backend_configs": {
+            "paddle_infer": {
+                "trt_dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            48,
+                            160
+                        ],
+                        [
+                            1,
+                            3,
+                            48,
+                            320
+                        ],
+                        [
+                            8,
+                            3,
+                            48,
+                            3200
+                        ]
+                    ]
+                }
+            },
+            "tensorrt": {
+                "dynamic_shapes": {
+                    "x": [
+                        [
+                            1,
+                            3,
+                            48,
+                            160
+                        ],
+                        [
+                            1,
+                            3,
+                            48,
+                            320
+                        ],
+                        [
+                            8,
+                            3,
+                            48,
+                            3200
+                        ]
+                    ]
+                }
+            }
+        }
+    },
+    "PreProcess": {
+        "transform_ops": [
+            {
+                "DecodeImage": {
+                    "channel_first": false,
+                    "img_mode": "BGR"
+                }
+            },
+            {
+                "MultiLabelEncode": {
+                    "gtc_encode": "NRTRLabelEncode"
+                }
+            },
+            {
+                "RecResizeImg": {
+                    "image_shape": [
+                        3,
+                        48,
+                        320
+                    ]
+                }
+            },
+            {
+                "KeepKeys": {
+                    "keep_keys": [
+                        "image",
+                        "label_ctc",
+                        "label_gtc",
+                        "length",
+                        "valid_ratio"
+                    ]
+                }
+            }
+        ]
+    },
+    "PostProcess": {
+        "name": "CTCLabelDecode",
+        "character_dict": [
+            "0",
+            "1",
+            "2",
+            "3",
+            "4",
+            "5",
+            "6",
+            "7",
+            "8",
+            "9",
+            "A",
+            "B",
+            "C",
+            "D",
+            "E",
+            "F",
+            "G",
+            "H",
+            "I",
+            "J",
+            "K",
+            "L",
+            "M",
+            "N",
+            "O",
+            "P",
+            "Q",
+            "R",
+            "S",
+            "T",
+            "U",
+            "V",
+            "W",
+            "X",
+            "Y",
+            "Z",
+            "a",
+            "b",
+            "c",
+            "d",
+            "e",
+            "f",
+            "g",
+            "h",
+            "i",
+            "j",
+            "k",
+            "l",
+            "m",
+            "n",
+            "o",
+            "p",
+            "q",
+            "r",
+            "s",
+            "t",
+            "u",
+            "v",
+            "w",
+            "x",
+            "y",
+            "z",
+            "!",
+            "\"",
+            "#",
+            "$",
+            "%",
+            "&",
+            "'",
+            "(",
+            ")",
+            "*",
+            "+",
+            ",",
+            "-",
+            ".",
+            "/",
+            ":",
+            ";",
+            "<",
+            "=",
+            ">",
+            "?",
+            "@",
+            "[",
+            "\\",
+            "]",
+            "_",
+            "`",
+            "{",
+            "|",
+            "}",
+            "^",
+            "~",
+            "©",
+            "®",
+            "℉",
+            "№",
+            "Ω",
+            "℮",
+            "™",
+            "∆",
+            "✓",
+            "✔",
+            "✗",
+            "✘",
+            "✕",
+            "☑",
+            "☒",
+            "●",
+            "▪",
+            "▫",
+            "◼",
+            "▶",
+            "◀",
+            "⬆",
+            "¤",
+            "¦",
+            "§",
+            "¨",
+            "ª",
+            "«",
+            "¬",
+            "¯",
+            "°",
+            "²",
+            "³",
+            "´",
+            "µ",
+            "¶",
+            "¸",
+            "¹",
+            "º",
+            "»",
+            "¼",
+            "½",
+            "��",
+            "¿",
+            "×",
+            "‐",
+            "‑",
+            "‒",
+            "—",
+            "―",
+            "‖",
+            "‗",
+            "‘",
+            "’",
+            "‚",
+            "‛",
+            "“",
+            "”",
+            "„",
+            "‟",
+            "†",
+            "‡",
+            "‣",
+            "․",
+            "…",
+            "‧",
+            "‰",
+            "‴",
+            "‵",
+            "‶",
+            "‷",
+            "‸",
+            "‹",
+            "›",
+            "※",
+            "‼",
+            "‽",
+            "‾",
+            "−",
+            "₤",
+            "₡",
+            "₹",
+            "₽",
+            "₴",
+            "₿",
+            "¢",
+            "€",
+            "£",
+            "¥",
+            "Ⅰ",
+            "Ⅱ",
+            "Ⅲ",
+            "Ⅳ",
+            "Ⅴ",
+            "Ⅵ",
+            "Ⅶ",
+            "Ⅷ",
+            "Ⅸ",
+            "Ⅹ",
+            "Ⅺ",
+            "Ⅻ",
+            "ⅰ",
+            "ⅱ",
+            "ⅲ",
+            "ⅳ",
+            "ⅴ",
+            "ⅵ",
+            "ⅶ",
+            "ⅷ",
+            "ⅸ",
+            "ⅹ",
+            "ⅺ",
+            "ⅻ",
+            "➀",
+            "➁",
+            "➂",
+            "➃",
+            "➄",
+            "➅",
+            "➆",
+            "➇",
+            "➈",
+            "➉",
+            "➊",
+            "➋",
+            "➌",
+            "➍",
+            "➎",
+            "➏",
+            "➐",
+            "➑",
+            "➒",
+            "➓",
+            "❶",
+            "❷",
+            "❸",
+            "❹",
+            "❺",
+            "❻",
+            "❼",
+            "❽",
+            "❾",
+            "❿",
+            "①",
+            "②",
+            "③",
+            "④",
+            "⑤",
+            "⑥",
+            "⑦",
+            "⑧",
+            "⑨",
+            "⑩",
+            "↑",
+            "→",
+            "↓",
+            "↕",
+            "←",
+            "↔",
+            "⇒",
+            "⇐",
+            "⇔",
+            "∀",
+            "∃",
+            "∄",
+            "∴",
+            "∵",
+            "∝",
+            "∞",
+            "∩",
+            "∪",
+            "∂",
+            "∫",
+            "∬",
+            "∭",
+            "∮",
+            "∯",
+            "∰",
+            "∑",
+            "∏",
+            "√",
+            "∛",
+            "∜",
+            "∱",
+            "∲",
+            "∳",
+            "∶",
+            "∷",
+            "∼",
+            "∖",
+            "∗",
+            "≈",
+            "≠",
+            "≡",
+            "≤",
+            "≥",
+            "⊂",
+            "⊃",
+            "⊥",
+            "⊾",
+            "⊿",
+            "□",
+            "∥",
+            "∋",
+            "ƒ",
+            "′",
+            "″",
+            "À",
+            "Á",
+            "Â",
+            "Ã",
+            "Ä",
+            "Å",
+            "Æ",
+            "Ç",
+            "È",
+            "É",
+            "Ê",
+            "Ë",
+            "Ì",
+            "Í",
+            "Î",
+            "Ï",
+            "Ð",
+            "Ñ",
+            "Ò",
+            "Ó",
+            "Ô",
+            "Õ",
+            "Ö",
+            "Ø",
+            "Ù",
+            "Ú",
+            "Û",
+            "Ü",
+            "Ý",
+            "Þ",
+            "à",
+            "á",
+            "â",
+            "ã",
+            "ä",
+            "å",
+            "æ",
+            "ç",
+            "è",
+            "é",
+            "ê",
+            "ë",
+            "ì",
+            "í",
+            "î",
+            "ï",
+            "ð",
+            "ñ",
+            "ò",
+            "ó",
+            "ô",
+            "õ",
+            "ö",
+            "ø",
+            "ù",
+            "ú",
+            "û",
+            "ü",
+            "ý",
+            "þ",
+            "ÿ",
+            "Α",
+            "Β",
+            "Γ",
+            "Δ",
+            "Ε",
+            "Ζ",
+            "Η",
+            "Θ",
+            "Ι",
+            "Κ",
+            "Λ",
+            "Μ",
+            "Ν",
+            "Ξ",
+            "Ο",
+            "Π",
+            "Ρ",
+            "Σ",
+            "Τ",
+            "Υ",
+            "Φ",
+            "Χ",
+            "Ψ",
+            "Ω",
+            "α",
+            "β",
+            "γ",
+            "δ",
+            "ε",
+            "ζ",
+            "η",
+            "θ",
+            "ι",
+            "κ",
+            "λ",
+            "μ",
+            "ν",
+            "ξ",
+            "ο",
+            "π",
+            "ρ",
+            "σ",
+            "ς",
+            "τ",
+            "υ",
+            "φ",
+            "χ",
+            "ψ",
+            "ω",
+            "Å",
+            "ℏ",
+            "⌀",
+            "⍺",
+            "⍵",
+            "𝑢",
+            "𝜓",
+            "०",
+            "‥",
+            "︽",
+            "﹥",
+            "•",
+            "÷",
+            "∕",
+            "∙",
+            "⋅",
+            "·",
+            "±",
+            "∓",
+            "∟",
+            "∠",
+            "∡",
+            "∢",
+            "℧",
+            "☺"
+        ]
+    }
+}

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.pdiparams ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3ec8a97ed6cefe8568d3e2ee90bb193299b566a7661aa4fd52d224b96b59f66b
+size 7772315

models/paddleocr/official_models/en_PP-OCRv5_mobile_rec/inference.yml ADDED Viewed

	@@ -0,0 +1,479 @@

+Global:
+  model_name: en_PP-OCRv5_mobile_rec
+Hpi:
+  backend_configs:
+    paddle_infer:
+      trt_dynamic_shapes: &id001
+        x:
+        - - 1
+          - 3
+          - 48
+          - 160
+        - - 1
+          - 3
+          - 48
+          - 320
+        - - 8
+          - 3
+          - 48
+          - 3200
+    tensorrt:
+      dynamic_shapes: *id001
+PreProcess:
+  transform_ops:
+  - DecodeImage:
+      channel_first: false
+      img_mode: BGR
+  - MultiLabelEncode:
+      gtc_encode: NRTRLabelEncode
+  - RecResizeImg:
+      image_shape:
+      - 3
+      - 48
+      - 320
+  - KeepKeys:
+      keep_keys:
+      - image
+      - label_ctc
+      - label_gtc
+      - length
+      - valid_ratio
+PostProcess:
+  name: CTCLabelDecode
+  character_dict:
+  - '0'
+  - '1'
+  - '2'
+  - '3'
+  - '4'
+  - '5'
+  - '6'
+  - '7'
+  - '8'
+  - '9'
+  - A
+  - B
+  - C
+  - D
+  - E
+  - F
+  - G
+  - H
+  - I
+  - J
+  - K
+  - L
+  - M
+  - N
+  - O
+  - P
+  - Q
+  - R
+  - S
+  - T
+  - U
+  - V
+  - W
+  - X
+  - Y
+  - Z
+  - a
+  - b
+  - c
+  - d
+  - e
+  - f
+  - g
+  - h
+  - i
+  - j
+  - k
+  - l
+  - m
+  - n
+  - o
+  - p
+  - q
+  - r
+  - s
+  - t
+  - u
+  - v
+  - w
+  - x
+  - y
+  - z
+  - '!'
+  - '"'
+  - '#'
+  - $
+  - '%'
+  - '&'
+  - ''''
+  - (
+  - )
+  - '*'
+  - +
+  - ','
+  - '-'
+  - .
+  - /
+  - ':'
+  - ;
+  - <
+  - '='
+  - '>'
+  - '?'
+  - '@'
+  - '['
+  - \
+  - ']'
+  - _
+  - '`'
+  - '{'
+  - '|'
+  - '}'
+  - ^
+  - '~'
+  - ©
+  - ®
+  - ℉
+  - №
+  - Ω
+  - ℮
+  - ™
+  - ∆
+  - ✓
+  - ✔
+  - ✗
+  - ✘
+  - ✕
+  - ☑
+  - ☒
+  - ●
+  - ▪
+  - ▫
+  - ◼
+  - ▶
+  - ◀
+  - ⬆
+  - ¤
+  - ¦
+  - §
+  - ¨
+  - ª
+  - «
+  - ¬
+  - ¯
+  - °
+  - ²
+  - ³
+  - ´
+  - µ
+  - ¶
+  - ¸
+  - ¹
+  - º
+  - »
+  - ¼
+  - ½
+  - ¾
+  - ¿
+  - ×
+  - ‐
+  - ‑
+  - ‒
+  - —
+  - ―
+  - ‖
+  - ‗
+  - ‘
+  - ’
+  - ‚
+  - ‛
+  - “
+  - ”
+  - „
+  - ‟
+  - †
+  - ‡
+  - ‣
+  - ․
+  - …
+  - ‧
+  - ‰
+  - ‴
+  - ‵
+  - ‶
+  - ‷
+  - ‸
+  - ‹
+  - ›
+  - ※
+  - ‼
+  - ‽
+  - ‾
+  - −
+  - ₤
+  - ₡
+  - ₹
+  - ₽
+  - ₴
+  - ₿
+  - ¢
+  - €
+  - £
+  - ¥
+  - Ⅰ
+  - Ⅱ
+  - Ⅲ
+  - Ⅳ
+  - Ⅴ
+  - Ⅵ
+  - Ⅶ
+  - Ⅷ
+  - Ⅸ
+  - Ⅹ
+  - Ⅺ
+  - Ⅻ
+  - ⅰ
+  - ⅱ
+  - ⅲ
+  - ⅳ
+  - ⅴ
+  - ⅵ
+  - ⅶ
+  - ⅷ
+  - ⅸ
+  - ⅹ
+  - ⅺ
+  - ⅻ
+  - ➀
+  - ➁
+  - ➂
+  - ➃
+  - ➄
+  - ➅
+  - ➆
+  - ➇
+  - ➈
+  - ➉
+  - ➊
+  - ➋
+  - ➌
+  - ➍
+  - ➎
+  - ➏
+  - ➐
+  - ➑
+  - ➒
+  - ➓
+  - ❶
+  - ❷
+  - ❸
+  - ❹
+  - ❺
+  - ❻
+  - ❼
+  - ❽
+  - ❾
+  - ❿
+  - ①
+  - ②
+  - ③
+  - ④
+  - ⑤
+  - ⑥
+  - ⑦
+  - ⑧
+  - ⑨
+  - ⑩
+  - ↑
+  - →
+  - ↓
+  - ↕
+  - ←
+  - ↔
+  - ⇒
+  - ⇐
+  - ⇔
+  - ∀
+  - ∃
+  - ∄
+  - ∴
+  - ∵
+  - ∝
+  - ∞
+  - ∩
+  - ∪
+  - ∂
+  - ∫
+  - ∬
+  - ∭
+  - ∮
+  - ∯
+  - ∰
+  - ∑
+  - ∏
+  - √
+  - ∛
+  - ∜
+  - ∱
+  - ∲
+  - ∳
+  - ∶
+  - ∷
+  - ∼
+  - ∖
+  - ∗
+  - ≈
+  - ≠
+  - ≡
+  - ≤
+  - ≥
+  - ⊂
+  - ⊃
+  - ⊥
+  - ⊾
+  - ⊿
+  - □
+  - ∥
+  - ∋
+  - ƒ
+  - ′
+  - ″
+  - À
+  - Á
+  - Â
+  - Ã
+  - Ä
+  - Å
+  - Æ
+  - Ç
+  - È
+  - É
+  - Ê
+  - Ë
+  - Ì
+  - Í
+  - Î
+  - Ï
+  - Ð
+  - Ñ
+  - Ò
+  - Ó
+  - Ô
+  - Õ
+  - Ö
+  - Ø
+  - Ù
+  - Ú
+  - Û
+  - Ü
+  - Ý
+  - Þ
+  - à
+  - á
+  - â
+  - ã
+  - ä
+  - å
+  - æ
+  - ç
+  - è
+  - é
+  - ê
+  - ë
+  - ì
+  - í
+  - î
+  - ï
+  - ð
+  - ñ
+  - ò
+  - ó
+  - ô
+  - õ
+  - ö
+  - ø
+  - ù
+  - ú
+  - û
+  - ü
+  - ý
+  - þ
+  - ÿ
+  - Α
+  - Β
+  - Γ
+  - Δ
+  - Ε
+  - Ζ
+  - Η
+  - Θ
+  - Ι
+  - Κ
+  - Λ
+  - Μ
+  - Ν
+  - Ξ
+  - Ο
+  - Π
+  - Ρ
+  - Σ
+  - Τ
+  - Υ
+  - Φ
+  - Χ
+  - Ψ
+  - Ω
+  - α
+  - β
+  - γ
+  - δ
+  - ε
+  - ζ
+  - η
+  - θ
+  - ι
+  - κ
+  - λ
+  - μ
+  - ν
+  - ξ
+  - ο
+  - π
+  - ρ
+  - σ
+  - ς
+  - τ
+  - υ
+  - φ
+  - χ
+  - ψ
+  - ω
+  - Å
+  - ℏ
+  - ⌀
+  - ⍺
+  - ⍵
+  - 𝑢
+  - 𝜓
+  - ०
+  - ‥
+  - ︽
+  - ﹥
+  - •
+  - ÷
+  - ∕
+  - ∙
+  - ⋅
+  - ·
+  - ±
+  - ∓
+  - ∟
+  - ∠
+  - ∡
+  - ∢
+  - ℧
+  - ☺

models/stage1_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bdfda1f591c8a33c1b60d0b4d013116b3dde30c2735f1a5ea6420c4d62bada8
+size 22532266

models/yolov8s.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1f47a78bf100391c2a140b7ac73a1caae18c32779be7d310658112f7ac9aa78a
+size 22588772

patch_safetensors.py ADDED Viewed

	@@ -0,0 +1,13 @@

+import safetensors.torch
+import os
+path = "models/depth_anything_v2/model.safetensors"
+temp_path = "models/depth_anything_v2/model_temp.safetensors"
+try:
+    tensors = safetensors.torch.load_file(path)
+    safetensors.torch.save_file(tensors, temp_path, metadata={"format": "pt"})
+    os.remove(path)
+    os.rename(temp_path, path)
+    print("Successfully patched model.safetensors")
+except Exception as e:
+    print("Error:", e)

requirements.txt ADDED Viewed

	@@ -0,0 +1,48 @@

+# requirements.txt — AID 728 Traffic Rule Violation Detection
+# Install with: pip install -r requirements.txt
+# ── Core ML / Vision ─────────────────────────────────────────────────────────
+torch==2.12.0
+torchvision==0.27.0
+numpy==1.26.4
+Pillow==12.2.0
+# ── OpenCV ────────────────────────────────────────────────────────────────────
+opencv-python==4.11.0.86
+# ── Object Detection ──────────────────────────────────────────────────────────
+ultralytics==8.4.51
+dill==0.4.1
+# ── Depth Estimation ──────────────────────────────────────────────────────────
+transformers==5.8.1
+huggingface_hub==1.15.0
+safetensors==0.7.0
+tokenizers==0.22.2
+# ── OCR (PaddleOCR 3.5.0 + PaddlePaddle) ─────────────────────────────────────
+paddlepaddle==3.3.1
+paddleocr==3.5.0
+# ── PaddleOCR transitive deps ─────────────────────────────────────────────────
+pyclipper==1.4.0
+shapely==2.1.2
+lmdb==2.2.0
+imgaug==0.4.0
+scikit-image==0.25.2
+python-docx==1.2.0
+fire==0.7.1
+beautifulsoup4==4.14.3
+lxml==6.1.0
+RapidFuzz==3.14.5
+# ── Utilities ─────────────────────────────────────────────────────────────────
+requests==2.34.2
+tqdm==4.67.3
+PyYAML==6.0.2
+regex==2026.5.9
+scipy==1.15.3
+packaging==26.2
+filelock==3.29.0
+gradio
+inference-sdk

run_inference.py ADDED Viewed

	@@ -0,0 +1,25 @@

+from solution import TrafficViolationDetector
+from pathlib import Path
+import json
+def run():
+    print("Loading models...")
+    detector = TrafficViolationDetector(model_dir=str(Path("models").resolve().absolute()))
+    images = ["testimages/1.jpg", "testimages/2.webp", "testimages/images.jpg"]
+    results = {}
+    print("Running inference...")
+    for img in images:
+        if Path(img).exists():
+            print(f"Processing {img}...")
+            res = detector.predict(img)
+            results[img] = res
+        else:
+            results[img] = "File not found"
+    print("\n--- RESULTS ---")
+    print(json.dumps(results, indent=2))
+if __name__ == "__main__":
+    run()

solution.py ADDED Viewed

	@@ -0,0 +1,405 @@

+"""
+solution.py — AID 728 Traffic Rule Violation Detection
+=======================================================
+Pipeline:
+  1. YOLOv8s (COCO) + custom bike detector  →  bike boxes + person boxes + car boxes
+  2. Depth-Anything V2 (fp16)               →  depth map for person→bike association
+  3. Helmet classifier (YOLO)                →  helmet / no-helmet per rider
+  4. license.pt (YOLO)                       →  license plate bounding box
+  5. PaddleOCR 3.5.0 (mobile det+rec)       →  plate text via legacy ocr() API
+  6. Roboflow inference_sdk                  →  wrong-way vehicle classification
+  7. Roboflow inference_sdk                  →  seatbelt classification for cars
+"""
+import os
+import re
+from pathlib import Path
+# Point paddlex to bundled offline models BEFORE any paddle import.
+_MODEL_DIR = Path(__file__).parent / "models"
+os.environ["PADDLE_PDX_CACHE_HOME"] = str(_MODEL_DIR / "paddleocr")
+import cv2
+import numpy as np
+import torch
+from PIL import Image
+from transformers import pipeline as hf_pipeline
+from ultralytics import YOLO
+from paddleocr import PaddleOCR
+try:
+    from inference_sdk import InferenceHTTPClient
+    CLIENT = InferenceHTTPClient(
+        api_url="https://serverless.roboflow.com",
+        api_key="SEsiEStxDAHdOx2SCo3k"
+    )
+except ImportError:
+    CLIENT = None
+# ── CONSTANTS ─────────────────────────────────────────────────────────────────
+COCO_PERSON = 0
+COCO_MOTO   = 3
+COCO_CAR    = 2
+COCO_BUS    = 5
+COCO_TRUCK  = 7
+FOUR_WHEELERS = {COCO_CAR, COCO_BUS, COCO_TRUCK}
+COCO_CONF = 0.30;  COCO_IOU  = 0.45
+S1_CONF   = 0.344; S1_IOU    = 0.45
+S3_CONF   = 0.25;  S3_IOU    = 0.60
+S4_CONF   = 0.20
+PERSON_BIKE_IOU_THRESH = 0.10
+PERSON_BIKE_COL_MARGIN = 0.35
+HEAD_CROP_FRACTION = 0.45
+HEAD_CROP_MIN_PX   = 40
+DEPTH_THRESHOLD    = 0.35
+OCR_MIN_CONF       = 0.25
+class TrafficViolationDetector:
+    """
+    Detects traffic violations on vehicles in a single RGB image.
+    All models loaded once in __init__; predict() is fully stateless.
+    """
+    def __init__(self, model_dir: str = "./models"):
+        md = Path(model_dir)
+        # Ensure paddlex finds bundled offline models
+        os.environ["PADDLE_PDX_CACHE_HOME"] = str(md / "paddleocr")
+        # 1. Depth estimation
+        self.depth_estimator = hf_pipeline(
+            "depth-estimation",
+            model=(md / "depth_anything_v2").as_posix(),
+            device=0 if torch.cuda.is_available() else -1,
+            dtype=torch.float32,
+        )
+        # 2. YOLO models
+        self.s_coco = YOLO(str(md / "yolov8s.pt"))
+        self.s1     = YOLO(str(md / "stage1_best.pt"))
+        self.s3     = YOLO(str(md / "helmet_v11.pt"))
+        self.s4     = YOLO(str(md / "license.pt"))
+        # 3. Super-resolution
+        self.sr_engine, self.has_sr = self._init_sr(md / "FSRCNN_x3.pb")
+        # 4. PaddleOCR
+        self.ocr_engine = PaddleOCR(
+            lang="en",
+            device="cpu",
+            enable_mkldnn=False,
+            text_detection_model_name="PP-OCRv5_mobile_det",
+            text_recognition_model_name="en_PP-OCRv5_mobile_rec",
+        )
+    # ── helpers ───────────────────────────────────────────────────────────────
+    @staticmethod
+    def _init_sr(sr_path):
+        try:
+            sr = cv2.dnn_superres.DnnSuperResImpl_create()
+        except AttributeError:
+            return None, False
+        if Path(sr_path).exists():
+            try:
+                sr.readModel(str(sr_path))
+                sr.setModel("fsrcnn", 3)
+                return sr, True
+            except Exception:
+                pass
+        return sr, False
+    @staticmethod
+    def _box_iou(a, b):
+        ax1, ay1, ax2, ay2 = a
+        bx1, by1, bx2, by2 = b
+        ix1 = max(ax1, bx1); iy1 = max(ay1, by1)
+        ix2 = min(ax2, bx2); iy2 = min(ay2, by2)
+        inter = max(0.0, ix2 - ix1) * max(0.0, iy2 - iy1)
+        if inter == 0:
+            return 0.0
+        return inter / ((ax2-ax1)*(ay2-ay1) + (bx2-bx1)*(by2-by1) - inter + 1e-6)
+    @staticmethod
+    def _region_depth(depth_map, x1, y1, x2, y2):
+        h, w = depth_map.shape
+        x1, y1 = max(0, int(x1)), max(0, int(y1))
+        x2, y2 = min(w, int(x2)), min(h, int(y2))
+        patch = depth_map[y1:y2, x1:x2]
+        return float(np.median(patch)) if patch.size > 0 else 0.5
+    def _is_depth_ok(self, pd, bd):
+        if bd < 0.05:
+            return abs(pd - bd) <= DEPTH_THRESHOLD * 0.5
+        return abs(pd - bd) / (bd + 1e-6) <= DEPTH_THRESHOLD
+    def _merge_bike_boxes(self, coco, custom, iou_thresh=0.45):
+        if not coco and not custom:
+            return np.zeros((0, 4), dtype=np.float32)
+        if not coco:
+            return np.array(custom, dtype=np.float32)
+        if not custom:
+            return np.array(coco, dtype=np.float32)
+        merged = list(coco)
+        for cb in custom:
+            if not any(self._box_iou(cb, mb) > iou_thresh for mb in merged):
+                merged.append(cb)
+        return np.array(merged, dtype=np.float32)
+    def _associate_persons_to_bikes(self, person_boxes, bike_boxes, depth_map, h, w):
+        bike_persons = [[] for _ in range(len(bike_boxes))]
+        for p_box in person_boxes:
+            px1, py1, px2, py2 = p_box
+            p_cx = (px1 + px2) / 2
+            p_bottom = py2
+            best_bike, best_score = -1, -1.0
+            for b_idx, b_box in enumerate(bike_boxes):
+                bx1, by1, bx2, by2 = b_box
+                bw = bx2 - bx1
+                iou = self._box_iou(p_box, b_box)
+                in_col = (
+                    bx1 - PERSON_BIKE_COL_MARGIN * bw <= p_cx <= bx2 + PERSON_BIKE_COL_MARGIN * bw
+                    and p_bottom <= by2 + 0.3 * (by2 - by1)
+                )
+                if iou < PERSON_BIKE_IOU_THRESH and not in_col:
+                    continue
+                pd_val = self._region_depth(depth_map, px1, py1, px2, py2)
+                bd_val = self._region_depth(depth_map, bx1, by1, bx2, by2)
+                if not self._is_depth_ok(pd_val, bd_val):
+                    continue
+                score = iou + 0.5 * (1.0 - abs(p_cx - (bx1 + bx2) / 2) / (w + 1e-6))
+                if score > best_score:
+                    best_score, best_bike = score, b_idx
+            if best_bike >= 0:
+                bike_persons[best_bike].append(p_box)
+        return bike_persons
+    def _get_depth_map(self, image_cv):
+        img_rgb = cv2.cvtColor(image_cv, cv2.COLOR_BGR2RGB)
+        result  = self.depth_estimator(Image.fromarray(img_rgb))
+        depth   = np.array(result["depth"]).astype(np.float32)
+        lo, hi  = depth.min(), depth.max()
+        depth   = (depth - lo) / (hi - lo + 1e-8)
+        if depth.shape != image_cv.shape[:2]:
+            depth = cv2.resize(depth, (image_cv.shape[1], image_cv.shape[0]))
+        return depth
+    def _classify_helmets(self, full_image, person_boxes):
+        if not person_boxes:
+            return 0, 0, 0
+        h_img, w_img = full_image.shape[:2]
+        with_h = without_h = 0
+        for p_box in person_boxes:
+            px1, py1, px2, py2 = map(int, p_box)
+            head_h = max(int((py2 - py1) * HEAD_CROP_FRACTION), HEAD_CROP_MIN_PX)
+            pad_x  = max(4, int((px2 - px1) * 0.05))
+            crop = full_image[max(0, py1):min(h_img, py1 + head_h),
+                              max(0, px1 - pad_x):min(w_img, px2 + pad_x)]
+            if crop.size == 0:
+                without_h += 1
+                continue
+            res = self.s3.predict(crop, conf=S3_CONF, iou=S3_IOU, verbose=False)[0]
+            if len(res.boxes) == 0:
+                without_h += 1
+            elif int(res.boxes[res.boxes.conf.argmax()].cls) == 0:
+                with_h += 1
+            else:
+                without_h += 1
+        return with_h + without_h, with_h, without_h
+    def _preprocess_plate(self, plate_img):
+        h, w = plate_img.shape[:2]
+        if self.has_sr and self.sr_engine is not None:
+            try:
+                plate_img = self.sr_engine.upsample(plate_img)
+            except Exception:
+                plate_img = cv2.resize(plate_img, (0, 0), fx=3, fy=3,
+                                       interpolation=cv2.INTER_CUBIC)
+        else:
+            if h < 100:
+                scale = 100 / h
+                plate_img = cv2.resize(plate_img,
+                                       (int(w * scale), int(h * scale)),
+                                       interpolation=cv2.INTER_CUBIC)
+        lab = cv2.cvtColor(plate_img, cv2.COLOR_BGR2LAB)
+        l, a, b = cv2.split(lab)
+        l = cv2.createCLAHE(clipLimit=3.0, tileGridSize=(4, 4)).apply(l)
+        plate_img = cv2.cvtColor(cv2.merge([l, a, b]), cv2.COLOR_LAB2BGR)
+        return cv2.filter2D(plate_img, -1, np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]))
+    def _run_ocr(self, plate_img):
+        processed = self._preprocess_plate(plate_img)
+        texts, scores = [], []
+        try:
+            result = self.ocr_engine.ocr(processed)
+            if result and isinstance(result, list):
+                for page in result:
+                    if isinstance(page, dict):
+                        page_texts  = page.get("rec_texts", [])
+                        page_scores = page.get("rec_scores", [])
+                        for t, s in zip(page_texts, page_scores):
+                            if str(t).strip():
+                                texts.append(str(t).strip())
+                                scores.append(float(s))
+                    elif isinstance(page, list):
+                        for line in page:
+                            if isinstance(line, (list, tuple)) and len(line) == 2:
+                                try:
+                                    txt   = str(line[1][0])
+                                    score = float(line[1][1])
+                                    if txt.strip():
+                                        texts.append(txt.strip())
+                                        scores.append(score)
+                                except (TypeError, ValueError, IndexError):
+                                    pass
+        except Exception:
+            pass
+        if not texts:
+            return "UNKNOWN", 0.0
+        return " ".join(texts), (sum(scores) / len(scores) if scores else 0.0)
+    def _extract_plate(self, vehicle_crop, plate_box):
+        h, w = vehicle_crop.shape[:2]
+        pad = 4
+        x1 = max(0, int(plate_box[0]) - pad)
+        y1 = max(0, int(plate_box[1]) - pad)
+        x2 = min(w, int(plate_box[2]) + pad)
+        y2 = min(h, int(plate_box[3]) + pad)
+        crop = vehicle_crop[y1:y2, x1:x2]
+        if crop.size == 0:
+            return "UNKNOWN"
+        raw, conf = self._run_ocr(crop)
+        if conf < OCR_MIN_CONF:
+            return "UNKNOWN"
+        text   = re.sub(r"[^A-Z0-9 \-]", "", raw.upper())
+        text   = re.sub(r"\s+", " ", text).strip()
+        tokens = [t for t in text.split() if len(t) > 1]
+        return " ".join(tokens) if tokens else "UNKNOWN"
+    def _get_plate(self, img, h_img, w_img, vehicle_box):
+        x1, y1, x2, y2 = map(int, vehicle_box)
+        bw, bh = x2 - x1, y2 - y1
+        vcrop = img[
+            max(0,     int(y1 - 0.20 * bh)): min(h_img, int(y2 + 0.10 * bh)),
+            max(0,     int(x1 - 0.15 * bw)): min(w_img, int(x2 + 0.15 * bw))
+        ]
+        plate_text = "UNKNOWN"
+        try:
+            if vcrop.size > 0:
+                p_res = self.s4.predict(vcrop, conf=S4_CONF, verbose=False)[0]
+                if len(p_res.boxes) > 0:
+                    best_pb = p_res.boxes.xyxy.cpu().numpy()[p_res.boxes.conf.argmax()]
+                    plate_text = self._extract_plate(vcrop, best_pb)
+        except Exception:
+            pass
+        return plate_text
+    # ── predict ───────────────────────────────────────────────────────────────
+    def predict(self, image_path: str) -> dict:
+        try:
+            img = cv2.imread(str(image_path))
+            if img is None:
+                return {"violations": []}
+            h_img, w_img = img.shape[:2]
+            # Stage 1: COCO primary detection
+            coco_res   = self.s_coco.predict(img, conf=COCO_CONF, iou=COCO_IOU,
+                                             verbose=False)[0]
+            coco_boxes = coco_res.boxes.xyxy.cpu().numpy()
+            coco_cls   = coco_res.boxes.cls.cpu().numpy().astype(int)
+            person_boxes = coco_boxes[coco_cls == COCO_PERSON].tolist()
+            coco_motos   = coco_boxes[coco_cls == COCO_MOTO].tolist()
+            coco_cars    = coco_boxes[np.isin(coco_cls, list(FOUR_WHEELERS))].tolist()
+            # Stage 2: Supplemental bike detector
+            s1_res       = self.s1.predict(img, conf=S1_CONF, iou=S1_IOU,
+                                           augment=True, verbose=False)[0]
+            custom_bikes = s1_res.boxes.xyxy.cpu().numpy().tolist()
+            bike_boxes   = self._merge_bike_boxes(coco_motos, custom_bikes)
+            # Stage 3: Depth map for spatial person→bike association
+            depth_map = self._get_depth_map(img)
+            # Stage 4: Associate persons to bikes
+            bike_persons = self._associate_persons_to_bikes(
+                person_boxes, bike_boxes, depth_map, h_img, w_img)
+            # Detect Wrong Way using Roboflow API
+            ww_boxes = []
+            if CLIENT is not None:
+                try:
+                    result = CLIENT.infer(img, model_id="wrong-way-driving-detection-gqdmg/1")
+                    for pred in result.get('predictions', []):
+                        if "wrong" in pred.get('class', '').lower():
+                            px, py, pw, ph = pred['x'], pred['y'], pred['width'], pred['height']
+                            wx1, wy1 = px - pw/2, py - ph/2
+                            wx2, wy2 = px + pw/2, py + ph/2
+                            ww_boxes.append([wx1, wy1, wx2, wy2])
+                except Exception as e:
+                    print("[Warning] Wrong-way API error:", e)
+            def is_wrong_way(v_box):
+                for wb in ww_boxes:
+                    if self._box_iou(v_box, wb) > 0.4:
+                        return True
+                return False
+            violations = []
+            # Process Two-wheelers
+            for i, bike_box in enumerate(bike_boxes):
+                num_riders, with_h, without_h = self._classify_helmets(
+                    img, bike_persons[i])
+                if num_riders == 0:
+                    num_riders, with_h, without_h = 1, 0, 1
+                ww = is_wrong_way(bike_box)
+                # Check for violation first, then do plate OCR if violation exists
+                if (num_riders >= 3) or (without_h > 0) or ww:
+                    plate_text = self._get_plate(img, h_img, w_img, bike_box)
+                    violations.append({
+                        "vehicle_type":      "two_wheeler",
+                        "num_riders":        num_riders,
+                        "helmet_violations": without_h,
+                        "wrong_way":         ww,
+                        "license_plate":     plate_text,
+                    })
+            # Process Four-wheelers (Cars/Trucks/Buses)
+            for car_box in coco_cars:
+                x1, y1, x2, y2 = map(int, car_box)
+                ww = is_wrong_way(car_box)
+                sb_viols = 0
+                if CLIENT is not None:
+                    ccrop = img[max(0, y1):min(h_img, y2), max(0, x1):min(w_img, x2)]
+                    if ccrop.size > 0:
+                        try:
+                            res = CLIENT.infer(ccrop, model_id="seat-belt-detection-udcfg/5")
+                            for pred in res.get('predictions', []):
+                                cls_name = pred.get('class', '').lower()
+                                if "no" in cls_name and "seatbelt" in cls_name:
+                                    sb_viols += 1
+                        except Exception as e:
+                            print("[Warning] Seatbelt API error:", e)
+                # Check for violation first, then do plate OCR if violation exists
+                if sb_viols > 0 or ww:
+                    plate_text = self._get_plate(img, h_img, w_img, car_box)
+                    violations.append({
+                        "vehicle_type": "four_wheeler",
+                        "seatbelt_violations": sb_viols,
+                        "wrong_way": ww,
+                        "license_plate": plate_text
+                    })
+            return {"violations": violations}
+        except Exception as e:
+            print(f"[ERROR] predict() failed for {image_path}: {e}")
+            return {"violations": []}

testimages/1.jpg ADDED Viewed

testimages/2.webp ADDED Viewed