idd / DOCUMENTATION.md
esmaill1
feat: implement image processing core, FastAPI backend, and full-stack integration tests
f19ba0f
|
Raw
History Blame Contribute Delete
6.03 kB

EL HELAL Studio β€” Technical Documentation

A headless FastAPI backend for AI-powered ID card photo processing.


Architecture

  • main.py: FastAPI application entry point. All routes are REST-only (JSON responses).
  • core/: Pure Python image processing logic β€” UI-agnostic.
  • newcolor/: AI color correction model (ColorUNet) and in-memory inference code.
  • config/: Global settings (settings.json) for retouch, layout defaults.
  • assets/: Branding assets (logo) and frame overlays.

No GUI, no desktop wrapper. Clients interact exclusively via the REST API at /docs.


The 5-Step AI Pipeline

Every photo processed by the studio follows a strictly sequenced pipeline:

1. Auto-Crop & Face Detection (core/crop.py)

  • Technology: OpenCV Haar Cascades.
  • Logic: Detects the largest face, centers it, and calculates a 5:7 (4x6cm) aspect ratio crop.
  • Fallback: Centers the crop if no face is detected to ensure the pipeline never breaks.

2. AI Background Removal (core/process_images.py)

  • Model: BiRefNet (RMBG-2.0) via the transformers library.
  • Optimization: Automatically detects and utilizes CUDA/GPU. Falls back to CPU with dynamic quantization.

3. AI Color Correction (newcolor/inference.py)

  • Model: ColorUNet via custom PyTorch model and weights.
  • Mechanism: Predicts corrected colors at model resolution (1024x1024), fits a quadratic polynomial color transform (10 parameters) on subject pixels using the alpha mask, and applies it to the full-resolution image.
  • Optimization: Dynamic device-aware PyTorch execution (reuses the RMBG execution device, e.g., CUDA or optimized CPU).

4. Surgical Retouching (core/retouch.py)

  • Landmarking: Uses MediaPipe Face Mesh (468 points) to generate a precise skin mask, excluding eyes, lips, and hair.
  • Frequency Separation: Splits the image into High Frequency (texture/pores) and Low Frequency (tone/color).
  • Blemish Removal: Detects anomalies on the High-Freq layer and inpaints them using surrounding texture.

5. Layout Composition (core/layout_engine.py)

  • Rendering: Composes a 300 DPI canvas for printing.
  • Localization: Uses arabic_reshaper and python-bidi for correct Arabic script rendering.
  • Dynamic Assets: Overlays IDs with specific offsets and studio branding (logos).

Configuration

The system is controlled by config/settings.json. The layout engine hot-reloads this file on every request. You can adjust id_font_size, grid_gap, or retouch_sensitivity and see changes in the next processed photo without restarting.


Known Dependency Conflicts

  • TensorFlow vs. Transformers: Standard tensorflow (especially nightly) conflicts with transformers and numpy >= 2.0.
  • Resolution: Uninstall TensorFlow. The pipeline is 100% PyTorch-based.
  • Pinned Versions:
    • numpy < 2.0.0: Compatibility with basicsr and older torchvision.
    • protobuf <= 3.20.3: Prevents "Double Registration" errors in multi-model environments.

Environment Setup

conda create -n idmaker python=3.10
conda activate idmaker
pip install -r requirements.txt
pip uninstall tensorflow tb-nightly tensorboard  # Remove conflicts if present

Docker

docker-compose up --build

The API will be available at http://localhost:8000 (or the port defined by $PORT).


πŸ›  Troubleshooting (Common Pitfalls)

Issue Root Cause Solution
"Tofu" Boxes in Text Missing or corrupted fonts. Ensure assets/arialbd.ttf is not a Git LFS pointer (size > 300KB).
NumPy AttributeError Conflict between NumPy 2.x and TensorFlow/Transformers. Uninstall tensorflow and ensure numpy < 2.0.0 is installed.
[Errno 10048] Socket Bind Port 7860 is already in use by another server process. Close the previous server instance or set a new PORT environment variable.
Meta-Tensor Error Transformers 4.50+ CPU bug. Handled by torch.linspace monkeypatch in process_images.py.
Slow Processing CPU bottleneck. Ensure torch is using multiple threads or enable CUDA.

Testing Framework

The codebase includes a comprehensive testing framework divided into lightweight, mock-based unit tests and full integration tests.

Unit & Mocked API Tests

Located in the tests/ directory:

  • test_layout_engine.py: Validates canvas scaling, grid composition, margins, Arabic text shaping, and bidi rendering.
  • test_crop.py: Validates OpenCV face-detection coordinates and the 5:7 ratio auto-crop fallback mechanism.
  • test_white_bg.py: Verifies transparency compositing onto white canvas and 300 DPI preservation.
  • test_color_steal.py: Validates red/green/blue 1D LUT extraction, .npz caching, and .cube file export.
  • test_api_mocked.py: Validates FastAPI endpoints (/settings, /status, /frames, /upload, /process) using FastAPI TestClient and unittest.mock to mock ML processing. Avoids GPU/VRAM or large weights download requirements.

Run them instantly using:

venv\Scripts\python.exe -m unittest discover -s tests -p "test_*.py"

Integration Tests

Located in the root:

  • test_api.py: Validates end-to-end processing with real ML model weights and HTTP requests on a running server.

Run with:

venv\Scripts\python.exe test_api.py

Last Updated: June 2026 β€” EL HELAL Studio Engineering