Spaces:

esmailx51
/

idd

Running

App Files Files Community

idd / DOCUMENTATION.md

esmaill1

feat: implement image processing core, FastAPI backend, and full-stack integration tests

f19ba0f 15 days ago

preview code

Raw

History Blame Contribute Delete

6.03 kB

EL HELAL Studio — Technical Documentation

A headless FastAPI backend for AI-powered ID card photo processing.

Architecture

main.py: FastAPI application entry point. All routes are REST-only (JSON responses).
core/: Pure Python image processing logic — UI-agnostic.
newcolor/: AI color correction model (ColorUNet) and in-memory inference code.
config/: Global settings (settings.json) for retouch, layout defaults.
assets/: Branding assets (logo) and frame overlays.

No GUI, no desktop wrapper. Clients interact exclusively via the REST API at /docs.

The 5-Step AI Pipeline

Every photo processed by the studio follows a strictly sequenced pipeline:

1. Auto-Crop & Face Detection (`core/crop.py`)

Technology: OpenCV Haar Cascades.
Logic: Detects the largest face, centers it, and calculates a 5:7 (4x6cm) aspect ratio crop.
Fallback: Centers the crop if no face is detected to ensure the pipeline never breaks.

2. AI Background Removal (`core/process_images.py`)

Model: BiRefNet (RMBG-2.0) via the transformers library.
Optimization: Automatically detects and utilizes CUDA/GPU. Falls back to CPU with dynamic quantization.

3. AI Color Correction (`newcolor/inference.py`)

Model: ColorUNet via custom PyTorch model and weights.
Mechanism: Predicts corrected colors at model resolution (1024x1024), fits a quadratic polynomial color transform (10 parameters) on subject pixels using the alpha mask, and applies it to the full-resolution image.
Optimization: Dynamic device-aware PyTorch execution (reuses the RMBG execution device, e.g., CUDA or optimized CPU).

4. Surgical Retouching (`core/retouch.py`)

Landmarking: Uses MediaPipe Face Mesh (468 points) to generate a precise skin mask, excluding eyes, lips, and hair.
Frequency Separation: Splits the image into High Frequency (texture/pores) and Low Frequency (tone/color).
Blemish Removal: Detects anomalies on the High-Freq layer and inpaints them using surrounding texture.

5. Layout Composition (`core/layout_engine.py`)

Rendering: Composes a 300 DPI canvas for printing.
Localization: Uses arabic_reshaper and python-bidi for correct Arabic script rendering.
Dynamic Assets: Overlays IDs with specific offsets and studio branding (logos).

Configuration

The system is controlled by config/settings.json. The layout engine hot-reloads this file on every request. You can adjust id_font_size, grid_gap, or retouch_sensitivity and see changes in the next processed photo without restarting.

Known Dependency Conflicts

TensorFlow vs. Transformers: Standard tensorflow (especially nightly) conflicts with transformers and numpy >= 2.0.
Resolution: Uninstall TensorFlow. The pipeline is 100% PyTorch-based.
Pinned Versions:
- numpy < 2.0.0: Compatibility with basicsr and older torchvision.
- protobuf <= 3.20.3: Prevents "Double Registration" errors in multi-model environments.

Environment Setup

conda create -n idmaker python=3.10
conda activate idmaker
pip install -r requirements.txt
pip uninstall tensorflow tb-nightly tensorboard  # Remove conflicts if present

Docker

docker-compose up --build

The API will be available at http://localhost:8000 (or the port defined by $PORT).

🛠 Troubleshooting (Common Pitfalls)

Issue	Root Cause	Solution
"Tofu" Boxes in Text	Missing or corrupted fonts.	Ensure `assets/arialbd.ttf` is not a Git LFS pointer (size > 300KB).
NumPy AttributeError	Conflict between NumPy 2.x and TensorFlow/Transformers.	Uninstall `tensorflow` and ensure `numpy < 2.0.0` is installed.
[Errno 10048] Socket Bind	Port 7860 is already in use by another server process.	Close the previous server instance or set a new `PORT` environment variable.
Meta-Tensor Error	Transformers 4.50+ CPU bug.	Handled by `torch.linspace` monkeypatch in `process_images.py`.
Slow Processing	CPU bottleneck.	Ensure `torch` is using multiple threads or enable CUDA.

Testing Framework

The codebase includes a comprehensive testing framework divided into lightweight, mock-based unit tests and full integration tests.

Unit & Mocked API Tests

Located in the tests/ directory:

test_layout_engine.py: Validates canvas scaling, grid composition, margins, Arabic text shaping, and bidi rendering.
test_crop.py: Validates OpenCV face-detection coordinates and the 5:7 ratio auto-crop fallback mechanism.
test_white_bg.py: Verifies transparency compositing onto white canvas and 300 DPI preservation.
test_color_steal.py: Validates red/green/blue 1D LUT extraction, .npz caching, and .cube file export.
test_api_mocked.py: Validates FastAPI endpoints (/settings, /status, /frames, /upload, /process) using FastAPI TestClient and unittest.mock to mock ML processing. Avoids GPU/VRAM or large weights download requirements.

Run them instantly using:

venv\Scripts\python.exe -m unittest discover -s tests -p "test_*.py"

Integration Tests

Located in the root:

test_api.py: Validates end-to-end processing with real ML model weights and HTTP requests on a running server.

Run with:

venv\Scripts\python.exe test_api.py

Last Updated: June 2026 — EL HELAL Studio Engineering