Spaces:

esmailx51
/

idd

Running

App Files Files Community

idd / DOCUMENTATION.md

esmaill1

feat: implement image processing core, FastAPI backend, and full-stack integration tests

f19ba0f 16 days ago

preview code

Raw

History Blame Contribute Delete

6.03 kB

	# EL HELAL Studio — Technical Documentation

	A headless FastAPI backend for AI-powered ID card photo processing.

	---

	## Architecture

	- `main.py`: FastAPI application entry point. All routes are REST-only (JSON responses).
	- `core/`: Pure Python image processing logic — UI-agnostic.
	- `newcolor/`: AI color correction model (`ColorUNet`) and in-memory inference code.
	- `config/`: Global settings (`settings.json`) for retouch, layout defaults.
	- `assets/`: Branding assets (logo) and frame overlays.

	No GUI, no desktop wrapper. Clients interact exclusively via the REST API at `/docs`.

	---

	## The 5-Step AI Pipeline

	Every photo processed by the studio follows a strictly sequenced pipeline:

	### 1. Auto-Crop & Face Detection (`core/crop.py`)

	- Technology: OpenCV Haar Cascades.
	- Logic: Detects the largest face, centers it, and calculates a 5:7 (4x6cm) aspect ratio crop.
	- Fallback: Centers the crop if no face is detected to ensure the pipeline never breaks.

	### 2. AI Background Removal (`core/process_images.py`)

	- Model: BiRefNet (RMBG-2.0) via the `transformers` library.
	- Optimization: Automatically detects and utilizes CUDA/GPU. Falls back to CPU with dynamic quantization.

	### 3. AI Color Correction (`newcolor/inference.py`)

	- Model: ColorUNet via custom PyTorch model and weights.
	- Mechanism: Predicts corrected colors at model resolution (1024x1024), fits a quadratic polynomial color transform (10 parameters) on subject pixels using the alpha mask, and applies it to the full-resolution image.
	- Optimization: Dynamic device-aware PyTorch execution (reuses the RMBG execution device, e.g., CUDA or optimized CPU).

	### 4. Surgical Retouching (`core/retouch.py`)

	- Landmarking: Uses MediaPipe Face Mesh (468 points) to generate a precise skin mask, excluding eyes, lips, and hair.
	- Frequency Separation: Splits the image into High Frequency (texture/pores) and Low Frequency (tone/color).
	- Blemish Removal: Detects anomalies on the High-Freq layer and inpaints them using surrounding texture.

	### 5. Layout Composition (`core/layout_engine.py`)

	- Rendering: Composes a 300 DPI canvas for printing.
	- Localization: Uses `arabic_reshaper` and `python-bidi` for correct Arabic script rendering.
	- Dynamic Assets: Overlays IDs with specific offsets and studio branding (logos).

	---

	## Configuration

	The system is controlled by `config/settings.json`. The layout engine hot-reloads this file on every request. You can adjust `id_font_size`, `grid_gap`, or `retouch_sensitivity` and see changes in the next processed photo without restarting.

	---

	## Known Dependency Conflicts

	- TensorFlow vs. Transformers: Standard `tensorflow` (especially nightly) conflicts with `transformers` and `numpy >= 2.0`.
	- Resolution: Uninstall TensorFlow. The pipeline is 100% PyTorch-based.
	- Pinned Versions:
	- `numpy < 2.0.0`: Compatibility with `basicsr` and older `torchvision`.
	- `protobuf <= 3.20.3`: Prevents "Double Registration" errors in multi-model environments.

	---

	## Environment Setup

	```bash
	conda create -n idmaker python=3.10
	conda activate idmaker
	pip install -r requirements.txt
	pip uninstall tensorflow tb-nightly tensorboard # Remove conflicts if present
	```

	---

	## Docker

	```bash
	docker-compose up --build
	```

	The API will be available at `http://localhost:8000` (or the port defined by `$PORT`).

	---

	## 🛠 Troubleshooting (Common Pitfalls)

	\| Issue \| Root Cause \| Solution \|
	\| ----------------------------- \| ------------------------------------------------------- \| ---------------------------------------------------------------------------- \|
	\| "Tofu" Boxes in Text \| Missing or corrupted fonts. \| Ensure `assets/arialbd.ttf` is not a Git LFS pointer (size > 300KB). \|
	\| NumPy AttributeError \| Conflict between NumPy 2.x and TensorFlow/Transformers. \| Uninstall `tensorflow` and ensure `numpy < 2.0.0` is installed. \|
	\| [Errno 10048] Socket Bind \| Port 7860 is already in use by another server process. \| Close the previous server instance or set a new `PORT` environment variable. \|
	\| Meta-Tensor Error \| Transformers 4.50+ CPU bug. \| Handled by `torch.linspace` monkeypatch in `process_images.py`. \|
	\| Slow Processing \| CPU bottleneck. \| Ensure `torch` is using multiple threads or enable CUDA. \|

	---

	## Testing Framework

	The codebase includes a comprehensive testing framework divided into lightweight, mock-based unit tests and full integration tests.

	### Unit & Mocked API Tests
	Located in the `tests/` directory:
	- `test_layout_engine.py`: Validates canvas scaling, grid composition, margins, Arabic text shaping, and bidi rendering.
	- `test_crop.py`: Validates OpenCV face-detection coordinates and the 5:7 ratio auto-crop fallback mechanism.
	- `test_white_bg.py`: Verifies transparency compositing onto white canvas and 300 DPI preservation.
	- `test_color_steal.py`: Validates red/green/blue 1D LUT extraction, `.npz` caching, and `.cube` file export.
	- `test_api_mocked.py`: Validates FastAPI endpoints (`/settings`, `/status`, `/frames`, `/upload`, `/process`) using FastAPI `TestClient` and `unittest.mock` to mock ML processing. Avoids GPU/VRAM or large weights download requirements.

	Run them instantly using:
	```bash
	venv\Scripts\python.exe -m unittest discover -s tests -p "test_*.py"
	```

	### Integration Tests
	Located in the root:
	- `test_api.py`: Validates end-to-end processing with real ML model weights and HTTP requests on a running server.

	Run with:
	```bash
	venv\Scripts\python.exe test_api.py
	```

	---

	_Last Updated: June 2026 — EL HELAL Studio Engineering_