idd / DOCUMENTATION.md
esmaill1
feat: implement image processing core, FastAPI backend, and full-stack integration tests
f19ba0f
|
Raw
History Blame Contribute Delete
6.03 kB
# EL HELAL Studio β€” Technical Documentation
A headless FastAPI backend for AI-powered ID card photo processing.
---
## Architecture
- **`main.py`**: FastAPI application entry point. All routes are REST-only (JSON responses).
- **`core/`**: Pure Python image processing logic β€” UI-agnostic.
- **`newcolor/`**: AI color correction model (`ColorUNet`) and in-memory inference code.
- **`config/`**: Global settings (`settings.json`) for retouch, layout defaults.
- **`assets/`**: Branding assets (logo) and frame overlays.
**No GUI, no desktop wrapper.** Clients interact exclusively via the REST API at `/docs`.
---
## The 5-Step AI Pipeline
Every photo processed by the studio follows a strictly sequenced pipeline:
### 1. Auto-Crop & Face Detection (`core/crop.py`)
- **Technology:** OpenCV Haar Cascades.
- **Logic:** Detects the largest face, centers it, and calculates a 5:7 (4x6cm) aspect ratio crop.
- **Fallback:** Centers the crop if no face is detected to ensure the pipeline never breaks.
### 2. AI Background Removal (`core/process_images.py`)
- **Model:** **BiRefNet (RMBG-2.0)** via the `transformers` library.
- **Optimization:** Automatically detects and utilizes CUDA/GPU. Falls back to CPU with dynamic quantization.
### 3. AI Color Correction (`newcolor/inference.py`)
- **Model:** **ColorUNet** via custom PyTorch model and weights.
- **Mechanism:** Predicts corrected colors at model resolution (1024x1024), fits a quadratic polynomial color transform (10 parameters) on subject pixels using the alpha mask, and applies it to the full-resolution image.
- **Optimization:** Dynamic device-aware PyTorch execution (reuses the RMBG execution device, e.g., CUDA or optimized CPU).
### 4. Surgical Retouching (`core/retouch.py`)
- **Landmarking:** Uses **MediaPipe Face Mesh** (468 points) to generate a precise skin mask, excluding eyes, lips, and hair.
- **Frequency Separation:** Splits the image into **High Frequency** (texture/pores) and **Low Frequency** (tone/color).
- **Blemish Removal:** Detects anomalies on the High-Freq layer and inpaints them using surrounding texture.
### 5. Layout Composition (`core/layout_engine.py`)
- **Rendering:** Composes a 300 DPI canvas for printing.
- **Localization:** Uses `arabic_reshaper` and `python-bidi` for correct Arabic script rendering.
- **Dynamic Assets:** Overlays IDs with specific offsets and studio branding (logos).
---
## Configuration
The system is controlled by `config/settings.json`. The layout engine hot-reloads this file on every request. You can adjust `id_font_size`, `grid_gap`, or `retouch_sensitivity` and see changes in the next processed photo without restarting.
---
## Known Dependency Conflicts
- **TensorFlow vs. Transformers:** Standard `tensorflow` (especially nightly) conflicts with `transformers` and `numpy >= 2.0`.
- **Resolution:** Uninstall TensorFlow. The pipeline is 100% PyTorch-based.
- **Pinned Versions:**
- `numpy < 2.0.0`: Compatibility with `basicsr` and older `torchvision`.
- `protobuf <= 3.20.3`: Prevents "Double Registration" errors in multi-model environments.
---
## Environment Setup
```bash
conda create -n idmaker python=3.10
conda activate idmaker
pip install -r requirements.txt
pip uninstall tensorflow tb-nightly tensorboard # Remove conflicts if present
```
---
## Docker
```bash
docker-compose up --build
```
The API will be available at `http://localhost:8000` (or the port defined by `$PORT`).
---
## πŸ›  Troubleshooting (Common Pitfalls)
| Issue | Root Cause | Solution |
| ----------------------------- | ------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **"Tofu" Boxes in Text** | Missing or corrupted fonts. | Ensure `assets/arialbd.ttf` is not a Git LFS pointer (size > 300KB). |
| **NumPy AttributeError** | Conflict between NumPy 2.x and TensorFlow/Transformers. | Uninstall `tensorflow` and ensure `numpy < 2.0.0` is installed. |
| **[Errno 10048] Socket Bind** | Port 7860 is already in use by another server process. | Close the previous server instance or set a new `PORT` environment variable. |
| **Meta-Tensor Error** | Transformers 4.50+ CPU bug. | Handled by `torch.linspace` monkeypatch in `process_images.py`. |
| **Slow Processing** | CPU bottleneck. | Ensure `torch` is using multiple threads or enable CUDA. |
---
## Testing Framework
The codebase includes a comprehensive testing framework divided into lightweight, mock-based unit tests and full integration tests.
### Unit & Mocked API Tests
Located in the `tests/` directory:
- **`test_layout_engine.py`**: Validates canvas scaling, grid composition, margins, Arabic text shaping, and bidi rendering.
- **`test_crop.py`**: Validates OpenCV face-detection coordinates and the 5:7 ratio auto-crop fallback mechanism.
- **`test_white_bg.py`**: Verifies transparency compositing onto white canvas and 300 DPI preservation.
- **`test_color_steal.py`**: Validates red/green/blue 1D LUT extraction, `.npz` caching, and `.cube` file export.
- **`test_api_mocked.py`**: Validates FastAPI endpoints (`/settings`, `/status`, `/frames`, `/upload`, `/process`) using FastAPI `TestClient` and `unittest.mock` to mock ML processing. Avoids GPU/VRAM or large weights download requirements.
Run them instantly using:
```bash
venv\Scripts\python.exe -m unittest discover -s tests -p "test_*.py"
```
### Integration Tests
Located in the root:
- **`test_api.py`**: Validates end-to-end processing with real ML model weights and HTTP requests on a running server.
Run with:
```bash
venv\Scripts\python.exe test_api.py
```
---
_Last Updated: June 2026 β€” EL HELAL Studio Engineering_