File size: 5,702 Bytes
5215ce9
b754303
 
5215ce9
 
 
 
4ca6349
b754303
5215ce9
b754303
 
5215ce9
b754303
 
 
 
 
 
 
 
4ca6349
b754303
 
 
73e78aa
b754303
 
5215ce9
b754303
 
 
 
 
 
 
6df4679
 
 
 
5215ce9
6df4679
 
5215ce9
6df4679
 
5215ce9
6df4679
 
4ca6349
 
 
 
 
 
 
 
 
 
 
b754303
 
 
73e78aa
4ca6349
5215ce9
b754303
6df4679
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b754303
2142ae0
 
 
 
 
 
6df4679
 
 
 
 
 
 
4ca6349
 
 
 
 
 
 
5215ce9
 
 
 
 
b754303
 
 
 
 
73e78aa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# AI Image Processing

## Overview
An AI-powered image processing API with multiple features:
- Image enhancement/upscaling using Real-ESRGAN
- Background removal using BiRefNet via rembg
- Noise reduction using OpenCV Non-Local Means Denoising
- Document scanning with auto-crop, alignment, and HD enhancement
- FastAPI backend with automatic Swagger API documentation
- Simple web frontend for testing

## Current State
- **Local Preview**: Running with simple processing (no heavy AI models due to size constraints)
- **Full AI Mode**: Available when deployed to Hugging Face Spaces

## Project Structure
```
/
β”œβ”€β”€ app.py              # Full FastAPI app for Hugging Face deployment
β”œβ”€β”€ app_local.py        # Lightweight local preview server
β”œβ”€β”€ enhancer.py         # Real-ESRGAN model wrapper (for HF deployment)
β”œβ”€β”€ document_scanner.py # Document scanning with OpenCV (auto-crop, align, enhance)
β”œβ”€β”€ templates/
β”‚   └── index.html      # Frontend interface
β”œβ”€β”€ requirements.txt    # Dependencies for Hugging Face Spaces
β”œβ”€β”€ Dockerfile          # Docker configuration for HF Spaces
β”œβ”€β”€ README.md           # Hugging Face Spaces configuration
β”œβ”€β”€ uploads/            # Temporary upload storage
└── outputs/            # Processed image outputs
```

## API Endpoints
- `GET /` - Web frontend
- `GET /docs` - Swagger API documentation
- `GET /health` - Health check
- `GET /model-info` - Model information
- `GET /progress/{job_id}` - Get async job progress
- `GET /result/{job_id}` - Get completed job result
- `POST /enhance` - Enhance/upscale image (Real-ESRGAN) - supports `async_mode=true`
- `POST /enhance/async` - Start async enhancement with progress tracking
- `POST /enhance/base64` - Enhance image (returns base64)
- `POST /remove-background` - Remove image background (BiRefNet) - supports `async_mode=true`
- `POST /remove-background/async` - Start async background removal with progress
- `POST /remove-background/base64` - Remove background (returns base64)
- `POST /denoise` - Reduce image noise (OpenCV NLM) - supports `async_mode=true`
- `POST /denoise/async` - Start async denoising with progress
- `POST /denoise/base64` - Denoise image (returns base64)
- `POST /docscan` - Scan document (auto-crop, align, HD enhance) - supports `async_mode=true`
- `POST /docscan/async` - Start async document scan with progress
- `POST /docscan/base64` - Scan document (returns base64)

## Document Scanner Features
The `/docscan` endpoint provides:
- **Auto-detection**: Edge detection using Canny algorithm
- **Auto-crop**: Contour detection and perspective correction
- **Alignment**: Four-point perspective transform
- **Contrast**: CLAHE (Contrast Limited Adaptive Histogram Equalization)
- **Denoising**: Bilateral filter (preserves edges while reducing noise)
- **Sharpening**: Unsharp masking for crisp text
- **HD Upscaling**: Optional Real-ESRGAN enhancement (1-4x scale)

## Deploying to Hugging Face Spaces
1. Create a new Space on Hugging Face
2. Select "Docker" as the SDK
3. Upload all files: `app.py`, `enhancer.py`, `document_scanner.py`, `templates/`, `requirements.txt`, `Dockerfile`, `README.md`
4. The Space will auto-build the container and download AI models

## Progress Tracking API
All image processing endpoints support async mode with progress tracking:

1. Start a job with `async_mode=true` or use the `/async` endpoints
2. Receive a `job_id` in the response
3. Poll `/progress/{job_id}` to get current progress (0-100%)
4. When complete, fetch result from `/result/{job_id}`

Example response from progress endpoint:
```json
{
  "job_id": "abc-123",
  "status": "processing",
  "progress": 45.0,
  "message": "Enhancing image (4 tiles)...",
  "current_step": 2,
  "total_steps": 5
}
```

## Performance Optimizations
- **Tile processing**: Images processed in 256px tiles for memory efficiency
- **Max input size**: 512x512 for enhance (auto-resized), 2048x2048 for docscan
- **Default scale**: 2x (faster than 4x, still good quality)
- **Background threading**: Server stays responsive during processing

## Recent Changes
- 2025-11-28: Fixed frontend progress bar and image return issues
  - Added visual progress bar with percentage display to frontend
  - Updated frontend to use async endpoints for all features
  - Added polling logic with proper error handling and timeouts
  - Fixed image result fetching with 202 status handling
  - All endpoints now show real-time progress during processing
- 2025-11-28: Added progress tracking and async processing
  - New progress_tracker.py module with thread-safe job tracking
  - /progress/{job_id} and /result/{job_id} endpoints
  - async_mode parameter on all image processing endpoints
  - Dedicated /async endpoints for each feature
  - Automatic cleanup of old jobs and result files
  - Performance optimizations: tile=256, max_size=512, default scale=2
- 2025-11-28: Added document scanning feature
  - Auto-crop with edge detection and contour finding
  - Perspective correction for skewed documents
  - CLAHE contrast enhancement
  - Bilateral filter denoising (preserves details)
  - Unsharp mask sharpening
  - Optional HD upscaling with Real-ESRGAN
- 2025-11-28: Added background removal and noise reduction features
  - BiRefNet integration via rembg for background removal
  - OpenCV Non-Local Means Denoising
  - Updated frontend with feature tabs
  - Updated API documentation
- 2025-11-28: Initial creation of AI Image Enhancer API
  - FastAPI backend with Swagger docs
  - Real-ESRGAN integration for Hugging Face
  - Simple frontend for testing
  - Lightweight local preview mode
  - Docker configuration for HF Spaces deployment