m9di6crga commited on
Commit
4ca6349
Β·
1 Parent(s): f96243d

Add a document scanning endpoint with AI enhancements

Browse files

Add a new `/docscan` endpoint to the API that performs auto-cropping, perspective correction, alignment, contrast enhancement, noise reduction, sharpening, and optional HD upscaling of document images.

Replit-Commit-Author: Agent
Replit-Commit-Session-Id: dc097ae8-2157-4d92-8d04-6b44128d6d7c
Replit-Commit-Checkpoint-Type: full_checkpoint
Replit-Commit-Event-Id: dd0bd260-40d9-4e6b-8be5-962fa7796efb
Replit-Commit-Screenshot-Url: https://storage.googleapis.com/screenshot-production-us-central1/01531b1e-f634-49fa-b952-38b1db7203b1/dc097ae8-2157-4d92-8d04-6b44128d6d7c/BHf9clb

Files changed (5) hide show
  1. .replit +4 -0
  2. README.md +40 -1
  3. app.py +152 -2
  4. document_scanner.py +200 -0
  5. replit.md +22 -1
.replit CHANGED
@@ -37,3 +37,7 @@ externalPort = 80
37
  [[ports]]
38
  localPort = 38887
39
  externalPort = 3000
 
 
 
 
 
37
  [[ports]]
38
  localPort = 38887
39
  externalPort = 3000
40
+
41
+ [[ports]]
42
+ localPort = 44343
43
+ externalPort = 3001
README.md CHANGED
@@ -10,13 +10,14 @@ license: mit
10
 
11
  # AI Image Processing API
12
 
13
- A comprehensive image processing API with multiple AI-powered features including super-resolution, background removal, and noise reduction.
14
 
15
  ## Features
16
 
17
  - **Image Enhancement**: Upscale images 2x or 4x using Real-ESRGAN
18
  - **Background Removal**: Remove backgrounds using BiRefNet AI model via rembg
19
  - **Noise Reduction**: Reduce image noise using OpenCV Non-Local Means Denoising
 
20
  - **RESTful API**: Full API with automatic OpenAPI/Swagger documentation
21
  - **Web Interface**: Simple drag-and-drop interface for testing
22
 
@@ -46,6 +47,24 @@ Reduce image noise using Non-Local Means Denoising.
46
  - `file`: Image file
47
  - `strength`: Denoising strength (1-30, default: 10)
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ### Other Endpoints
50
  - `GET /docs` - Interactive Swagger UI documentation
51
  - `GET /redoc` - ReDoc documentation
@@ -59,6 +78,7 @@ Reduce image noise using Non-Local Means Denoising.
59
  | Super Resolution | Real-ESRGAN x4plus | State-of-the-art image upscaling |
60
  | Background Removal | BiRefNet-general | High-accuracy segmentation via rembg |
61
  | Noise Reduction | OpenCV NLM | Non-Local Means Denoising |
 
62
 
63
  ## Local Development
64
 
@@ -126,6 +146,21 @@ with open("denoised.png", "wb") as f:
126
  f.write(response.content)
127
  ```
128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ### cURL Examples
130
  ```bash
131
  # Enhance image
@@ -139,6 +174,10 @@ curl -X POST "https://your-space.hf.space/remove-background?bgcolor=transparent"
139
  # Denoise image
140
  curl -X POST "https://your-space.hf.space/denoise?strength=10" \
141
  -F "file=@noisy.jpg" -o denoised.png
 
 
 
 
142
  ```
143
 
144
  ## License
 
10
 
11
  # AI Image Processing API
12
 
13
+ A comprehensive image processing API with multiple AI-powered features including super-resolution, background removal, noise reduction, and document scanning.
14
 
15
  ## Features
16
 
17
  - **Image Enhancement**: Upscale images 2x or 4x using Real-ESRGAN
18
  - **Background Removal**: Remove backgrounds using BiRefNet AI model via rembg
19
  - **Noise Reduction**: Reduce image noise using OpenCV Non-Local Means Denoising
20
+ - **Document Scanning**: Auto-crop, align, and enhance document photos with AI
21
  - **RESTful API**: Full API with automatic OpenAPI/Swagger documentation
22
  - **Web Interface**: Simple drag-and-drop interface for testing
23
 
 
47
  - `file`: Image file
48
  - `strength`: Denoising strength (1-30, default: 10)
49
 
50
+ ### Document Scanning
51
+ #### `POST /docscan`
52
+ Scan and enhance document images with AI-powered processing.
53
+
54
+ **Features:**
55
+ - Auto-detection of document edges
56
+ - Auto-crop and perspective correction
57
+ - Alignment and straightening
58
+ - CLAHE contrast enhancement
59
+ - Bilateral noise reduction (preserves edges)
60
+ - Unsharp mask sharpening
61
+ - Optional HD upscaling with Real-ESRGAN
62
+
63
+ **Parameters:**
64
+ - `file`: Document image (PNG, JPG, JPEG, WebP, BMP)
65
+ - `enhance_hd`: Enable AI HD enhancement (default: true)
66
+ - `scale`: Upscale factor 1-4 (default: 2)
67
+
68
  ### Other Endpoints
69
  - `GET /docs` - Interactive Swagger UI documentation
70
  - `GET /redoc` - ReDoc documentation
 
78
  | Super Resolution | Real-ESRGAN x4plus | State-of-the-art image upscaling |
79
  | Background Removal | BiRefNet-general | High-accuracy segmentation via rembg |
80
  | Noise Reduction | OpenCV NLM | Non-Local Means Denoising |
81
+ | Document Scanning | OpenCV + Real-ESRGAN | Edge detection, perspective correction, HD enhancement |
82
 
83
  ## Local Development
84
 
 
146
  f.write(response.content)
147
  ```
148
 
149
+ ### Python - Document Scanning
150
+ ```python
151
+ import requests
152
+
153
+ with open("document_photo.jpg", "rb") as f:
154
+ response = requests.post(
155
+ "https://your-space.hf.space/docscan",
156
+ files={"file": f},
157
+ params={"enhance_hd": True, "scale": 2}
158
+ )
159
+
160
+ with open("scanned_document.png", "wb") as f:
161
+ f.write(response.content)
162
+ ```
163
+
164
  ### cURL Examples
165
  ```bash
166
  # Enhance image
 
174
  # Denoise image
175
  curl -X POST "https://your-space.hf.space/denoise?strength=10" \
176
  -F "file=@noisy.jpg" -o denoised.png
177
+
178
+ # Scan document
179
+ curl -X POST "https://your-space.hf.space/docscan?enhance_hd=true&scale=2" \
180
+ -F "file=@document.jpg" -o scanned.png
181
  ```
182
 
183
  ## License
app.py CHANGED
@@ -25,6 +25,7 @@ A comprehensive image processing API with multiple AI-powered features.
25
  - **Image Upscaling**: Enhance image resolution up to 4x using Real-ESRGAN
26
  - **Background Removal**: Remove backgrounds using rembg with BiRefNet model
27
  - **Noise Reduction**: Reduce image noise using advanced denoising algorithms
 
28
  - **Quality Enhancement**: Improve image clarity and reduce artifacts
29
 
30
  ### Supported Formats:
@@ -34,8 +35,9 @@ A comprehensive image processing API with multiple AI-powered features.
34
  - **Super Resolution**: Real-ESRGAN x4plus
35
  - **Background Removal**: rembg with BiRefNet-massive model
36
  - **Noise Reduction**: OpenCV Non-Local Means Denoising
 
37
  """,
38
- version="2.0.0",
39
  docs_url="/docs",
40
  redoc_url="/redoc",
41
  )
@@ -87,7 +89,7 @@ async def health_check():
87
  return {
88
  "status": "healthy",
89
  "version": "2.0.0",
90
- "features": ["enhance", "remove-background", "denoise"]
91
  }
92
 
93
  @app.get("/model-info")
@@ -110,6 +112,12 @@ async def model_info():
110
  "name": "Non-Local Means Denoising",
111
  "description": "Advanced noise reduction algorithm",
112
  "source": "OpenCV"
 
 
 
 
 
 
113
  }
114
  },
115
  "supported_formats": ["png", "jpg", "jpeg", "webp", "bmp"],
@@ -476,6 +484,148 @@ async def denoise_image_base64(
476
  except Exception as e:
477
  raise HTTPException(status_code=500, detail=f"Error denoising image: {str(e)}")
478
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
479
  if __name__ == "__main__":
480
  import uvicorn
481
  uvicorn.run(app, host="0.0.0.0", port=7860)
 
25
  - **Image Upscaling**: Enhance image resolution up to 4x using Real-ESRGAN
26
  - **Background Removal**: Remove backgrounds using rembg with BiRefNet model
27
  - **Noise Reduction**: Reduce image noise using advanced denoising algorithms
28
+ - **Document Scanning**: Auto-crop, align, and enhance document photos with AI
29
  - **Quality Enhancement**: Improve image clarity and reduce artifacts
30
 
31
  ### Supported Formats:
 
35
  - **Super Resolution**: Real-ESRGAN x4plus
36
  - **Background Removal**: rembg with BiRefNet-massive model
37
  - **Noise Reduction**: OpenCV Non-Local Means Denoising
38
+ - **Document Scanner**: OpenCV edge detection + Real-ESRGAN upscaling
39
  """,
40
+ version="2.1.0",
41
  docs_url="/docs",
42
  redoc_url="/redoc",
43
  )
 
89
  return {
90
  "status": "healthy",
91
  "version": "2.0.0",
92
+ "features": ["enhance", "remove-background", "denoise", "docscan"]
93
  }
94
 
95
  @app.get("/model-info")
 
112
  "name": "Non-Local Means Denoising",
113
  "description": "Advanced noise reduction algorithm",
114
  "source": "OpenCV"
115
+ },
116
+ "document_scanner": {
117
+ "name": "AI Document Scanner",
118
+ "description": "Auto-crop, perspective correction, alignment, and HD enhancement",
119
+ "features": ["edge detection", "perspective transform", "CLAHE contrast", "bilateral denoising", "unsharp masking", "Real-ESRGAN upscaling"],
120
+ "source": "OpenCV + Real-ESRGAN"
121
  }
122
  },
123
  "supported_formats": ["png", "jpg", "jpeg", "webp", "bmp"],
 
484
  except Exception as e:
485
  raise HTTPException(status_code=500, detail=f"Error denoising image: {str(e)}")
486
 
487
+
488
+ doc_scanner = None
489
+
490
+ def get_doc_scanner():
491
+ global doc_scanner
492
+ if doc_scanner is None:
493
+ from document_scanner import get_document_scanner
494
+ doc_scanner = get_document_scanner()
495
+ return doc_scanner
496
+
497
+
498
+ @app.post("/docscan")
499
+ async def scan_document(
500
+ file: UploadFile = File(..., description="Document image to scan (PNG, JPG, JPEG, WebP, BMP)"),
501
+ enhance_hd: bool = Query(default=True, description="Apply HD enhancement using AI (Real-ESRGAN)"),
502
+ scale: int = Query(default=2, ge=1, le=4, description="Upscale factor for HD enhancement (1-4)")
503
+ ):
504
+ """
505
+ Scan and enhance a document image with AI-powered processing.
506
+
507
+ This endpoint performs:
508
+ - **Auto-detection**: Finds document edges automatically using edge detection
509
+ - **Auto-crop**: Removes background and crops to document boundaries
510
+ - **Perspective correction**: Straightens tilted or skewed documents
511
+ - **Alignment**: Ensures the document is properly aligned
512
+ - **Contrast enhancement**: Applies CLAHE for improved readability
513
+ - **Noise reduction**: Uses bilateral filtering to reduce noise while preserving edges
514
+ - **Sharpening**: Applies unsharp masking for crisp text without artifacts
515
+ - **HD upscaling**: Optionally uses Real-ESRGAN for high-definition output
516
+
517
+ Parameters:
518
+ - **file**: Upload a photo of a document (supports various angles and lighting)
519
+ - **enhance_hd**: Enable AI-powered HD enhancement (default: True)
520
+ - **scale**: Upscaling factor 1-4 (default: 2 for balanced quality/size)
521
+
522
+ Returns the scanned document as a high-quality PNG file.
523
+ """
524
+ allowed_types = ["image/png", "image/jpeg", "image/jpg", "image/webp", "image/bmp"]
525
+ if file.content_type not in allowed_types:
526
+ raise HTTPException(
527
+ status_code=400,
528
+ detail=f"Invalid file type. Allowed types: {', '.join(allowed_types)}"
529
+ )
530
+
531
+ try:
532
+ contents = await file.read()
533
+ input_image = Image.open(io.BytesIO(contents))
534
+
535
+ if input_image.mode != "RGB":
536
+ input_image = input_image.convert("RGB")
537
+
538
+ max_size = 2048
539
+ if input_image.width > max_size or input_image.height > max_size:
540
+ ratio = min(max_size / input_image.width, max_size / input_image.height)
541
+ new_size = (int(input_image.width * ratio), int(input_image.height * ratio))
542
+ input_image = input_image.resize(new_size, Image.LANCZOS)
543
+
544
+ original_size = {"width": input_image.width, "height": input_image.height}
545
+
546
+ scanner = get_doc_scanner()
547
+ scanned_image = scanner.process_document(input_image, enhance_hd=enhance_hd, scale=scale)
548
+
549
+ file_id = str(uuid.uuid4())
550
+ output_path = OUTPUT_DIR / f"{file_id}_scanned.png"
551
+ scanned_image.save(output_path, "PNG", optimize=True)
552
+
553
+ return FileResponse(
554
+ output_path,
555
+ media_type="image/png",
556
+ filename=f"scanned_{file.filename.rsplit('.', 1)[0]}.png"
557
+ )
558
+
559
+ except Exception as e:
560
+ raise HTTPException(status_code=500, detail=f"Error scanning document: {str(e)}")
561
+
562
+
563
+ @app.post("/docscan/base64")
564
+ async def scan_document_base64(
565
+ file: UploadFile = File(..., description="Document image to scan"),
566
+ enhance_hd: bool = Query(default=True, description="Apply HD enhancement using AI"),
567
+ scale: int = Query(default=2, ge=1, le=4, description="Upscale factor for HD enhancement (1-4)")
568
+ ):
569
+ """
570
+ Scan and enhance a document image, returning the result as base64.
571
+
572
+ Same processing as /docscan but returns base64-encoded image data.
573
+ Useful for integrations that prefer base64 over file downloads.
574
+ """
575
+ import base64
576
+
577
+ allowed_types = ["image/png", "image/jpeg", "image/jpg", "image/webp", "image/bmp"]
578
+ if file.content_type not in allowed_types:
579
+ raise HTTPException(
580
+ status_code=400,
581
+ detail=f"Invalid file type. Allowed types: {', '.join(allowed_types)}"
582
+ )
583
+
584
+ try:
585
+ contents = await file.read()
586
+ input_image = Image.open(io.BytesIO(contents))
587
+
588
+ if input_image.mode != "RGB":
589
+ input_image = input_image.convert("RGB")
590
+
591
+ max_size = 2048
592
+ if input_image.width > max_size or input_image.height > max_size:
593
+ ratio = min(max_size / input_image.width, max_size / input_image.height)
594
+ new_size = (int(input_image.width * ratio), int(input_image.height * ratio))
595
+ input_image = input_image.resize(new_size, Image.LANCZOS)
596
+
597
+ original_size = {"width": input_image.width, "height": input_image.height}
598
+
599
+ scanner = get_doc_scanner()
600
+ scanned_image = scanner.process_document(input_image, enhance_hd=enhance_hd, scale=scale)
601
+
602
+ buffer = io.BytesIO()
603
+ scanned_image.save(buffer, format="PNG", optimize=True)
604
+ buffer.seek(0)
605
+
606
+ img_base64 = base64.b64encode(buffer.getvalue()).decode("utf-8")
607
+
608
+ return JSONResponse({
609
+ "success": True,
610
+ "image_base64": img_base64,
611
+ "original_size": original_size,
612
+ "scanned_size": {"width": scanned_image.width, "height": scanned_image.height},
613
+ "enhance_hd": enhance_hd,
614
+ "scale_factor": scale,
615
+ "processing": {
616
+ "auto_crop": True,
617
+ "perspective_correction": True,
618
+ "contrast_enhancement": "CLAHE",
619
+ "noise_reduction": "bilateral_filter",
620
+ "sharpening": "unsharp_mask",
621
+ "hd_upscaling": "Real-ESRGAN" if enhance_hd else "disabled"
622
+ }
623
+ })
624
+
625
+ except Exception as e:
626
+ raise HTTPException(status_code=500, detail=f"Error scanning document: {str(e)}")
627
+
628
+
629
  if __name__ == "__main__":
630
  import uvicorn
631
  uvicorn.run(app, host="0.0.0.0", port=7860)
document_scanner.py ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import cv2
2
+ import numpy as np
3
+ from PIL import Image, ImageEnhance, ImageFilter
4
+
5
+ class DocumentScanner:
6
+ def __init__(self):
7
+ pass
8
+
9
+ def order_points(self, pts):
10
+ rect = np.zeros((4, 2), dtype="float32")
11
+ s = pts.sum(axis=1)
12
+ rect[0] = pts[np.argmin(s)]
13
+ rect[2] = pts[np.argmax(s)]
14
+ diff = np.diff(pts, axis=1)
15
+ rect[1] = pts[np.argmin(diff)]
16
+ rect[3] = pts[np.argmax(diff)]
17
+ return rect
18
+
19
+ def four_point_transform(self, image, pts):
20
+ rect = self.order_points(pts)
21
+ (tl, tr, br, bl) = rect
22
+
23
+ widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
24
+ widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
25
+ maxWidth = max(int(widthA), int(widthB))
26
+
27
+ heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
28
+ heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
29
+ maxHeight = max(int(heightA), int(heightB))
30
+
31
+ dst = np.array([
32
+ [0, 0],
33
+ [maxWidth - 1, 0],
34
+ [maxWidth - 1, maxHeight - 1],
35
+ [0, maxHeight - 1]], dtype="float32")
36
+
37
+ M = cv2.getPerspectiveTransform(rect, dst)
38
+ warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))
39
+ return warped
40
+
41
+ def detect_document(self, image):
42
+ orig = image.copy()
43
+ height, width = image.shape[:2]
44
+
45
+ ratio = height / 500.0
46
+ new_width = int(width / ratio)
47
+ resized = cv2.resize(image, (new_width, 500))
48
+
49
+ gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
50
+
51
+ blurred = cv2.GaussianBlur(gray, (5, 5), 0)
52
+
53
+ edged = cv2.Canny(blurred, 50, 200)
54
+
55
+ kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
56
+ edged = cv2.dilate(edged, kernel, iterations=1)
57
+
58
+ contours, _ = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
59
+ contours = sorted(contours, key=cv2.contourArea, reverse=True)[:10]
60
+
61
+ screen_cnt = None
62
+ for c in contours:
63
+ peri = cv2.arcLength(c, True)
64
+ approx = cv2.approxPolyDP(c, 0.02 * peri, True)
65
+
66
+ if len(approx) == 4:
67
+ screen_cnt = approx
68
+ break
69
+
70
+ if screen_cnt is None:
71
+ edge_margin = 0.02
72
+ h, w = resized.shape[:2]
73
+ margin_x = int(w * edge_margin)
74
+ margin_y = int(h * edge_margin)
75
+ screen_cnt = np.array([
76
+ [[margin_x, margin_y]],
77
+ [[w - margin_x, margin_y]],
78
+ [[w - margin_x, h - margin_y]],
79
+ [[margin_x, h - margin_y]]
80
+ ])
81
+
82
+ return screen_cnt.reshape(4, 2) * ratio
83
+
84
+ def auto_crop_and_align(self, image):
85
+ if isinstance(image, Image.Image):
86
+ image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
87
+
88
+ doc_contour = self.detect_document(image)
89
+
90
+ warped = self.four_point_transform(image, doc_contour)
91
+
92
+ return warped
93
+
94
+ def enhance_sharpness(self, image, amount=1.5):
95
+ if isinstance(image, np.ndarray):
96
+ pil_image = Image.fromarray(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
97
+ else:
98
+ pil_image = image
99
+
100
+ blurred = pil_image.filter(ImageFilter.GaussianBlur(radius=1))
101
+
102
+ blurred_np = np.array(blurred).astype(np.float32)
103
+ original_np = np.array(pil_image).astype(np.float32)
104
+
105
+ sharpened = original_np + (original_np - blurred_np) * amount
106
+ sharpened = np.clip(sharpened, 0, 255).astype(np.uint8)
107
+
108
+ return Image.fromarray(sharpened)
109
+
110
+ def adaptive_contrast(self, image):
111
+ if isinstance(image, Image.Image):
112
+ image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
113
+
114
+ lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
115
+ l, a, b = cv2.split(lab)
116
+
117
+ clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
118
+ l = clahe.apply(l)
119
+
120
+ lab = cv2.merge([l, a, b])
121
+ result = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
122
+
123
+ return result
124
+
125
+ def denoise_preserve_details(self, image, strength=3):
126
+ if isinstance(image, Image.Image):
127
+ image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
128
+
129
+ denoised = cv2.bilateralFilter(image, 9, strength * 10, strength * 10)
130
+
131
+ return denoised
132
+
133
+ def process_document(self, pil_image, enhance_hd=True, scale=2):
134
+ img_array = np.array(pil_image)
135
+ if len(img_array.shape) == 2:
136
+ img_array = cv2.cvtColor(img_array, cv2.COLOR_GRAY2BGR)
137
+ else:
138
+ img_array = cv2.cvtColor(img_array, cv2.COLOR_RGB2BGR)
139
+
140
+ cropped = self.auto_crop_and_align(img_array)
141
+
142
+ denoised = self.denoise_preserve_details(cropped, strength=2)
143
+
144
+ contrasted = self.adaptive_contrast(denoised)
145
+
146
+ result_rgb = cv2.cvtColor(contrasted, cv2.COLOR_BGR2RGB)
147
+ result_pil = Image.fromarray(result_rgb)
148
+
149
+ sharpened = self.enhance_sharpness(result_pil, amount=0.8)
150
+
151
+ enhancer = ImageEnhance.Brightness(sharpened)
152
+ brightened = enhancer.enhance(1.05)
153
+
154
+ if enhance_hd:
155
+ try:
156
+ from enhancer import ImageEnhancer
157
+ ai_enhancer = ImageEnhancer()
158
+ hd_image = ai_enhancer.enhance(brightened, scale=scale)
159
+ return hd_image
160
+ except Exception as e:
161
+ print(f"AI enhancement not available: {e}")
162
+ new_size = (brightened.width * scale, brightened.height * scale)
163
+ hd_image = brightened.resize(new_size, Image.LANCZOS)
164
+ return self.enhance_sharpness(hd_image, amount=0.5)
165
+
166
+ return brightened
167
+
168
+
169
+ class FallbackDocumentScanner:
170
+ def process_document(self, pil_image, enhance_hd=True, scale=2):
171
+ if pil_image.mode != "RGB":
172
+ pil_image = pil_image.convert("RGB")
173
+
174
+ enhancer = ImageEnhance.Contrast(pil_image)
175
+ contrasted = enhancer.enhance(1.15)
176
+
177
+ enhancer = ImageEnhance.Sharpness(contrasted)
178
+ sharpened = enhancer.enhance(1.3)
179
+
180
+ enhancer = ImageEnhance.Brightness(sharpened)
181
+ brightened = enhancer.enhance(1.05)
182
+
183
+ if enhance_hd:
184
+ new_size = (brightened.width * scale, brightened.height * scale)
185
+ hd_image = brightened.resize(new_size, Image.LANCZOS)
186
+
187
+ enhancer = ImageEnhance.Sharpness(hd_image)
188
+ final = enhancer.enhance(1.2)
189
+ return final
190
+
191
+ return brightened
192
+
193
+
194
+ def get_document_scanner():
195
+ try:
196
+ import cv2
197
+ return DocumentScanner()
198
+ except ImportError:
199
+ print("OpenCV not available, using fallback scanner")
200
+ return FallbackDocumentScanner()
replit.md CHANGED
@@ -5,6 +5,7 @@ An AI-powered image processing API with multiple features:
5
  - Image enhancement/upscaling using Real-ESRGAN
6
  - Background removal using BiRefNet via rembg
7
  - Noise reduction using OpenCV Non-Local Means Denoising
 
8
  - FastAPI backend with automatic Swagger API documentation
9
  - Simple web frontend for testing
10
 
@@ -18,6 +19,7 @@ An AI-powered image processing API with multiple features:
18
  β”œβ”€β”€ app.py # Full FastAPI app for Hugging Face deployment
19
  β”œβ”€β”€ app_local.py # Lightweight local preview server
20
  β”œβ”€β”€ enhancer.py # Real-ESRGAN model wrapper (for HF deployment)
 
21
  β”œβ”€β”€ templates/
22
  β”‚ └── index.html # Frontend interface
23
  β”œβ”€β”€ requirements.txt # Dependencies for Hugging Face Spaces
@@ -38,14 +40,33 @@ An AI-powered image processing API with multiple features:
38
  - `POST /remove-background/base64` - Remove background (returns base64)
39
  - `POST /denoise` - Reduce image noise (OpenCV NLM)
40
  - `POST /denoise/base64` - Denoise image (returns base64)
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Deploying to Hugging Face Spaces
43
  1. Create a new Space on Hugging Face
44
  2. Select "Docker" as the SDK
45
- 3. Upload all files: `app.py`, `enhancer.py`, `templates/`, `requirements.txt`, `Dockerfile`, `README.md`
46
  4. The Space will auto-build the container and download AI models
47
 
48
  ## Recent Changes
 
 
 
 
 
 
 
49
  - 2025-11-28: Added background removal and noise reduction features
50
  - BiRefNet integration via rembg for background removal
51
  - OpenCV Non-Local Means Denoising
 
5
  - Image enhancement/upscaling using Real-ESRGAN
6
  - Background removal using BiRefNet via rembg
7
  - Noise reduction using OpenCV Non-Local Means Denoising
8
+ - Document scanning with auto-crop, alignment, and HD enhancement
9
  - FastAPI backend with automatic Swagger API documentation
10
  - Simple web frontend for testing
11
 
 
19
  β”œβ”€β”€ app.py # Full FastAPI app for Hugging Face deployment
20
  β”œβ”€β”€ app_local.py # Lightweight local preview server
21
  β”œβ”€β”€ enhancer.py # Real-ESRGAN model wrapper (for HF deployment)
22
+ β”œβ”€β”€ document_scanner.py # Document scanning with OpenCV (auto-crop, align, enhance)
23
  β”œβ”€β”€ templates/
24
  β”‚ └── index.html # Frontend interface
25
  β”œβ”€β”€ requirements.txt # Dependencies for Hugging Face Spaces
 
40
  - `POST /remove-background/base64` - Remove background (returns base64)
41
  - `POST /denoise` - Reduce image noise (OpenCV NLM)
42
  - `POST /denoise/base64` - Denoise image (returns base64)
43
+ - `POST /docscan` - Scan document (auto-crop, align, HD enhance)
44
+ - `POST /docscan/base64` - Scan document (returns base64)
45
+
46
+ ## Document Scanner Features
47
+ The `/docscan` endpoint provides:
48
+ - **Auto-detection**: Edge detection using Canny algorithm
49
+ - **Auto-crop**: Contour detection and perspective correction
50
+ - **Alignment**: Four-point perspective transform
51
+ - **Contrast**: CLAHE (Contrast Limited Adaptive Histogram Equalization)
52
+ - **Denoising**: Bilateral filter (preserves edges while reducing noise)
53
+ - **Sharpening**: Unsharp masking for crisp text
54
+ - **HD Upscaling**: Optional Real-ESRGAN enhancement (1-4x scale)
55
 
56
  ## Deploying to Hugging Face Spaces
57
  1. Create a new Space on Hugging Face
58
  2. Select "Docker" as the SDK
59
+ 3. Upload all files: `app.py`, `enhancer.py`, `document_scanner.py`, `templates/`, `requirements.txt`, `Dockerfile`, `README.md`
60
  4. The Space will auto-build the container and download AI models
61
 
62
  ## Recent Changes
63
+ - 2025-11-28: Added document scanning feature
64
+ - Auto-crop with edge detection and contour finding
65
+ - Perspective correction for skewed documents
66
+ - CLAHE contrast enhancement
67
+ - Bilateral filter denoising (preserves details)
68
+ - Unsharp mask sharpening
69
+ - Optional HD upscaling with Real-ESRGAN
70
  - 2025-11-28: Added background removal and noise reduction features
71
  - BiRefNet integration via rembg for background removal
72
  - OpenCV Non-Local Means Denoising