shailgsits commited on
Commit
e9ac033
Β·
verified Β·
1 Parent(s): 4a36d74

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +149 -165
README.md CHANGED
@@ -7,224 +7,208 @@ tags:
7
  - efficientnet
8
  - computer-vision
9
  license: mit
10
- framework: tensorflow
11
  pipeline_tag: image-classification
 
12
  ---
13
 
14
  # Document Classifier
15
 
16
- A TensorFlow SavedModel for classifying real-world document images into structured categories. Built on **EfficientNet** with preprocessing, the model is designed for production use and includes an extensive validation pipeline covering image quality, fake/AI detection, and confidence thresholding.
17
 
18
  ---
19
 
20
- ## Supported Document Types
21
-
22
- | Class Key | Label | Description |
23
- |---|---|---|
24
- | `1_visiting_card` | Visiting Card | Business cards, name cards |
25
- | `2_prescription` | Prescription | Medical prescriptions |
26
- | `3_shop_banner` | Shop Banner | Storefront signage, banners |
27
- | `4_invalid_image` | Invalid | Rejected / unrecognized documents |
28
-
29
- ---
30
-
31
- ## Model Details
32
-
33
- | Property | Value |
34
- |---|---|
35
- | Architecture | EfficientNet (TF SavedModel) |
36
- | Input Size | Configured via `settings.IMAGE_SIZE` |
37
- | Preprocessing | `efficientnet.preprocess_input` |
38
- | Output | Softmax class probabilities |
39
- | Confidence Threshold | Configured via `settings.CONFIDENCE_THRESHOLD` |
40
-
41
- ---
42
-
43
- ## Repository Structure
44
-
45
- ```
46
- document-classifier/
47
- β”œβ”€β”€ saved_model.pb
48
- β”œβ”€β”€ variables/
49
- β”‚ β”œβ”€β”€ variables.index
50
- β”‚ └── variables.data-00000-of-00001
51
- β”œβ”€β”€ class_index.json
52
- └── README.md
53
- ```
54
-
55
- ### `class_index.json` format
56
-
57
- ```json
58
- {
59
- "1_visiting_card": 0,
60
- "2_prescription": 1,
61
- "3_shop_banner": 2,
62
- "4_invalid_image": 3
63
- }
64
- ```
65
-
66
- ---
67
-
68
- ## Installation
69
 
70
- ```bash
71
- pip install tensorflow opencv-python pillow huggingface_hub
72
- # Optional but recommended:
73
- pip install pytesseract # For AI watermark OCR detection
74
- ```
75
-
76
- ---
77
-
78
- ## Usage
79
 
80
- ### Load from Hugging Face
81
 
82
- ```python
83
  from huggingface_hub import snapshot_download
84
  import tensorflow as tf
 
 
85
  import json
 
86
 
87
- # Download model + class index
88
- local_path = snapshot_download(repo_id="your-username/document-classifier")
89
 
90
- # Load model
91
  model = tf.saved_model.load(local_path)
92
  infer = model.signatures["serving_default"]
93
 
94
- # Load class labels
95
  with open(f"{local_path}/class_index.json") as f:
96
  class_indices = json.load(f)
97
-
98
  LABELS = {int(v): k for k, v in class_indices.items()}
99
- ```
100
-
101
- ### Run Inference
102
-
103
- ```python
104
- import cv2
105
- import numpy as np
106
- from tensorflow.keras.applications.efficientnet import preprocess_input
107
 
108
- IMAGE_SIZE = (224, 224) # match your training config
 
 
 
 
 
109
 
110
- def predict(image_path: str):
111
  img = cv2.imread(image_path)
112
- img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
113
- resized = cv2.resize(img_rgb, IMAGE_SIZE)
 
 
 
114
  input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
115
  input_arr = preprocess_input(input_arr)
116
 
117
- outputs = infer(tf.constant(input_arr))
118
- preds = list(outputs.values())[0].numpy()[0]
119
-
120
  class_id = int(np.argmax(preds))
121
  confidence = float(np.max(preds))
122
  label = LABELS.get(class_id, "unknown")
123
-
124
- return {"label": label, "confidence": round(confidence * 100, 2)}
125
-
126
- result = predict("my_document.jpg")
 
 
 
 
 
 
 
 
 
 
 
127
  print(result)
128
- # {'label': '1_visiting_card', 'confidence': 97.43}
 
 
 
 
 
 
 
 
129
  ```
130
 
131
  ---
132
 
133
- ## Validation Pipeline
134
 
135
- Before inference runs, every image passes through a multi-stage validation pipeline. Requests are rejected early and cheaply when possible.
136
-
137
- ### Image Quality Checks
138
-
139
- | Check | Condition | Rejection Code |
140
- |---|---|---|
141
- | Blank image | Grayscale std < 12 | `BLANK_IMAGE` |
142
- | Blurry image | Laplacian variance < 10 | `BLURRED_IMAGE` |
143
- | Ruled paper | β‰₯5 evenly-spaced horizontal lines | `RULED_PAPER` |
144
- | No text | Fewer than 6 text-like connected components | `NO_MEANINGFUL_TEXT` |
145
-
146
- ### AI / Fake Image Detection
147
-
148
- The pipeline runs AI-detection checks from cheapest to most expensive:
149
-
150
- | Step | Method | Description |
151
- |---|---|---|
152
- | 1 | **EXIF/XMP Metadata** | Scans for AI tool keywords (`midjourney`, `dall-e`, `stable-diffusion`, etc.) and flags Google ICC profile without camera EXIF tags |
153
- | 2 | **Screenshot / UI detection** | Rejects app screenshots with >55% near-white pixels or flat white corners |
154
- | 3 | **AI watermark OCR** | Scans the bottom 20% of the image for known AI generator watermarks via Tesseract |
155
- | 4 | **Gemini ✦ sparkle** | Detects the characteristic Gemini/Imagen sparkle artifact in the bottom-right corner using both absolute and local-contrast blob analysis |
156
- | 5 | **AI staged background** | Detects bokeh-blurred backgrounds with a sharp foreground card (card/background sharpness ratio > 5.0) |
157
- | 6 | **Perspective tilt** | Flags images where >35% of detected lines fall in the 15°–45Β° diagonal range |
158
- | 7 | **DCT frequency analysis** | Flags unnaturally uniform high-frequency energy (ratio > 0.12) |
159
- | 8 | **Texture uniformity** | Flags low patch variance coefficient of variation (< 0.4) combined with low mean variance (< 50) |
160
-
161
- ### Response Format
162
-
163
- **Valid document:**
164
- ```json
165
- {
166
- "status": "VALID",
167
- "title": "Document Verified Successfully",
168
- "message": "Your document has been identified as a Visiting Card.",
169
- "document_type": "1_visiting_card",
170
- "document_type_label": "Visiting Card",
171
- "confidence": 97.43,
172
- "doc_type_received": null
173
- }
174
- ```
175
 
176
- **Invalid / rejected:**
177
- ```json
178
- {
179
- "status": "INVALID",
180
- "reason_code": "AI_GENERATED_IMAGE",
181
- "title": "AI-Generated Image Detected",
182
- "message": "The uploaded image appears to be AI-generated and cannot be accepted.",
183
- "suggestion": "Please upload a real photograph of your document."
184
- }
185
- ```
186
 
187
- ### All Rejection Codes
188
 
189
- | Code | Meaning |
190
  |---|---|
191
- | `BLANK_IMAGE` | Blank or uniformly white/black image |
192
- | `BLURRED_IMAGE` | Image too blurry to process |
193
- | `RULED_PAPER` | Lined/ruled paper detected |
194
- | `NO_MEANINGFUL_TEXT` | No readable text components found |
195
- | `SCREENSHOT_DOCUMENT` | App screenshot or web UI render |
196
- | `AI_GENERATED_IMAGE` | AI-generated image (any detection method) |
197
- | `MODEL_REJECTED` | Model confidence below threshold or invalid class |
198
- | `UNREADABLE_IMAGE` | File could not be decoded |
199
- | `SERVER_ERROR` | Unexpected server-side error |
200
 
201
  ---
202
 
203
- ## Dependencies
204
 
205
- | Package | Purpose |
206
- |---|---|
207
- | `tensorflow` | Model loading and inference |
208
- | `opencv-python` | Image decoding, quality checks, AI detection |
209
- | `pillow` | EXIF/XMP metadata reading |
210
- | `pytesseract` | AI watermark OCR scan (optional) |
211
- | `numpy` | Array operations |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212
 
213
  ---
214
 
215
- ## Configuration
216
 
217
- The model reads settings from a `config.py` / `get_settings()` object. Key settings:
218
 
219
- | Setting | Description |
 
 
 
 
 
 
220
  |---|---|
221
- | `MODEL_PATH` | Path to the SavedModel directory |
222
- | `CLASS_INDEX_FILE` | Path to `class_index.json` |
223
- | `IMAGE_SIZE` | Tuple, e.g. `(224, 224)` |
224
- | `CONFIDENCE_THRESHOLD` | Float, e.g. `0.75` β€” minimum confidence to accept |
 
 
 
 
 
 
 
 
225
 
226
  ---
227
 
228
  ## License
229
 
230
- MIT
 
 
 
 
 
 
 
7
  - efficientnet
8
  - computer-vision
9
  license: mit
 
10
  pipeline_tag: image-classification
11
+ library_name: tf-keras
12
  ---
13
 
14
  # Document Classifier
15
 
16
+ A Keras EfficientNet model for classifying real-world document images into structured categories. Includes a full validation pipeline covering image quality checks and AI/fake image detection.
17
 
18
  ---
19
 
20
+ ## How to use this model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ```python
23
+ # Step 1 β€” Install dependencies
24
+ # pip install huggingface_hub tensorflow opencv-python pillow
 
 
 
 
 
 
25
 
26
+ # Step 2 β€” Copy and run this complete code
27
 
 
28
  from huggingface_hub import snapshot_download
29
  import tensorflow as tf
30
+ import numpy as np
31
+ import cv2
32
  import json
33
+ from tensorflow.keras.applications.efficientnet import preprocess_input
34
 
35
+ # Download model from Hugging Face (cached after first run)
36
+ local_path = snapshot_download(repo_id="shailgsits/document-classifier")
37
 
38
+ # Load model + class labels
39
  model = tf.saved_model.load(local_path)
40
  infer = model.signatures["serving_default"]
41
 
 
42
  with open(f"{local_path}/class_index.json") as f:
43
  class_indices = json.load(f)
 
44
  LABELS = {int(v): k for k, v in class_indices.items()}
 
 
 
 
 
 
 
 
45
 
46
+ DOCUMENT_TYPE_LABELS = {
47
+ "1_visiting_card": "Visiting Card",
48
+ "2_prescription": "Prescription",
49
+ "3_shop_banner": "Shop Banner",
50
+ "4_invalid_image": "Invalid",
51
+ }
52
 
53
+ def predict(image_path: str) -> dict:
54
  img = cv2.imread(image_path)
55
+ if img is None:
56
+ return {"status": "ERROR", "message": "Could not read image"}
57
+
58
+ img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
59
+ resized = cv2.resize(img_rgb, (224, 224))
60
  input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
61
  input_arr = preprocess_input(input_arr)
62
 
63
+ outputs = infer(tf.constant(input_arr))
64
+ preds = list(outputs.values())[0].numpy()[0]
 
65
  class_id = int(np.argmax(preds))
66
  confidence = float(np.max(preds))
67
  label = LABELS.get(class_id, "unknown")
68
+ friendly = DOCUMENT_TYPE_LABELS.get(label, label)
69
+
70
+ return {
71
+ "status": "VALID" if confidence >= 0.75 else "LOW_CONFIDENCE",
72
+ "document_type": label,
73
+ "document_type_label": friendly,
74
+ "confidence": round(confidence * 100, 2),
75
+ "all_scores": {
76
+ DOCUMENT_TYPE_LABELS.get(LABELS[i], LABELS[i]): round(float(p) * 100, 2)
77
+ for i, p in enumerate(preds)
78
+ }
79
+ }
80
+
81
+ # --- Run prediction ---
82
+ result = predict("your_image.jpg")
83
  print(result)
84
+
85
+ # Example output:
86
+ # {
87
+ # 'status': 'VALID',
88
+ # 'document_type': '1_visiting_card',
89
+ # 'document_type_label': 'Visiting Card',
90
+ # 'confidence': 97.43,
91
+ # 'all_scores': {'Visiting Card': 97.43, 'Prescription': 1.2, 'Shop Banner': 0.9, 'Invalid': 0.47}
92
+ # }
93
  ```
94
 
95
  ---
96
 
97
+ ## Supported Document Types
98
 
99
+ | Label | Description |
100
+ |---|---|
101
+ | `visiting_card` | Business / name cards |
102
+ | `prescription` | Medical prescriptions |
103
+ | `shop_banner` | Storefront signage, banners |
104
+ | `invalid_image` | Rejected / unrecognized documents |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
 
106
+ ---
 
 
 
 
 
 
 
 
 
107
 
108
+ ## Files in this repo
109
 
110
+ | File | Description |
111
  |---|---|
112
+ | `document_classifier_final.keras` | Trained Keras model (EfficientNet) |
113
+ | `class_index.json` | Class name β†’ index mapping |
 
 
 
 
 
 
 
114
 
115
  ---
116
 
117
+ ## Quick Test in Google Colab
118
 
119
+ ```python
120
+ !pip install huggingface_hub tensorflow pillow opencv-python requests -q
121
+
122
+ import tensorflow as tf, numpy as np, cv2, requests, json
123
+ from PIL import Image
124
+ from io import BytesIO
125
+ from huggingface_hub import hf_hub_download
126
+ from tensorflow.keras.applications.efficientnet import preprocess_input
127
+
128
+ # Load model + class mapping
129
+ model = tf.keras.models.load_model(
130
+ hf_hub_download("shailgsits/document-classifier", "document_classifier_final.keras")
131
+ )
132
+ with open(hf_hub_download("shailgsits/document-classifier", "class_index.json")) as f:
133
+ index_to_label = {v: k.split("_", 1)[1] for k, v in json.load(f).items()}
134
+
135
+ # Predict from any image URL
136
+ def predict_from_url(url: str):
137
+ img = np.array(Image.open(BytesIO(requests.get(url).content)).convert("RGB"))[:, :, ::-1]
138
+ h, w = img.shape[:2]
139
+ scale = min(224 / w, 224 / h)
140
+ nw, nh = int(w * scale), int(h * scale)
141
+ res = cv2.resize(img, (nw, nh))
142
+ canvas = np.ones((224, 224, 3), np.uint8) * 255
143
+ canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
144
+ input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
145
+ pred = model.predict(input_arr)[0]
146
+ idx = int(np.argmax(pred))
147
+ return {"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)}
148
+
149
+ # Test with a Google Drive image
150
+ url = "https://drive.google.com/uc?export=download&id=YOUR_FILE_ID"
151
+ print(predict_from_url(url))
152
+ # {'label': 'visiting_card', 'confidence': 97.43}
153
+ ```
154
+
155
+ ---
156
+
157
+ ## Predict from local file (Colab upload)
158
+
159
+ ```python
160
+ from google.colab import files
161
+ uploaded = files.upload()
162
+ image_path = list(uploaded.keys())[0]
163
+
164
+ img = cv2.imread(image_path)
165
+ h, w = img.shape[:2]
166
+ scale = min(224 / w, 224 / h)
167
+ nw, nh = int(w * scale), int(h * scale)
168
+ res = cv2.resize(img, (nw, nh))
169
+ canvas = np.ones((224, 224, 3), np.uint8) * 255
170
+ canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
171
+ input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
172
+ pred = model.predict(input_arr)[0]
173
+ idx = int(np.argmax(pred))
174
+ print({"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)})
175
+ ```
176
 
177
  ---
178
 
179
+ ## Preprocessing Details
180
 
181
+ Images are resized with **letterboxing** (aspect-ratio preserved, white padding) to 224Γ—224, then passed through `EfficientNet`'s `preprocess_input`.
182
 
183
+ ---
184
+
185
+ ## Validation Pipeline
186
+
187
+ Before inference, every image passes through:
188
+
189
+ | Check | Condition |
190
  |---|---|
191
+ | Blank image | Grayscale std < 12 |
192
+ | Blurry image | Laplacian variance < 10 |
193
+ | Ruled paper | β‰₯5 evenly-spaced horizontal lines |
194
+ | No text detected | Fewer than 6 connected text components |
195
+ | AI metadata | EXIF/XMP contains AI tool keywords |
196
+ | Screenshot/UI | >55% near-white pixels |
197
+ | AI watermark | OCR detects generator text in bottom strip |
198
+ | Gemini sparkle | Sparkle artifact in bottom-right corner |
199
+ | AI staged background | Card/background sharpness ratio > 5.0 |
200
+ | Perspective tilt | >35% lines in 15°–45Β° diagonal range |
201
+ | DCT frequency | High-freq energy ratio > 0.12 |
202
+ | Texture uniformity | Patch variance CV < 0.4 and mean var < 50 |
203
 
204
  ---
205
 
206
  ## License
207
 
208
+ MIT
209
+
210
+ ---
211
+
212
+ ## Author
213
+
214
+ Developed and trained by **[Shailendra Singh Tiwari](https://www.linkedin.com/in/shailendra-singh-tiwari/)**