shailgsits commited on
Commit
6a90f4f
·
verified ·
1 Parent(s): 05b50f7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,230 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - image-classification
5
+ - document-classification
6
+ - tensorflow
7
+ - efficientnet
8
+ - computer-vision
9
+ license: mit
10
+ framework: tensorflow
11
+ pipeline_tag: image-classification
12
+ ---
13
+
14
+ # Document Classifier
15
+
16
+ A TensorFlow SavedModel for classifying real-world document images into structured categories. Built on **EfficientNet** with preprocessing, the model is designed for production use and includes an extensive validation pipeline covering image quality, fake/AI detection, and confidence thresholding.
17
+
18
+ ---
19
+
20
+ ## Supported Document Types
21
+
22
+ | Class Key | Label | Description |
23
+ |---|---|---|
24
+ | `1_visiting_card` | Visiting Card | Business cards, name cards |
25
+ | `2_prescription` | Prescription | Medical prescriptions |
26
+ | `3_shop_banner` | Shop Banner | Storefront signage, banners |
27
+ | `4_invalid_image` | Invalid | Rejected / unrecognized documents |
28
+
29
+ ---
30
+
31
+ ## Model Details
32
+
33
+ | Property | Value |
34
+ |---|---|
35
+ | Architecture | EfficientNet (TF SavedModel) |
36
+ | Input Size | Configured via `settings.IMAGE_SIZE` |
37
+ | Preprocessing | `efficientnet.preprocess_input` |
38
+ | Output | Softmax class probabilities |
39
+ | Confidence Threshold | Configured via `settings.CONFIDENCE_THRESHOLD` |
40
+
41
+ ---
42
+
43
+ ## Repository Structure
44
+
45
+ ```
46
+ document-classifier/
47
+ ├── saved_model.pb
48
+ ├── variables/
49
+ │ ├── variables.index
50
+ │ └── variables.data-00000-of-00001
51
+ ├── class_index.json
52
+ └── README.md
53
+ ```
54
+
55
+ ### `class_index.json` format
56
+
57
+ ```json
58
+ {
59
+ "1_visiting_card": 0,
60
+ "2_prescription": 1,
61
+ "3_shop_banner": 2,
62
+ "4_invalid_image": 3
63
+ }
64
+ ```
65
+
66
+ ---
67
+
68
+ ## Installation
69
+
70
+ ```bash
71
+ pip install tensorflow opencv-python pillow huggingface_hub
72
+ # Optional but recommended:
73
+ pip install pytesseract # For AI watermark OCR detection
74
+ ```
75
+
76
+ ---
77
+
78
+ ## Usage
79
+
80
+ ### Load from Hugging Face
81
+
82
+ ```python
83
+ from huggingface_hub import snapshot_download
84
+ import tensorflow as tf
85
+ import json
86
+
87
+ # Download model + class index
88
+ local_path = snapshot_download(repo_id="your-username/document-classifier")
89
+
90
+ # Load model
91
+ model = tf.saved_model.load(local_path)
92
+ infer = model.signatures["serving_default"]
93
+
94
+ # Load class labels
95
+ with open(f"{local_path}/class_index.json") as f:
96
+ class_indices = json.load(f)
97
+
98
+ LABELS = {int(v): k for k, v in class_indices.items()}
99
+ ```
100
+
101
+ ### Run Inference
102
+
103
+ ```python
104
+ import cv2
105
+ import numpy as np
106
+ from tensorflow.keras.applications.efficientnet import preprocess_input
107
+
108
+ IMAGE_SIZE = (224, 224) # match your training config
109
+
110
+ def predict(image_path: str):
111
+ img = cv2.imread(image_path)
112
+ img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
113
+ resized = cv2.resize(img_rgb, IMAGE_SIZE)
114
+ input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
115
+ input_arr = preprocess_input(input_arr)
116
+
117
+ outputs = infer(tf.constant(input_arr))
118
+ preds = list(outputs.values())[0].numpy()[0]
119
+
120
+ class_id = int(np.argmax(preds))
121
+ confidence = float(np.max(preds))
122
+ label = LABELS.get(class_id, "unknown")
123
+
124
+ return {"label": label, "confidence": round(confidence * 100, 2)}
125
+
126
+ result = predict("my_document.jpg")
127
+ print(result)
128
+ # {'label': '1_visiting_card', 'confidence': 97.43}
129
+ ```
130
+
131
+ ---
132
+
133
+ ## Validation Pipeline
134
+
135
+ Before inference runs, every image passes through a multi-stage validation pipeline. Requests are rejected early and cheaply when possible.
136
+
137
+ ### Image Quality Checks
138
+
139
+ | Check | Condition | Rejection Code |
140
+ |---|---|---|
141
+ | Blank image | Grayscale std < 12 | `BLANK_IMAGE` |
142
+ | Blurry image | Laplacian variance < 10 | `BLURRED_IMAGE` |
143
+ | Ruled paper | ≥5 evenly-spaced horizontal lines | `RULED_PAPER` |
144
+ | No text | Fewer than 6 text-like connected components | `NO_MEANINGFUL_TEXT` |
145
+
146
+ ### AI / Fake Image Detection
147
+
148
+ The pipeline runs AI-detection checks from cheapest to most expensive:
149
+
150
+ | Step | Method | Description |
151
+ |---|---|---|
152
+ | 1 | **EXIF/XMP Metadata** | Scans for AI tool keywords (`midjourney`, `dall-e`, `stable-diffusion`, etc.) and flags Google ICC profile without camera EXIF tags |
153
+ | 2 | **Screenshot / UI detection** | Rejects app screenshots with >55% near-white pixels or flat white corners |
154
+ | 3 | **AI watermark OCR** | Scans the bottom 20% of the image for known AI generator watermarks via Tesseract |
155
+ | 4 | **Gemini ✦ sparkle** | Detects the characteristic Gemini/Imagen sparkle artifact in the bottom-right corner using both absolute and local-contrast blob analysis |
156
+ | 5 | **AI staged background** | Detects bokeh-blurred backgrounds with a sharp foreground card (card/background sharpness ratio > 5.0) |
157
+ | 6 | **Perspective tilt** | Flags images where >35% of detected lines fall in the 15°–45° diagonal range |
158
+ | 7 | **DCT frequency analysis** | Flags unnaturally uniform high-frequency energy (ratio > 0.12) |
159
+ | 8 | **Texture uniformity** | Flags low patch variance coefficient of variation (< 0.4) combined with low mean variance (< 50) |
160
+
161
+ ### Response Format
162
+
163
+ **Valid document:**
164
+ ```json
165
+ {
166
+ "status": "VALID",
167
+ "title": "Document Verified Successfully",
168
+ "message": "Your document has been identified as a Visiting Card.",
169
+ "document_type": "1_visiting_card",
170
+ "document_type_label": "Visiting Card",
171
+ "confidence": 97.43,
172
+ "doc_type_received": null
173
+ }
174
+ ```
175
+
176
+ **Invalid / rejected:**
177
+ ```json
178
+ {
179
+ "status": "INVALID",
180
+ "reason_code": "AI_GENERATED_IMAGE",
181
+ "title": "AI-Generated Image Detected",
182
+ "message": "The uploaded image appears to be AI-generated and cannot be accepted.",
183
+ "suggestion": "Please upload a real photograph of your document."
184
+ }
185
+ ```
186
+
187
+ ### All Rejection Codes
188
+
189
+ | Code | Meaning |
190
+ |---|---|
191
+ | `BLANK_IMAGE` | Blank or uniformly white/black image |
192
+ | `BLURRED_IMAGE` | Image too blurry to process |
193
+ | `RULED_PAPER` | Lined/ruled paper detected |
194
+ | `NO_MEANINGFUL_TEXT` | No readable text components found |
195
+ | `SCREENSHOT_DOCUMENT` | App screenshot or web UI render |
196
+ | `AI_GENERATED_IMAGE` | AI-generated image (any detection method) |
197
+ | `MODEL_REJECTED` | Model confidence below threshold or invalid class |
198
+ | `UNREADABLE_IMAGE` | File could not be decoded |
199
+ | `SERVER_ERROR` | Unexpected server-side error |
200
+
201
+ ---
202
+
203
+ ## Dependencies
204
+
205
+ | Package | Purpose |
206
+ |---|---|
207
+ | `tensorflow` | Model loading and inference |
208
+ | `opencv-python` | Image decoding, quality checks, AI detection |
209
+ | `pillow` | EXIF/XMP metadata reading |
210
+ | `pytesseract` | AI watermark OCR scan (optional) |
211
+ | `numpy` | Array operations |
212
+
213
+ ---
214
+
215
+ ## Configuration
216
+
217
+ The model reads settings from a `config.py` / `get_settings()` object. Key settings:
218
+
219
+ | Setting | Description |
220
+ |---|---|
221
+ | `MODEL_PATH` | Path to the SavedModel directory |
222
+ | `CLASS_INDEX_FILE` | Path to `class_index.json` |
223
+ | `IMAGE_SIZE` | Tuple, e.g. `(224, 224)` |
224
+ | `CONFIDENCE_THRESHOLD` | Float, e.g. `0.75` — minimum confidence to accept |
225
+
226
+ ---
227
+
228
+ ## License
229
+
230
+ MIT
class_index.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"1_visiting_card": 0, "2_prescription": 1, "3_shop_banner": 2, "4_invalid_image": 3}
fingerprint.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb66d3a7ea49d1815a57db751e427f83733297c68ae54d5def094ab65f915bcc
3
+ size 97
saved_model.pb ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:830d7b949a0d27fa1af2f0fae6cf093f46ae14c66396d46c25bd88eeee018169
3
+ size 2350545
variables/variables.data-00000-of-00001 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39fa2d05ba34b21ff5dfb0b669060ce2a58c60397dd68d1e2340b5b4be392adb
3
+ size 32548732
variables/variables.index ADDED
Binary file (36.6 kB). View file