shailgsits
/

document-classifier

@@ -7,224 +7,208 @@ tags:
   - efficientnet
   - computer-vision
 license: mit
-framework: tensorflow
 pipeline_tag: image-classification
 ---
 # Document Classifier
-A TensorFlow SavedModel for classifying real-world document images into structured categories. Built on **EfficientNet** with preprocessing, the model is designed for production use and includes an extensive validation pipeline covering image quality, fake/AI detection, and confidence thresholding.
 ---
-## Supported Document Types
-| Class Key | Label | Description |
-|---|---|---|
-| `1_visiting_card` | Visiting Card | Business cards, name cards |
-| `2_prescription` | Prescription | Medical prescriptions |
-| `3_shop_banner` | Shop Banner | Storefront signage, banners |
-| `4_invalid_image` | Invalid | Rejected / unrecognized documents |
----
-## Model Details
-| Property | Value |
-|---|---|
-| Architecture | EfficientNet (TF SavedModel) |
-| Input Size | Configured via `settings.IMAGE_SIZE` |
-| Preprocessing | `efficientnet.preprocess_input` |
-| Output | Softmax class probabilities |
-| Confidence Threshold | Configured via `settings.CONFIDENCE_THRESHOLD` |
----
-## Repository Structure
-```
-document-classifier/
-├── saved_model.pb
-├── variables/
-│   ├── variables.index
-│   └── variables.data-00000-of-00001
-├── class_index.json
-└── README.md
-```
-### `class_index.json` format
-```json
-{
-    "1_visiting_card": 0,
-    "2_prescription":  1,
-    "3_shop_banner":   2,
-    "4_invalid_image": 3
-}
-```
----
-## Installation
-```bash
-pip install tensorflow opencv-python pillow huggingface_hub
-# Optional but recommended:
-pip install pytesseract   # For AI watermark OCR detection
-```
----
-## Usage
-### Load from Hugging Face
-```python
 from huggingface_hub import snapshot_download
 import tensorflow as tf
 import json
-# Download model + class index
-local_path = snapshot_download(repo_id="your-username/document-classifier")
-# Load model
 model = tf.saved_model.load(local_path)
 infer = model.signatures["serving_default"]
-# Load class labels
 with open(f"{local_path}/class_index.json") as f:
     class_indices = json.load(f)
 LABELS = {int(v): k for k, v in class_indices.items()}
-```
-### Run Inference
-```python
-import cv2
-import numpy as np
-from tensorflow.keras.applications.efficientnet import preprocess_input
-IMAGE_SIZE = (224, 224)   # match your training config
-def predict(image_path: str):
     img = cv2.imread(image_path)
-    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
-    resized = cv2.resize(img_rgb, IMAGE_SIZE)
     input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
     input_arr = preprocess_input(input_arr)
-    outputs = infer(tf.constant(input_arr))
-    preds = list(outputs.values())[0].numpy()[0]
     class_id   = int(np.argmax(preds))
     confidence = float(np.max(preds))
     label      = LABELS.get(class_id, "unknown")
-    return {"label": label, "confidence": round(confidence * 100, 2)}
-result = predict("my_document.jpg")
 print(result)
-# {'label': '1_visiting_card', 'confidence': 97.43}
 ```
 ---
-## Validation Pipeline
-Before inference runs, every image passes through a multi-stage validation pipeline. Requests are rejected early and cheaply when possible.
-### Image Quality Checks
-| Check | Condition | Rejection Code |
-|---|---|---|
-| Blank image | Grayscale std < 12 | `BLANK_IMAGE` |
-| Blurry image | Laplacian variance < 10 | `BLURRED_IMAGE` |
-| Ruled paper | ≥5 evenly-spaced horizontal lines | `RULED_PAPER` |
-| No text | Fewer than 6 text-like connected components | `NO_MEANINGFUL_TEXT` |
-### AI / Fake Image Detection
-The pipeline runs AI-detection checks from cheapest to most expensive:
-| Step | Method | Description |
-|---|---|---|
-| 1 | **EXIF/XMP Metadata** | Scans for AI tool keywords (`midjourney`, `dall-e`, `stable-diffusion`, etc.) and flags Google ICC profile without camera EXIF tags |
-| 2 | **Screenshot / UI detection** | Rejects app screenshots with >55% near-white pixels or flat white corners |
-| 3 | **AI watermark OCR** | Scans the bottom 20% of the image for known AI generator watermarks via Tesseract |
-| 4 | **Gemini ✦ sparkle** | Detects the characteristic Gemini/Imagen sparkle artifact in the bottom-right corner using both absolute and local-contrast blob analysis |
-| 5 | **AI staged background** | Detects bokeh-blurred backgrounds with a sharp foreground card (card/background sharpness ratio > 5.0) |
-| 6 | **Perspective tilt** | Flags images where >35% of detected lines fall in the 15°–45° diagonal range |
-| 7 | **DCT frequency analysis** | Flags unnaturally uniform high-frequency energy (ratio > 0.12) |
-| 8 | **Texture uniformity** | Flags low patch variance coefficient of variation (< 0.4) combined with low mean variance (< 50) |
-### Response Format
-**Valid document:**
-```json
-{
-    "status": "VALID",
-    "title": "Document Verified Successfully",
-    "message": "Your document has been identified as a Visiting Card.",
-    "document_type": "1_visiting_card",
-    "document_type_label": "Visiting Card",
-    "confidence": 97.43,
-    "doc_type_received": null
-}
-```
-**Invalid / rejected:**
-```json
-{
-    "status": "INVALID",
-    "reason_code": "AI_GENERATED_IMAGE",
-    "title": "AI-Generated Image Detected",
-    "message": "The uploaded image appears to be AI-generated and cannot be accepted.",
-    "suggestion": "Please upload a real photograph of your document."
-}
-```
-### All Rejection Codes
-| Code | Meaning |
 |---|---|
-| `BLANK_IMAGE` | Blank or uniformly white/black image |
-| `BLURRED_IMAGE` | Image too blurry to process |
-| `RULED_PAPER` | Lined/ruled paper detected |
-| `NO_MEANINGFUL_TEXT` | No readable text components found |
-| `SCREENSHOT_DOCUMENT` | App screenshot or web UI render |
-| `AI_GENERATED_IMAGE` | AI-generated image (any detection method) |
-| `MODEL_REJECTED` | Model confidence below threshold or invalid class |
-| `UNREADABLE_IMAGE` | File could not be decoded |
-| `SERVER_ERROR` | Unexpected server-side error |
 ---
-## Dependencies
-| Package | Purpose |
-|---|---|
-| `tensorflow` | Model loading and inference |
-| `opencv-python` | Image decoding, quality checks, AI detection |
-| `pillow` | EXIF/XMP metadata reading |
-| `pytesseract` | AI watermark OCR scan (optional) |
-| `numpy` | Array operations |
 ---
-## Configuration
-The model reads settings from a `config.py` / `get_settings()` object. Key settings:
-| Setting | Description |
 |---|---|
-| `MODEL_PATH` | Path to the SavedModel directory |
-| `CLASS_INDEX_FILE` | Path to `class_index.json` |
-| `IMAGE_SIZE` | Tuple, e.g. `(224, 224)` |
-| `CONFIDENCE_THRESHOLD` | Float, e.g. `0.75` — minimum confidence to accept |
 ---
 ## License
-MIT

   - efficientnet
   - computer-vision
 license: mit
 pipeline_tag: image-classification
+library_name: tf-keras
 ---
 # Document Classifier
+A Keras EfficientNet model for classifying real-world document images into structured categories. Includes a full validation pipeline covering image quality checks and AI/fake image detection.
 ---
+## How to use this model
+```python
+# Step 1 — Install dependencies
+# pip install huggingface_hub tensorflow opencv-python pillow
+# Step 2 — Copy and run this complete code
 from huggingface_hub import snapshot_download
 import tensorflow as tf
+import numpy as np
+import cv2
 import json
+from tensorflow.keras.applications.efficientnet import preprocess_input
+# Download model from Hugging Face (cached after first run)
+local_path = snapshot_download(repo_id="shailgsits/document-classifier")
+# Load model + class labels
 model = tf.saved_model.load(local_path)
 infer = model.signatures["serving_default"]
 with open(f"{local_path}/class_index.json") as f:
     class_indices = json.load(f)
 LABELS = {int(v): k for k, v in class_indices.items()}
+DOCUMENT_TYPE_LABELS = {
+    "1_visiting_card": "Visiting Card",
+    "2_prescription":  "Prescription",
+    "3_shop_banner":   "Shop Banner",
+    "4_invalid_image": "Invalid",
+}
+def predict(image_path: str) -> dict:
     img = cv2.imread(image_path)
+    if img is None:
+        return {"status": "ERROR", "message": "Could not read image"}
+    img_rgb   = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
+    resized   = cv2.resize(img_rgb, (224, 224))
     input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
     input_arr = preprocess_input(input_arr)
+    outputs    = infer(tf.constant(input_arr))
+    preds      = list(outputs.values())[0].numpy()[0]
     class_id   = int(np.argmax(preds))
     confidence = float(np.max(preds))
     label      = LABELS.get(class_id, "unknown")
+    friendly   = DOCUMENT_TYPE_LABELS.get(label, label)
+    return {
+        "status":               "VALID" if confidence >= 0.75 else "LOW_CONFIDENCE",
+        "document_type":        label,
+        "document_type_label":  friendly,
+        "confidence":           round(confidence * 100, 2),
+        "all_scores": {
+            DOCUMENT_TYPE_LABELS.get(LABELS[i], LABELS[i]): round(float(p) * 100, 2)
+            for i, p in enumerate(preds)
+        }
+    }
+# --- Run prediction ---
+result = predict("your_image.jpg")
 print(result)
+# Example output:
+# {
+#   'status': 'VALID',
+#   'document_type': '1_visiting_card',
+#   'document_type_label': 'Visiting Card',
+#   'confidence': 97.43,
+#   'all_scores': {'Visiting Card': 97.43, 'Prescription': 1.2, 'Shop Banner': 0.9, 'Invalid': 0.47}
+# }
 ```
 ---
+## Supported Document Types
+| Label | Description |
+|---|---|
+| `visiting_card` | Business / name cards |
+| `prescription` | Medical prescriptions |
+| `shop_banner` | Storefront signage, banners |
+| `invalid_image` | Rejected / unrecognized documents |
+---
+## Files in this repo
+| File | Description |
 |---|---|
+| `document_classifier_final.keras` | Trained Keras model (EfficientNet) |
+| `class_index.json` | Class name → index mapping |
 ---
+## Quick Test in Google Colab
+```python
+!pip install huggingface_hub tensorflow pillow opencv-python requests -q
+import tensorflow as tf, numpy as np, cv2, requests, json
+from PIL import Image
+from io import BytesIO
+from huggingface_hub import hf_hub_download
+from tensorflow.keras.applications.efficientnet import preprocess_input
+# Load model + class mapping
+model = tf.keras.models.load_model(
+    hf_hub_download("shailgsits/document-classifier", "document_classifier_final.keras")
+)
+with open(hf_hub_download("shailgsits/document-classifier", "class_index.json")) as f:
+    index_to_label = {v: k.split("_", 1)[1] for k, v in json.load(f).items()}
+# Predict from any image URL
+def predict_from_url(url: str):
+    img = np.array(Image.open(BytesIO(requests.get(url).content)).convert("RGB"))[:, :, ::-1]
+    h, w = img.shape[:2]
+    scale = min(224 / w, 224 / h)
+    nw, nh = int(w * scale), int(h * scale)
+    res = cv2.resize(img, (nw, nh))
+    canvas = np.ones((224, 224, 3), np.uint8) * 255
+    canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
+    input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
+    pred = model.predict(input_arr)[0]
+    idx = int(np.argmax(pred))
+    return {"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)}
+# Test with a Google Drive image
+url = "https://drive.google.com/uc?export=download&id=YOUR_FILE_ID"
+print(predict_from_url(url))
+# {'label': 'visiting_card', 'confidence': 97.43}
+```
+---
+## Predict from local file (Colab upload)
+```python
+from google.colab import files
+uploaded = files.upload()
+image_path = list(uploaded.keys())[0]
+img = cv2.imread(image_path)
+h, w = img.shape[:2]
+scale = min(224 / w, 224 / h)
+nw, nh = int(w * scale), int(h * scale)
+res = cv2.resize(img, (nw, nh))
+canvas = np.ones((224, 224, 3), np.uint8) * 255
+canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
+input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
+pred = model.predict(input_arr)[0]
+idx = int(np.argmax(pred))
+print({"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)})
+```
 ---
+## Preprocessing Details
+Images are resized with **letterboxing** (aspect-ratio preserved, white padding) to 224×224, then passed through `EfficientNet`'s `preprocess_input`.
+---
+## Validation Pipeline
+Before inference, every image passes through:
+| Check | Condition |
 |---|---|
+| Blank image | Grayscale std < 12 |
+| Blurry image | Laplacian variance < 10 |
+| Ruled paper | ≥5 evenly-spaced horizontal lines |
+| No text detected | Fewer than 6 connected text components |
+| AI metadata | EXIF/XMP contains AI tool keywords |
+| Screenshot/UI | >55% near-white pixels |
+| AI watermark | OCR detects generator text in bottom strip |
+| Gemini sparkle | Sparkle artifact in bottom-right corner |
+| AI staged background | Card/background sharpness ratio > 5.0 |
+| Perspective tilt | >35% lines in 15°–45° diagonal range |
+| DCT frequency | High-freq energy ratio > 0.12 |
+| Texture uniformity | Patch variance CV < 0.4 and mean var < 50 |
 ---
 ## License
+MIT
+---
+## Author
+Developed and trained by **[Shailendra Singh Tiwari](https://www.linkedin.com/in/shailendra-singh-tiwari/)**