document-classifier / README.md
shailgsits's picture
Upload folder using huggingface_hub
e9ac033 verified
metadata
language: en
tags:
  - image-classification
  - document-classification
  - tensorflow
  - efficientnet
  - computer-vision
license: mit
pipeline_tag: image-classification
library_name: tf-keras

Document Classifier

A Keras EfficientNet model for classifying real-world document images into structured categories. Includes a full validation pipeline covering image quality checks and AI/fake image detection.


How to use this model

# Step 1 — Install dependencies
# pip install huggingface_hub tensorflow opencv-python pillow

# Step 2 — Copy and run this complete code

from huggingface_hub import snapshot_download
import tensorflow as tf
import numpy as np
import cv2
import json
from tensorflow.keras.applications.efficientnet import preprocess_input

# Download model from Hugging Face (cached after first run)
local_path = snapshot_download(repo_id="shailgsits/document-classifier")

# Load model + class labels
model = tf.saved_model.load(local_path)
infer = model.signatures["serving_default"]

with open(f"{local_path}/class_index.json") as f:
    class_indices = json.load(f)
LABELS = {int(v): k for k, v in class_indices.items()}

DOCUMENT_TYPE_LABELS = {
    "1_visiting_card": "Visiting Card",
    "2_prescription":  "Prescription",
    "3_shop_banner":   "Shop Banner",
    "4_invalid_image": "Invalid",
}

def predict(image_path: str) -> dict:
    img = cv2.imread(image_path)
    if img is None:
        return {"status": "ERROR", "message": "Could not read image"}

    img_rgb   = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    resized   = cv2.resize(img_rgb, (224, 224))
    input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
    input_arr = preprocess_input(input_arr)

    outputs    = infer(tf.constant(input_arr))
    preds      = list(outputs.values())[0].numpy()[0]
    class_id   = int(np.argmax(preds))
    confidence = float(np.max(preds))
    label      = LABELS.get(class_id, "unknown")
    friendly   = DOCUMENT_TYPE_LABELS.get(label, label)

    return {
        "status":               "VALID" if confidence >= 0.75 else "LOW_CONFIDENCE",
        "document_type":        label,
        "document_type_label":  friendly,
        "confidence":           round(confidence * 100, 2),
        "all_scores": {
            DOCUMENT_TYPE_LABELS.get(LABELS[i], LABELS[i]): round(float(p) * 100, 2)
            for i, p in enumerate(preds)
        }
    }

# --- Run prediction ---
result = predict("your_image.jpg")
print(result)

# Example output:
# {
#   'status': 'VALID',
#   'document_type': '1_visiting_card',
#   'document_type_label': 'Visiting Card',
#   'confidence': 97.43,
#   'all_scores': {'Visiting Card': 97.43, 'Prescription': 1.2, 'Shop Banner': 0.9, 'Invalid': 0.47}
# }

Supported Document Types

Label Description
visiting_card Business / name cards
prescription Medical prescriptions
shop_banner Storefront signage, banners
invalid_image Rejected / unrecognized documents

Files in this repo

File Description
document_classifier_final.keras Trained Keras model (EfficientNet)
class_index.json Class name → index mapping

Quick Test in Google Colab

!pip install huggingface_hub tensorflow pillow opencv-python requests -q

import tensorflow as tf, numpy as np, cv2, requests, json
from PIL import Image
from io import BytesIO
from huggingface_hub import hf_hub_download
from tensorflow.keras.applications.efficientnet import preprocess_input

# Load model + class mapping
model = tf.keras.models.load_model(
    hf_hub_download("shailgsits/document-classifier", "document_classifier_final.keras")
)
with open(hf_hub_download("shailgsits/document-classifier", "class_index.json")) as f:
    index_to_label = {v: k.split("_", 1)[1] for k, v in json.load(f).items()}

# Predict from any image URL
def predict_from_url(url: str):
    img = np.array(Image.open(BytesIO(requests.get(url).content)).convert("RGB"))[:, :, ::-1]
    h, w = img.shape[:2]
    scale = min(224 / w, 224 / h)
    nw, nh = int(w * scale), int(h * scale)
    res = cv2.resize(img, (nw, nh))
    canvas = np.ones((224, 224, 3), np.uint8) * 255
    canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
    input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
    pred = model.predict(input_arr)[0]
    idx = int(np.argmax(pred))
    return {"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)}

# Test with a Google Drive image
url = "https://drive.google.com/uc?export=download&id=YOUR_FILE_ID"
print(predict_from_url(url))
# {'label': 'visiting_card', 'confidence': 97.43}

Predict from local file (Colab upload)

from google.colab import files
uploaded = files.upload()
image_path = list(uploaded.keys())[0]

img = cv2.imread(image_path)
h, w = img.shape[:2]
scale = min(224 / w, 224 / h)
nw, nh = int(w * scale), int(h * scale)
res = cv2.resize(img, (nw, nh))
canvas = np.ones((224, 224, 3), np.uint8) * 255
canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
pred = model.predict(input_arr)[0]
idx = int(np.argmax(pred))
print({"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)})

Preprocessing Details

Images are resized with letterboxing (aspect-ratio preserved, white padding) to 224×224, then passed through EfficientNet's preprocess_input.


Validation Pipeline

Before inference, every image passes through:

Check Condition
Blank image Grayscale std < 12
Blurry image Laplacian variance < 10
Ruled paper ≥5 evenly-spaced horizontal lines
No text detected Fewer than 6 connected text components
AI metadata EXIF/XMP contains AI tool keywords
Screenshot/UI >55% near-white pixels
AI watermark OCR detects generator text in bottom strip
Gemini sparkle Sparkle artifact in bottom-right corner
AI staged background Card/background sharpness ratio > 5.0
Perspective tilt >35% lines in 15°–45° diagonal range
DCT frequency High-freq energy ratio > 0.12
Texture uniformity Patch variance CV < 0.4 and mean var < 50

License

MIT


Author

Developed and trained by Shailendra Singh Tiwari