Document Classifier
A Keras EfficientNet model for classifying real-world document images into structured categories. Includes a full validation pipeline covering image quality checks and AI/fake image detection.
How to use this model
# Step 1 — Install dependencies
# pip install huggingface_hub tensorflow opencv-python pillow
# Step 2 — Copy and run this complete code
from huggingface_hub import snapshot_download
import tensorflow as tf
import numpy as np
import cv2
import json
from tensorflow.keras.applications.efficientnet import preprocess_input
# Download model from Hugging Face (cached after first run)
local_path = snapshot_download(repo_id="shailgsits/document-classifier")
# Load model + class labels
model = tf.saved_model.load(local_path)
infer = model.signatures["serving_default"]
with open(f"{local_path}/class_index.json") as f:
class_indices = json.load(f)
LABELS = {int(v): k for k, v in class_indices.items()}
DOCUMENT_TYPE_LABELS = {
"1_visiting_card": "Visiting Card",
"2_prescription": "Prescription",
"3_shop_banner": "Shop Banner",
"4_invalid_image": "Invalid",
}
def predict(image_path: str) -> dict:
img = cv2.imread(image_path)
if img is None:
return {"status": "ERROR", "message": "Could not read image"}
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
resized = cv2.resize(img_rgb, (224, 224))
input_arr = np.expand_dims(resized.astype(np.float32), axis=0)
input_arr = preprocess_input(input_arr)
outputs = infer(tf.constant(input_arr))
preds = list(outputs.values())[0].numpy()[0]
class_id = int(np.argmax(preds))
confidence = float(np.max(preds))
label = LABELS.get(class_id, "unknown")
friendly = DOCUMENT_TYPE_LABELS.get(label, label)
return {
"status": "VALID" if confidence >= 0.75 else "LOW_CONFIDENCE",
"document_type": label,
"document_type_label": friendly,
"confidence": round(confidence * 100, 2),
"all_scores": {
DOCUMENT_TYPE_LABELS.get(LABELS[i], LABELS[i]): round(float(p) * 100, 2)
for i, p in enumerate(preds)
}
}
# --- Run prediction ---
result = predict("your_image.jpg")
print(result)
# Example output:
# {
# 'status': 'VALID',
# 'document_type': '1_visiting_card',
# 'document_type_label': 'Visiting Card',
# 'confidence': 97.43,
# 'all_scores': {'Visiting Card': 97.43, 'Prescription': 1.2, 'Shop Banner': 0.9, 'Invalid': 0.47}
# }
Supported Document Types
| Label | Description |
|---|---|
visiting_card |
Business / name cards |
prescription |
Medical prescriptions |
shop_banner |
Storefront signage, banners |
invalid_image |
Rejected / unrecognized documents |
Files in this repo
| File | Description |
|---|---|
document_classifier_final.keras |
Trained Keras model (EfficientNet) |
class_index.json |
Class name → index mapping |
Quick Test in Google Colab
!pip install huggingface_hub tensorflow pillow opencv-python requests -q
import tensorflow as tf, numpy as np, cv2, requests, json
from PIL import Image
from io import BytesIO
from huggingface_hub import hf_hub_download
from tensorflow.keras.applications.efficientnet import preprocess_input
# Load model + class mapping
model = tf.keras.models.load_model(
hf_hub_download("shailgsits/document-classifier", "document_classifier_final.keras")
)
with open(hf_hub_download("shailgsits/document-classifier", "class_index.json")) as f:
index_to_label = {v: k.split("_", 1)[1] for k, v in json.load(f).items()}
# Predict from any image URL
def predict_from_url(url: str):
img = np.array(Image.open(BytesIO(requests.get(url).content)).convert("RGB"))[:, :, ::-1]
h, w = img.shape[:2]
scale = min(224 / w, 224 / h)
nw, nh = int(w * scale), int(h * scale)
res = cv2.resize(img, (nw, nh))
canvas = np.ones((224, 224, 3), np.uint8) * 255
canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
pred = model.predict(input_arr)[0]
idx = int(np.argmax(pred))
return {"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)}
# Test with a Google Drive image
url = "https://drive.google.com/uc?export=download&id=YOUR_FILE_ID"
print(predict_from_url(url))
# {'label': 'visiting_card', 'confidence': 97.43}
Predict from local file (Colab upload)
from google.colab import files
uploaded = files.upload()
image_path = list(uploaded.keys())[0]
img = cv2.imread(image_path)
h, w = img.shape[:2]
scale = min(224 / w, 224 / h)
nw, nh = int(w * scale), int(h * scale)
res = cv2.resize(img, (nw, nh))
canvas = np.ones((224, 224, 3), np.uint8) * 255
canvas[(224 - nh) // 2:(224 - nh) // 2 + nh, (224 - nw) // 2:(224 - nw) // 2 + nw] = res
input_arr = preprocess_input(np.expand_dims(canvas.astype(np.float32), 0))
pred = model.predict(input_arr)[0]
idx = int(np.argmax(pred))
print({"label": index_to_label[idx], "confidence": round(float(pred[idx]) * 100, 2)})
Preprocessing Details
Images are resized with letterboxing (aspect-ratio preserved, white padding) to 224×224, then passed through EfficientNet's preprocess_input.
Validation Pipeline
Before inference, every image passes through:
| Check | Condition |
|---|---|
| Blank image | Grayscale std < 12 |
| Blurry image | Laplacian variance < 10 |
| Ruled paper | ≥5 evenly-spaced horizontal lines |
| No text detected | Fewer than 6 connected text components |
| AI metadata | EXIF/XMP contains AI tool keywords |
| Screenshot/UI | >55% near-white pixels |
| AI watermark | OCR detects generator text in bottom strip |
| Gemini sparkle | Sparkle artifact in bottom-right corner |
| AI staged background | Card/background sharpness ratio > 5.0 |
| Perspective tilt | >35% lines in 15°–45° diagonal range |
| DCT frequency | High-freq energy ratio > 0.12 |
| Texture uniformity | Patch variance CV < 0.4 and mean var < 50 |
License
MIT
Author
Developed and trained by Shailendra Singh Tiwari
- Downloads last month
- 42