monkt
/

doctype

Image Classification

Model card Files Files and versions

doctype / README.md

s-emanuilov's picture

Update README.md

8b16a76 verified 3 months ago

|

history blame contribute delete

1.53 kB

	---
	license: mit
	pipeline_tag: image-classification
	datasets:
	- monkt/doctype
	---

	# DocType - Document Image Classification

	A high-performance MobileNetV3-based document classifier that categorizes document images into 7 distinct types. Optimized for production deployment with ONNX format.

	## 🎯 Model Overview

	This model classifies document images into the following categories:

	\| Category \| Description \|
	\|----------\|-------------\|
	\| chart \| Charts, graphs, and data visualizations \|
	\| diagram \| Flowcharts, diagrams, and technical drawings \|
	\| document_handwritten \| Handwritten documents and notes \|
	\| document_printed \| Printed text documents \|
	\| map \| Maps and geographic visualizations \|
	\| photo \| Photographs and natural images \|
	\| screenshot \| Screenshots and screen captures \|


	## 🚀 Performance

	### Model Metrics

	- Architecture: MobileNetV3-Large (transfer learning + fine-tuning)
	- Input Size: 320×320 pixels
	- Parameters: ~5.4M (lightweight and efficient)
	- Inference Time: ~10-30ms on CPU (depending on hardware)

	### Training Details

	- Dataset Size: 21,000 images (17,500 train / 2,100 val / 1,400 test)
	- Training Strategy:
	- Phase 1: Transfer learning with frozen base (40 epochs)
	- Phase 2: Fine-tuning entire model (20 epochs)
	- Data Augmentation: Rotation, shifts, zoom, brightness variation
	- Optimizer: Adam (lr=0.001 → 1e-5 for fine-tuning)

	## 📮 Citation

	If you use this model in your research or project, please cite.