doctype / README.md
s-emanuilov's picture
Update README.md
8b16a76 verified
---
license: mit
pipeline_tag: image-classification
datasets:
- monkt/doctype
---
# DocType - Document Image Classification
A high-performance MobileNetV3-based document classifier that categorizes document images into 7 distinct types. Optimized for production deployment with ONNX format.
## ๐ŸŽฏ Model Overview
This model classifies document images into the following categories:
| Category | Description |
|----------|-------------|
| **chart** | Charts, graphs, and data visualizations |
| **diagram** | Flowcharts, diagrams, and technical drawings |
| **document_handwritten** | Handwritten documents and notes |
| **document_printed** | Printed text documents |
| **map** | Maps and geographic visualizations |
| **photo** | Photographs and natural images |
| **screenshot** | Screenshots and screen captures |
## ๐Ÿš€ Performance
### Model Metrics
- **Architecture**: MobileNetV3-Large (transfer learning + fine-tuning)
- **Input Size**: 320ร—320 pixels
- **Parameters**: ~5.4M (lightweight and efficient)
- **Inference Time**: ~10-30ms on CPU (depending on hardware)
### Training Details
- **Dataset Size**: 21,000 images (17,500 train / 2,100 val / 1,400 test)
- **Training Strategy**:
- Phase 1: Transfer learning with frozen base (40 epochs)
- Phase 2: Fine-tuning entire model (20 epochs)
- **Data Augmentation**: Rotation, shifts, zoom, brightness variation
- **Optimizer**: Adam (lr=0.001 โ†’ 1e-5 for fine-tuning)
## ๐Ÿ“ฎ Citation
If you use this model in your research or project, please cite.