|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: image-classification |
|
|
datasets: |
|
|
- monkt/doctype |
|
|
--- |
|
|
|
|
|
# DocType - Document Image Classification |
|
|
|
|
|
A high-performance MobileNetV3-based document classifier that categorizes document images into 7 distinct types. Optimized for production deployment with ONNX format. |
|
|
|
|
|
## ๐ฏ Model Overview |
|
|
|
|
|
This model classifies document images into the following categories: |
|
|
|
|
|
| Category | Description | |
|
|
|----------|-------------| |
|
|
| **chart** | Charts, graphs, and data visualizations | |
|
|
| **diagram** | Flowcharts, diagrams, and technical drawings | |
|
|
| **document_handwritten** | Handwritten documents and notes | |
|
|
| **document_printed** | Printed text documents | |
|
|
| **map** | Maps and geographic visualizations | |
|
|
| **photo** | Photographs and natural images | |
|
|
| **screenshot** | Screenshots and screen captures | |
|
|
|
|
|
|
|
|
## ๐ Performance |
|
|
|
|
|
### Model Metrics |
|
|
|
|
|
- **Architecture**: MobileNetV3-Large (transfer learning + fine-tuning) |
|
|
- **Input Size**: 320ร320 pixels |
|
|
- **Parameters**: ~5.4M (lightweight and efficient) |
|
|
- **Inference Time**: ~10-30ms on CPU (depending on hardware) |
|
|
|
|
|
### Training Details |
|
|
|
|
|
- **Dataset Size**: 21,000 images (17,500 train / 2,100 val / 1,400 test) |
|
|
- **Training Strategy**: |
|
|
- Phase 1: Transfer learning with frozen base (40 epochs) |
|
|
- Phase 2: Fine-tuning entire model (20 epochs) |
|
|
- **Data Augmentation**: Rotation, shifts, zoom, brightness variation |
|
|
- **Optimizer**: Adam (lr=0.001 โ 1e-5 for fine-tuning) |
|
|
|
|
|
## ๐ฎ Citation |
|
|
|
|
|
If you use this model in your research or project, please cite. |