arpit-gour02
/

document-classification

Image Classification

computer-vision

document-classification

Eval Results (legacy)

Model card Files Files and versions

arpit-gour02 commited on 29 days ago

Commit

8a37ee0

·

1 Parent(s): 1a7a5b2

update readme

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -38,6 +38,8 @@ This model is a **ResNet-50** Convolutional Neural Network (CNN) finetuned to cl
 ## Model Details
 ### Model Description
 This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
@@ -48,7 +50,20 @@ It was trained using **Transfer Learning**, starting with weights pre-trained on
 - **Model type:** Computer Vision (Image Classification / CNN)
 - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
 - **License:** MIT
-- **Finetuned from model:** ResNet-50 (ImageNet weights)
 ### Model Sources
@@ -183,6 +198,15 @@ The model was evaluated on the standard, unseen **RVL-CDIP Test Split** containi
 | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
 | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
 #### Detailed Performance Analysis (The "Traffic Light" Report)
 An analysis of per-class F1-scores reveals distinct tiers of performance:

 ## Model Details
+![Model Architecture](aechitecture.png)
 ### Model Description
 This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
 - **Model type:** Computer Vision (Image Classification / CNN)
 - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
 - **License:** MIT
+## Why ResNet50
+| Model      | Approximate Parameters | Year Released | Layers |
+|------------|------------------------|---------------|--------|
+| VGG16      | 138.4 Million          | 2014          | 16     |
+| AlexNet    | 61.1 Million           | 2012          | 8      |
+| ResNet-50  | 25.6 Million           | 2015          | 50     |
+| Model      | FLOPs (Billions) | Efficiency Score      |
+|------------|------------------|-----------------------|
+| AlexNet    | 0.7 GFLOPs       | Low Cost / Low Acc    |
+| ResNet-50  | 3.8 GFLOPs       | High Efficiency       |
+| VGG-16     | 15.5 GFLOPs      | Terribly Inefficient  |
 ### Model Sources
 | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
 | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
+![Loss and Accuracy Curves](results/loss_and_acc_curve.png)
+#### Confusion Matrix
+![Confusion Matrix](results/cm.png)
+#### Detailed Classificatio report
+![Detailed Classification report](results/detailed_classification_report.png)
 #### Detailed Performance Analysis (The "Traffic Light" Report)
 An analysis of per-class F1-scores reveals distinct tiers of performance: