arpit-gour02 commited on
Commit
8a37ee0
·
1 Parent(s): 1a7a5b2

update readme

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -38,6 +38,8 @@ This model is a **ResNet-50** Convolutional Neural Network (CNN) finetuned to cl
38
 
39
  ## Model Details
40
 
 
 
41
  ### Model Description
42
 
43
  This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
@@ -48,7 +50,20 @@ It was trained using **Transfer Learning**, starting with weights pre-trained on
48
  - **Model type:** Computer Vision (Image Classification / CNN)
49
  - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
50
  - **License:** MIT
51
- - **Finetuned from model:** ResNet-50 (ImageNet weights)
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ### Model Sources
54
 
@@ -183,6 +198,15 @@ The model was evaluated on the standard, unseen **RVL-CDIP Test Split** containi
183
  | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
184
  | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
185
 
 
 
 
 
 
 
 
 
 
186
  #### Detailed Performance Analysis (The "Traffic Light" Report)
187
 
188
  An analysis of per-class F1-scores reveals distinct tiers of performance:
 
38
 
39
  ## Model Details
40
 
41
+ ![Model Architecture](aechitecture.png)
42
+
43
  ### Model Description
44
 
45
  This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
 
50
  - **Model type:** Computer Vision (Image Classification / CNN)
51
  - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
52
  - **License:** MIT
53
+
54
+ ## Why ResNet50
55
+
56
+ | Model | Approximate Parameters | Year Released | Layers |
57
+ |------------|------------------------|---------------|--------|
58
+ | VGG16 | 138.4 Million | 2014 | 16 |
59
+ | AlexNet | 61.1 Million | 2012 | 8 |
60
+ | ResNet-50 | 25.6 Million | 2015 | 50 |
61
+
62
+ | Model | FLOPs (Billions) | Efficiency Score |
63
+ |------------|------------------|-----------------------|
64
+ | AlexNet | 0.7 GFLOPs | Low Cost / Low Acc |
65
+ | ResNet-50 | 3.8 GFLOPs | High Efficiency |
66
+ | VGG-16 | 15.5 GFLOPs | Terribly Inefficient |
67
 
68
  ### Model Sources
69
 
 
198
  | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
199
  | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
200
 
201
+ ![Loss and Accuracy Curves](results/loss_and_acc_curve.png)
202
+
203
+ #### Confusion Matrix
204
+ ![Confusion Matrix](results/cm.png)
205
+
206
+ #### Detailed Classificatio report
207
+ ![Detailed Classification report](results/detailed_classification_report.png)
208
+
209
+
210
  #### Detailed Performance Analysis (The "Traffic Light" Report)
211
 
212
  An analysis of per-class F1-scores reveals distinct tiers of performance: