arpit-gour02 commited on
Commit
257ae3b
·
1 Parent(s): 8c37263

Saving local graphs and readme

Browse files
.idea/.gitignore ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Default ignored files
2
+ /shelf/
3
+ /workspace.xml
4
+ # Ignored default folder with query files
5
+ /queries/
6
+ # Datasource local storage ignored files
7
+ /dataSources/
8
+ /dataSources.local.xml
9
+ # Editor-based HTTP Client requests
10
+ /httpRequests/
.idea/document-classification.iml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <module type="PYTHON_MODULE" version="4">
3
+ <component name="NewModuleRootManager">
4
+ <content url="file://$MODULE_DIR$" />
5
+ <orderEntry type="inheritedJdk" />
6
+ <orderEntry type="sourceFolder" forTests="false" />
7
+ </component>
8
+ </module>
.idea/inspectionProfiles/profiles_settings.xml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ <component name="InspectionProjectProfileManager">
2
+ <settings>
3
+ <option name="USE_PROJECT_PROFILE" value="false" />
4
+ <version value="1.0" />
5
+ </settings>
6
+ </component>
.idea/modules.xml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <project version="4">
3
+ <component name="ProjectModuleManager">
4
+ <modules>
5
+ <module fileurl="file://$PROJECT_DIR$/.idea/document-classification.iml" filepath="$PROJECT_DIR$/.idea/document-classification.iml" />
6
+ </modules>
7
+ </component>
8
+ </project>
README.md CHANGED
@@ -38,8 +38,6 @@ This model is a **ResNet-50** Convolutional Neural Network (CNN) finetuned to cl
38
 
39
  ## Model Details
40
 
41
- ![Model Architecture](aechitecture.png)
42
-
43
  ### Model Description
44
 
45
  This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
@@ -50,20 +48,7 @@ It was trained using **Transfer Learning**, starting with weights pre-trained on
50
  - **Model type:** Computer Vision (Image Classification / CNN)
51
  - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
52
  - **License:** MIT
53
-
54
- ## Why ResNet50
55
-
56
- | Model | Approximate Parameters | Year Released | Layers |
57
- |------------|------------------------|---------------|--------|
58
- | VGG16 | 138.4 Million | 2014 | 16 |
59
- | AlexNet | 61.1 Million | 2012 | 8 |
60
- | ResNet-50 | 25.6 Million | 2015 | 50 |
61
-
62
- | Model | FLOPs (Billions) | Efficiency Score |
63
- |------------|------------------|-----------------------|
64
- | AlexNet | 0.7 GFLOPs | Low Cost / Low Acc |
65
- | ResNet-50 | 3.8 GFLOPs | High Efficiency |
66
- | VGG-16 | 15.5 GFLOPs | Terribly Inefficient |
67
 
68
  ### Model Sources
69
 
@@ -198,15 +183,6 @@ The model was evaluated on the standard, unseen **RVL-CDIP Test Split** containi
198
  | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
199
  | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
200
 
201
- ![Loss and Accuracy Curves](results/loss_and_acc_curve.png)
202
-
203
- #### Confusion Matrix
204
- ![Confusion Matrix](results/cm.png)
205
-
206
- #### Detailed Classificatio report
207
- ![Detailed Classification report](results/detailed_classification_report.png)
208
-
209
-
210
  #### Detailed Performance Analysis (The "Traffic Light" Report)
211
 
212
  An analysis of per-class F1-scores reveals distinct tiers of performance:
 
38
 
39
  ## Model Details
40
 
 
 
41
  ### Model Description
42
 
43
  This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
 
48
  - **Model type:** Computer Vision (Image Classification / CNN)
49
  - **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
50
  - **License:** MIT
51
+ - **Finetuned from model:** ResNet-50 (ImageNet weights)
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ### Model Sources
54
 
 
183
  | **Overall Accuracy** | **88.46%** | Solid baseline performance. |
184
  | **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
185
 
 
 
 
 
 
 
 
 
 
186
  #### Detailed Performance Analysis (The "Traffic Light" Report)
187
 
188
  An analysis of per-class F1-scores reveals distinct tiers of performance:
main.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
res.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
results.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
results/loss_and_acc_curve.png CHANGED

Git LFS Details

  • SHA256: 4984e1dc4b6048fc5328a18c94c2ac3db5b244538178b443e25fbf32158f06b3
  • Pointer size: 130 Bytes
  • Size of remote file: 72.9 kB

Git LFS Details

  • SHA256: cd82f3722791d2fc18c1f4fee3d0c7f5ea8a349ba26ac89ae8099ebea7aba8fe
  • Pointer size: 131 Bytes
  • Size of remote file: 262 kB
test.ipynb ADDED
The diff for this file is too large to render. See raw diff