Commit
·
257ae3b
1
Parent(s):
8c37263
Saving local graphs and readme
Browse files- .idea/.gitignore +10 -0
- .idea/document-classification.iml +8 -0
- .idea/inspectionProfiles/profiles_settings.xml +6 -0
- .idea/modules.xml +8 -0
- README.md +1 -25
- main.ipynb +0 -0
- res.ipynb +0 -0
- results.ipynb +0 -0
- results/loss_and_acc_curve.png +2 -2
- test.ipynb +0 -0
.idea/.gitignore
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Default ignored files
|
| 2 |
+
/shelf/
|
| 3 |
+
/workspace.xml
|
| 4 |
+
# Ignored default folder with query files
|
| 5 |
+
/queries/
|
| 6 |
+
# Datasource local storage ignored files
|
| 7 |
+
/dataSources/
|
| 8 |
+
/dataSources.local.xml
|
| 9 |
+
# Editor-based HTTP Client requests
|
| 10 |
+
/httpRequests/
|
.idea/document-classification.iml
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<?xml version="1.0" encoding="UTF-8"?>
|
| 2 |
+
<module type="PYTHON_MODULE" version="4">
|
| 3 |
+
<component name="NewModuleRootManager">
|
| 4 |
+
<content url="file://$MODULE_DIR$" />
|
| 5 |
+
<orderEntry type="inheritedJdk" />
|
| 6 |
+
<orderEntry type="sourceFolder" forTests="false" />
|
| 7 |
+
</component>
|
| 8 |
+
</module>
|
.idea/inspectionProfiles/profiles_settings.xml
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<component name="InspectionProjectProfileManager">
|
| 2 |
+
<settings>
|
| 3 |
+
<option name="USE_PROJECT_PROFILE" value="false" />
|
| 4 |
+
<version value="1.0" />
|
| 5 |
+
</settings>
|
| 6 |
+
</component>
|
.idea/modules.xml
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<?xml version="1.0" encoding="UTF-8"?>
|
| 2 |
+
<project version="4">
|
| 3 |
+
<component name="ProjectModuleManager">
|
| 4 |
+
<modules>
|
| 5 |
+
<module fileurl="file://$PROJECT_DIR$/.idea/document-classification.iml" filepath="$PROJECT_DIR$/.idea/document-classification.iml" />
|
| 6 |
+
</modules>
|
| 7 |
+
</component>
|
| 8 |
+
</project>
|
README.md
CHANGED
|
@@ -38,8 +38,6 @@ This model is a **ResNet-50** Convolutional Neural Network (CNN) finetuned to cl
|
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
| 41 |
-

|
| 42 |
-
|
| 43 |
### Model Description
|
| 44 |
|
| 45 |
This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
|
|
@@ -50,20 +48,7 @@ It was trained using **Transfer Learning**, starting with weights pre-trained on
|
|
| 50 |
- **Model type:** Computer Vision (Image Classification / CNN)
|
| 51 |
- **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
|
| 52 |
- **License:** MIT
|
| 53 |
-
|
| 54 |
-
## Why ResNet50
|
| 55 |
-
|
| 56 |
-
| Model | Approximate Parameters | Year Released | Layers |
|
| 57 |
-
|------------|------------------------|---------------|--------|
|
| 58 |
-
| VGG16 | 138.4 Million | 2014 | 16 |
|
| 59 |
-
| AlexNet | 61.1 Million | 2012 | 8 |
|
| 60 |
-
| ResNet-50 | 25.6 Million | 2015 | 50 |
|
| 61 |
-
|
| 62 |
-
| Model | FLOPs (Billions) | Efficiency Score |
|
| 63 |
-
|------------|------------------|-----------------------|
|
| 64 |
-
| AlexNet | 0.7 GFLOPs | Low Cost / Low Acc |
|
| 65 |
-
| ResNet-50 | 3.8 GFLOPs | High Efficiency |
|
| 66 |
-
| VGG-16 | 15.5 GFLOPs | Terribly Inefficient |
|
| 67 |
|
| 68 |
### Model Sources
|
| 69 |
|
|
@@ -198,15 +183,6 @@ The model was evaluated on the standard, unseen **RVL-CDIP Test Split** containi
|
|
| 198 |
| **Overall Accuracy** | **88.46%** | Solid baseline performance. |
|
| 199 |
| **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
|
| 200 |
|
| 201 |
-

|
| 202 |
-
|
| 203 |
-
#### Confusion Matrix
|
| 204 |
-

|
| 205 |
-
|
| 206 |
-
#### Detailed Classificatio report
|
| 207 |
-

|
| 208 |
-
|
| 209 |
-
|
| 210 |
#### Detailed Performance Analysis (The "Traffic Light" Report)
|
| 211 |
|
| 212 |
An analysis of per-class F1-scores reveals distinct tiers of performance:
|
|
|
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
|
|
|
|
|
|
| 41 |
### Model Description
|
| 42 |
|
| 43 |
This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
|
|
|
|
| 48 |
- **Model type:** Computer Vision (Image Classification / CNN)
|
| 49 |
- **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
|
| 50 |
- **License:** MIT
|
| 51 |
+
- **Finetuned from model:** ResNet-50 (ImageNet weights)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
### Model Sources
|
| 54 |
|
|
|
|
| 183 |
| **Overall Accuracy** | **88.46%** | Solid baseline performance. |
|
| 184 |
| **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
#### Detailed Performance Analysis (The "Traffic Light" Report)
|
| 187 |
|
| 188 |
An analysis of per-class F1-scores reveals distinct tiers of performance:
|
main.ipynb
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
res.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
results.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
results/loss_and_acc_curve.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
test.ipynb
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|