Commit
·
8a37ee0
1
Parent(s):
1a7a5b2
update readme
Browse files
README.md
CHANGED
|
@@ -38,6 +38,8 @@ This model is a **ResNet-50** Convolutional Neural Network (CNN) finetuned to cl
|
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
|
|
|
|
|
|
| 41 |
### Model Description
|
| 42 |
|
| 43 |
This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
|
|
@@ -48,7 +50,20 @@ It was trained using **Transfer Learning**, starting with weights pre-trained on
|
|
| 48 |
- **Model type:** Computer Vision (Image Classification / CNN)
|
| 49 |
- **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
|
| 50 |
- **License:** MIT
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
### Model Sources
|
| 54 |
|
|
@@ -183,6 +198,15 @@ The model was evaluated on the standard, unseen **RVL-CDIP Test Split** containi
|
|
| 183 |
| **Overall Accuracy** | **88.46%** | Solid baseline performance. |
|
| 184 |
| **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
|
| 185 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
#### Detailed Performance Analysis (The "Traffic Light" Report)
|
| 187 |
|
| 188 |
An analysis of per-class F1-scores reveals distinct tiers of performance:
|
|
|
|
| 38 |
|
| 39 |
## Model Details
|
| 40 |
|
| 41 |
+

|
| 42 |
+
|
| 43 |
### Model Description
|
| 44 |
|
| 45 |
This model utilizes the standard ResNet-50 architecture designed for image classification. Instead of "reading" the text like an OCR system, it analyzes the visual layout, structure, and low-level texture features of a whole document page to determine its category (e.g., recognizing the block layout of a resume versus the dense, two-column text of a scientific report).
|
|
|
|
| 50 |
- **Model type:** Computer Vision (Image Classification / CNN)
|
| 51 |
- **Language(s) (NLP):** English (Implicitly, via the text present in the RVL-CDIP dataset images)
|
| 52 |
- **License:** MIT
|
| 53 |
+
|
| 54 |
+
## Why ResNet50
|
| 55 |
+
|
| 56 |
+
| Model | Approximate Parameters | Year Released | Layers |
|
| 57 |
+
|------------|------------------------|---------------|--------|
|
| 58 |
+
| VGG16 | 138.4 Million | 2014 | 16 |
|
| 59 |
+
| AlexNet | 61.1 Million | 2012 | 8 |
|
| 60 |
+
| ResNet-50 | 25.6 Million | 2015 | 50 |
|
| 61 |
+
|
| 62 |
+
| Model | FLOPs (Billions) | Efficiency Score |
|
| 63 |
+
|------------|------------------|-----------------------|
|
| 64 |
+
| AlexNet | 0.7 GFLOPs | Low Cost / Low Acc |
|
| 65 |
+
| ResNet-50 | 3.8 GFLOPs | High Efficiency |
|
| 66 |
+
| VGG-16 | 15.5 GFLOPs | Terribly Inefficient |
|
| 67 |
|
| 68 |
### Model Sources
|
| 69 |
|
|
|
|
| 198 |
| **Overall Accuracy** | **88.46%** | Solid baseline performance. |
|
| 199 |
| **Top-3 Accuracy** | **95.62%** | Excellent reliability for triage tasks. |
|
| 200 |
|
| 201 |
+

|
| 202 |
+
|
| 203 |
+
#### Confusion Matrix
|
| 204 |
+

|
| 205 |
+
|
| 206 |
+
#### Detailed Classificatio report
|
| 207 |
+

|
| 208 |
+
|
| 209 |
+
|
| 210 |
#### Detailed Performance Analysis (The "Traffic Light" Report)
|
| 211 |
|
| 212 |
An analysis of per-class F1-scores reveals distinct tiers of performance:
|