0xgr3y commited on
Commit
6bfa0ae
·
verified ·
1 Parent(s): 44f233b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +194 -7
README.md CHANGED
@@ -2,10 +2,197 @@
2
  license: apache-2.0
3
  pipeline_tag: image-classification
4
  tags:
5
- - TensorFlow,
6
- - feature-extraction,
7
- - densenet121,
8
- - architectural,
9
- - building,
10
- - CNN,
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  pipeline_tag: image-classification
4
  tags:
5
+ - tensorflow
6
+ - keras
7
+ - image-classification
8
+ - densenet121
9
+ - architecture
10
+ - building
11
+ - cnn
12
+ - fgvc
13
+ - transfer-learning
14
+ - gem-pooling
15
+ - swa
16
+ library_name: keras
17
+ language: en
18
+ datasets:
19
+ - Saugani/arch-building-dataset
20
+ widget:
21
+ - structure:
22
+ src: https://cdn-uploads.huggingface.co/production/uploads/66cdac913f233bf2c7b4f590/HzXxNze2jmCkV5KPY_fpQ.png
23
+ example_title: Bridge Classification
24
+ ---
25
+
26
+ # Arch Building Image Classifier
27
+
28
+ <table>
29
+ <tr>
30
+ <td><strong>Architecture</strong></td>
31
+ <td>DenseNet121 + GeM Pooling (p=3.0) + SWA</td>
32
+ </tr>
33
+ <tr>
34
+ <td><strong>Task</strong></td>
35
+ <td>Fine-Grained Visual Categorization (FGVC)</td>
36
+ </tr>
37
+ <tr>
38
+ <td><strong>Test Accuracy</strong></td>
39
+ <td>96.23% (970/1,008)</td>
40
+ </tr>
41
+ <tr>
42
+ <td><strong>Classes</strong></td>
43
+ <td>6 (Bridge, Castle, Mosque, Skyscraper, Stadium, Temple)</td>
44
+ </tr>
45
+ <tr>
46
+ <td><strong>Input Size</strong></td>
47
+ <td>320 x 320 pixels</td>
48
+ </tr>
49
+ <tr>
50
+ <td><strong>Framework</strong></td>
51
+ <td>TensorFlow / Keras 3</td>
52
+ </tr>
53
+ <tr>
54
+ <td><strong>License</strong></td>
55
+ <td><a href="https://www.apache.org/licenses/LICENSE-2.0">Apache-2.0</a></td>
56
+ </tr>
57
+ </table>
58
+
59
+ ## Model Description
60
+
61
+ A fine-grained image classification model for world architectural buildings. Built on DenseNet121 pretrained on ImageNet, enhanced with GeM Pooling (learnable generalized mean pooling), Focal Loss, and Stochastic Weight Averaging (SWA).
62
+
63
+ **Key architectural innovations:**
64
+ - **GeM Pooling (p=3.0)** — replaces global average pooling with learnable power parameter, better for fine-grained discrimination
65
+ - **Focal Loss (gamma=2.0)** — focuses on hard-to-classify building pairs
66
+ - **DiscriminativeAdamW** — per-layer learning rate multipliers for backbone layers
67
+ - **SWA with BN re-estimation** — weight averaging for improved generalization
68
+
69
+ ## Architecture
70
+
71
+ ```
72
+ Input (320, 320, 3)
73
+ |
74
+ DenseNet121 (ImageNet, 8M params)
75
+ |
76
+ Conv2D(256, 3x3, ReLU, padding=same)
77
+ BatchNormalization
78
+ MaxPooling2D(2x2)
79
+ |
80
+ GeM Pooling(p=3.0, eps=1e-6, learnable)
81
+ |
82
+ Dense(256, ReLU)
83
+ BatchNormalization
84
+ Dropout(0.4)
85
+ |
86
+ Dense(6, Softmax)
87
+ |
88
+ Output (6 classes)
89
+ ```
90
+
91
+ ## Performance
92
+
93
+ | Metric | Value |
94
+ |--------|-------|
95
+ | Test Accuracy | **96.23%** |
96
+ | Validation Accuracy (SWA) | **95.93%** |
97
+ | Test-Time Augmentation | **96.33%** |
98
+ | Overfitting Gap | 3.22% |
99
+
100
+ ### Per-Class Results
101
+
102
+ | Class | F1 Score | Recall |
103
+ |-------|----------|--------|
104
+ | Bridge | 95.29% | 96.43% |
105
+ | Castle | 97.92% | 98.21% |
106
+ | Mosque | 95.93% | 98.21% |
107
+ | Skyscraper | 97.95% | 99.40% |
108
+ | Stadium | 94.12% | 90.48% |
109
+ | Temple | 96.07% | 94.64% |
110
+
111
+ ## Training
112
+
113
+ - **Dataset:** 10,080 images (1,680 per class, balanced) from Pexels
114
+ - **Split:** 80/10/10 (train/val/test), seed=42
115
+ - **Phase 1:** Feature extraction, AdamW LR=0.001, CutMix+Mixup
116
+ - **Phase 2:** Selective fine-tuning conv4+conv5, DiscriminativeAdamW
117
+ - **Post-training:** SWA 5 epochs + BN re-estimation
118
+
119
+ ## Files
120
+
121
+ | File | Description |
122
+ |------|-------------|
123
+ | `best_phase2_swa.keras` | Best model — SWA averaged weights (val_acc=95.93%) |
124
+ | `best_phase2.keras` | Phase 2 checkpoint (val_acc=93.35%) |
125
+ | `config.json` | Full model configuration and evaluation metrics |
126
+ | `label_mapping.json` | Class name <-> ID mapping |
127
+ | `preprocessor_config.json` | Input preprocessing specification |
128
+
129
+ ## Usage
130
+
131
+ ### Gradio Demo
132
+
133
+ Try the live demo: [arch-building-classifier Space](https://huggingface.co/spaces/0xgr3y/arch-building-classifier)
134
+
135
+ ### Python
136
+
137
+ ```python
138
+ from huggingface_hub import hf_hub_download
139
+ import tensorflow as tf
140
+ from tensorflow.keras.applications.densenet import preprocess_input
141
+ from tensorflow.keras.layers import Layer
142
+ from tensorflow.keras.optimizers import Optimizer
143
+ from PIL import Image
144
+ import numpy as np
145
+
146
+ # --- Custom layers (must match training definition) ---
147
+ class GeMPooling(Layer):
148
+ def __init__(self, p=3.0, eps=1e-6, **kwargs):
149
+ super().__init__(**kwargs)
150
+ self.p_init = p
151
+ self.eps = eps
152
+ def build(self, input_shape):
153
+ self.p = self.add_weight(name="gem_p", shape=(), dtype=tf.float32,
154
+ initializer=tf.keras.initializers.Constant(self.p_init), trainable=True)
155
+ super().build(input_shape)
156
+ def call(self, x):
157
+ x = tf.clip_by_value(x, self.eps, tf.reduce_max(x))
158
+ x = tf.pow(x, self.p)
159
+ x = tf.reduce_mean(x, axis=[1, 2], keepdims=False)
160
+ return tf.pow(x, 1.0 / self.p)
161
+ def get_config(self):
162
+ return {**super().get_config(), "p": self.p_init, "eps": self.eps}
163
+
164
+ # ... (FocalLoss, DiscriminativeAdamW definitions) ...
165
+
166
+ custom_objects = {"GeMPooling": GeMPooling, "FocalLoss": FocalLoss, "DiscriminativeAdamW": DiscriminativeAdamW}
167
+ model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "best_phase2_swa.keras")
168
+ model = tf.keras.models.load_model(model_path, custom_objects=custom_objects, compile=False)
169
+
170
+ img = Image.open("building.jpg").convert("RGB").resize((320, 320))
171
+ arr = np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0)
172
+ preds = model.predict(arr, verbose=0)[0]
173
+ print(f"Predicted: {LABELS[np.argmax(preds)]} ({np.max(preds)*100:.1f}%)")
174
+ ```
175
+
176
+ ## Intended Use
177
+
178
+ - Architectural style classification from building photographs
179
+ - Educational tool for architecture recognition
180
+ - Research baseline for fine-grained visual categorization
181
+
182
+ ## Limitations
183
+
184
+ - Trained on Pexels stock photography — may perform differently on user-generated photos
185
+ - Limited to 6 architectural classes
186
+ - Temple class has highest confusion rate (often misclassified as Mosque)
187
+
188
+ ## Citation
189
+
190
+ ```bibtex
191
+ @misc{saugani2024_arch_building,
192
+ title={Architecture Building Image Classifier: FGVC with DenseNet121 + GeM Pooling + SWA},
193
+ author={Saugani},
194
+ year={2024},
195
+ publisher={Hugging Face},
196
+ url={https://huggingface.co/0xgr3y/Arch-Building-Image-Classification}
197
+ }
198
+ ```