Image Classification
Keras
LiteRT
TF-Keras
Safetensors
English
efficientnetv2-s
efficientnetv2
fgic
transfer-learning
gem-pooling
focal-loss
swa
grad-cam
calibration
temperature-scaling
computer-vision
tensorflow.js
Eval Results (legacy)
Instructions to use 0xgr3y/Arch-Building-Image-Classification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use 0xgr3y/Arch-Building-Image-Classification with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://0xgr3y/Arch-Building-Image-Classification") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: image-classification | |
| tags: | |
| - efficientnetv2 | |
| - fgic | |
| - safetensors | |
| - transfer-learning | |
| - gem-pooling | |
| - focal-loss | |
| - swa | |
| - grad-cam | |
| - calibration | |
| - temperature-scaling | |
| - computer-vision | |
| - tensorflow.js | |
| library_name: keras | |
| language: en | |
| datasets: | |
| - 0xgr3y/arch-building-dataset | |
| model-index: | |
| - name: Architectural Building Image Classifier | |
| results: | |
| - task: | |
| type: image-classification | |
| name: Fine-Grained Image Classification | |
| dataset: | |
| type: imagefolder | |
| name: arch-building-dataset | |
| split: test | |
| metrics: | |
| - type: accuracy | |
| value: 0.9777 | |
| name: Test Accuracy | |
| - type: accuracy | |
| value: 0.9836 | |
| name: Validation Accuracy (SWA) | |
| - type: accuracy | |
| value: 0.9799 | |
| name: TTA Accuracy | |
| - type: f1 | |
| value: 0.9777 | |
| name: Macro F1 | |
| - type: precision | |
| value: 0.9777 | |
| name: Macro Precision | |
| - type: recall | |
| value: 0.9777 | |
| name: Macro Recall | |
| - type: roc_auc | |
| value: 0.9985 | |
| name: Macro ROC-AUC (OvR) | |
|  | |
| # Fine-Grained Image Classification of World Architecture: An EfficientNetV2-S Transfer Learning Approach with Layered Regularization | |
| ### Architectural Building Image Classifier | |
| Fine-Grained Image Classification (FGIC) of world architectural buildings using CNN transfer learning with EfficientNetV2-S, enhanced with GeM Pooling, Focal Loss, Discriminative AdamW (LR), Stochastic Weight Averaging (SWA), Grad-CAM explainability, and calibration analysis. | |
| <table> | |
| <tr><td><strong>Architecture</strong></td><td>EfficientNetV2-S + GeM Pooling + Focal Loss + SWA</td></tr> | |
| <tr><td><strong>Task</strong></td><td>Fine-Grained Image Classification (FGIC)</td></tr> | |
| <tr><td><strong>Test Accuracy</strong></td><td>97.77%</td></tr> | |
| <tr><td><strong>Classes</strong></td><td>8 (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill)</td></tr> | |
| <tr><td><strong>Input Size</strong></td><td>320 Γ 320 pixels</td></tr> | |
| <tr><td><strong>Parameters</strong></td><td>23,350,633</td></tr> | |
| <tr><td><strong>Framework</strong></td><td>TensorFlow / Keras 3</td></tr> | |
| <tr><td><strong>License</strong></td><td><a href="https://www.apache.org/licenses/LICENSE-2.0">Apache-2.0</a></td></tr> | |
| </table> | |
| ## Model Description | |
| A fine-grained image classification model for world architectural buildings. Built on EfficientNetV2-S pretrained on ImageNet, enhanced with GeM Pooling (learnable generalized mean pooling), Focal Loss, Discriminative AdamW and Stochastic Weight Averaging (SWA). Extended with Grad-CAM explainability visualization, ROC-AUC evaluation, ECE calibration analysis, and t-SNE embedding visualization. | |
| **Key architectural contributions:** | |
| - **GeM Pooling** (Radenovic et al., CVPR 2018) β replaces global average pooling with a learnable power parameter (p=3.0) that emphasizes high-activation features, yielding stronger discriminative representations for FGIC tasks | |
| - **Focal Loss** (Lin et al., ICCV 2017, gamma=2.0) β down-weights well-classified examples to focus gradient updates on hard-to-classify building pairs | |
| - **DiscriminativeAdamW LR** β extends AdamW with per-variable LR scaling on block6 (Γ0.1) via (update_step) override, combined with selective fine-tuning (block6+top_conv unfrozen, BN frozen). LR scaling produces truly discriminative updates β block6 variables receive 10Γ smaller learning rate than head variables (117 total: 105 block6 + 12 head) | |
| - **Mixup + CutMix** (Zhang et al., ICLR 2018. Yun et al., ICCV 2019) β alternating per-batch (50/50): Mixup (alpha=0.2, linear interpolation) and CutMix (alpha=1.0, spatial patch). Applied only in Phase 1 training to regularize head learning | |
| - **Selective Unfreeze** (Yosinski et al., 2014) β Phase 2 unfreezes block6+top_conv layers (180/513 EfficientNetV2-S layers) while keeping BatchNormalization frozen to preserve pretrained statistics | |
| - **SWA with BN re-estimation** (Izmailov et al., UAI 2018) β 10-epoch post-training weight averaging with constant LR 1e-4, followed by 100-step batch normalization statistics re-estimation (3,200 images) | |
| - **Test-Time Augmentation** β 6 variations averaged at inference: original, horizontal flip, center crop 85%, center crop 70%, corner crop top-left 80%, corner crop bottom-right 80%. Yields +0.22% accuracy improvement (97.77% β 97.99%) | |
| - **Grad-CAM** (Selvaraju et al., ICCV 2017) β gradient-weighted class activation mapping for explainability, targeting *top_conv* (last Conv2D layer of EfficientNetV2-S) | |
| - **ECE Calibration** (Guo et al., ICML 2017) β Expected Calibration Error with 15-bin reliability diagram to assess prediction confidence reliability | |
| - **Temperature Scaling** (Guo et al., ICML 2017) β post-hoc calibration via scalar temperature parameter T optimized on validation set (NLL minimization). T=0.54 reduces ECE from 12.04% (underconfident due to Label Smoothing) to 0.53% β applied at inference via (softmax(log(probs) / T)) trick | |
| ## Architecture | |
| ``` | |
| Input (320, 320, 3) | |
| β | |
| EfficientNetV2-S (ImageNet pretrained, 513 layers, 20.33M params) | |
| β | |
| Conv2D(256, 3Γ3, ReLU, padding=same) β 2,949,376 params | |
| BatchNormalization β 1,024 params | |
| MaxPooling2D(2Γ2) β 0 params | |
| β | |
| GeM Pooling(p=3.0, eps=1e-6, learnable) β 1 param | |
| β | |
| Dense(256, ReLU) β 65,792 params | |
| BatchNormalization β 1,024 params | |
| Dropout(0.4) β 0 params | |
| β | |
| Dense(8, Softmax) β 2,056 params | |
| β | |
| Output (8 classes) | |
| ``` | |
| | Component | Output Shape | Parameters | | |
| |-----------|-------------|------------| | |
| | EfficientNetV2-S (Functional) | (None, 10, 10, 1280) | 20,331,360 | | |
| | Conv2D 256 3Γ3 | (None, 10, 10, 256) | 2,949,376 | | |
| | BatchNormalization | (None, 10, 10, 256) | 1,024 | | |
| | MaxPooling2D 2Γ2 | (None, 5, 5, 256) | 0 | | |
| | GeM Pooling p=3.0 | (None, 256) | 1 | | |
| | Dense 256 ReLU | (None, 256) | 65,792 | | |
| | BatchNormalization | (None, 256) | 1,024 | | |
| | Dropout 0.4 | (None, 256) | 0 | | |
| | Dense 8 Softmax | (None, 8) | 2,056 | | |
| | **Total** | | **23,350,633** | | |
| | Trainable (Phase 1) | | **3,018,249** (11.51 MB) | | |
| | Trainable (Phase 2) | | **17,810,225** (67.94 MB) | | |
| | Non-trainable (Phase 1) | | **20,332,384** (77.56 MB) | | |
| ## Performance | |
| ### Overall Metrics | |
| | Metric | Value | | |
| |--------|-------| | |
| | Test Accuracy | 97.77% | | |
| | Validation Accuracy (SWA) | 98.36% | | |
| | Test-Time Augmentation | 97.99% | | |
| | Test Loss | 0.4262 | | |
| | Overfitting Gap (Train β Test) | 2.11% | | |
| | Macro Avg Precision | 0.9777 | | |
| | Macro Avg Recall | 0.9777 | | |
| | Macro Avg F1-Score | 0.9777 | | |
| | Top-2 Accuracy | 99.26% | | |
| | Top-3 Accuracy | 99.70% | | |
| | Macro ROC-AUC (OvR) | 0.9985 | | |
| | ECE (15 bins) | 0.1204 (pre-T-scaling. post-T-scaling: 0.0053, T=0.54) | | |
| ### Per-Class Results | |
| | Class | Precision | Recall | F1-Score | AUC (OvR) | Support | | |
| |-------|-----------|--------|----------|-----------|---------| | |
| | barn | 0.9760 | 0.9702 | 0.9731 | 0.9950 | 168 | | |
| | bridge | 0.9591 | 0.9762 | 0.9676 | 0.9983 | 168 | | |
| | castle | 0.9763 | 0.9821 | 0.9792 | 0.9996 | 168 | | |
| | mosque | 0.9763 | 0.9821 | 0.9792 | 0.9987 | 168 | | |
| | skyscraper | 0.9940 | 0.9940 | 0.9940 | 0.9999 | 168 | | |
| | stadium | 0.9820 | 0.9762 | 0.9791 | 0.9999 | 168 | | |
| | temple | 0.9816 | 0.9524 | 0.9668 | 0.9976 | 168 | | |
| | windmill | 0.9765 | 0.9881 | 0.9822 | 0.9987 | 168 | | |
| | **Macro Avg** | **0.9777** | **0.9777** | **0.9777** | **0.9985** | **1,344** | | |
| ### Model Selection | |
| Four candidate models were evaluated on the validation set: | |
| | Checkpoint | Val Accuracy | Val Loss | Description | | |
| |------------|-------------|----------|-------------| | |
| | `head_training.keras` | 92.34% | 1.0109 | Phase 1 checkpoint (backbone frozen) | | |
| | `fine_tuning.keras` | 96.28% | 0.5655 | Phase 2 checkpoint (block6+top_conv unfrozen) | | |
| | `fine_tuning_ema.keras` | 93.53% | 0.6007 | Phase 2 EMA (per-step Polyak averaging) | | |
| | **`fine_tuning_swa.keras`** | **98.36%** | **0.4109** | **SWA averaged weights β SELECTED** | | |
| ### Training Progression | |
| | Phase | Epoch | Train Acc | Val Accuracy | Val Loss | | |
| |-------|-------|-----------|-------------|----------| | |
| | Phase 1 (Head Training) | 1 | 56.96% | 92.19% | 1.0079 | | |
| | Phase 2 (Selective Fine-Tuning) | 1 | 84.96% | 96.21% | 0.5656 | | |
| | SWA | 1 | 90.83% | 95.76% | 0.5831 | | |
| | SWA | 2 | 94.07% | 97.62% | 0.5116 | | |
| | SWA | 3 | 95.36% | 97.69% | 0.4748 | | |
| | SWA | 4 | 96.56% | 96.95% | 0.4390 | | |
| | SWA | 5 | 97.18% | 97.47% | 0.4490 | | |
| | SWA | 6 | 97.76% | 97.84% | 0.4416 | | |
| | SWA | 7 | 97.91% | 98.14% | 0.4055 | | |
| | SWA | 8 | 98.19% | 97.32% | 0.4359 | | |
| | SWA | 9 | 98.14% | 97.02% | 0.4519 | | |
| | SWA | 10 | 98.59% | 97.54% | 0.4226 | | |
| | **SWA + BN (final)** | β | β | **98.36%** | **0.4109** | | |
| > Phase 1 and Phase 2 each stopped after 1 epoch via `myCallback` (custom early stopping at target accuracy: 85% Phase 1, 92% Phase 2). SWA ran 10 epochs with constant LR 1e-4, followed by BN re-estimation (100 steps, 3,200 images). Values shown are training-time metrics from progress bar. checkpoint evaluation values may differ slightly (see Model Selection table above). | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| ## Training Details | |
| ### Training Strategy | |
| Two-phase progressive training with SWA post-processing: | |
| | Phase | Description | Backbone | Optimizer | LR | Max Epochs | Actual Epochs | CutMix+Mixup | FocalLoss LS | | |
| |-------|-------------|----------|-----------|-----|-----------|---------------|---------------|-------------| | |
| | **Phase 1** β Feature Extraction | Train custom head only | Frozen (all) | AdamW (wd=2e-5) | 0.001 + CosineDecay + Warmup 3ep | 25 | 1 | Yes (50/50 alternation) | 0.1 | | |
| | **Phase 2** β Selective Fine-Tuning | Load head_training β fine-tune | block6 + top_conv unfrozen (BN frozen) | DiscriminativeAdamW (block6=0.1Γ) | 3e-4 + CosineDecay + Warmup 5ep | 50 | 1 + 10 SWA | No | 0.05 | | |
| > ΒΉ Phase 1 stops when `val_accuracy β₯ 85%` threshold (myCallback). | |
| > Β² Phase 2 stops when `val_accuracy β₯ 92%` threshold (myCallback), followed by 10 SWA epochs (constant LR 1e-4). | |
| ### Hyperparameters | |
| | Parameter | Phase 1 | Phase 2 | | |
| |-----------|---------|---------| | |
| | Optimizer | AdamW | DiscriminativeAdamW | | |
| | Learning Rate | 0.001 | 3Γ10β»β΄ | | |
| | LR Schedule | WarmupCosineDecay (warmup=3) | WarmupCosineDecay (warmup=5) | | |
| | Weight Decay | 2Γ10β»β΅ | 2Γ10β»β΅ | | |
| | LR Multiplier (block6) | β | 0.1Γ (LR scaling via update_step, truly discriminative) | | |
| | LR Multiplier (top_conv+head) | β | 1.0Γ | | |
| | Loss | FocalLoss (gamma=2.0, LS=0.1) | FocalLoss (gamma=2.0, LS=0.05) | | |
| | Batch Size | 32 | 32 | | |
| | Early Stopping Patience | 7 | 12 | | |
| | myCallback Threshold | val_acc β₯ 0.85 | val_acc β₯ 0.92 | | |
| | EMA Decay (per-step) | 0.999 | 0.999 | | |
| | SWA Epochs | β | 10 (post-training) | | |
| | SWA LR | β | 1Γ10β»β΄ (constant) | | |
| | BN Re-estimation Steps | β | 100 | | |
| | CutMix (alpha=1.0) | Yes (50% batches) | No | | |
| | Mixup (alpha=0.2) | Yes (50% batches) | No | | |
| | Hardware | 2Γ Tesla T4 (MirroredStrategy) | 2Γ Tesla T4 (MirroredStrategy) | | |
| ### Regularization Strategy | |
| | Technique | Implementation | Reference | | |
| |-----------|---------------|-----------| | |
| | Transfer Learning | EfficientNetV2-S backbone frozen in Phase 1 | Yosinski et al., NeurIPS 2014 | | |
| | Selective Fine-Tuning | Unfreeze block6+top_conv only, BN stays frozen | Howard & Ruder, ACL 2018 | | |
| | Discriminative LR Scaling | block6 LRΓ0.1 via update_step (truly discriminative β 10Γ smaller updates for pretrained features) | Howard & Ruder, ACL 2018 | | |
| | CutMix + Mixup | Alternation per batch (50/50), Phase 1 only | Yun et al., ICCV 2019. Zhang et al., ICLR 2018 | | |
| | Focal Loss | gamma=2.0, down-weights easy examples | Lin et al., ICCV 2017 | | |
| | Label Smoothing | 0.1 (Phase 1) β 0.05 (Phase 2) | Szegedy et al., CVPR 2016 | | |
| | GeM Pooling | p=3.0 learnable, replaces GAP | Radenovic et al., CVPR 2018 | | |
| | Dropout | 0.4 after Dense(256)+BN | Srivastava et al., JMLR 2014 | | |
| | Batch Normalization | After Conv2D and Dense. frozen during fine-tuning | Ioffe & Szegedy, arXiv 2015 | | |
| | EMA (per-step) | Shadow weights, decay=0.999, Polyak averaging | Tarvainen & Valpola, NeurIPS 2017 | | |
| | SWA | 10-epoch post-training, constant LR 1e-4 | Izmailov et al., UAI 2018 | | |
| | Data Augmentation | Rotation Β±15Β°, shift Β±10%, shear Β±0.1 rad, zoom Β±20%, brightness 0.75β1.15, channel shift Β±10.0, horizontal flip | Perez & Wang, arXiv 2017 | | |
| | Random Erasing | p=0.5, area [0.02β0.15], aspect [0.3β3.3], applied pre-normalization | Zhong et al., AAAI 2020 | | |
| | Test-Time Augmentation | 6 augmentation variants, averaged | Shanmugam et al., ICML 2020 | | |
| | WarmupCosineDecay | Linear warmup + cosine annealing | Loshchilov & Hutter, ICLR 2017 (SGDR) | | |
| | Early Stopping | Patience 7 (Phase 1) / 12 (Phase 2) | Prechelt, Neural Networks 1998 | | |
| ### Dataset | |
| See the dataset curation page for [World Architectural Buildings Dataset for MultiβClass Image Classification](https://huggingface.co/datasets/0xgr3y/arch-building-dataset) β 13,440 images (8 classes Γ 1,680, balanced) sourced from Pexels with perceptual (pHash) and exact (SHA256) deduplication. | |
| | Split | Images | Percentage | | |
| |-------|--------|------------| | |
| | Train | 10,752 | 80% | | |
| | Validation | 1,344 | 10% | | |
| | Test | 1,344 | 10% | | |
| ### Data Preprocessing | |
| - **Normalization:** `preprocess_input` from `tf.keras.applications.efficientnet_v2` (ImageNet distribution) | |
| - **Input resolution:** 320Γ320 (higher than ImageNet default 224Γ224 to capture fine-grained architectural details β textures, ornaments, facade patterns) | |
| - **Augmentation:** Applied to training set only. validation and test sets use clean preprocessing | |
| - **Split method:** `splitfolders.ratio` from `dataset/`, seed=42 | |
| ## Files | |
| | Category | Files | | |
| |----------|-------| | |
| | **Model (best)** | `fine_tuning_swa.keras` (227 MB) Β· `.weights.h5` (158 MB) Β· `.safetensors` (157 MB) | | |
| | **Code** | `build_model.py` (21 KB) β architecture + CLI inference | | |
| | **Config** | `config.json` Β· `label_mapping.json` Β· `preprocessor_config.json` | | |
| | **Evaluation** | `calibration_data.json` Β· `model_benchmark.json` Β· `confusion_pairs.json` Β· `class_confidence_stats.json` Β· `temperature_config.json` | | |
| | **Deployment** | `saved_model/` (183 MB) Β· `tflite/` (88 MB) Β· `tfjs_model/` (90 MB, 23 shards) | | |
| | **Results** | `results/` β 12 PNG (augmentation, reliability-diagram, training curves, confusion matrix, ROC, t-SNE, Grad-CAM, etc.) | | |
| | **Archive** | `models_keras/` β 3 checkpoints (head_training, fine_tuning, fine_tuning_ema) | | |
| ## Usage | |
| ### Gradio Space | |
| Try the live building classify: [Architecture Building Image Classifier with Space](https://huggingface.co/spaces/0xgr3y/arch-building-classifier) | |
| ### Python β build_model.py (recommended) | |
| `build_model.py` is a standalone module that provides: | |
| - **Custom class definitions** (`GeMPooling`, `FocalLoss`, `DiscriminativeAdamW`) with `@register_keras_serializable` β importing the module registers all custom classes globally, so `load_model()` works without explicit `custom_objects`. | |
| - **`ArchBuildingClassifier`** β high-level wrapper class with `build()`, `from_weights()`, `from_keras()`, `predict()`, `predict_batch()` methods. | |
| - **`CUSTOM_OBJECTS`** dict β fallback for explicit `custom_objects=` in `load_model()`. | |
| - **`build_model()`** β backward-compatible function that returns a raw `tf.keras.Model`. | |
| Upload `build_model.py` to the same directory as your script or add it to `PYTHONPATH`. | |
| > **Note:** Filenames below use `fine_tuning_swa` as an example. The actual best checkpoint filename depends on training results β check the repo for the actual `.keras`, `.weights.h5`, and `.safetensors` filenames. | |
| ```python | |
| from build_model import ArchBuildingClassifier | |
| from huggingface_hub import hf_hub_download | |
| # Download weights (clean format) | |
| weights_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.weights.h5") | |
| # Load model: architecture + weights | |
| clf = ArchBuildingClassifier.from_weights(weights_path) | |
| # Inference | |
| from PIL import Image | |
| import numpy as np | |
| label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg")) | |
| print(f"Predicted: {label} ({confidence:.1%})") | |
| for cls, prob in top3: | |
| print(f" {cls}: {prob:.1%}") | |
| ``` | |
| ### Python β TF-Lite (fastest inference) | |
| ```python | |
| import numpy as np | |
| import tensorflow as tf | |
| from huggingface_hub import hf_hub_download | |
| from PIL import Image | |
| import json | |
| try: | |
| from tensorflow.keras.applications.efficientnet_v2 import preprocess_input | |
| except (ImportError, ModuleNotFoundError): | |
| from tensorflow.keras.applications.efficientnet import preprocess_input | |
| # Download | |
| model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "tflite/model.tflite") | |
| labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json") | |
| with open(labels_path) as f: | |
| LABELS = json.load(f)["labels"] | |
| interpreter = tf.lite.Interpreter(model_path=model_path) | |
| interpreter.allocate_tensors() | |
| input_details = interpreter.get_input_details() | |
| output_details = interpreter.get_output_details() | |
| img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) | |
| arr = np.expand_dims(preprocess_input( | |
| np.array(img, dtype=np.float32)), axis=0) | |
| interpreter.set_tensor(input_details[0]["index"], arr) | |
| interpreter.invoke() | |
| preds = interpreter.get_tensor(output_details[0]["index"])[0] | |
| top3_idx = np.argsort(preds)[::-1][:3] | |
| for i in top3_idx: | |
| print(f" {LABELS[i]}: {preds[i]*100:.1f}%") | |
| ``` | |
| ### Python β Keras (convenient) | |
| ```python | |
| import build_model # registers custom classes via @register_keras_serializable | |
| import tensorflow as tf | |
| from huggingface_hub import hf_hub_download | |
| try: | |
| from tensorflow.keras.applications.efficientnet_v2 import preprocess_input | |
| except (ImportError, ModuleNotFoundError): | |
| from tensorflow.keras.applications.efficientnet import preprocess_input | |
| from PIL import Image | |
| import numpy as np | |
| import json | |
| model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.keras") | |
| labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json") | |
| model = tf.keras.models.load_model(model_path, compile=False) # custom_objects not needed | |
| with open(labels_path) as f: | |
| LABELS = json.load(f)["labels"] | |
| img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) | |
| arr = np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0) | |
| preds = model.predict(arr, verbose=0)[0] | |
| print(f"Predicted: {LABELS[np.argmax(preds)]} ({np.max(preds)*100:.1f}%)") | |
| ``` | |
| ### Python β SavedModel (TF Serving) | |
| ```python | |
| from huggingface_hub import snapshot_download | |
| import tensorflow as tf | |
| import numpy as np | |
| from PIL import Image | |
| try: | |
| from tensorflow.keras.applications.efficientnet_v2 import preprocess_input | |
| except (ImportError, ModuleNotFoundError): | |
| from tensorflow.keras.applications.efficientnet import preprocess_input | |
| snapshot_download("0xgr3y/Arch-Building-Image-Classification", allow_patterns=["saved_model/*"], local_dir=".") | |
| # Load SavedModel (created via model.export() β inference-only, no custom_objects needed) | |
| loaded = tf.saved_model.load("saved_model") | |
| img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) | |
| arr = tf.constant(np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0)) | |
| preds = loaded(arr).numpy()[0] | |
| top3_idx = np.argsort(preds)[::-1][:3] | |
| for i in top3_idx: | |
| print(f" Class {i}: {preds[i]*100:.1f}%") | |
| ``` | |
| ### Python β safetensors (HF standard, cross-framework) | |
| > **Note:** safetensors stores raw weight tensors without architecture metadata. To load, reconstruct the architecture with `build_model.py` first, then map tensors manually. For most use cases, `.weights.h5` (via `ArchBuildingClassifier.from_weights()`) is simpler and equally clean. | |
| ```python | |
| from safetensors.numpy import load_file | |
| from build_model import ArchBuildingClassifier | |
| from PIL import Image | |
| # Reconstruct architecture | |
| clf = ArchBuildingClassifier.build() | |
| # Load safetensors tensors | |
| tensors = load_file("fine_tuning_swa.safetensors") | |
| # Map tensors to model weights (iterate layers, not .variables β Keras 3 compatible) | |
| for layer in clf.keras_model.layers: | |
| for w in layer.weights: | |
| name = w.name.replace(':', '_').replace('/', '_') | |
| if name in tensors: | |
| w.assign(tensors[name]) | |
| # Inference | |
| label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg")) | |
| ``` | |
| ## Inference Verification | |
| Keras vs TFLite consistency was verified on 8 random test samples (1 per class): | |
| | Metric | Result | | |
| |--------|--------| | |
| | Keras correct | 7/8 (88%) | | |
| | TFLite correct | 7/8 (88%) | | |
| | Keras vs TFLite match | **8/8 (100%)** β identical predictions | | |
| | Keras inference speed | 358.0 ms | | |
| | TFLite inference speed | 170.0 ms | | |
| > The 1 misclassification (castleβbarn, 65% confidence) is consistent with the 97.77% test accuracy. The 8/8 match confirms TFLite conversion preserves model behavior exactly. | |
|  | |
| ## Security Notice (PAIT-KERAS-301) | |
| The `.keras` files in this repository are flagged **"Unsafe"** by [Protect AI Guardian](https://protectai.com/insights/models/0xgr3y/Arch-Building-Image-Classification) (threat: PAIT-KERAS-301). This is a **structural false positive**, not a malware detection: | |
| - **What the scanner checks:** String-matching of `class_name` fields in the Keras v3 config against a whitelist of built-in Keras layers. | |
| - **Why flagged:** The model contains a custom layer (`GeMPooling`) β a non-standard class name triggers the flag. | |
| - **What it does NOT check:** The scanner does not analyze the Python code of the custom class, does not look for `eval()`/`exec()`/`os.system()`, and does not detect actual malware. | |
| - **Other scanners:** VirusTotal, JFrog, HF Picklescan β all clean. Only Protect AI flags this file. | |
| **The custom classes are safe and open source:** | |
| - `GeMPooling` β Generalized Mean Pooling (Radenovic et al., CVPR 2018). Pure tensor ops: `tf.pow`, `tf.reduce_mean`, `tf.maximum`. | |
| - `FocalLoss` β Focal Loss (Lin et al., ICCV 2017). Pure tensor ops. | |
| - `DiscriminativeAdamW` β AdamW subclass with gradient scaling. No file I/O, no network calls, no arbitrary code. | |
| Full source code for all custom classes is available in [`build_model.py`](https://huggingface.co/0xgr3y/Arch-Building-Image-Classification/blob/main/build_model.py) and the training notebook for public audit. | |
| ## Multi-Format Deployment Guide | |
| With model is provided in multiple formats to suit different deployment scenarios. Formats marked β are **not flagged** by Protect AI (no custom class serialization). | |
| | Format | File | Size | Protect AI | Inference Speed | Best For | | |
| |--------|------|------|------------|-----------------|----------| | |
| | **TF-Lite** β | `tflite/model.tflite` | ~88 MB | β Safe | **170.0 ms** (fastest) | Mobile, edge, embedded, HF Space | | |
| | **SavedModel** β | `saved_model/` | ~183 MB | β Safe | β | TensorFlow Serving, cloud backend | | |
| | **TFJS** β | `tfjs_model/` | ~90 MB | β Safe | β | Browser, Node.js (no backend) | | |
| | **Weights H5** β | `fine_tuning_swa.weights.h5` | ~158 MB | β Safe | β | Programmatic load via `build_model.py` | | |
| | **safetensors** β | `fine_tuning_swa.safetensors` | ~157 MB | β Safe | β | HF standard, cross-framework | | |
| | **Build Script** β | `build_model.py` | ~21 KB | β Safe | β | Architecture reconstruction + `load_weights()` | | |
| | **Keras** βΉ | `fine_tuning_swa.keras` | ~227 MB | βΉ Flagged | 358.0 ms | Developer reference, fine-tuning | | |
| ### Load Examples | |
| See **Usage** section above for complete load + inference examples for each format. | |
| ## Intended Use | |
| - Architectural style classification from building photographs | |
| - Educational tool for architecture recognition | |
| - Research baseline for fine-grained image classification (FGIC) | |
| - Transfer learning experiments on architectural imagery | |
| ## Limitations | |
| - Trained on Pexels stock photography β performance may differ on user-generated or field photographs | |
| - Limited to 8 architectural classes (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill) | |
| - Confusion pair analysis found **0 significant pairs** (threshold >5%) β all 8 classes are well-distinguished by the model. see `confusion_pairs.json` for details | |
| - Barn and windmill share 3 cross-class duplicates (0.02% of dataset) β left as-is due to negligible impact | |
| - Inference confidence can be low on atypical examples | |
|  | |
| ## Ethical Considerations | |
| - All training images sourced from [Pexels.com](https://www.pexels.com) under the Pexels License (free for commercial use, no attribution required). No copyrighted or personally identifiable images were used. | |
| - The dataset contains only photographs of buildings and structures β no people, faces, or private property are the subject of classification. | |
| - The model reflects the visual distribution of Pexels stock photography, which may over-represent Western and iconic architectural styles and under-represent vernacular or regional architecture. | |
| - The 8 class categories are broad and do not capture the full diversity of world architecture. Results should not be used to make definitive claims about architectural categorization. | |
| - URL pattern filtering during dataset collection explicitly excluded AI-generated art, illustrations, and non-photographic content to ensure authenticity. | |
| ## Links | |
| - **Gradio Space (Live):** [arch-building-classifier Space](https://huggingface.co/spaces/0xgr3y/arch-building-classifier) | |
| - **Dataset Studio:** [0xgr3y/arch-building-dataset](https://huggingface.co/datasets/0xgr3y/arch-building-dataset) | |
| - **GitHub Repository:** [arcxteam/building-architectural-image-classifier](https://github.com/arcxteam/building-architectural-image-classifier) | |
| ## References | |
| 1. Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller Models and Faster Training. *ICML 2021*. [arXiv:2104.00298](https://arxiv.org/abs/2104.00298) | |
| 2. Radenovic, F., Tolias, G., & Chum, O. (2018). Fine-Tuning CNN Image Retrieval with No Human Annotation. *IEEE TPAMI*. [arXiv:1711.02512](https://arxiv.org/abs/1711.02512) | |
| 3. Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal Loss for Dense Object Detection. *ICCV 2017*. [arXiv:1708.02002](https://arxiv.org/abs/1708.02002) | |
| 4. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., & Wilson, A. G. (2018). Averaging Weights Leads to Wider Optima and Better Generalization. *UAI 2018*. [arXiv:1803.05407](https://arxiv.org/abs/1803.05407) | |
| 5. Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. *ICLR 2018*. [arXiv:1710.09412](https://arxiv.org/abs/1710.09412) | |
| 6. Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. *ICCV 2019*. [arXiv:1905.04899](https://arxiv.org/abs/1905.04899) | |
| 7. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. *CVPR 2016*. [arXiv:1512.00567](https://arxiv.org/abs/1512.00567) | |
| 8. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How Transferable Are Features in Deep Neural Networks? *NeurIPS 2014*. [arXiv:1411.1792](https://arxiv.org/abs/1411.1792) | |
| 9. Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. *ACL 2018*. [arXiv:1801.06146](https://arxiv.org/abs/1801.06146) | |
| 10. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. *JMLR*, 15(56), 1929β1958. [http://jmlr.org/papers/v15/srivastava14a.html](http://jmlr.org/papers/v15/srivastava14a.html) | |
| 11. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. *arXiv preprint*. [arXiv:1502.03167](https://arxiv.org/abs/1502.03167) | |
| 12. Tarvainen, A., & Valpola, H. (2017). Mean Teachers are Better Role Models: Weight-averaged Consistency Targets Improve Semi-supervised Deep Learning Results. *NeurIPS 2017*. [arXiv:1703.01780](https://arxiv.org/abs/1703.01780) | |
| 13. Perez, L., & Wang, J. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. *arXiv preprint*. [arXiv:1712.04621](https://arxiv.org/abs/1712.04621) | |
| 14. Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J., & Sarma, A. (2020). Towards Principled Test-Time Augmentation. *ICML 2020*. [PDF](https://dmshanmugam.github.io/pdfs/icml_2020_testaug.pdf) | |
| 15. Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. *ICLR 2017*. [arXiv:1608.03983](https://arxiv.org/abs/1608.03983) | |
| 16. Prechelt, L. (1998). Automatic Early Stopping Using Cross Validation: Quantifying the Criteria. *Neural Networks*, 11(4), 761β767. [https://doi.org/10.1016/S0893-6080(98)00010-0](https://doi.org/10.1016/S0893-6080(98)00010-0) | |
| 17. Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. *ICML 2017*. [arXiv:1706.04599](https://arxiv.org/abs/1706.04599) | |
| 18. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. *ICCV 2017*. [arXiv:1610.02391](https://arxiv.org/abs/1610.02391) | |
| 19. van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. *JMLR*, 9(Nov), 2579β2605. [http://jmlr.org/papers/v9/vandermaaten08a.html](http://jmlr.org/papers/v9/vandermaaten08a.html) | |
| 20. Hand, D. J., & Till, R. J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. *Machine Learning*, 45(2), 171β186. [https://doi.org/10.1023/A:1010920819831](https://doi.org/10.1023/A:1010920819831) | |
| 21. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. *IJCV*, 115(3), 211β252. [arXiv:1409.0575](https://arxiv.org/abs/1409.0575) | |
| 22. Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. *NeurIPS 2017*. [arXiv:1612.01474](https://arxiv.org/abs/1612.01474) | |
| ## Citation | |
| ```bibtex | |
| @misc{saugani2026_arch_building, | |
| title={Fine-Grained Image Classification of World Architecture: | |
| An EfficientNetV2-S Transfer Learning Approach with Layered Regularization}, | |
| author={Saugani}, | |
| year={2026}, | |
| publisher={Hugging Face}, | |
| url={https://huggingface.co/0xgr3y/Arch-Building-Image-Classification} | |
| } | |
| ``` | |