--- license: apache-2.0 pipeline_tag: image-classification tags: - efficientnetv2 - fgic - safetensors - transfer-learning - gem-pooling - focal-loss - swa - grad-cam - calibration - temperature-scaling - computer-vision - tensorflow.js library_name: keras language: en datasets: - 0xgr3y/arch-building-dataset model-index: - name: Architectural Building Image Classifier results: - task: type: image-classification name: Fine-Grained Image Classification dataset: type: imagefolder name: arch-building-dataset split: test metrics: - type: accuracy value: 0.9777 name: Test Accuracy - type: accuracy value: 0.9836 name: Validation Accuracy (SWA) - type: accuracy value: 0.9799 name: TTA Accuracy - type: f1 value: 0.9777 name: Macro F1 - type: precision value: 0.9777 name: Macro Precision - type: recall value: 0.9777 name: Macro Recall - type: roc_auc value: 0.9985 name: Macro ROC-AUC (OvR) --- ![Arch-Building-Image-Classification](results/greyscope-labs-architecture-classification-efficientnetv2.jpg) # Fine-Grained Image Classification of World Architecture: An EfficientNetV2-S Transfer Learning Approach with Layered Regularization ### Architectural Building Image Classifier Fine-Grained Image Classification (FGIC) of world architectural buildings using CNN transfer learning with EfficientNetV2-S, enhanced with GeM Pooling, Focal Loss, Discriminative AdamW (LR), Stochastic Weight Averaging (SWA), Grad-CAM explainability, and calibration analysis.
ArchitectureEfficientNetV2-S + GeM Pooling + Focal Loss + SWA
TaskFine-Grained Image Classification (FGIC)
Test Accuracy97.77%
Classes8 (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill)
Input Size320 × 320 pixels
Parameters23,350,633
FrameworkTensorFlow / Keras 3
LicenseApache-2.0
## Model Description A fine-grained image classification model for world architectural buildings. Built on EfficientNetV2-S pretrained on ImageNet, enhanced with GeM Pooling (learnable generalized mean pooling), Focal Loss, Discriminative AdamW and Stochastic Weight Averaging (SWA). Extended with Grad-CAM explainability visualization, ROC-AUC evaluation, ECE calibration analysis, and t-SNE embedding visualization. **Key architectural contributions:** - **GeM Pooling** (Radenovic et al., CVPR 2018) — replaces global average pooling with a learnable power parameter (p=3.0) that emphasizes high-activation features, yielding stronger discriminative representations for FGIC tasks - **Focal Loss** (Lin et al., ICCV 2017, gamma=2.0) — down-weights well-classified examples to focus gradient updates on hard-to-classify building pairs - **DiscriminativeAdamW LR** — extends AdamW with per-variable LR scaling on block6 (×0.1) via (update_step) override, combined with selective fine-tuning (block6+top_conv unfrozen, BN frozen). LR scaling produces truly discriminative updates — block6 variables receive 10× smaller learning rate than head variables (117 total: 105 block6 + 12 head) - **Mixup + CutMix** (Zhang et al., ICLR 2018. Yun et al., ICCV 2019) — alternating per-batch (50/50): Mixup (alpha=0.2, linear interpolation) and CutMix (alpha=1.0, spatial patch). Applied only in Phase 1 training to regularize head learning - **Selective Unfreeze** (Yosinski et al., 2014) — Phase 2 unfreezes block6+top_conv layers (180/513 EfficientNetV2-S layers) while keeping BatchNormalization frozen to preserve pretrained statistics - **SWA with BN re-estimation** (Izmailov et al., UAI 2018) — 10-epoch post-training weight averaging with constant LR 1e-4, followed by 100-step batch normalization statistics re-estimation (3,200 images) - **Test-Time Augmentation** — 6 variations averaged at inference: original, horizontal flip, center crop 85%, center crop 70%, corner crop top-left 80%, corner crop bottom-right 80%. Yields +0.22% accuracy improvement (97.77% → 97.99%) - **Grad-CAM** (Selvaraju et al., ICCV 2017) — gradient-weighted class activation mapping for explainability, targeting *top_conv* (last Conv2D layer of EfficientNetV2-S) - **ECE Calibration** (Guo et al., ICML 2017) — Expected Calibration Error with 15-bin reliability diagram to assess prediction confidence reliability - **Temperature Scaling** (Guo et al., ICML 2017) — post-hoc calibration via scalar temperature parameter T optimized on validation set (NLL minimization). T=0.54 reduces ECE from 12.04% (underconfident due to Label Smoothing) to 0.53% — applied at inference via (softmax(log(probs) / T)) trick ## Architecture ``` Input (320, 320, 3) │ EfficientNetV2-S (ImageNet pretrained, 513 layers, 20.33M params) │ Conv2D(256, 3×3, ReLU, padding=same) → 2,949,376 params BatchNormalization → 1,024 params MaxPooling2D(2×2) → 0 params │ GeM Pooling(p=3.0, eps=1e-6, learnable) → 1 param │ Dense(256, ReLU) → 65,792 params BatchNormalization → 1,024 params Dropout(0.4) → 0 params │ Dense(8, Softmax) → 2,056 params │ Output (8 classes) ``` | Component | Output Shape | Parameters | |-----------|-------------|------------| | EfficientNetV2-S (Functional) | (None, 10, 10, 1280) | 20,331,360 | | Conv2D 256 3×3 | (None, 10, 10, 256) | 2,949,376 | | BatchNormalization | (None, 10, 10, 256) | 1,024 | | MaxPooling2D 2×2 | (None, 5, 5, 256) | 0 | | GeM Pooling p=3.0 | (None, 256) | 1 | | Dense 256 ReLU | (None, 256) | 65,792 | | BatchNormalization | (None, 256) | 1,024 | | Dropout 0.4 | (None, 256) | 0 | | Dense 8 Softmax | (None, 8) | 2,056 | | **Total** | | **23,350,633** | | Trainable (Phase 1) | | **3,018,249** (11.51 MB) | | Trainable (Phase 2) | | **17,810,225** (67.94 MB) | | Non-trainable (Phase 1) | | **20,332,384** (77.56 MB) | ## Performance ### Overall Metrics | Metric | Value | |--------|-------| | Test Accuracy | 97.77% | | Validation Accuracy (SWA) | 98.36% | | Test-Time Augmentation | 97.99% | | Test Loss | 0.4262 | | Overfitting Gap (Train − Test) | 2.11% | | Macro Avg Precision | 0.9777 | | Macro Avg Recall | 0.9777 | | Macro Avg F1-Score | 0.9777 | | Top-2 Accuracy | 99.26% | | Top-3 Accuracy | 99.70% | | Macro ROC-AUC (OvR) | 0.9985 | | ECE (15 bins) | 0.1204 (pre-T-scaling. post-T-scaling: 0.0053, T=0.54) | ### Per-Class Results | Class | Precision | Recall | F1-Score | AUC (OvR) | Support | |-------|-----------|--------|----------|-----------|---------| | barn | 0.9760 | 0.9702 | 0.9731 | 0.9950 | 168 | | bridge | 0.9591 | 0.9762 | 0.9676 | 0.9983 | 168 | | castle | 0.9763 | 0.9821 | 0.9792 | 0.9996 | 168 | | mosque | 0.9763 | 0.9821 | 0.9792 | 0.9987 | 168 | | skyscraper | 0.9940 | 0.9940 | 0.9940 | 0.9999 | 168 | | stadium | 0.9820 | 0.9762 | 0.9791 | 0.9999 | 168 | | temple | 0.9816 | 0.9524 | 0.9668 | 0.9976 | 168 | | windmill | 0.9765 | 0.9881 | 0.9822 | 0.9987 | 168 | | **Macro Avg** | **0.9777** | **0.9777** | **0.9777** | **0.9985** | **1,344** | ### Model Selection Four candidate models were evaluated on the validation set: | Checkpoint | Val Accuracy | Val Loss | Description | |------------|-------------|----------|-------------| | `head_training.keras` | 92.34% | 1.0109 | Phase 1 checkpoint (backbone frozen) | | `fine_tuning.keras` | 96.28% | 0.5655 | Phase 2 checkpoint (block6+top_conv unfrozen) | | `fine_tuning_ema.keras` | 93.53% | 0.6007 | Phase 2 EMA (per-step Polyak averaging) | | **`fine_tuning_swa.keras`** | **98.36%** | **0.4109** | **SWA averaged weights ← SELECTED** | ### Training Progression | Phase | Epoch | Train Acc | Val Accuracy | Val Loss | |-------|-------|-----------|-------------|----------| | Phase 1 (Head Training) | 1 | 56.96% | 92.19% | 1.0079 | | Phase 2 (Selective Fine-Tuning) | 1 | 84.96% | 96.21% | 0.5656 | | SWA | 1 | 90.83% | 95.76% | 0.5831 | | SWA | 2 | 94.07% | 97.62% | 0.5116 | | SWA | 3 | 95.36% | 97.69% | 0.4748 | | SWA | 4 | 96.56% | 96.95% | 0.4390 | | SWA | 5 | 97.18% | 97.47% | 0.4490 | | SWA | 6 | 97.76% | 97.84% | 0.4416 | | SWA | 7 | 97.91% | 98.14% | 0.4055 | | SWA | 8 | 98.19% | 97.32% | 0.4359 | | SWA | 9 | 98.14% | 97.02% | 0.4519 | | SWA | 10 | 98.59% | 97.54% | 0.4226 | | **SWA + BN (final)** | — | — | **98.36%** | **0.4109** | > Phase 1 and Phase 2 each stopped after 1 epoch via `myCallback` (custom early stopping at target accuracy: 85% Phase 1, 92% Phase 2). SWA ran 10 epochs with constant LR 1e-4, followed by BN re-estimation (100 steps, 3,200 images). Values shown are training-time metrics from progress bar. checkpoint evaluation values may differ slightly (see Model Selection table above). ![Training Curves](results/training_curves.png) ![Confusion Matrix](results/confusion_matrix.png) ![Per-Class Accuracy](results/per_class_accuracy.png) ![Confidence Per Class](results/confidence_per_class.png) ![t-SNE Embedding](results/tsne_embedding.png) ![Grad-CAM Heatmaps](results/gradcam_heatmaps.png) ## Training Details ### Training Strategy Two-phase progressive training with SWA post-processing: | Phase | Description | Backbone | Optimizer | LR | Max Epochs | Actual Epochs | CutMix+Mixup | FocalLoss LS | |-------|-------------|----------|-----------|-----|-----------|---------------|---------------|-------------| | **Phase 1** — Feature Extraction | Train custom head only | Frozen (all) | AdamW (wd=2e-5) | 0.001 + CosineDecay + Warmup 3ep | 25 | 1 | Yes (50/50 alternation) | 0.1 | | **Phase 2** — Selective Fine-Tuning | Load head_training → fine-tune | block6 + top_conv unfrozen (BN frozen) | DiscriminativeAdamW (block6=0.1×) | 3e-4 + CosineDecay + Warmup 5ep | 50 | 1 + 10 SWA | No | 0.05 | > ¹ Phase 1 stops when `val_accuracy ≥ 85%` threshold (myCallback). > ² Phase 2 stops when `val_accuracy ≥ 92%` threshold (myCallback), followed by 10 SWA epochs (constant LR 1e-4). ### Hyperparameters | Parameter | Phase 1 | Phase 2 | |-----------|---------|---------| | Optimizer | AdamW | DiscriminativeAdamW | | Learning Rate | 0.001 | 3×10⁻⁴ | | LR Schedule | WarmupCosineDecay (warmup=3) | WarmupCosineDecay (warmup=5) | | Weight Decay | 2×10⁻⁵ | 2×10⁻⁵ | | LR Multiplier (block6) | — | 0.1× (LR scaling via update_step, truly discriminative) | | LR Multiplier (top_conv+head) | — | 1.0× | | Loss | FocalLoss (gamma=2.0, LS=0.1) | FocalLoss (gamma=2.0, LS=0.05) | | Batch Size | 32 | 32 | | Early Stopping Patience | 7 | 12 | | myCallback Threshold | val_acc ≥ 0.85 | val_acc ≥ 0.92 | | EMA Decay (per-step) | 0.999 | 0.999 | | SWA Epochs | — | 10 (post-training) | | SWA LR | — | 1×10⁻⁴ (constant) | | BN Re-estimation Steps | — | 100 | | CutMix (alpha=1.0) | Yes (50% batches) | No | | Mixup (alpha=0.2) | Yes (50% batches) | No | | Hardware | 2× Tesla T4 (MirroredStrategy) | 2× Tesla T4 (MirroredStrategy) | ### Regularization Strategy | Technique | Implementation | Reference | |-----------|---------------|-----------| | Transfer Learning | EfficientNetV2-S backbone frozen in Phase 1 | Yosinski et al., NeurIPS 2014 | | Selective Fine-Tuning | Unfreeze block6+top_conv only, BN stays frozen | Howard & Ruder, ACL 2018 | | Discriminative LR Scaling | block6 LR×0.1 via update_step (truly discriminative — 10× smaller updates for pretrained features) | Howard & Ruder, ACL 2018 | | CutMix + Mixup | Alternation per batch (50/50), Phase 1 only | Yun et al., ICCV 2019. Zhang et al., ICLR 2018 | | Focal Loss | gamma=2.0, down-weights easy examples | Lin et al., ICCV 2017 | | Label Smoothing | 0.1 (Phase 1) → 0.05 (Phase 2) | Szegedy et al., CVPR 2016 | | GeM Pooling | p=3.0 learnable, replaces GAP | Radenovic et al., CVPR 2018 | | Dropout | 0.4 after Dense(256)+BN | Srivastava et al., JMLR 2014 | | Batch Normalization | After Conv2D and Dense. frozen during fine-tuning | Ioffe & Szegedy, arXiv 2015 | | EMA (per-step) | Shadow weights, decay=0.999, Polyak averaging | Tarvainen & Valpola, NeurIPS 2017 | | SWA | 10-epoch post-training, constant LR 1e-4 | Izmailov et al., UAI 2018 | | Data Augmentation | Rotation ±15°, shift ±10%, shear ±0.1 rad, zoom ±20%, brightness 0.75–1.15, channel shift ±10.0, horizontal flip | Perez & Wang, arXiv 2017 | | Random Erasing | p=0.5, area [0.02–0.15], aspect [0.3–3.3], applied pre-normalization | Zhong et al., AAAI 2020 | | Test-Time Augmentation | 6 augmentation variants, averaged | Shanmugam et al., ICML 2020 | | WarmupCosineDecay | Linear warmup + cosine annealing | Loshchilov & Hutter, ICLR 2017 (SGDR) | | Early Stopping | Patience 7 (Phase 1) / 12 (Phase 2) | Prechelt, Neural Networks 1998 | ### Dataset See the dataset curation page for [World Architectural Buildings Dataset for Multi‑Class Image Classification](https://huggingface.co/datasets/0xgr3y/arch-building-dataset) — 13,440 images (8 classes × 1,680, balanced) sourced from Pexels with perceptual (pHash) and exact (SHA256) deduplication. | Split | Images | Percentage | |-------|--------|------------| | Train | 10,752 | 80% | | Validation | 1,344 | 10% | | Test | 1,344 | 10% | ### Data Preprocessing - **Normalization:** `preprocess_input` from `tf.keras.applications.efficientnet_v2` (ImageNet distribution) - **Input resolution:** 320×320 (higher than ImageNet default 224×224 to capture fine-grained architectural details — textures, ornaments, facade patterns) - **Augmentation:** Applied to training set only. validation and test sets use clean preprocessing - **Split method:** `splitfolders.ratio` from `dataset/`, seed=42 ## Files | Category | Files | |----------|-------| | **Model (best)** | `fine_tuning_swa.keras` (227 MB) · `.weights.h5` (158 MB) · `.safetensors` (157 MB) | | **Code** | `build_model.py` (21 KB) — architecture + CLI inference | | **Config** | `config.json` · `label_mapping.json` · `preprocessor_config.json` | | **Evaluation** | `calibration_data.json` · `model_benchmark.json` · `confusion_pairs.json` · `class_confidence_stats.json` · `temperature_config.json` | | **Deployment** | `saved_model/` (183 MB) · `tflite/` (88 MB) · `tfjs_model/` (90 MB, 23 shards) | | **Results** | `results/` — 12 PNG (augmentation, reliability-diagram, training curves, confusion matrix, ROC, t-SNE, Grad-CAM, etc.) | | **Archive** | `models_keras/` — 3 checkpoints (head_training, fine_tuning, fine_tuning_ema) | ## Usage ### Gradio Space Try the live building classify: [Architecture Building Image Classifier with Space](https://huggingface.co/spaces/0xgr3y/arch-building-classifier) ### Python — build_model.py (recommended) `build_model.py` is a standalone module that provides: - **Custom class definitions** (`GeMPooling`, `FocalLoss`, `DiscriminativeAdamW`) with `@register_keras_serializable` — importing the module registers all custom classes globally, so `load_model()` works without explicit `custom_objects`. - **`ArchBuildingClassifier`** — high-level wrapper class with `build()`, `from_weights()`, `from_keras()`, `predict()`, `predict_batch()` methods. - **`CUSTOM_OBJECTS`** dict — fallback for explicit `custom_objects=` in `load_model()`. - **`build_model()`** — backward-compatible function that returns a raw `tf.keras.Model`. Upload `build_model.py` to the same directory as your script or add it to `PYTHONPATH`. > **Note:** Filenames below use `fine_tuning_swa` as an example. The actual best checkpoint filename depends on training results — check the repo for the actual `.keras`, `.weights.h5`, and `.safetensors` filenames. ```python from build_model import ArchBuildingClassifier from huggingface_hub import hf_hub_download # Download weights (clean format) weights_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.weights.h5") # Load model: architecture + weights clf = ArchBuildingClassifier.from_weights(weights_path) # Inference from PIL import Image import numpy as np label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg")) print(f"Predicted: {label} ({confidence:.1%})") for cls, prob in top3: print(f" {cls}: {prob:.1%}") ``` ### Python — TF-Lite (fastest inference) ```python import numpy as np import tensorflow as tf from huggingface_hub import hf_hub_download from PIL import Image import json try: from tensorflow.keras.applications.efficientnet_v2 import preprocess_input except (ImportError, ModuleNotFoundError): from tensorflow.keras.applications.efficientnet import preprocess_input # Download model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "tflite/model.tflite") labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json") with open(labels_path) as f: LABELS = json.load(f)["labels"] interpreter = tf.lite.Interpreter(model_path=model_path) interpreter.allocate_tensors() input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) arr = np.expand_dims(preprocess_input( np.array(img, dtype=np.float32)), axis=0) interpreter.set_tensor(input_details[0]["index"], arr) interpreter.invoke() preds = interpreter.get_tensor(output_details[0]["index"])[0] top3_idx = np.argsort(preds)[::-1][:3] for i in top3_idx: print(f" {LABELS[i]}: {preds[i]*100:.1f}%") ``` ### Python — Keras (convenient) ```python import build_model # registers custom classes via @register_keras_serializable import tensorflow as tf from huggingface_hub import hf_hub_download try: from tensorflow.keras.applications.efficientnet_v2 import preprocess_input except (ImportError, ModuleNotFoundError): from tensorflow.keras.applications.efficientnet import preprocess_input from PIL import Image import numpy as np import json model_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "fine_tuning_swa.keras") labels_path = hf_hub_download("0xgr3y/Arch-Building-Image-Classification", "label_mapping.json") model = tf.keras.models.load_model(model_path, compile=False) # custom_objects not needed with open(labels_path) as f: LABELS = json.load(f)["labels"] img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) arr = np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0) preds = model.predict(arr, verbose=0)[0] print(f"Predicted: {LABELS[np.argmax(preds)]} ({np.max(preds)*100:.1f}%)") ``` ### Python — SavedModel (TF Serving) ```python from huggingface_hub import snapshot_download import tensorflow as tf import numpy as np from PIL import Image try: from tensorflow.keras.applications.efficientnet_v2 import preprocess_input except (ImportError, ModuleNotFoundError): from tensorflow.keras.applications.efficientnet import preprocess_input snapshot_download("0xgr3y/Arch-Building-Image-Classification", allow_patterns=["saved_model/*"], local_dir=".") # Load SavedModel (created via model.export() — inference-only, no custom_objects needed) loaded = tf.saved_model.load("saved_model") img = Image.open("skyscraper_00000.jpg").convert("RGB").resize((320, 320)) arr = tf.constant(np.expand_dims(preprocess_input(np.array(img, dtype=np.float32)), axis=0)) preds = loaded(arr).numpy()[0] top3_idx = np.argsort(preds)[::-1][:3] for i in top3_idx: print(f" Class {i}: {preds[i]*100:.1f}%") ``` ### Python — safetensors (HF standard, cross-framework) > **Note:** safetensors stores raw weight tensors without architecture metadata. To load, reconstruct the architecture with `build_model.py` first, then map tensors manually. For most use cases, `.weights.h5` (via `ArchBuildingClassifier.from_weights()`) is simpler and equally clean. ```python from safetensors.numpy import load_file from build_model import ArchBuildingClassifier from PIL import Image # Reconstruct architecture clf = ArchBuildingClassifier.build() # Load safetensors tensors tensors = load_file("fine_tuning_swa.safetensors") # Map tensors to model weights (iterate layers, not .variables — Keras 3 compatible) for layer in clf.keras_model.layers: for w in layer.weights: name = w.name.replace(':', '_').replace('/', '_') if name in tensors: w.assign(tensors[name]) # Inference label, confidence, top3 = clf.predict(Image.open("skyscraper_00000.jpg")) ``` ## Inference Verification Keras vs TFLite consistency was verified on 8 random test samples (1 per class): | Metric | Result | |--------|--------| | Keras correct | 7/8 (88%) | | TFLite correct | 7/8 (88%) | | Keras vs TFLite match | **8/8 (100%)** — identical predictions | | Keras inference speed | 358.0 ms | | TFLite inference speed | 170.0 ms | > The 1 misclassification (castle→barn, 65% confidence) is consistent with the 97.77% test accuracy. The 8/8 match confirms TFLite conversion preserves model behavior exactly. ![TFLite Inference](results/inference_tflite.png) ## Security Notice (PAIT-KERAS-301) The `.keras` files in this repository are flagged **"Unsafe"** by [Protect AI Guardian](https://protectai.com/insights/models/0xgr3y/Arch-Building-Image-Classification) (threat: PAIT-KERAS-301). This is a **structural false positive**, not a malware detection: - **What the scanner checks:** String-matching of `class_name` fields in the Keras v3 config against a whitelist of built-in Keras layers. - **Why flagged:** The model contains a custom layer (`GeMPooling`) — a non-standard class name triggers the flag. - **What it does NOT check:** The scanner does not analyze the Python code of the custom class, does not look for `eval()`/`exec()`/`os.system()`, and does not detect actual malware. - **Other scanners:** VirusTotal, JFrog, HF Picklescan — all clean. Only Protect AI flags this file. **The custom classes are safe and open source:** - `GeMPooling` — Generalized Mean Pooling (Radenovic et al., CVPR 2018). Pure tensor ops: `tf.pow`, `tf.reduce_mean`, `tf.maximum`. - `FocalLoss` — Focal Loss (Lin et al., ICCV 2017). Pure tensor ops. - `DiscriminativeAdamW` — AdamW subclass with gradient scaling. No file I/O, no network calls, no arbitrary code. Full source code for all custom classes is available in [`build_model.py`](https://huggingface.co/0xgr3y/Arch-Building-Image-Classification/blob/main/build_model.py) and the training notebook for public audit. ## Multi-Format Deployment Guide With model is provided in multiple formats to suit different deployment scenarios. Formats marked ✓ are **not flagged** by Protect AI (no custom class serialization). | Format | File | Size | Protect AI | Inference Speed | Best For | |--------|------|------|------------|-----------------|----------| | **TF-Lite** ✓ | `tflite/model.tflite` | ~88 MB | ✓ Safe | **170.0 ms** (fastest) | Mobile, edge, embedded, HF Space | | **SavedModel** ✓ | `saved_model/` | ~183 MB | ✓ Safe | — | TensorFlow Serving, cloud backend | | **TFJS** ✓ | `tfjs_model/` | ~90 MB | ✓ Safe | — | Browser, Node.js (no backend) | | **Weights H5** ✓ | `fine_tuning_swa.weights.h5` | ~158 MB | ✓ Safe | — | Programmatic load via `build_model.py` | | **safetensors** ✓ | `fine_tuning_swa.safetensors` | ~157 MB | ✓ Safe | — | HF standard, cross-framework | | **Build Script** ✓ | `build_model.py` | ~21 KB | ✓ Safe | — | Architecture reconstruction + `load_weights()` | | **Keras** ℹ | `fine_tuning_swa.keras` | ~227 MB | ℹ Flagged | 358.0 ms | Developer reference, fine-tuning | ### Load Examples See **Usage** section above for complete load + inference examples for each format. ## Intended Use - Architectural style classification from building photographs - Educational tool for architecture recognition - Research baseline for fine-grained image classification (FGIC) - Transfer learning experiments on architectural imagery ## Limitations - Trained on Pexels stock photography — performance may differ on user-generated or field photographs - Limited to 8 architectural classes (barn, bridge, castle, mosque, skyscraper, stadium, temple, windmill) - Confusion pair analysis found **0 significant pairs** (threshold >5%) — all 8 classes are well-distinguished by the model. see `confusion_pairs.json` for details - Barn and windmill share 3 cross-class duplicates (0.02% of dataset) — left as-is due to negligible impact - Inference confidence can be low on atypical examples ![Misclassification Examples](results/misclassification_examples.png) ## Ethical Considerations - All training images sourced from [Pexels.com](https://www.pexels.com) under the Pexels License (free for commercial use, no attribution required). No copyrighted or personally identifiable images were used. - The dataset contains only photographs of buildings and structures — no people, faces, or private property are the subject of classification. - The model reflects the visual distribution of Pexels stock photography, which may over-represent Western and iconic architectural styles and under-represent vernacular or regional architecture. - The 8 class categories are broad and do not capture the full diversity of world architecture. Results should not be used to make definitive claims about architectural categorization. - URL pattern filtering during dataset collection explicitly excluded AI-generated art, illustrations, and non-photographic content to ensure authenticity. ## Links - **Gradio Space (Live):** [arch-building-classifier Space](https://huggingface.co/spaces/0xgr3y/arch-building-classifier) - **Dataset Studio:** [0xgr3y/arch-building-dataset](https://huggingface.co/datasets/0xgr3y/arch-building-dataset) - **GitHub Repository:** [arcxteam/building-architectural-image-classifier](https://github.com/arcxteam/building-architectural-image-classifier) ## References 1. Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller Models and Faster Training. *ICML 2021*. [arXiv:2104.00298](https://arxiv.org/abs/2104.00298) 2. Radenovic, F., Tolias, G., & Chum, O. (2018). Fine-Tuning CNN Image Retrieval with No Human Annotation. *IEEE TPAMI*. [arXiv:1711.02512](https://arxiv.org/abs/1711.02512) 3. Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollar, P. (2017). Focal Loss for Dense Object Detection. *ICCV 2017*. [arXiv:1708.02002](https://arxiv.org/abs/1708.02002) 4. Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., & Wilson, A. G. (2018). Averaging Weights Leads to Wider Optima and Better Generalization. *UAI 2018*. [arXiv:1803.05407](https://arxiv.org/abs/1803.05407) 5. Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond Empirical Risk Minimization. *ICLR 2018*. [arXiv:1710.09412](https://arxiv.org/abs/1710.09412) 6. Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., & Yoo, Y. (2019). CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. *ICCV 2019*. [arXiv:1905.04899](https://arxiv.org/abs/1905.04899) 7. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. *CVPR 2016*. [arXiv:1512.00567](https://arxiv.org/abs/1512.00567) 8. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How Transferable Are Features in Deep Neural Networks? *NeurIPS 2014*. [arXiv:1411.1792](https://arxiv.org/abs/1411.1792) 9. Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. *ACL 2018*. [arXiv:1801.06146](https://arxiv.org/abs/1801.06146) 10. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. *JMLR*, 15(56), 1929–1958. [http://jmlr.org/papers/v15/srivastava14a.html](http://jmlr.org/papers/v15/srivastava14a.html) 11. Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. *arXiv preprint*. [arXiv:1502.03167](https://arxiv.org/abs/1502.03167) 12. Tarvainen, A., & Valpola, H. (2017). Mean Teachers are Better Role Models: Weight-averaged Consistency Targets Improve Semi-supervised Deep Learning Results. *NeurIPS 2017*. [arXiv:1703.01780](https://arxiv.org/abs/1703.01780) 13. Perez, L., & Wang, J. (2017). The Effectiveness of Data Augmentation in Image Classification using Deep Learning. *arXiv preprint*. [arXiv:1712.04621](https://arxiv.org/abs/1712.04621) 14. Shanmugam, D., Blalock, D., Balakrishnan, G., Guttag, J., & Sarma, A. (2020). Towards Principled Test-Time Augmentation. *ICML 2020*. [PDF](https://dmshanmugam.github.io/pdfs/icml_2020_testaug.pdf) 15. Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. *ICLR 2017*. [arXiv:1608.03983](https://arxiv.org/abs/1608.03983) 16. Prechelt, L. (1998). Automatic Early Stopping Using Cross Validation: Quantifying the Criteria. *Neural Networks*, 11(4), 761–767. [https://doi.org/10.1016/S0893-6080(98)00010-0](https://doi.org/10.1016/S0893-6080(98)00010-0) 17. Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On Calibration of Modern Neural Networks. *ICML 2017*. [arXiv:1706.04599](https://arxiv.org/abs/1706.04599) 18. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. *ICCV 2017*. [arXiv:1610.02391](https://arxiv.org/abs/1610.02391) 19. van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. *JMLR*, 9(Nov), 2579–2605. [http://jmlr.org/papers/v9/vandermaaten08a.html](http://jmlr.org/papers/v9/vandermaaten08a.html) 20. Hand, D. J., & Till, R. J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. *Machine Learning*, 45(2), 171–186. [https://doi.org/10.1023/A:1010920819831](https://doi.org/10.1023/A:1010920819831) 21. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. *IJCV*, 115(3), 211–252. [arXiv:1409.0575](https://arxiv.org/abs/1409.0575) 22. Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. *NeurIPS 2017*. [arXiv:1612.01474](https://arxiv.org/abs/1612.01474) ## Citation ```bibtex @misc{saugani2026_arch_building, title={Fine-Grained Image Classification of World Architecture: An EfficientNetV2-S Transfer Learning Approach with Layered Regularization}, author={Saugani}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/0xgr3y/Arch-Building-Image-Classification} } ```