clip-arch-classifier

Architectural style image classifier built on frozen CLIP ViT-B/32 embeddings.

Classifies exterior building photographs into 26 architectural styles. The classifier is a LinearSVC fitted on 512-dim L2-normalised CLIP image embeddings, with a Platt calibrator (logistic regression) on top to produce interpretable probabilities.


Model description

Component Detail
Feature extractor CLIP ViT-B/32 (openai/clip-vit-base-patch32) — frozen
Embedding dim 512, L2-normalised
Classifier sklearn.svm.LinearSVC (C=1, balanced class weights)
Calibration Platt scaling — sklearn.linear_model.LogisticRegression fitted on val-set decision scores
Training date 2026-05-08
Random seed 42

Files

File Description
linearsvc.joblib Fitted LinearSVC
label_encoder.joblib sklearn LabelEncoder (integer ↔ class name)
platt_calibrator.joblib Platt calibrator — use this for predict_proba

Training data

Trained on the Architectural Styles Dataset (Curated and Extended): 9,767 images across 26 classes, split 70/15/15 train/val/test (stratified, seed 42).

The 26 classes are: Achaemenid, American Craftsman, American Foursquare, Ancient Egyptian, Art Deco, Art Nouveau, Baroque, Bauhaus, Beaux-Arts, Brutalism, Byzantine, Chicago school, Colonial, Deconstructivism, Edwardian, Georgian, Gothic, Greek Revival, International style, Novelty, Palladian, Postmodern, Queen Anne, Romanesque, Russian Revival, Tudor Revival.


Evaluation

Test set: 1,489 images (held-out, never seen during training or calibration)

Metric Value
Top-1 accuracy 0.7616
Top-3 accuracy 0.9261
Top-5 accuracy 0.9664
Macro F1 0.7577
Weighted F1 0.7582

Per-class F1 (test set)

Class F1 Support
Ancient Egyptian architecture 0.952 53
Achaemenid architecture 0.938 55
Novelty architecture 0.920 54
Gothic architecture 0.915 47
Brutalism architecture 0.867 44
Deconstructivism 0.872 44
Russian Revival architecture 0.844 49
Chicago school architecture 0.824 39
Art Nouveau architecture 0.813 90
Romanesque architecture 0.805 44
Byzantine architecture 0.795 45
Queen Anne architecture 0.793 107
Greek Revival architecture 0.776 76
Tudor Revival architecture 0.776 65
Art Deco architecture 0.764 83
Baroque architecture 0.740 66
American Foursquare architecture 0.732 53
Postmodern architecture 0.674 47
Bauhaus architecture 0.674 45
American craftsman style 0.698 52
Georgian architecture 0.634 53
Beaux-Arts architecture 0.650 61
Colonial architecture 0.610 68
International style 0.561 59
Palladian architecture 0.547 49
Edwardian architecture 0.526 41

Most-confused pairs

True class Predicted as Confusion rate
International style Bauhaus architecture 27.1 %
Postmodern architecture International style 17.0 %
American craftsman style American Foursquare 15.4 %
Palladian architecture Greek Revival architecture 14.3 %
Byzantine architecture Russian Revival architecture 13.3 %

Intended use

  • Classifying exterior building photographs by architectural style
  • Educational and research use in architectural history and computer vision
  • Input to downstream retrieval or recommendation systems

Not intended for:

  • Interior photographs, architectural renders, or drawings
  • Styles not in the 26-class vocabulary
  • High-stakes decisions without human review

Limitations

  • Weak classes: Edwardian (F1 = 0.53), Palladian (0.55), and International style (0.56) are the least reliable; treat their predictions as soft signals
  • Style overlap: International ↔ Bauhaus and Postmodern ↔ International confusions reflect genuine art-historical ambiguity, not purely model error
  • Geographic bias: training data is heavily Western/European
  • Modality: trained exclusively on exterior photographs; performance on interiors and non-photographic images is undefined
  • Leakage caveat: Ancient Egyptian and Novelty classes contain multiple photographs of the same landmark buildings; their F1 scores are likely slightly optimistic

Usage

import joblib
import torch
import torch.nn.functional as F
from PIL import Image
from transformers import CLIPModel, CLIPProcessor
from huggingface_hub import hf_hub_download

REPO_ID = "axel-riben/clip-arch-classifier"

# Load CLIP
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
clip      = CLIPModel.from_pretrained("openai/clip-vit-base-patch32").eval()

# Load classifier and calibrator
svc   = joblib.load(hf_hub_download(REPO_ID, "linearsvc.joblib"))
platt = joblib.load(hf_hub_download(REPO_ID, "platt_calibrator.joblib"))

# Predict
image  = Image.open("building.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
    feats = clip.get_image_features(**inputs)
if not isinstance(feats, torch.Tensor):
    feats = feats.pooler_output
emb = F.normalize(feats, dim=-1).numpy()

scores = svc.decision_function(emb)          # (1, 26)
probs  = platt.predict_proba(scores)[0]      # (26,)

top5 = sorted(zip(platt.classes_, probs), key=lambda x: -x[1])[:5]
for label, prob in top5:
    print(f"{prob:.3f}  {label}")

Citation

If you use this model, please also cite the original dataset:

Danci, Marian Dumitru/dumitrux. (n.d.). Architectural Styles Dataset [Data set].
Kaggle. https://www.kaggle.com/datasets/dumitrux/architectural-styles-dataset

License

Code and model weights: MIT. Training data licences: see the dataset card.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for axel-riben/clip-arch-classifier

Finetuned
(118)
this model

Dataset used to train axel-riben/clip-arch-classifier

Space using axel-riben/clip-arch-classifier 1

Evaluation results

  • Top-1 Accuracy on Architectural Styles Dataset (Curated and Extended)
    self-reported
    0.762
  • Top-3 Accuracy on Architectural Styles Dataset (Curated and Extended)
    self-reported
    0.926
  • Macro F1 on Architectural Styles Dataset (Curated and Extended)
    self-reported
    0.758