DRAGON NAS Classifier

Image classifier trained on configs/alpaga_datasets.json with 2 classes.

Method

  • Backbone : DINOv2-base (facebook/dinov2-base) - frozen, 86M params
  • Head : found by Neural Architecture Search (DRAGON + Mutant-UCB)
  • NAS objective : -(0.70*MacroF1(Horama/Horama_WOW) + 0.30*MacroF1(_other))
  • Data augmentation :
    • Image-level : ? augmented views/image
    • Feature-level : Mixup (alpha=0.0)

Results

Split Accuracy Loss Macro-F1 F1w Recall Kappa
validation 55.02% 0.6896 44.50% 47.39% 55.02% 0.0117
test 52.63% 0.6921 40.62% 43.81% 52.63% -0.0448
test_thoiry 25.00% 0.7293 20.00% 40.00% 25.00% N/A
test_combined 51.61% 0.6934 41.01% 42.97% 51.61% -0.0298

Model variants

Variant Description Metric
best_nas NAS search weights (before retrain) -
best_retrain (recommended) Best validation loss 0.8802

Classes (2)

  • alpaga
  • vigogne

Architecture

Best DRAGON Architecture

  • Nodes : 4
  • Operations : [['add', 'Identity', 'Identity'], ['concat', 'BatchNorm1d', 'SiLU'], ['add', 'Identity', 'Identity'], ['concat', 'Dropout', 0.8124402237012616, 'ELU']]
  • LR : 0.000235
  • WD : 0.004388
  • Classes : 2

Architecture

graph TD
    subgraph BACKBONE ["Backbone (frozen)"]
    IMG[/"Image"/] --> ENCODER["Encoder"]
    ENCODER --> CLS["Features"]
    end
    subgraph HEAD ["Classification head (DRAGON NAS)"]
    N0["['add', 'Identity', 'Identity'] [add]"]
    N1["['concat', 'BatchNorm1d', 'SiLU'] | SiLU [concat]"]
    N2["['add', 'Identity', 'Identity'] [add]"]
    N3["['concat', 'Dropout', 0.8124402237012616, 'ELU'] | ELU [concat]"]
    OUT_MLP["Linear -> 2 classes"]
    end
    subgraph OUTPUT ["Output"]
    SOFTMAX["Softmax"] --> PRED[/"Prediction<br/>2 classes"/]
    end
    CLS --> N0
    N0 --> N1
    N0 --> N2
    N0 --> N3
    N1 --> N3
    N2 --> N3
    N3 --> OUT_MLP
    OUT_MLP --> SOFTMAX
    style BACKBONE fill:#f0f0f0,stroke:#666
    style HEAD fill:#e8f4fd,stroke:#1a73e8
    style OUTPUT fill:#e8fde8,stroke:#1a8c1a

Usage (ONNX)

import onnxruntime as ort
import numpy as np
from transformers import Dinov2Model
import torchvision.transforms as T

backbone = Dinov2Model.from_pretrained("facebook/dinov2-base")
transform = T.Compose([
    T.Resize(518, interpolation=T.InterpolationMode.BICUBIC),
    T.CenterCrop(518),
    T.ToTensor(),
    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
features = backbone(transform(image).unsqueeze(0)).last_hidden_state[:, 0]

session = ort.InferenceSession("model_head.onnx")
logits = session.run(None, {"features": features.numpy()})[0]
pred = np.argmax(logits, axis=1)
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support