DRAGON NAS Classifier
Image classifier trained on configs/alpaga_datasets.json with 2 classes.
Method
- Backbone : DINOv2-base (facebook/dinov2-base) - frozen, 86M params
- Head : found by Neural Architecture Search (DRAGON + Mutant-UCB)
- NAS objective :
-(0.70*MacroF1(Horama/Horama_WOW) + 0.30*MacroF1(_other))
- Data augmentation :
- Image-level : ? augmented views/image
- Feature-level : Mixup (alpha=0.0)
Results
| Split |
Accuracy |
Loss |
Macro-F1 |
F1w |
Recall |
Kappa |
| validation |
55.02% |
0.6896 |
44.50% |
47.39% |
55.02% |
0.0117 |
| test |
52.63% |
0.6921 |
40.62% |
43.81% |
52.63% |
-0.0448 |
| test_thoiry |
25.00% |
0.7293 |
20.00% |
40.00% |
25.00% |
N/A |
| test_combined |
51.61% |
0.6934 |
41.01% |
42.97% |
51.61% |
-0.0298 |
Model variants
| Variant |
Description |
Metric |
| best_nas |
NAS search weights (before retrain) |
- |
| best_retrain (recommended) |
Best validation loss |
0.8802 |
Classes (2)
Architecture
Best DRAGON Architecture
- Nodes : 4
- Operations : [['add', 'Identity', 'Identity'], ['concat', 'BatchNorm1d', 'SiLU'], ['add', 'Identity', 'Identity'], ['concat', 'Dropout', 0.8124402237012616, 'ELU']]
- LR : 0.000235
- WD : 0.004388
- Classes : 2
Architecture
graph TD
subgraph BACKBONE ["Backbone (frozen)"]
IMG[/"Image"/] --> ENCODER["Encoder"]
ENCODER --> CLS["Features"]
end
subgraph HEAD ["Classification head (DRAGON NAS)"]
N0["['add', 'Identity', 'Identity'] [add]"]
N1["['concat', 'BatchNorm1d', 'SiLU'] | SiLU [concat]"]
N2["['add', 'Identity', 'Identity'] [add]"]
N3["['concat', 'Dropout', 0.8124402237012616, 'ELU'] | ELU [concat]"]
OUT_MLP["Linear -> 2 classes"]
end
subgraph OUTPUT ["Output"]
SOFTMAX["Softmax"] --> PRED[/"Prediction<br/>2 classes"/]
end
CLS --> N0
N0 --> N1
N0 --> N2
N0 --> N3
N1 --> N3
N2 --> N3
N3 --> OUT_MLP
OUT_MLP --> SOFTMAX
style BACKBONE fill:#f0f0f0,stroke:#666
style HEAD fill:#e8f4fd,stroke:#1a73e8
style OUTPUT fill:#e8fde8,stroke:#1a8c1a
Usage (ONNX)
import onnxruntime as ort
import numpy as np
from transformers import Dinov2Model
import torchvision.transforms as T
backbone = Dinov2Model.from_pretrained("facebook/dinov2-base")
transform = T.Compose([
T.Resize(518, interpolation=T.InterpolationMode.BICUBIC),
T.CenterCrop(518),
T.ToTensor(),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
features = backbone(transform(image).unsqueeze(0)).last_hidden_state[:, 0]
session = ort.InferenceSession("model_head.onnx")
logits = session.run(None, {"features": features.numpy()})[0]
pred = np.argmax(logits, axis=1)