File size: 8,439 Bytes
31886b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63089c1
31886b5
 
 
 
 
63089c1
31886b5
63089c1
 
 
31886b5
63089c1
31886b5
63089c1
31886b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63089c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31886b5
63089c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31886b5
63089c1
31886b5
63089c1
 
 
 
 
 
 
31886b5
 
 
 
 
 
 
 
 
 
 
 
63089c1
31886b5
 
 
63089c1
31886b5
 
 
 
 
 
 
 
 
 
63089c1
31886b5
 
63089c1
 
 
 
 
 
 
 
 
31886b5
 
63089c1
31886b5
 
63089c1
31886b5
 
 
 
63089c1
31886b5
 
 
 
 
 
 
 
 
 
 
 
 
63089c1
31886b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63089c1
31886b5
 
 
 
 
63089c1
31886b5
 
 
 
63089c1
31886b5
63089c1
31886b5
63089c1
 
 
31886b5
 
63089c1
31886b5
63089c1
 
 
 
31886b5
63089c1
31886b5
 
 
63089c1
31886b5
63089c1
31886b5
63089c1
 
 
 
 
 
 
 
31886b5
63089c1
31886b5
63089c1
 
31886b5
 
63089c1
 
31886b5
 
63089c1
31886b5
63089c1
31886b5
63089c1
 
 
 
 
31886b5
63089c1
31886b5
63089c1
 
 
31886b5
63089c1
31886b5
63089c1
 
 
31886b5
63089c1
31886b5
63089c1
31886b5
63089c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31886b5
 
 
 
 
 
 
 
63089c1
31886b5
 
 
 
63089c1
31886b5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63089c1
 
31886b5
 
 
 
 
 
 
63089c1
31886b5
63089c1
31886b5
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
---
license: cc-by-sa-4.0
library_name: pytorch
pipeline_tag: image-classification
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
tags:
  - image-classification
  - computer-vision
  - dinov3
  - pytorch
  - safetensors
  - prototype-learning
  - hard-example-mining
  - feedback-routing
  - experimental
metrics:
  - accuracy
  - f1
  - precision
  - recall
---

# ProtoMorph-DINO

**Feedback-Gated Prototype Morphing for Hard-Case Image Classification**

ProtoMorph-DINO is an experimental image classification head designed to run on top of a frozen DINOv3 vision backbone.

This model card is for the Hugging Face repository:

```text
shiowo/DINO-Protomorph
```

This repository currently contains an initial research scaffold and custom ProtoMorph head checkpoint. Evaluation results are **pending** because the repository is being created before full training and benchmarking.

This project is independent and is not affiliated with Meta AI, Hugging Face, or the official DINOv3 project.

---

## Architecture

```text
Image
↓
Frozen DINOv3
↓
Patch map z0
↓
ProtoMorph block 1
↓
Layer Memory Attention
↓
ProtoMorph block 2
↓
Layer Memory Attention
↓
Main logits
↓
Hard-case gate
    β”œβ”€β”€ easy: return main logits
    └── hard:
          feedback from top-2 probabilities
          modulate DINO patch map
          run Delta-RBF hard expert
          fuse logits
```

---

## Model Summary

ProtoMorph-DINO explores whether a frozen foundation vision backbone can be improved with a custom hard-case refinement head.

For easy images, the model returns the main classifier output directly. For difficult or ambiguous images, the model activates a feedback branch. The feedback branch uses the top-2 predicted probabilities to modulate the DINO patch map, sends the modified representation through a Delta-RBF hard expert, and fuses the refined logits with the main logits.

The main research question is whether feedback-guided hard-case refinement can improve classification performance over simpler frozen-backbone heads such as a linear probe or MLP classifier.

---

## Current Status

**Status: research scaffold / pre-training setup**

The current checkpoint may be randomly initialized or only intended for smoke testing unless a later release says otherwise.

Predictions are **not meaningful** until the ProtoMorph head is trained on a real dataset.

---

## Results

**Evaluation results: Pending**

No benchmark results are reported yet because the repository is being prepared before training and evaluation.

| Metric | Value |
|---|---:|
| Accuracy | Pending |
| F1 | Pending |
| Precision | Pending |
| Recall | Pending |
| Confusion-pair improvement | Pending |
| Hard-case routing benefit | Pending |

Recommended future baselines:

| Baseline | Purpose |
|---|---|
| DINOv3 + Linear Probe | Minimal frozen-backbone baseline |
| DINOv3 + MLP Head | Strong simple head baseline |
| CLIP + Linear Probe | Popular vision-language comparison |
| ConvNeXt | Strong CNN-style baseline |
| ViT | Standard transformer baseline |

---

## Intended Use

This model is intended for:

- image classification research
- hard-example routing experiments
- prototype learning experiments
- frozen-backbone classifier research
- fine-grained classification experiments
- educational computer vision experiments

This model is **not** intended for safety-critical use.

Do not use this model for medical, legal, financial, biometric, security-critical, or production decisions without independent validation.

---

## Model Files

Recommended repository layout:

```text
.
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE-WEIGHTS.md
β”œβ”€β”€ config.json
β”œβ”€β”€ labels.txt
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ labels.txt
β”‚   └── protomorph_head.safetensors
β”œβ”€β”€ infer.py
β”œβ”€β”€ scripts/
β”‚   └── upload_to_hf.py
└── src/
    └── protomorph/
```

The main weight file is:

```text
checkpoints/protomorph_head.safetensors
```

This file contains only the custom ProtoMorph classification head.

DINOv3 backbone weights are **not** included in this repository.

---

## Backbone

Default backbone:

```text
facebook/dinov3-vits16-pretrain-lvd1689m
```

The backbone is used as a frozen visual feature extractor.

For RTX 3090-class GPUs, ViT-S/16 is a practical starting point because it keeps VRAM usage manageable while still producing useful patch embeddings.

---

## Installation

Recommended environment:

```text
Python 3.11
PyTorch 2.4.0
CUDA 12.4 PyTorch wheel
```

Install PyTorch:

```bash
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
```

Install dependencies:

```bash
pip install -r requirements-core.txt
```

---

## RunPod Environment Variables

This project supports the RunPod environment variable names shown below:

```text
hf_key=hf_your_huggingface_write_token_here
hf_repo=shiowo/DINO-Protomorph
```

Standard Hugging Face names are also supported:

```text
HF_TOKEN=hf_your_huggingface_write_token_here
HF_REPO_ID=shiowo/DINO-Protomorph
```

Never commit your real Hugging Face token to the repository.

---

## Inference

Run inference from the command line:

```bash
python infer.py \
  --image examples/sample_image.jpg \
  --config checkpoints/config.json \
  --checkpoint checkpoints/protomorph_head.safetensors \
  --labels checkpoints/labels.txt \
  --topk 5
```

For smoke testing only:

```bash
python infer.py --image examples/sample_image.jpg --allow-random-head
```

If the head is untrained, the output is only useful for checking that the pipeline runs.

---

## Upload to Hugging Face from RunPod

After setting `hf_key` and `hf_repo` in RunPod, run:

```bash
cd /workspace/protomorph_dinov3_runpod
source .venv/bin/activate
python scripts/upload_to_hf.py
```

Or use the helper script:

```bash
bash runpod/upload_to_hf.sh
```

Dry run before upload:

```bash
python scripts/upload_to_hf.py --dry-run
```

---

## Config Example

```json
{
  "dino_model_name": "facebook/dinov3-vits16-pretrain-lvd1689m",
  "num_classes": 10,
  "embed_dim": 384,
  "patch_size": 16,
  "proto_count": 64,
  "memory_tokens": 16,
  "rbf_count": 128,
  "num_heads": 8,
  "dropout": 0.0,
  "hard_pmax_threshold": 0.65,
  "hard_margin_threshold": 0.15,
  "hard_entropy_threshold": 1.35,
  "image_size": 512,
  "use_bf16_autocast": true,
  "normalize_patch_tokens": true
}
```

---

## Limitations

Known limitations:

- The architecture is experimental.
- Evaluation results are pending.
- The hard-case gate requires threshold tuning.
- The Delta-RBF hard expert may overfit small datasets.
- Inference may be slower for hard samples.
- The model should be compared against simple baselines before claiming improvement.
- This repository does not include DINOv3 weights.
- The custom head may not generalize outside the dataset it was trained on.

---

## License

The ProtoMorph head weights in this repository are released under:

```text
Creative Commons Attribution-ShareAlike 4.0 International
CC BY-SA 4.0
```

You may use, share, and adapt these weights, including commercially, provided that you give appropriate credit and distribute adapted versions under CC BY-SA 4.0 or a compatible license.

This license applies only to the ProtoMorph head weights and related files released in this repository.

It does not apply to:

- DINOv3
- PyTorch
- Hugging Face Transformers
- third-party datasets
- third-party model weights
- upstream dependencies

DINOv3 is not redistributed in this repository. Users are responsible for obtaining DINOv3 separately and complying with its license.

---

## Attribution

If you use this model or build on it, please credit:

```text
ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification
Author: shiowo
Repository: https://huggingface.co/shiowo/DINO-Protomorph
```

BibTeX:

```bibtex
@software{protomorph_dino_2026,
  title = {ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification},
  author = {shiowo},
  year = {2026},
  url = {https://huggingface.co/shiowo/DINO-Protomorph}
}
```

---

## Disclaimer

This is a research prototype.

The model is provided for experimentation and educational use. It should not be used in production or high-stakes environments without independent validation, dataset auditing, robustness testing, and bias evaluation.