shiowo commited on
Commit
31886b5
·
verified ·
1 Parent(s): 30d3a70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +402 -3
README.md CHANGED
@@ -1,3 +1,402 @@
1
- ---
2
- license: cc-by-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-sa-4.0
3
+ library_name: pytorch
4
+ pipeline_tag: image-classification
5
+ base_model: facebook/dinov3-vits16-pretrain-lvd1689m
6
+ tags:
7
+ - image-classification
8
+ - computer-vision
9
+ - dinov3
10
+ - pytorch
11
+ - safetensors
12
+ - prototype-learning
13
+ - hard-example-mining
14
+ - feedback-routing
15
+ - experimental
16
+ datasets:
17
+ - pending
18
+ metrics:
19
+ - accuracy
20
+ - f1
21
+ - precision
22
+ - recall
23
+ ---
24
+
25
+ # DINO-Protomorph
26
+
27
+ **Feedback-Gated Prototype Morphing for Hard-Case Image Classification**
28
+
29
+ ProtoMorph-DINO is an experimental image classification head designed to run on top of a frozen DINOv3 vision backbone.
30
+
31
+ The model explores a custom architecture for hard-case image classification using:
32
+
33
+ - frozen DINOv3 patch embeddings
34
+ - ProtoMorph prototype-style transformation blocks
35
+ - layer memory attention
36
+ - confidence-based hard-case routing
37
+ - top-2 probability feedback
38
+ - Delta-RBF hard expert refinement
39
+ - logit fusion for difficult samples
40
+
41
+ This repository currently contains the early project/model-card setup for ProtoMorph-DINO. Training and evaluation results are still pending.
42
+
43
+ This repository does **not** redistribute DINOv3 weights. Users must download DINOv3 separately from its official source and comply with the upstream DINOv3 license.
44
+
45
+ This project is an independent research implementation and is not affiliated with Meta AI, Hugging Face, or the official DINOv3 project.
46
+
47
+ ---
48
+
49
+ ## Architecture
50
+
51
+ ```text
52
+ Image
53
+
54
+ Frozen DINOv3
55
+
56
+ Patch map z0
57
+
58
+ ProtoMorph block 1
59
+
60
+ Layer Memory Attention
61
+
62
+ ProtoMorph block 2
63
+
64
+ Layer Memory Attention
65
+
66
+ Main logits
67
+
68
+ Hard-case gate
69
+ ├── easy: return main logits
70
+ └── hard:
71
+ feedback from top-2 probabilities
72
+ modulate DINO patch map
73
+ run Delta-RBF hard expert
74
+ fuse logits
75
+ ```
76
+
77
+ ---
78
+
79
+ ## Model Summary
80
+
81
+ ProtoMorph-DINO is built around the idea that not every image needs the same amount of computation.
82
+
83
+ For easy images, the model returns the main classifier output directly.
84
+
85
+ For difficult or ambiguous images, the model activates a feedback branch. This branch uses the top-2 predicted probabilities to modulate the DINO patch map, then sends the modified representation through a specialized Delta-RBF hard expert before fusing the logits.
86
+
87
+ The main research goal is to test whether feedback-guided hard-case refinement can improve classification performance over a standard frozen-backbone linear or MLP head.
88
+
89
+ ---
90
+
91
+ ## Intended Use
92
+
93
+ This model is intended for:
94
+
95
+ - image classification research
96
+ - hard-example routing experiments
97
+ - prototype learning experiments
98
+ - frozen-backbone classifier research
99
+ - fine-grained classification experiments
100
+ - educational and experimental computer vision projects
101
+
102
+ This model is **not** intended for safety-critical use.
103
+
104
+ Do not use this model for medical, legal, financial, biometric, security-critical, or production decisions without proper validation.
105
+
106
+ ---
107
+
108
+ ## Model Files
109
+
110
+ Recommended repository layout:
111
+
112
+ ```text
113
+ .
114
+ ├── README.md
115
+ ├── config.json
116
+ ├── labels.txt
117
+ ├── protomorph_head.safetensors
118
+ └── inference/
119
+ ├── model.py
120
+ └── infer.py
121
+ ```
122
+
123
+ The main weight file is expected to be:
124
+
125
+ ```text
126
+ protomorph_head.safetensors
127
+ ```
128
+
129
+ This file contains only the custom ProtoMorph classification head.
130
+
131
+ DINOv3 backbone weights are not included.
132
+
133
+ ---
134
+
135
+ ## Backbone
136
+
137
+ Default backbone:
138
+
139
+ ```text
140
+ facebook/dinov3-vits16-pretrain-lvd1689m
141
+ ```
142
+
143
+ The backbone is used as a frozen visual feature extractor.
144
+
145
+ For RTX 3090-class GPUs, the ViT-S/16 DINOv3 variant is recommended as a practical starting point because it keeps VRAM usage manageable while still producing strong patch embeddings.
146
+
147
+ ---
148
+
149
+ ## Installation
150
+
151
+ Recommended environment:
152
+
153
+ ```text
154
+ Python 3.11
155
+ PyTorch 2.4.0
156
+ CUDA 12.4 PyTorch wheel
157
+ ```
158
+
159
+ Install PyTorch:
160
+
161
+ ```bash
162
+ pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
163
+ ```
164
+
165
+ Install dependencies:
166
+
167
+ ```bash
168
+ pip install transformers safetensors pillow numpy tqdm accelerate
169
+ ```
170
+
171
+ ---
172
+
173
+ ## Example Usage
174
+
175
+ ```python
176
+ import torch
177
+ from PIL import Image
178
+ from transformers import AutoImageProcessor, AutoModel
179
+ from safetensors.torch import load_file
180
+
181
+ # Replace with your local or Hugging Face repo path.
182
+ REPO_ID = "YOUR_USERNAME/protomorph-dino"
183
+
184
+ # DINOv3 is loaded separately.
185
+ BACKBONE_NAME = "facebook/dinov3-vits16-pretrain-lvd1689m"
186
+
187
+ device = "cuda" if torch.cuda.is_available() else "cpu"
188
+
189
+ processor = AutoImageProcessor.from_pretrained(BACKBONE_NAME)
190
+ backbone = AutoModel.from_pretrained(
191
+ BACKBONE_NAME,
192
+ torch_dtype=torch.float16 if device == "cuda" else torch.float32,
193
+ ).to(device)
194
+
195
+ backbone.eval()
196
+ for p in backbone.parameters():
197
+ p.requires_grad = False
198
+
199
+ # Load your ProtoMorph model class from your local code.
200
+ # from model import ProtoMorphDINOClassifier
201
+ #
202
+ # model = ProtoMorphDINOClassifier(...)
203
+ # state = load_file("protomorph_head.safetensors")
204
+ # model.load_state_dict(state, strict=True)
205
+ # model.to(device)
206
+ # model.eval()
207
+
208
+ image = Image.open("example.jpg").convert("RGB")
209
+ inputs = processor(images=image, return_tensors="pt").to(device)
210
+
211
+ with torch.no_grad():
212
+ outputs = backbone(**inputs)
213
+ tokens = outputs.last_hidden_state
214
+
215
+ # DINOv3 ViT outputs include special tokens before patch tokens.
216
+ # Your implementation should remove CLS/register tokens according to its config.
217
+ #
218
+ # logits = model(tokens)
219
+ # probs = torch.softmax(logits, dim=-1)
220
+ # print(probs)
221
+ ```
222
+
223
+ For the full runnable inference script, see the associated GitHub repository.
224
+
225
+ ---
226
+
227
+ ## Config Example
228
+
229
+ ```json
230
+ {
231
+ "model_name": "ProtoMorph-DINO",
232
+ "backbone_name": "facebook/dinov3-vits16-pretrain-lvd1689m",
233
+ "num_classes": "pending",
234
+ "patch_dim": 384,
235
+ "hidden_dim": 512,
236
+ "num_prototypes": 64,
237
+ "memory_heads": 8,
238
+ "hard_gate_confidence_threshold": 0.65,
239
+ "hard_gate_margin_threshold": 0.15,
240
+ "hard_expert_weight": 0.5,
241
+ "dtype": "float16"
242
+ }
243
+ ```
244
+
245
+ ---
246
+
247
+ ## Training Status
248
+
249
+ **Status: Pending**
250
+
251
+ This repository is being prepared before full training and evaluation. At the moment, final training runs, benchmark comparisons, and validated metrics are not available yet.
252
+
253
+ If this repository contains an untrained or randomly initialized head, predictions are not meaningful yet.
254
+
255
+ ---
256
+
257
+ ## Dataset
258
+
259
+ **Dataset: Pending**
260
+
261
+ Training dataset information will be added after the dataset selection and training split are finalized.
262
+
263
+ Expected fields to add later:
264
+
265
+ - dataset name
266
+ - number of classes
267
+ - train/validation/test split
268
+ - preprocessing steps
269
+ - augmentation strategy
270
+ - label mapping
271
+
272
+ Class labels are expected to be stored in:
273
+
274
+ ```text
275
+ labels.txt
276
+ ```
277
+
278
+ ---
279
+
280
+ ## Evaluation
281
+
282
+ **Evaluation results: Pending**
283
+
284
+ The model has not yet been fully trained and evaluated. Metrics will be added after experiments are complete.
285
+
286
+ | Metric | Value |
287
+ |---|---:|
288
+ | Accuracy | Pending |
289
+ | F1 | Pending |
290
+ | Precision | Pending |
291
+ | Recall | Pending |
292
+
293
+ Recommended baselines:
294
+
295
+ | Baseline | Why Compare |
296
+ |---|---|
297
+ | DINOv3 + Linear Probe | Minimal frozen-backbone baseline |
298
+ | DINOv3 + MLP Head | Strong simple head baseline |
299
+ | CLIP + Linear Probe | Popular vision-language baseline |
300
+ | ConvNeXt | Strong CNN-style baseline |
301
+ | ViT | Standard transformer baseline |
302
+
303
+ ---
304
+
305
+ ## Planned Experiments
306
+
307
+ Planned research questions:
308
+
309
+ - Can feedback from top-2 probabilities improve hard-case classification?
310
+ - Can prototype-style transformations improve frozen DINO features?
311
+ - Does hard-case routing reduce unnecessary compute?
312
+ - Can a Delta-RBF expert improve class-boundary decisions?
313
+ - Does memory attention help preserve useful intermediate representations?
314
+ - Can this approach outperform a normal linear or MLP head on fine-grained datasets?
315
+
316
+ ---
317
+
318
+ ## Limitations
319
+
320
+ Known limitations:
321
+
322
+ - The architecture is experimental.
323
+ - Training and evaluation results are currently pending.
324
+ - The hard-case gate requires threshold tuning.
325
+ - The Delta-RBF hard expert may overfit small datasets.
326
+ - Inference may be slower for hard samples.
327
+ - The model should be compared against simple baselines before claiming improvement.
328
+ - This repo does not include DINOv3 weights.
329
+ - The custom head may not generalize outside the dataset it was trained on.
330
+
331
+ ---
332
+
333
+ ## License
334
+
335
+ The ProtoMorph head weights in this repository are released under:
336
+
337
+ ```text
338
+ Creative Commons Attribution-ShareAlike 4.0 International
339
+ CC BY-SA 4.0
340
+ ```
341
+
342
+ You may use, share, and adapt these weights, including commercially, provided that you give appropriate credit and distribute adapted versions under CC BY-SA 4.0 or a compatible license.
343
+
344
+ This license applies only to the ProtoMorph head weights and related files released in this repository.
345
+
346
+ It does not apply to:
347
+
348
+ - DINOv3
349
+ - PyTorch
350
+ - Hugging Face Transformers
351
+ - third-party datasets
352
+ - third-party model weights
353
+ - upstream dependencies
354
+
355
+ DINOv3 is not redistributed in this repository. Users are responsible for obtaining DINOv3 separately and complying with its license.
356
+
357
+ ---
358
+
359
+ ## Attribution
360
+
361
+ If you use this model or build on it, please credit:
362
+
363
+ ```text
364
+ ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification
365
+ Author: YOUR_NAME
366
+ Repository: https://huggingface.co/YOUR_USERNAME/protomorph-dino
367
+ ```
368
+
369
+ BibTeX:
370
+
371
+ ```bibtex
372
+ @software{protomorph_dino_2026,
373
+ title = {ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification},
374
+ author = {YOUR_NAME},
375
+ year = {2026},
376
+ url = {https://huggingface.co/YOUR_USERNAME/protomorph-dino}
377
+ }
378
+ ```
379
+
380
+ ---
381
+
382
+ ## Disclaimer
383
+
384
+ This is a research prototype.
385
+
386
+ The model is provided for experimentation and educational use. It should not be used in production or high-stakes environments without independent validation, dataset auditing, robustness testing, and bias evaluation.
387
+
388
+ ---
389
+
390
+ ## Project Links
391
+
392
+ GitHub repository: coming soon
393
+
394
+ ```text
395
+ https://github.com/shiowo/DINO-Protomorph
396
+ ```
397
+
398
+ Hugging Face model page:
399
+
400
+ ```text
401
+ https://huggingface.co/shiowo/DINO-Protomorph
402
+ ```