shiowo
/

DINO-Protomorph

+---
+license: cc-by-sa-4.0
+library_name: pytorch
+pipeline_tag: image-classification
+base_model: facebook/dinov3-vits16-pretrain-lvd1689m
+tags:
+  - image-classification
+  - computer-vision
+  - dinov3
+  - pytorch
+  - safetensors
+  - prototype-learning
+  - hard-example-mining
+  - feedback-routing
+  - experimental
+datasets:
+  - pending
+metrics:
+  - accuracy
+  - f1
+  - precision
+  - recall
+---
+# DINO-Protomorph
+**Feedback-Gated Prototype Morphing for Hard-Case Image Classification**
+ProtoMorph-DINO is an experimental image classification head designed to run on top of a frozen DINOv3 vision backbone.
+The model explores a custom architecture for hard-case image classification using:
+- frozen DINOv3 patch embeddings
+- ProtoMorph prototype-style transformation blocks
+- layer memory attention
+- confidence-based hard-case routing
+- top-2 probability feedback
+- Delta-RBF hard expert refinement
+- logit fusion for difficult samples
+This repository currently contains the early project/model-card setup for ProtoMorph-DINO. Training and evaluation results are still pending.
+This repository does **not** redistribute DINOv3 weights. Users must download DINOv3 separately from its official source and comply with the upstream DINOv3 license.
+This project is an independent research implementation and is not affiliated with Meta AI, Hugging Face, or the official DINOv3 project.
+---
+## Architecture
+```text
+Image
+↓
+Frozen DINOv3
+↓
+Patch map z0
+↓
+ProtoMorph block 1
+↓
+Layer Memory Attention
+↓
+ProtoMorph block 2
+↓
+Layer Memory Attention
+↓
+Main logits
+↓
+Hard-case gate
+    ├── easy: return main logits
+    └── hard:
+          feedback from top-2 probabilities
+          modulate DINO patch map
+          run Delta-RBF hard expert
+          fuse logits
+```
+---
+## Model Summary
+ProtoMorph-DINO is built around the idea that not every image needs the same amount of computation.
+For easy images, the model returns the main classifier output directly.
+For difficult or ambiguous images, the model activates a feedback branch. This branch uses the top-2 predicted probabilities to modulate the DINO patch map, then sends the modified representation through a specialized Delta-RBF hard expert before fusing the logits.
+The main research goal is to test whether feedback-guided hard-case refinement can improve classification performance over a standard frozen-backbone linear or MLP head.
+---
+## Intended Use
+This model is intended for:
+- image classification research
+- hard-example routing experiments
+- prototype learning experiments
+- frozen-backbone classifier research
+- fine-grained classification experiments
+- educational and experimental computer vision projects
+This model is **not** intended for safety-critical use.
+Do not use this model for medical, legal, financial, biometric, security-critical, or production decisions without proper validation.
+---
+## Model Files
+Recommended repository layout:
+```text
+.
+├── README.md
+├── config.json
+├── labels.txt
+├── protomorph_head.safetensors
+└── inference/
+    ├── model.py
+    └── infer.py
+```
+The main weight file is expected to be:
+```text
+protomorph_head.safetensors
+```
+This file contains only the custom ProtoMorph classification head.
+DINOv3 backbone weights are not included.
+---
+## Backbone
+Default backbone:
+```text
+facebook/dinov3-vits16-pretrain-lvd1689m
+```
+The backbone is used as a frozen visual feature extractor.
+For RTX 3090-class GPUs, the ViT-S/16 DINOv3 variant is recommended as a practical starting point because it keeps VRAM usage manageable while still producing strong patch embeddings.
+---
+## Installation
+Recommended environment:
+```text
+Python 3.11
+PyTorch 2.4.0
+CUDA 12.4 PyTorch wheel
+```
+Install PyTorch:
+```bash
+pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
+```
+Install dependencies:
+```bash
+pip install transformers safetensors pillow numpy tqdm accelerate
+```
+---
+## Example Usage
+```python
+import torch
+from PIL import Image
+from transformers import AutoImageProcessor, AutoModel
+from safetensors.torch import load_file
+# Replace with your local or Hugging Face repo path.
+REPO_ID = "YOUR_USERNAME/protomorph-dino"
+# DINOv3 is loaded separately.
+BACKBONE_NAME = "facebook/dinov3-vits16-pretrain-lvd1689m"
+device = "cuda" if torch.cuda.is_available() else "cpu"
+processor = AutoImageProcessor.from_pretrained(BACKBONE_NAME)
+backbone = AutoModel.from_pretrained(
+    BACKBONE_NAME,
+    torch_dtype=torch.float16 if device == "cuda" else torch.float32,
+).to(device)
+backbone.eval()
+for p in backbone.parameters():
+    p.requires_grad = False
+# Load your ProtoMorph model class from your local code.
+# from model import ProtoMorphDINOClassifier
+#
+# model = ProtoMorphDINOClassifier(...)
+# state = load_file("protomorph_head.safetensors")
+# model.load_state_dict(state, strict=True)
+# model.to(device)
+# model.eval()
+image = Image.open("example.jpg").convert("RGB")
+inputs = processor(images=image, return_tensors="pt").to(device)
+with torch.no_grad():
+    outputs = backbone(**inputs)
+    tokens = outputs.last_hidden_state
+    # DINOv3 ViT outputs include special tokens before patch tokens.
+    # Your implementation should remove CLS/register tokens according to its config.
+    #
+    # logits = model(tokens)
+    # probs = torch.softmax(logits, dim=-1)
+    # print(probs)
+```
+For the full runnable inference script, see the associated GitHub repository.
+---
+## Config Example
+```json
+{
+  "model_name": "ProtoMorph-DINO",
+  "backbone_name": "facebook/dinov3-vits16-pretrain-lvd1689m",
+  "num_classes": "pending",
+  "patch_dim": 384,
+  "hidden_dim": 512,
+  "num_prototypes": 64,
+  "memory_heads": 8,
+  "hard_gate_confidence_threshold": 0.65,
+  "hard_gate_margin_threshold": 0.15,
+  "hard_expert_weight": 0.5,
+  "dtype": "float16"
+}
+```
+---
+## Training Status
+**Status: Pending**
+This repository is being prepared before full training and evaluation. At the moment, final training runs, benchmark comparisons, and validated metrics are not available yet.
+If this repository contains an untrained or randomly initialized head, predictions are not meaningful yet.
+---
+## Dataset
+**Dataset: Pending**
+Training dataset information will be added after the dataset selection and training split are finalized.
+Expected fields to add later:
+- dataset name
+- number of classes
+- train/validation/test split
+- preprocessing steps
+- augmentation strategy
+- label mapping
+Class labels are expected to be stored in:
+```text
+labels.txt
+```
+---
+## Evaluation
+**Evaluation results: Pending**
+The model has not yet been fully trained and evaluated. Metrics will be added after experiments are complete.
+| Metric | Value |
+|---|---:|
+| Accuracy | Pending |
+| F1 | Pending |
+| Precision | Pending |
+| Recall | Pending |
+Recommended baselines:
+| Baseline | Why Compare |
+|---|---|
+| DINOv3 + Linear Probe | Minimal frozen-backbone baseline |
+| DINOv3 + MLP Head | Strong simple head baseline |
+| CLIP + Linear Probe | Popular vision-language baseline |
+| ConvNeXt | Strong CNN-style baseline |
+| ViT | Standard transformer baseline |
+---
+## Planned Experiments
+Planned research questions:
+- Can feedback from top-2 probabilities improve hard-case classification?
+- Can prototype-style transformations improve frozen DINO features?
+- Does hard-case routing reduce unnecessary compute?
+- Can a Delta-RBF expert improve class-boundary decisions?
+- Does memory attention help preserve useful intermediate representations?
+- Can this approach outperform a normal linear or MLP head on fine-grained datasets?
+---
+## Limitations
+Known limitations:
+- The architecture is experimental.
+- Training and evaluation results are currently pending.
+- The hard-case gate requires threshold tuning.
+- The Delta-RBF hard expert may overfit small datasets.
+- Inference may be slower for hard samples.
+- The model should be compared against simple baselines before claiming improvement.
+- This repo does not include DINOv3 weights.
+- The custom head may not generalize outside the dataset it was trained on.
+---
+## License
+The ProtoMorph head weights in this repository are released under:
+```text
+Creative Commons Attribution-ShareAlike 4.0 International
+CC BY-SA 4.0
+```
+You may use, share, and adapt these weights, including commercially, provided that you give appropriate credit and distribute adapted versions under CC BY-SA 4.0 or a compatible license.
+This license applies only to the ProtoMorph head weights and related files released in this repository.
+It does not apply to:
+- DINOv3
+- PyTorch
+- Hugging Face Transformers
+- third-party datasets
+- third-party model weights
+- upstream dependencies
+DINOv3 is not redistributed in this repository. Users are responsible for obtaining DINOv3 separately and complying with its license.
+---
+## Attribution
+If you use this model or build on it, please credit:
+```text
+ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification
+Author: YOUR_NAME
+Repository: https://huggingface.co/YOUR_USERNAME/protomorph-dino
+```
+BibTeX:
+```bibtex
+@software{protomorph_dino_2026,
+  title = {ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification},
+  author = {YOUR_NAME},
+  year = {2026},
+  url = {https://huggingface.co/YOUR_USERNAME/protomorph-dino}
+}
+```
+---
+## Disclaimer
+This is a research prototype.
+The model is provided for experimentation and educational use. It should not be used in production or high-stakes environments without independent validation, dataset auditing, robustness testing, and bias evaluation.
+---
+## Project Links
+GitHub repository: coming soon
+```text
+https://github.com/shiowo/DINO-Protomorph
+```
+Hugging Face model page:
+```text
+https://huggingface.co/shiowo/DINO-Protomorph
+```