---
license: cc-by-sa-4.0
library_name: pytorch
pipeline_tag: image-classification
base_model: facebook/dinov3-vits16-pretrain-lvd1689m
tags:
  - image-classification
  - computer-vision
  - dinov3
  - pytorch
  - safetensors
  - prototype-learning
  - hard-example-mining
  - feedback-routing
  - experimental
metrics:
  - accuracy
  - f1
  - precision
  - recall
---

# ProtoMorph-DINO

**Feedback-Gated Prototype Morphing for Hard-Case Image Classification**

ProtoMorph-DINO is an experimental image classification head designed to run on top of a frozen DINOv3 vision backbone.

This model card is for the Hugging Face repository:

```text
shiowo/DINO-Protomorph
```

This repository currently contains an initial research scaffold and custom ProtoMorph head checkpoint. Evaluation results are **pending** because the repository is being created before full training and benchmarking.

This project is independent and is not affiliated with Meta AI, Hugging Face, or the official DINOv3 project.

---

## Architecture

```text
Image
↓
Frozen DINOv3
↓
Patch map z0
↓
ProtoMorph block 1
↓
Layer Memory Attention
↓
ProtoMorph block 2
↓
Layer Memory Attention
↓
Main logits
↓
Hard-case gate
    ├── easy: return main logits
    └── hard:
          feedback from top-2 probabilities
          modulate DINO patch map
          run Delta-RBF hard expert
          fuse logits
```

---

## Model Summary

ProtoMorph-DINO explores whether a frozen foundation vision backbone can be improved with a custom hard-case refinement head.

For easy images, the model returns the main classifier output directly. For difficult or ambiguous images, the model activates a feedback branch. The feedback branch uses the top-2 predicted probabilities to modulate the DINO patch map, sends the modified representation through a Delta-RBF hard expert, and fuses the refined logits with the main logits.

The main research question is whether feedback-guided hard-case refinement can improve classification performance over simpler frozen-backbone heads such as a linear probe or MLP classifier.

---

## Current Status

**Status: research scaffold / pre-training setup**

The current checkpoint may be randomly initialized or only intended for smoke testing unless a later release says otherwise.

Predictions are **not meaningful** until the ProtoMorph head is trained on a real dataset.

---

## Results

**Evaluation results: Pending**

No benchmark results are reported yet because the repository is being prepared before training and evaluation.

| Metric | Value |
|---|---:|
| Accuracy | Pending |
| F1 | Pending |
| Precision | Pending |
| Recall | Pending |
| Confusion-pair improvement | Pending |
| Hard-case routing benefit | Pending |

Recommended future baselines:

| Baseline | Purpose |
|---|---|
| DINOv3 + Linear Probe | Minimal frozen-backbone baseline |
| DINOv3 + MLP Head | Strong simple head baseline |
| CLIP + Linear Probe | Popular vision-language comparison |
| ConvNeXt | Strong CNN-style baseline |
| ViT | Standard transformer baseline |

---

## Intended Use

This model is intended for:

- image classification research
- hard-example routing experiments
- prototype learning experiments
- frozen-backbone classifier research
- fine-grained classification experiments
- educational computer vision experiments

This model is **not** intended for safety-critical use.

Do not use this model for medical, legal, financial, biometric, security-critical, or production decisions without independent validation.

---

## Model Files

Recommended repository layout:

```text
.
├── README.md
├── LICENSE-WEIGHTS.md
├── config.json
├── labels.txt
├── checkpoints/
│   ├── config.json
│   ├── labels.txt
│   └── protomorph_head.safetensors
├── infer.py
├── scripts/
│   └── upload_to_hf.py
└── src/
    └── protomorph/
```

The main weight file is:

```text
checkpoints/protomorph_head.safetensors
```

This file contains only the custom ProtoMorph classification head.

DINOv3 backbone weights are **not** included in this repository.

---

## Backbone

Default backbone:

```text
facebook/dinov3-vits16-pretrain-lvd1689m
```

The backbone is used as a frozen visual feature extractor.

For RTX 3090-class GPUs, ViT-S/16 is a practical starting point because it keeps VRAM usage manageable while still producing useful patch embeddings.

---

## Installation

Recommended environment:

```text
Python 3.11
PyTorch 2.4.0
CUDA 12.4 PyTorch wheel
```

Install PyTorch:

```bash
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
```

Install dependencies:

```bash
pip install -r requirements-core.txt
```

---

## RunPod Environment Variables

This project supports the RunPod environment variable names shown below:

```text
hf_key=hf_your_huggingface_write_token_here
hf_repo=shiowo/DINO-Protomorph
```

Standard Hugging Face names are also supported:

```text
HF_TOKEN=hf_your_huggingface_write_token_here
HF_REPO_ID=shiowo/DINO-Protomorph
```

Never commit your real Hugging Face token to the repository.

---

## Inference

Run inference from the command line:

```bash
python infer.py \
  --image examples/sample_image.jpg \
  --config checkpoints/config.json \
  --checkpoint checkpoints/protomorph_head.safetensors \
  --labels checkpoints/labels.txt \
  --topk 5
```

For smoke testing only:

```bash
python infer.py --image examples/sample_image.jpg --allow-random-head
```

If the head is untrained, the output is only useful for checking that the pipeline runs.

---

## Upload to Hugging Face from RunPod

After setting `hf_key` and `hf_repo` in RunPod, run:

```bash
cd /workspace/protomorph_dinov3_runpod
source .venv/bin/activate
python scripts/upload_to_hf.py
```

Or use the helper script:

```bash
bash runpod/upload_to_hf.sh
```

Dry run before upload:

```bash
python scripts/upload_to_hf.py --dry-run
```

---

## Config Example

```json
{
  "dino_model_name": "facebook/dinov3-vits16-pretrain-lvd1689m",
  "num_classes": 10,
  "embed_dim": 384,
  "patch_size": 16,
  "proto_count": 64,
  "memory_tokens": 16,
  "rbf_count": 128,
  "num_heads": 8,
  "dropout": 0.0,
  "hard_pmax_threshold": 0.65,
  "hard_margin_threshold": 0.15,
  "hard_entropy_threshold": 1.35,
  "image_size": 512,
  "use_bf16_autocast": true,
  "normalize_patch_tokens": true
}
```

---

## Limitations

Known limitations:

- The architecture is experimental.
- Evaluation results are pending.
- The hard-case gate requires threshold tuning.
- The Delta-RBF hard expert may overfit small datasets.
- Inference may be slower for hard samples.
- The model should be compared against simple baselines before claiming improvement.
- This repository does not include DINOv3 weights.
- The custom head may not generalize outside the dataset it was trained on.

---

## License

The ProtoMorph head weights in this repository are released under:

```text
Creative Commons Attribution-ShareAlike 4.0 International
CC BY-SA 4.0
```

You may use, share, and adapt these weights, including commercially, provided that you give appropriate credit and distribute adapted versions under CC BY-SA 4.0 or a compatible license.

This license applies only to the ProtoMorph head weights and related files released in this repository.

It does not apply to:

- DINOv3
- PyTorch
- Hugging Face Transformers
- third-party datasets
- third-party model weights
- upstream dependencies

DINOv3 is not redistributed in this repository. Users are responsible for obtaining DINOv3 separately and complying with its license.

---

## Attribution

If you use this model or build on it, please credit:

```text
ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification
Author: shiowo
Repository: https://huggingface.co/shiowo/DINO-Protomorph
```

BibTeX:

```bibtex
@software{protomorph_dino_2026,
  title = {ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification},
  author = {shiowo},
  year = {2026},
  url = {https://huggingface.co/shiowo/DINO-Protomorph}
}
```

---

## Disclaimer

This is a research prototype.

The model is provided for experimentation and educational use. It should not be used in production or high-stakes environments without independent validation, dataset auditing, robustness testing, and bias evaluation.