| --- |
| license: cc-by-sa-4.0 |
| library_name: pytorch |
| pipeline_tag: image-classification |
| base_model: facebook/dinov3-vits16-pretrain-lvd1689m |
| tags: |
| - image-classification |
| - computer-vision |
| - dinov3 |
| - pytorch |
| - safetensors |
| - prototype-learning |
| - hard-example-mining |
| - feedback-routing |
| - experimental |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| --- |
| |
| # ProtoMorph-DINO |
|
|
| **Feedback-Gated Prototype Morphing for Hard-Case Image Classification** |
|
|
| ProtoMorph-DINO is an experimental image classification head designed to run on top of a frozen DINOv3 vision backbone. |
|
|
| This model card is for the Hugging Face repository: |
|
|
| ```text |
| shiowo/DINO-Protomorph |
| ``` |
|
|
| This repository currently contains an initial research scaffold and custom ProtoMorph head checkpoint. Evaluation results are **pending** because the repository is being created before full training and benchmarking. |
|
|
| This project is independent and is not affiliated with Meta AI, Hugging Face, or the official DINOv3 project. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ```text |
| Image |
| ↓ |
| Frozen DINOv3 |
| ↓ |
| Patch map z0 |
| ↓ |
| ProtoMorph block 1 |
| ↓ |
| Layer Memory Attention |
| ↓ |
| ProtoMorph block 2 |
| ↓ |
| Layer Memory Attention |
| ↓ |
| Main logits |
| ↓ |
| Hard-case gate |
| ├── easy: return main logits |
| └── hard: |
| feedback from top-2 probabilities |
| modulate DINO patch map |
| run Delta-RBF hard expert |
| fuse logits |
| ``` |
|
|
| --- |
|
|
| ## Model Summary |
|
|
| ProtoMorph-DINO explores whether a frozen foundation vision backbone can be improved with a custom hard-case refinement head. |
|
|
| For easy images, the model returns the main classifier output directly. For difficult or ambiguous images, the model activates a feedback branch. The feedback branch uses the top-2 predicted probabilities to modulate the DINO patch map, sends the modified representation through a Delta-RBF hard expert, and fuses the refined logits with the main logits. |
|
|
| The main research question is whether feedback-guided hard-case refinement can improve classification performance over simpler frozen-backbone heads such as a linear probe or MLP classifier. |
|
|
| --- |
|
|
| ## Current Status |
|
|
| **Status: research scaffold / pre-training setup** |
|
|
| The current checkpoint may be randomly initialized or only intended for smoke testing unless a later release says otherwise. |
|
|
| Predictions are **not meaningful** until the ProtoMorph head is trained on a real dataset. |
|
|
| --- |
|
|
| ## Results |
|
|
| **Evaluation results: Pending** |
|
|
| No benchmark results are reported yet because the repository is being prepared before training and evaluation. |
|
|
| | Metric | Value | |
| |---|---:| |
| | Accuracy | Pending | |
| | F1 | Pending | |
| | Precision | Pending | |
| | Recall | Pending | |
| | Confusion-pair improvement | Pending | |
| | Hard-case routing benefit | Pending | |
|
|
| Recommended future baselines: |
|
|
| | Baseline | Purpose | |
| |---|---| |
| | DINOv3 + Linear Probe | Minimal frozen-backbone baseline | |
| | DINOv3 + MLP Head | Strong simple head baseline | |
| | CLIP + Linear Probe | Popular vision-language comparison | |
| | ConvNeXt | Strong CNN-style baseline | |
| | ViT | Standard transformer baseline | |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| This model is intended for: |
|
|
| - image classification research |
| - hard-example routing experiments |
| - prototype learning experiments |
| - frozen-backbone classifier research |
| - fine-grained classification experiments |
| - educational computer vision experiments |
|
|
| This model is **not** intended for safety-critical use. |
|
|
| Do not use this model for medical, legal, financial, biometric, security-critical, or production decisions without independent validation. |
|
|
| --- |
|
|
| ## Model Files |
|
|
| Recommended repository layout: |
|
|
| ```text |
| . |
| ├── README.md |
| ├── LICENSE-WEIGHTS.md |
| ├── config.json |
| ├── labels.txt |
| ├── checkpoints/ |
| │ ├── config.json |
| │ ├── labels.txt |
| │ └── protomorph_head.safetensors |
| ├── infer.py |
| ├── scripts/ |
| │ └── upload_to_hf.py |
| └── src/ |
| └── protomorph/ |
| ``` |
|
|
| The main weight file is: |
|
|
| ```text |
| checkpoints/protomorph_head.safetensors |
| ``` |
|
|
| This file contains only the custom ProtoMorph classification head. |
|
|
| DINOv3 backbone weights are **not** included in this repository. |
|
|
| --- |
|
|
| ## Backbone |
|
|
| Default backbone: |
|
|
| ```text |
| facebook/dinov3-vits16-pretrain-lvd1689m |
| ``` |
|
|
| The backbone is used as a frozen visual feature extractor. |
|
|
| For RTX 3090-class GPUs, ViT-S/16 is a practical starting point because it keeps VRAM usage manageable while still producing useful patch embeddings. |
|
|
| --- |
|
|
| ## Installation |
|
|
| Recommended environment: |
|
|
| ```text |
| Python 3.11 |
| PyTorch 2.4.0 |
| CUDA 12.4 PyTorch wheel |
| ``` |
|
|
| Install PyTorch: |
|
|
| ```bash |
| pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124 |
| ``` |
|
|
| Install dependencies: |
|
|
| ```bash |
| pip install -r requirements-core.txt |
| ``` |
|
|
| --- |
|
|
| ## RunPod Environment Variables |
|
|
| This project supports the RunPod environment variable names shown below: |
|
|
| ```text |
| hf_key=hf_your_huggingface_write_token_here |
| hf_repo=shiowo/DINO-Protomorph |
| ``` |
|
|
| Standard Hugging Face names are also supported: |
|
|
| ```text |
| HF_TOKEN=hf_your_huggingface_write_token_here |
| HF_REPO_ID=shiowo/DINO-Protomorph |
| ``` |
|
|
| Never commit your real Hugging Face token to the repository. |
|
|
| --- |
|
|
| ## Inference |
|
|
| Run inference from the command line: |
|
|
| ```bash |
| python infer.py \ |
| --image examples/sample_image.jpg \ |
| --config checkpoints/config.json \ |
| --checkpoint checkpoints/protomorph_head.safetensors \ |
| --labels checkpoints/labels.txt \ |
| --topk 5 |
| ``` |
|
|
| For smoke testing only: |
|
|
| ```bash |
| python infer.py --image examples/sample_image.jpg --allow-random-head |
| ``` |
|
|
| If the head is untrained, the output is only useful for checking that the pipeline runs. |
|
|
| --- |
|
|
| ## Upload to Hugging Face from RunPod |
|
|
| After setting `hf_key` and `hf_repo` in RunPod, run: |
|
|
| ```bash |
| cd /workspace/protomorph_dinov3_runpod |
| source .venv/bin/activate |
| python scripts/upload_to_hf.py |
| ``` |
|
|
| Or use the helper script: |
|
|
| ```bash |
| bash runpod/upload_to_hf.sh |
| ``` |
|
|
| Dry run before upload: |
|
|
| ```bash |
| python scripts/upload_to_hf.py --dry-run |
| ``` |
|
|
| --- |
|
|
| ## Config Example |
|
|
| ```json |
| { |
| "dino_model_name": "facebook/dinov3-vits16-pretrain-lvd1689m", |
| "num_classes": 10, |
| "embed_dim": 384, |
| "patch_size": 16, |
| "proto_count": 64, |
| "memory_tokens": 16, |
| "rbf_count": 128, |
| "num_heads": 8, |
| "dropout": 0.0, |
| "hard_pmax_threshold": 0.65, |
| "hard_margin_threshold": 0.15, |
| "hard_entropy_threshold": 1.35, |
| "image_size": 512, |
| "use_bf16_autocast": true, |
| "normalize_patch_tokens": true |
| } |
| ``` |
|
|
| --- |
|
|
| ## Limitations |
|
|
| Known limitations: |
|
|
| - The architecture is experimental. |
| - Evaluation results are pending. |
| - The hard-case gate requires threshold tuning. |
| - The Delta-RBF hard expert may overfit small datasets. |
| - Inference may be slower for hard samples. |
| - The model should be compared against simple baselines before claiming improvement. |
| - This repository does not include DINOv3 weights. |
| - The custom head may not generalize outside the dataset it was trained on. |
|
|
| --- |
|
|
| ## License |
|
|
| The ProtoMorph head weights in this repository are released under: |
|
|
| ```text |
| Creative Commons Attribution-ShareAlike 4.0 International |
| CC BY-SA 4.0 |
| ``` |
|
|
| You may use, share, and adapt these weights, including commercially, provided that you give appropriate credit and distribute adapted versions under CC BY-SA 4.0 or a compatible license. |
|
|
| This license applies only to the ProtoMorph head weights and related files released in this repository. |
|
|
| It does not apply to: |
|
|
| - DINOv3 |
| - PyTorch |
| - Hugging Face Transformers |
| - third-party datasets |
| - third-party model weights |
| - upstream dependencies |
|
|
| DINOv3 is not redistributed in this repository. Users are responsible for obtaining DINOv3 separately and complying with its license. |
|
|
| --- |
|
|
| ## Attribution |
|
|
| If you use this model or build on it, please credit: |
|
|
| ```text |
| ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification |
| Author: shiowo |
| Repository: https://huggingface.co/shiowo/DINO-Protomorph |
| ``` |
|
|
| BibTeX: |
|
|
| ```bibtex |
| @software{protomorph_dino_2026, |
| title = {ProtoMorph-DINO: Feedback-Gated Prototype Morphing for Hard-Case Image Classification}, |
| author = {shiowo}, |
| year = {2026}, |
| url = {https://huggingface.co/shiowo/DINO-Protomorph} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Disclaimer |
|
|
| This is a research prototype. |
|
|
| The model is provided for experimentation and educational use. It should not be used in production or high-stakes environments without independent validation, dataset auditing, robustness testing, and bias evaluation. |
|
|