Instructions to use chaenayo/id-sim_dinov3_vitl16_cls_patch with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use chaenayo/id-sim_dinov3_vitl16_cls_patch with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
ID-Sim (DINOv3 ViT-L/16, cls_patch)
ID-Sim is a fine-grained identity similarity metric trained to capture the subtle visual differences humans rely on when distinguishing highly similar subjects across varied viewpoints and contexts. This checkpoint uses a DINOv3 ViT-L/16 backbone with LoRA fine-tuning and outputs both CLS and patch-level embeddings (cls_patch).
Paper: ID-Sim: An Identity-Focused Similarity Metric — CVPR 2026
Project page: https://juliachae.github.io/id_sim.github.io/
Code: https://github.com/JuliaChae/id_sim
Usage
DINOv3 backbone weights must be downloaded manually from Meta (gated). Place dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth in ./models/id_sim_checkpoint/checkpoints/. ID-Sim adapter weights download automatically on first use.
import torch
from PIL import Image
from id_sim import id_sim
model, preprocess = id_sim(pretrained=True, device="cuda")
img_a = preprocess(Image.open("a.jpg")).cuda()
img_b = preprocess(Image.open("b.jpg")).cuda()
with torch.inference_mode():
distance = model(img_a, img_b) # lower = more similar
Install: pip install git+https://github.com/JuliaChae/id_sim.git
Model Details
| Base model | DINOv3 ViT-L/16 (dinov3_vitl16) |
| Fine-tuning | LoRA (r=16, α=32, dropout=0.05) + MLP heads (hidden=512) |
| Feature type | cls_patch — joint CLS + Sinkhorn patch loss |
| License | MIT |
| Developed by | Julia Chae, Nicholas Kolkin, Jui-Hsien Wang, Richard Zhang, Sara Beery, Cusuh Ham |
Training
Trained on ~10k triplets (anchor / positive / negative) spanning ~10k instances across 10 real datasets (landmarks, fashion, objects, art, animals), augmented with generative identity-preserving and identity-altering edits. Key hyperparameters: lr=3e-4, effective batch size 32 (8×4 grad accum), 3 epochs, bf16, InfoNCE + Sinkhorn patch loss.
Evaluation
The model was evaluated across a diverse set of instance-level tasks ranging from concept-preservation evaluation in generative models and perceptual similarity ranking to fine-grained re-identification and retrieval. For full evaluation instructions see the repository; for full evaluation results see the paper.
Citation
@InProceedings{Chae_2026_CVPR,
author = {Chae, Julia and Kolkin, Nicholas and Wang, Jui-Hsien Wang, Richard and Beery, Sara and Ham, Cusuh},
title = {ID-Sim: An Identity-Focused Similarity Metric},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026},
pages = {11250-11262}
}
Framework versions
- PEFT 0.16.0
- Downloads last month
- 6