Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

ID-Sim (DINOv3 ViT-L/16, cls_patch)

ID-Sim is a fine-grained identity similarity metric trained to capture the subtle visual differences humans rely on when distinguishing highly similar subjects across varied viewpoints and contexts. This checkpoint uses a DINOv3 ViT-L/16 backbone with LoRA fine-tuning and outputs both CLS and patch-level embeddings (cls_patch).

Paper: ID-Sim: An Identity-Focused Similarity Metric — CVPR 2026
Project page: https://juliachae.github.io/id_sim.github.io/
Code: https://github.com/JuliaChae/id_sim

Usage

DINOv3 backbone weights must be downloaded manually from Meta (gated). Place dinov3_vitl16_pretrain_lvd1689m-8aa4cbdd.pth in ./models/id_sim_checkpoint/checkpoints/. ID-Sim adapter weights download automatically on first use.

import torch
from PIL import Image
from id_sim import id_sim

model, preprocess = id_sim(pretrained=True, device="cuda")

img_a = preprocess(Image.open("a.jpg")).cuda()
img_b = preprocess(Image.open("b.jpg")).cuda()

with torch.inference_mode():
    distance = model(img_a, img_b)  # lower = more similar

Install: pip install git+https://github.com/JuliaChae/id_sim.git

Model Details


Base model	DINOv3 ViT-L/16 (`dinov3_vitl16`)
Fine-tuning	LoRA (r=16, α=32, dropout=0.05) + MLP heads (hidden=512)
Feature type	`cls_patch` — joint CLS + Sinkhorn patch loss
License	MIT
Developed by	Julia Chae, Nicholas Kolkin, Jui-Hsien Wang, Richard Zhang, Sara Beery, Cusuh Ham

Training

Trained on ~10k triplets (anchor / positive / negative) spanning ~10k instances across 10 real datasets (landmarks, fashion, objects, art, animals), augmented with generative identity-preserving and identity-altering edits. Key hyperparameters: lr=3e-4, effective batch size 32 (8×4 grad accum), 3 epochs, bf16, InfoNCE + Sinkhorn patch loss.

Evaluation

The model was evaluated across a diverse set of instance-level tasks ranging from concept-preservation evaluation in generative models and perceptual similarity ranking to fine-grained re-identification and retrieval. For full evaluation instructions see the repository; for full evaluation results see the paper.

Citation

@InProceedings{Chae_2026_CVPR,
    author    = {Chae, Julia and Kolkin, Nicholas and Wang, Jui-Hsien Wang, Richard and Beery, Sara and Ham, Cusuh},
    title     = {ID-Sim: An Identity-Focused Similarity Metric},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {11250-11262}
}

Framework versions

PEFT 0.16.0

Downloads last month: 6

Inference Providers NEW

Image Feature Extraction

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for chaenayo/id-sim_dinov3_vitl16_cls_patch

ID-Sim: An Identity-Focused Similarity Metric

Paper • 2604.05039 • Published Apr 6