MEDiC-ViT-Base / README.md

drkostas

Upload README.md with huggingface_hub

a62d23b verified 17 days ago

preview code

raw

history blame contribute delete

2.39 kB

metadata

license: apache-2.0
tags:
  - vision
  - self-supervised-learning
  - masked-image-modeling
  - knowledge-distillation
  - multi-objective-learning
  - vit
datasets:
  - ILSVRC/imagenet-1k
  - 1aurent/ADE20K
metrics:
  - accuracy
  - mIoU
pipeline_tag: image-classification

MEDiC ViT-Base/16

Multi-objective Exploration of Distillation from CLIP

This model was trained using the MEDiC codebase, implementing the method from "MEDiC: Multi-objective Exploration of Distillation from CLIP".

Model Description

MEDiC extends MaskDistill by combining three complementary training objectives:

Token Distillation: Smooth L1 loss between student and CLIP teacher patch features
CLS Token Alignment: Cosine similarity between student and teacher CLS tokens
Pixel Reconstruction: MAE-style decoder reconstructing normalized pixel patches

Architecture:

Student: ViT-Base/16 (86M params)
Teacher: CLIP ViT-B/16 (frozen)
Decoder: 8-block transformer (512 dim, for pixel reconstruction)
Pretraining: 300 epochs on ImageNet-1K, sparse encoding, block masking 40%

Results

Evaluation	Result
k-NN (k=20)	73.9% top-1
Linear Probe	60.5% top-1
Finetuning (ImageNet-1K)	85.1% top-1
Sem. Seg. (ADE20K, UPerNet)	52.5 mIoU

Loss Ablation

Configuration	k-NN
Token only	68.6%
+ Pixel	71.4%
+ CLS	72.3%
+ Pixel + CLS (MEDiC)	73.9%

Usage

import torch
from src.models.vision_transformer import VisionTransformerMIM

# Load pretrained student backbone
model = VisionTransformerMIM(
    img_size=224, patch_size=16, embed_dim=768, depth=12, num_heads=12,
    use_abs_pos_emb=True, use_mask_tokens=False,
)
ckpt = torch.load("medic_vit_base_ep299.pth", map_location="cpu")
state = {k.replace("module.student.", ""): v for k, v in ckpt["model"].items() if "student" in k}
model.load_state_dict(state, strict=False)

See the GitHub repo for full training and evaluation code.

Citation

@article{georgiou2025medic,
  title={MEDiC: Multi-objective Exploration of Distillation from CLIP},
  author={Georgiou, Kostas},
  journal={arXiv preprint arXiv:2603.29009},
  year={2025}
}