erjui
/

dho

@@ -13,11 +13,9 @@ library_name: pytorch
 pipeline_tag: image-classification
 ---
-# DHO: Simple Few-shot Semi-supervised Knowledge Distillation
 [![arXiv](https://img.shields.io/badge/arXiv-2505.07675v1-b31b1b.svg)](https://arxiv.org/abs/2505.07675v1)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-1)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-1?p=simple-semi-supervised-knowledge-distillation)
-[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/simple-semi-supervised-knowledge-distillation/semi-supervised-image-classification-on-2)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-2?p=simple-semi-supervised-knowledge-distillation)
 This repository contains pretrained checkpoints for **DHO (Dual-Head Optimization)**, a simple yet effective approach for semi-supervised knowledge distillation from Vision-Language Models.
@@ -52,33 +50,87 @@ The method achieves state-of-the-art performance on ImageNet semi-supervised lea
 ```python
 import torch
 import clip
-# Load the student model architecture
 device = "cuda" if torch.cuda.is_available() else "cpu"
-# For ViT-B/16 checkpoints
-model, preprocess = clip.load("ViT-B-16", device=device)
-# Load DHO checkpoint
-checkpoint = torch.hub.load_state_dict_from_url(
-    "https://huggingface.co/erjui/dho/resolve/main/vit_b_10.pt",
-    map_location=device
-)
-# Load the state dict
-model.load_state_dict(checkpoint['model_state_dict'])
 model.eval()
-# Use the model for inference
 from PIL import Image
 image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0).to(device)
 with torch.no_grad():
-    image_features = model.encode_image(image)
-    # ... your inference code
 ```
 ### Training Your Own Model
 To train your own DHO model, please visit the [official GitHub repository](https://github.com/yourusername/DHO) for detailed instructions and training scripts.

 pipeline_tag: image-classification
 ---
+# DHO: Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
 [![arXiv](https://img.shields.io/badge/arXiv-2505.07675v1-b31b1b.svg)](https://arxiv.org/abs/2505.07675v1)
 This repository contains pretrained checkpoints for **DHO (Dual-Head Optimization)**, a simple yet effective approach for semi-supervised knowledge distillation from Vision-Language Models.
 ```python
 import torch
+import torch.nn as nn
+import torch.nn.functional as F
 import clip
+from huggingface_hub import hf_hub_download
+# Define the DHO StudentModel architecture with dual heads
+class StudentModel(nn.Module):
+    def __init__(self, num_classes=1000, model_name='ViT-B-16'):
+        super().__init__()
+        # Load CLIP backbone
+        clip_model, _ = clip.load(model_name, device='cpu')
+        self.backbone = clip_model.float().visual
+        # Feature dimensions per architecture
+        in_features = {
+            'RN50': 1024,
+            'ViT-B-16': 512,
+            'ViT-L-14': 768,
+            'ViT-L-14-336px': 768
+        }[model_name]
+        # Dual-head architecture
+        self.ce_head = nn.Linear(in_features, num_classes)  # CE branch
+        self.kd_head = nn.Linear(in_features, num_classes)  # KD branch
+    def forward(self, x):
+        features = self.backbone(x)
+        ce_out = self.ce_head(features)
+        kd_out = self.kd_head(F.normalize(features, dim=1)) * 100
+        return ce_out, kd_out
+# Download and load checkpoint
 device = "cuda" if torch.cuda.is_available() else "cpu"
+checkpoint_path = hf_hub_download(repo_id="erjui/dho", filename="vit_b_10.pt")
+checkpoint = torch.load(checkpoint_path, map_location=device)
+# Initialize model
+model = StudentModel(num_classes=1000, model_name='ViT-B-16').to(device)
+# Handle DDP wrapped state_dict
+state_dict = checkpoint['model_state_dict']
+state_dict = {k.replace('module.', ''): v for k, v in state_dict.items()}
+model.load_state_dict(state_dict)
+# Get optimal inference parameters
+alpha = checkpoint['alpha']  # Weight for CE head
+beta = checkpoint['beta']    # Temperature for KD head
 model.eval()
+# Inference example
 from PIL import Image
+import torchvision.transforms as transforms
+# CLIP preprocessing
+preprocess = transforms.Compose([
+    transforms.Resize(224),
+    transforms.CenterCrop(224),
+    transforms.ToTensor(),
+    transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
+                        std=(0.26862954, 0.26130258, 0.27577711))
+])
 image = preprocess(Image.open("path/to/image.jpg")).unsqueeze(0).to(device)
 with torch.no_grad():
+    ce_logits, kd_logits = model(image)
+    # Combine predictions using saved parameters
+    probs_ce = F.softmax(ce_logits, dim=1)
+    probs_kd = F.softmax(kd_logits / beta, dim=1)
+    probs = alpha * probs_ce + (1 - alpha) * probs_kd
+    predicted_class = probs.argmax(dim=1)
+    print(f"Predicted class: {predicted_class.item()}")
 ```
+**Important Notes:**
+- DHO checkpoints contain: `model_state_dict`, `epoch`, `acc`, `alpha`, `beta`
+- The model has a **dual-head architecture** (CE head + KD head)
+- Use the saved `alpha` and `beta` parameters for optimal inference
+- For ViT-L checkpoints, change `model_name='ViT-L-14'` and use image size 224 (or 336 for ViT-L-14-336px)
 ### Training Your Own Model
 To train your own DHO model, please visit the [official GitHub repository](https://github.com/yourusername/DHO) for detailed instructions and training scripts.