gafda commited on
Commit
0118473
·
1 Parent(s): fbe1b74

Add ONNX models for visual similarity and perceptual comparison

Browse files
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - onnx
5
+ - clip
6
+ - lpips
7
+ - image-similarity
8
+ - computer-vision
9
+ ---
10
+
11
+ # ONNX Models for vidupe.net
12
+
13
+ This repository contains ONNX-exported models used by [vidupe.net](https://vidupe.net) for visual similarity and perceptual comparison tasks.
14
+
15
+ ## Models
16
+
17
+ ### `vidupe.net/models/clip_visual_vit_b32.onnx`
18
+
19
+ CLIP visual encoder (ViT-B/32) exported to ONNX. This model encodes images into a 512-dimensional embedding space, enabling semantic image similarity comparisons.
20
+
21
+ - **Source:** [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)
22
+ - **Input:** RGB image tensor `[batch, 3, 224, 224]`, normalized
23
+ - **Output:** Image embeddings `[batch, 512]`
24
+
25
+ ### `vidupe.net/models/lpips_alexnet.onnx`
26
+
27
+ LPIPS (Learned Perceptual Image Patch Similarity) model with an AlexNet backbone exported to ONNX. Computes perceptual distance between two image patches.
28
+
29
+ - **Source:** [richzhang/PerceptualSimilarity](https://github.com/richzhang/PerceptualSimilarity)
30
+ - **Input:** Two normalized RGB image tensors `[batch, 3, H, W]`
31
+ - **Output:** Perceptual distance score `[batch, 1, 1, 1]`
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ import onnxruntime as ort
37
+ import numpy as np
38
+
39
+ # CLIP visual encoder
40
+ session = ort.InferenceSession("vidupe.net/models/clip_visual_vit_b32.onnx")
41
+ image = np.random.randn(1, 3, 224, 224).astype(np.float32)
42
+ embeddings = session.run(None, {"input": image})[0]
43
+
44
+ # LPIPS perceptual similarity
45
+ session = ort.InferenceSession("vidupe.net/models/lpips_alexnet.onnx")
46
+ img0 = np.random.randn(1, 3, 64, 64).astype(np.float32)
47
+ img1 = np.random.randn(1, 3, 64, 64).astype(np.float32)
48
+ distance = session.run(None, {"input0": img0, "input1": img1})[0]
49
+ ```
50
+
51
+ ## Requirements
52
+
53
+ ```
54
+ onnxruntime>=1.16.0
55
+ numpy
56
+ ```
clip_visual_vit_b32.onnx → vidupe.net/models/clip_visual_vit_b32.onnx RENAMED
File without changes
lpips_alexnet.onnx → vidupe.net/models/lpips_alexnet.onnx RENAMED
File without changes