File size: 2,638 Bytes
cb88cda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5210ba7
cb88cda
d2ec2f5
0b58374
cb88cda
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
library_name: onnx
pipeline_tag: feature-extraction
license: cc-by-nc-4.0
tags:
  - onnx
  - vision
  - clip
  - hyperbolic
  - image-embedding
  - hyperboloid
  - non-euclidean
  - lorentz
  - meru
  - hycoclip
language:
  - en
---

# Hyperbolic CLIP Models (ONNX)

This repository contains **ONNX exports** of hyperbolic vision-language models for **hyperbolic image embeddings**.

## Available Models

| Model | Architecture | Embedding Dim | Size | Path |
|-------|--------------|---------------|------|------|
| **hycoclip-vit-b** | ViT-B/16 | 513 | ~350 MB | `hycoclip-vit-b/model.onnx` |
| **hycoclip-vit-s** | ViT-S/16 | 513 | ~84 MB | `hycoclip-vit-s/model.onnx` |
| **meru-vit-b** | ViT-B/16 | 513 | ~350 MB | `meru-vit-b/model.onnx` |
| **meru-vit-s** | ViT-S/16 | 513 | ~84 MB | `meru-vit-s/model.onnx` |

## Quick Start

```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download

# Download a model
onnx_path = hf_hub_download(
    repo_id="mnm-matin/hyperbolic-clip",
    filename="hycoclip-vit-s/model.onnx"  # or other model path
)

# Load and run
session = ort.InferenceSession(onnx_path)
image = np.random.rand(1, 3, 224, 224).astype(np.float32)  # Your preprocessed image
embedding, curvature = session.run(None, {"image": image})

print(f"Embedding shape: {embedding.shape}")  # (1, 513) - hyperboloid format
```

## Model Details

All models output embeddings in **Lorentz/Hyperboloid format**:
- Output: `(t, x₁...xₙ)` where `t = √(1/c + ‖x‖²)`
- Embedding dim: 513 (1 time component + 512 spatial)
- Curvature `c` is learned and exported as secondary output

### Converting to Poincaré Ball

```python
t = embedding[:, 0:1]   # time component
x = embedding[:, 1:]    # spatial components
poincare = x / (t + 1)  # stereographic projection
```

## Usage with HyperView

```python
import hyperview as hv
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download("mnm-matin/hyperbolic-clip", "hycoclip-vit-s/model.onnx")

# Use with HyperView
ds = hv.Dataset("my_images")
ds.add_images_dir("/path/to/images")
ds.compute_embeddings(onnx_path=model_path)
hv.show(ds)
```

## License

**CC-BY-NC-4.0** (Non-commercial use only)

Based on:
- [PalAvik/hycoclip](https://github.com/PalAvik/hycoclip)
- [facebookresearch/meru](https://github.com/facebookresearch/meru)

## Citation

```bibtex
@inproceedings{desai2023hyperbolic,
  title={Hyperbolic Image-Text Representations},
  author={Desai, Karan and Nickel, Maximilian and Rajpurohit, Tanmay and Johnson, Justin and Vedantam, Ramakrishna},
  booktitle={ICML},
  year={2023}
}
```