| | --- |
| | library_name: onnx |
| | pipeline_tag: feature-extraction |
| | license: cc-by-nc-4.0 |
| | tags: |
| | - onnx |
| | - vision |
| | - clip |
| | - hyperbolic |
| | - image-embedding |
| | - hyperboloid |
| | - non-euclidean |
| | - lorentz |
| | - meru |
| | - hycoclip |
| | language: |
| | - en |
| | --- |
| | |
| | # Hyperbolic CLIP Models (ONNX) |
| |
|
| | This repository contains **ONNX exports** of hyperbolic vision-language models for **hyperbolic image embeddings**. |
| |
|
| | ## Available Models |
| |
|
| | | Model | Architecture | Embedding Dim | Size | Path | |
| | |-------|--------------|---------------|------|------| |
| | | **hycoclip-vit-b** | ViT-B/16 | 513 | ~350 MB | `hycoclip-vit-b/model.onnx` | |
| | | **hycoclip-vit-s** | ViT-S/16 | 513 | ~84 MB | `hycoclip-vit-s/model.onnx` | |
| | | **meru-vit-b** | ViT-B/16 | 513 | ~350 MB | `meru-vit-b/model.onnx` | |
| | | **meru-vit-s** | ViT-S/16 | 513 | ~84 MB | `meru-vit-s/model.onnx` | |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | import onnxruntime as ort |
| | import numpy as np |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download a model |
| | onnx_path = hf_hub_download( |
| | repo_id="mnm-matin/hyperbolic-clip", |
| | filename="hycoclip-vit-s/model.onnx" # or other model path |
| | ) |
| | |
| | # Load and run |
| | session = ort.InferenceSession(onnx_path) |
| | image = np.random.rand(1, 3, 224, 224).astype(np.float32) # Your preprocessed image |
| | embedding, curvature = session.run(None, {"image": image}) |
| | |
| | print(f"Embedding shape: {embedding.shape}") # (1, 513) - hyperboloid format |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | All models output embeddings in **Lorentz/Hyperboloid format**: |
| | - Output: `(t, x₁...xₙ)` where `t = √(1/c + ‖x‖²)` |
| | - Embedding dim: 513 (1 time component + 512 spatial) |
| | - Curvature `c` is learned and exported as secondary output |
| |
|
| | ### Converting to Poincaré Ball |
| |
|
| | ```python |
| | t = embedding[:, 0:1] # time component |
| | x = embedding[:, 1:] # spatial components |
| | poincare = x / (t + 1) # stereographic projection |
| | ``` |
| |
|
| | ## Usage with HyperView |
| |
|
| | ```python |
| | import hyperview as hv |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download model |
| | model_path = hf_hub_download("mnm-matin/hyperbolic-clip", "hycoclip-vit-s/model.onnx") |
| | |
| | # Use with HyperView |
| | ds = hv.Dataset("my_images") |
| | ds.add_images_dir("/path/to/images") |
| | ds.compute_embeddings(onnx_path=model_path) |
| | hv.show(ds) |
| | ``` |
| |
|
| | ## License |
| |
|
| | **CC-BY-NC-4.0** (Non-commercial use only) |
| |
|
| | Based on: |
| | - [PalAvik/hycoclip](https://github.com/PalAvik/hycoclip) |
| | - [facebookresearch/meru](https://github.com/facebookresearch/meru) |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{desai2023hyperbolic, |
| | title={Hyperbolic Image-Text Representations}, |
| | author={Desai, Karan and Nickel, Maximilian and Rajpurohit, Tanmay and Johnson, Justin and Vedantam, Ramakrishna}, |
| | booktitle={ICML}, |
| | year={2023} |
| | } |
| | ``` |
| |
|