mnm-matin
/

hyperbolic-clip

Feature Extraction

image-embedding

Model card Files Files and versions

hyperbolic-clip / README.md

mnm-matin's picture

Update README with meru-vit-b

d2ec2f5 verified about 2 months ago

|

history blame contribute delete

2.64 kB

	---
	library_name: onnx
	pipeline_tag: feature-extraction
	license: cc-by-nc-4.0
	tags:
	- onnx
	- vision
	- clip
	- hyperbolic
	- image-embedding
	- hyperboloid
	- non-euclidean
	- lorentz
	- meru
	- hycoclip
	language:
	- en
	---

	# Hyperbolic CLIP Models (ONNX)

	This repository contains ONNX exports of hyperbolic vision-language models for hyperbolic image embeddings.

	## Available Models

	\| Model \| Architecture \| Embedding Dim \| Size \| Path \|
	\|-------\|--------------\|---------------\|------\|------\|
	\| hycoclip-vit-b \| ViT-B/16 \| 513 \| ~350 MB \| `hycoclip-vit-b/model.onnx` \|
	\| hycoclip-vit-s \| ViT-S/16 \| 513 \| ~84 MB \| `hycoclip-vit-s/model.onnx` \|
	\| meru-vit-b \| ViT-B/16 \| 513 \| ~350 MB \| `meru-vit-b/model.onnx` \|
	\| meru-vit-s \| ViT-S/16 \| 513 \| ~84 MB \| `meru-vit-s/model.onnx` \|

	## Quick Start

	```python
	import onnxruntime as ort
	import numpy as np
	from huggingface_hub import hf_hub_download

	# Download a model
	onnx_path = hf_hub_download(
	repo_id="mnm-matin/hyperbolic-clip",
	filename="hycoclip-vit-s/model.onnx" # or other model path
	)

	# Load and run
	session = ort.InferenceSession(onnx_path)
	image = np.random.rand(1, 3, 224, 224).astype(np.float32) # Your preprocessed image
	embedding, curvature = session.run(None, {"image": image})

	print(f"Embedding shape: {embedding.shape}") # (1, 513) - hyperboloid format
	```

	## Model Details

	All models output embeddings in Lorentz/Hyperboloid format:
	- Output: `(t, x₁...xₙ)` where `t = √(1/c + ‖x‖²)`
	- Embedding dim: 513 (1 time component + 512 spatial)
	- Curvature `c` is learned and exported as secondary output

	### Converting to Poincaré Ball

	```python
	t = embedding[:, 0:1] # time component
	x = embedding[:, 1:] # spatial components
	poincare = x / (t + 1) # stereographic projection
	```

	## Usage with HyperView

	```python
	import hyperview as hv
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download("mnm-matin/hyperbolic-clip", "hycoclip-vit-s/model.onnx")

	# Use with HyperView
	ds = hv.Dataset("my_images")
	ds.add_images_dir("/path/to/images")
	ds.compute_embeddings(onnx_path=model_path)
	hv.show(ds)
	```

	## License

	CC-BY-NC-4.0 (Non-commercial use only)

	Based on:
	- [PalAvik/hycoclip](https://github.com/PalAvik/hycoclip)
	- [facebookresearch/meru](https://github.com/facebookresearch/meru)

	## Citation

	```bibtex
	@inproceedings{desai2023hyperbolic,
	title={Hyperbolic Image-Text Representations},
	author={Desai, Karan and Nickel, Maximilian and Rajpurohit, Tanmay and Johnson, Justin and Vedantam, Ramakrishna},
	booktitle={ICML},
	year={2023}
	}
	```