handaber
/

imagebind

Model card Files Files and versions

handaber commited on Oct 16, 2024

Commit

02881e5

·

verified ·

1 Parent(s): ef37d3b

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -1,3 +1,38 @@
 # ImageBind: One Embedding Space To Bind Them All
 **[FAIR, Meta AI](https://ai.facebook.com/research/)**

+# ImageBind Models (@ `./checkpoints`):
+  - imagebind_huge.pth
+  - model.safetensors
+  - OpenVino Intermediate Representation Models:
+    - Text
+    - Vision
+    - Audio
+    - Thermal
+    - Depth
+    - [ ] TODO: IMU
+    - [ ] TODO: Video
+### Updated training assets in `.assets`; thermal and depth need to be converted into greyscale
+```py
+import torchvision.transforms as transforms
+# Define a transform to convert RGB images to single-channel
+to_single_channel = transforms.Compose([
+    transforms.Grayscale(num_output_channels=1),
+    transforms.Resize((224, 224)),
+    transforms.ToTensor(),
+])
+inputs = {
+    ModalityType.TEXT: data.load_and_transform_text(texts, device),
+    ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device),
+    ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, device),
+    ModalityType.DEPTH: torch.stack([to_single_channel(Image.open(path)) for path in depth_paths]).to(device),
+    ModalityType.THERMAL: torch.stack([to_single_channel(Image.open(path)) for path in thermal_paths]).to(device),
+}
+...
+```
+# === Original: ===
 # ImageBind: One Embedding Space To Bind Them All
 **[FAIR, Meta AI](https://ai.facebook.com/research/)**