OceanGPT-X
Collection
4 items • Updated • 2
High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the OceanGPT-X pipeline.
| Path | Description |
|---|---|
faiss/index.faiss |
Pre-built FAISS index containing BioCLIP feature vectors |
faiss/id_map.json |
Mapping between FAISS internal IDs and dataset image IDs |
metadata/metadata.jsonl |
Rich metadata for each indexed image (species, location, capture info) |
Requires faiss-cpu or faiss-gpu.
import faiss
import json
import jsonlines
index = faiss.read_index("faiss/index.faiss")
with open("faiss/id_map.json", "r") as f:
id_map = json.load(f)
# Query vector must match the embedding dimension of the index
query_vector = ... # Shape: (1, dim), dtype: float32
D, I = index.search(query_vector, k=5)
# Retrieve metadata
with jsonlines.open("metadata/metadata.jsonl") as reader:
metadata = {obj["id"]: obj for obj in reader}
for idx in I[0]:
img_id = id_map[str(idx)]
print(metadata.get(img_id, "Not found"))