--- language: en license: apache-2.0 library_name: faiss tags: - faiss - retrieval - vector-search - marine-images - bio-clip - oceangpt-x --- # Ocean-FAISS: Marine Image Retrieval Index High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the [OceanGPT-X](https://huggingface.co/collections/zjunlp/oceangpt-x) pipeline. ## Repository Contents | Path | Description | |------|-------------| | `faiss/index.faiss` | Pre-built FAISS index containing BioCLIP feature vectors | | `faiss/id_map.json` | Mapping between FAISS internal IDs and dataset image IDs | | `metadata/metadata.jsonl` | Rich metadata for each indexed image (species, location, capture info) | ## Usage Requires `faiss-cpu` or `faiss-gpu`. ```python import faiss import json import jsonlines index = faiss.read_index("faiss/index.faiss") with open("faiss/id_map.json", "r") as f: id_map = json.load(f) # Query vector must match the embedding dimension of the index query_vector = ... # Shape: (1, dim), dtype: float32 D, I = index.search(query_vector, k=5) # Retrieve metadata with jsonlines.open("metadata/metadata.jsonl") as reader: metadata = {obj["id"]: obj for obj in reader} for idx in I[0]: img_id = id_map[str(idx)] print(metadata.get(img_id, "Not found"))