Ocean-FAISS / README.md
zhemaxiya's picture
Upload README.md
0068bcd verified
metadata
language: en
license: apache-2.0
library_name: faiss
tags:
  - faiss
  - retrieval
  - vector-search
  - marine-images
  - bio-clip
  - oceangpt-x

Ocean-FAISS: Marine Image Retrieval Index

High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the OceanGPT-X pipeline.

Repository Contents

Path Description
faiss/index.faiss Pre-built FAISS index containing BioCLIP feature vectors
faiss/id_map.json Mapping between FAISS internal IDs and dataset image IDs
metadata/metadata.jsonl Rich metadata for each indexed image (species, location, capture info)

Usage

Requires faiss-cpu or faiss-gpu.

import faiss
import json
import jsonlines

index = faiss.read_index("faiss/index.faiss")
with open("faiss/id_map.json", "r") as f:
    id_map = json.load(f)

# Query vector must match the embedding dimension of the index
query_vector = ...  # Shape: (1, dim), dtype: float32
D, I = index.search(query_vector, k=5)

# Retrieve metadata
with jsonlines.open("metadata/metadata.jsonl") as reader:
    metadata = {obj["id"]: obj for obj in reader}

for idx in I[0]:
    img_id = id_map[str(idx)]
    print(metadata.get(img_id, "Not found"))