| ---
|
| language: en
|
| license: apache-2.0
|
| library_name: faiss
|
| tags:
|
| - faiss
|
| - retrieval
|
| - vector-search
|
| - marine-images
|
| - bio-clip
|
| - oceangpt-x
|
| ---
|
| # Ocean-FAISS: Marine Image Retrieval Index
|
|
|
| High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the [OceanGPT-X](https://huggingface.co/collections/zjunlp/oceangpt-x) pipeline.
|
|
|
| ## Repository Contents
|
| | Path | Description |
|
| |------|-------------|
|
| | `faiss/index.faiss` | Pre-built FAISS index containing BioCLIP feature vectors |
|
| | `faiss/id_map.json` | Mapping between FAISS internal IDs and dataset image IDs |
|
| | `metadata/metadata.jsonl` | Rich metadata for each indexed image (species, location, capture info) |
|
|
|
| ## Usage
|
| Requires `faiss-cpu` or `faiss-gpu`.
|
| ```python
|
| import faiss
|
| import json
|
| import jsonlines
|
|
|
| index = faiss.read_index("faiss/index.faiss")
|
| with open("faiss/id_map.json", "r") as f:
|
| id_map = json.load(f)
|
|
|
| # Query vector must match the embedding dimension of the index
|
| query_vector = ... # Shape: (1, dim), dtype: float32
|
| D, I = index.search(query_vector, k=5)
|
|
|
| # Retrieve metadata
|
| with jsonlines.open("metadata/metadata.jsonl") as reader:
|
| metadata = {obj["id"]: obj for obj in reader}
|
|
|
| for idx in I[0]:
|
| img_id = id_map[str(idx)]
|
| print(metadata.get(img_id, "Not found"))
|
| |