Ocean-FAISS / README.md
zhemaxiya's picture
Upload README.md
0068bcd verified
---
language: en
license: apache-2.0
library_name: faiss
tags:
- faiss
- retrieval
- vector-search
- marine-images
- bio-clip
- oceangpt-x
---
# Ocean-FAISS: Marine Image Retrieval Index
High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the [OceanGPT-X](https://huggingface.co/collections/zjunlp/oceangpt-x) pipeline.
## Repository Contents
| Path | Description |
|------|-------------|
| `faiss/index.faiss` | Pre-built FAISS index containing BioCLIP feature vectors |
| `faiss/id_map.json` | Mapping between FAISS internal IDs and dataset image IDs |
| `metadata/metadata.jsonl` | Rich metadata for each indexed image (species, location, capture info) |
## Usage
Requires `faiss-cpu` or `faiss-gpu`.
```python
import faiss
import json
import jsonlines
index = faiss.read_index("faiss/index.faiss")
with open("faiss/id_map.json", "r") as f:
id_map = json.load(f)
# Query vector must match the embedding dimension of the index
query_vector = ... # Shape: (1, dim), dtype: float32
D, I = index.search(query_vector, k=5)
# Retrieve metadata
with jsonlines.open("metadata/metadata.jsonl") as reader:
metadata = {obj["id"]: obj for obj in reader}
for idx in I[0]:
img_id = id_map[str(idx)]
print(metadata.get(img_id, "Not found"))