zhemaxiya commited on
Commit
0068bcd
·
verified ·
1 Parent(s): 4b25e86

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: faiss
5
+ tags:
6
+ - faiss
7
+ - retrieval
8
+ - vector-search
9
+ - marine-images
10
+ - bio-clip
11
+ - oceangpt-x
12
+ ---
13
+ # Ocean-FAISS: Marine Image Retrieval Index
14
+
15
+ High-speed FAISS vector index and metadata for marine image retrieval using BioCLIP embeddings. Core component of the [OceanGPT-X](https://huggingface.co/collections/zjunlp/oceangpt-x) pipeline.
16
+
17
+ ## Repository Contents
18
+ | Path | Description |
19
+ |------|-------------|
20
+ | `faiss/index.faiss` | Pre-built FAISS index containing BioCLIP feature vectors |
21
+ | `faiss/id_map.json` | Mapping between FAISS internal IDs and dataset image IDs |
22
+ | `metadata/metadata.jsonl` | Rich metadata for each indexed image (species, location, capture info) |
23
+
24
+ ## Usage
25
+ Requires `faiss-cpu` or `faiss-gpu`.
26
+ ```python
27
+ import faiss
28
+ import json
29
+ import jsonlines
30
+
31
+ index = faiss.read_index("faiss/index.faiss")
32
+ with open("faiss/id_map.json", "r") as f:
33
+ id_map = json.load(f)
34
+
35
+ # Query vector must match the embedding dimension of the index
36
+ query_vector = ... # Shape: (1, dim), dtype: float32
37
+ D, I = index.search(query_vector, k=5)
38
+
39
+ # Retrieve metadata
40
+ with jsonlines.open("metadata/metadata.jsonl") as reader:
41
+ metadata = {obj["id"]: obj for obj in reader}
42
+
43
+ for idx in I[0]:
44
+ img_id = id_map[str(idx)]
45
+ print(metadata.get(img_id, "Not found"))