File size: 13,151 Bytes

227eab8

---

library_name: custom
license: mit
tags:
  - vector-search
  - hnsw
  - nearest-neighbor
  - information-retrieval
  - from-scratch
  - approximate-nearest-neighbor
language:
  - en
pipeline_tag: feature-extraction
---


# HNSW Vector Engine -- Zero-Dependency Approximate Nearest Neighbor Search

A from-scratch implementation of the HNSW (Hierarchical Navigable Small World) algorithm for approximate nearest neighbor search. No FAISS. No ChromaDB. No Annoy. Just the algorithm described in the Malkov & Yashunin 2018 paper, implemented directly in Python with NumPy.

This is the vector search engine from the [Citadel](https://github.com/dbhavery/citadel) AI operations platform, extracted here as a standalone reference. Citadel uses it as its built-in vector index -- no external vector database required.

**Paper**: [Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs](https://arxiv.org/abs/1603.09320) (Malkov & Yashunin, 2018)

**Source**: [github.com/dbhavery/citadel](https://github.com/dbhavery/citadel) -- see `packages/citadel-vector/`

---

## Why Build HNSW From Scratch

Most projects that need vector search reach for a prebuilt library -- FAISS, Hnswlib, ChromaDB, Pinecone. These are excellent tools. But they are also opaque. When your recall drops, when your index corrupts, when performance degrades on a specific data distribution, you are debugging a black box.

Building HNSW from the paper forces you to understand every decision the algorithm makes: how layers are assigned, why the greedy search converges, what happens when you prune neighbor lists, and how the entry point selection affects traversal. That understanding is the difference between using a tool and knowing a tool.

This implementation prioritizes clarity and correctness over raw speed. It is intended as a readable, well-documented reference that you can study, modify, and extend. For production workloads at scale, FAISS or Hnswlib will be faster due to their C++/SIMD cores. For applications in the tens-of-thousands to low-millions range -- which covers most RAG pipelines, recommendation engines, and local AI applications -- this implementation is both fast enough and transparent enough to use directly.

---

## How HNSW Works

HNSW builds a multi-layer navigable graph over the dataset. The core insight is borrowed from skip lists: maintain multiple layers of the same graph at decreasing density, so that search starts with coarse, long-range hops at the top and refines to precise, short-range hops at the bottom.

### The Multi-Layer Structure

```

Layer 3 (sparsest):    [A] -------------- [F]

                         |                   |

Layer 2:               [A] --- [C] ------- [F] --- [H]

                         |      |            |       |

Layer 1:          [A] - [B] - [C] - [E] - [F] - [G] - [H]

                    |    |     |     |      |     |      |

Layer 0 (densest): [A] [B] - [C] - [D] - [E] - [F] - [G] - [H] - [I] - [J]

```

Every node exists in layer 0. Each higher layer contains a geometrically decreasing random subset of nodes. When a new vector is inserted, its maximum layer is drawn from an exponential distribution: `level = floor(-ln(uniform()) * mL)`, where `mL = 1/ln(M)` and `M` is the maximum connections per node. Most nodes land on layer 0. A few reach layer 1. Even fewer reach layer 2. This gives the probabilistic skip-list property without any explicit balancing.

### Insertion Algorithm

When inserting a new vector `q` at assigned level `L`:

1. **Top-down greedy phase**: Starting from the global entry point at the highest layer, perform greedy search (beam width = 1) down to layer `L+1`. At each layer, find the single nearest neighbor to `q` and use it as the entry point for the next layer down. This quickly navigates to the region of the graph closest to `q`.

2. **Connection phase**: From layer `min(L, max_layer)` down to layer 0, perform beam search with `ef_construction` candidates. Select the `M` nearest neighbors from the candidates and create bidirectional edges between `q` and each selected neighbor.

3. **Pruning**: If any neighbor now exceeds its maximum connection count (`M` for layers > 0, `2*M` for layer 0), prune its edge list down to the `M` nearest connections. Layer 0 is allowed double the connections because it carries all the data and needs higher connectivity for recall.

4. **Entry point update**: If `L` exceeds the current maximum layer, `q` becomes the new global entry point.

### Search Algorithm

Searching for the `k` nearest neighbors of a query vector `q`:

1. **Top-down traversal**: Starting from the entry point, greedily descend from the top layer to layer 1, keeping only the single closest node at each layer. This narrows the search to the right neighborhood in O(log n) hops.

2. **Layer 0 beam search**: At layer 0, perform a beam search with width `ef_search` (a tunable parameter). The search maintains two heaps -- a min-heap of candidates to expand, and a max-heap of the current best results. At each step, pop the closest unexpanded candidate, examine its neighbors, and add any neighbor closer than the current worst result to both heaps.

3. **Termination**: The search stops when the closest unexpanded candidate is farther than the worst result in the beam and the beam is full. Return the top `k` results from the beam.

### Key Parameters

| Parameter | Default | Role |
|-----------|---------|------|
| `M` | 16 | Max edges per node per layer. Higher M = better recall, more memory, slower insertion. |
| `ef_construction` | 200 | Beam width during insertion. Higher = better graph quality, slower build. |
| `ef_search` | 50 | Beam width during search. Higher = better recall, slower queries. This is the primary recall/speed knob at query time. |

---

## Performance Characteristics

| Operation | Time Complexity | Notes |
|-----------|----------------|-------|
| Insert | O(log n * ef_construction * M) | Dominated by the beam search at each layer during connection. |

| Search | O(log n * ef_search * M) | Top-down traversal is O(log n) layers; layer 0 search is bounded by ef_search. |

| Delete | O(1) | Lazy deletion -- marks the node, does not rebuild the graph. Deleted nodes are still traversed but excluded from results. |

| Memory | O(n * M * layers) | Each node stores M neighbor IDs per layer. Average layers per node is ~1/ln(M). |



For typical configurations (M=16, ef_construction=200, ef_search=50), this implementation achieves **>90% recall@10** against brute-force search on 1,000 random 64-dimensional vectors, verified by the test suite.



---



## Supported Distance Metrics



| Metric | Function | Range | Use Case |

|--------|----------|-------|----------|

| `cosine` | `1 - cos(a, b)` | [0, 2] | Normalized embeddings (most embedding models). Default. |

| `euclidean` | `\|\|a - b\|\|_2` | [0, inf) | Raw feature vectors, spatial data. |
| `dot` | `-dot(a, b)` | (-inf, inf) | Maximum inner product search (MIPS). |

All distance functions are implemented with NumPy for vectorized computation. Batch variants (`batch_cosine_distance`, `batch_euclidean_distance`) are provided for operations over multiple vectors.

---

## Usage

### Basic Index Operations

```python

import numpy as np

from citadel_vector import HNSWIndex



# Create an index for 384-dimensional vectors (e.g., sentence-transformers output)

index = HNSWIndex(

    dim=384,

    max_elements=50_000,

    M=16,

    ef_construction=200,

    metric="cosine",

)



# Insert vectors with metadata

index.add(

    vector=np.random.randn(384),

    id="doc_001",

    metadata={"source": "readme.md", "chunk": 0},

)



# Batch insert

vectors = np.random.randn(100, 384)

ids = [f"doc_{i:03d}" for i in range(100)]

metadatas = [{"source": f"file_{i}.txt"} for i in range(100)]

index.batch_add(vectors, ids, metadatas)



# Search

query = np.random.randn(384)

results = index.search(query, k=5, ef_search=100)

for doc_id, distance, metadata in results:

    print(f"  {doc_id}: distance={distance:.4f}, source={metadata['source']}")



# Filtered search -- only return results matching a predicate

results = index.search(

    query,

    k=5,

    filter_fn=lambda meta: meta is not None and meta.get("source") == "readme.md",

)



# Lazy deletion

index.delete("doc_001")

assert "doc_001" not in index

```

### Persistent Storage (VectorStore)

```python

from citadel_vector import VectorStore



# Create a persistent store -- data is saved to disk

store = VectorStore(path="./my_vectors", dim=384, metric="cosine")



# Add vectors (metadata is persisted to SQLite)

store.add(np.random.randn(384), "doc_1", metadata={"title": "Introduction"})

store.add(np.random.randn(384), "doc_2", metadata={"title": "Methods"})



# Save the index (vectors as .npy, graph as JSON, metadata in SQLite)

store.save()



# Load from disk in another process

store = VectorStore.load("./my_vectors")

results = store.search(np.random.randn(384), k=5)



# Inspect index statistics

print(store.stats())

# {'count': 2, 'dim': 384, 'metric': 'cosine', 'max_elements': 100000, 'layers': 1, ...}

```

### REST API Server

```python

# Start the server

# citadel-vector serve --port 8082



# Or programmatically:

from citadel_vector.server import create_app

app = create_app(storage_dir="./vector_data")

```

```bash

# Create a collection

curl -X POST http://localhost:8082/collections \

  -H "Content-Type: application/json" \

  -d '{"name": "documents", "dim": 384, "metric": "cosine"}'



# Add vectors

curl -X POST http://localhost:8082/collections/documents/add \

  -H "Content-Type: application/json" \

  -d '{"vectors": [[0.1, 0.2, ...]], "ids": ["doc_1"], "metadatas": [{"title": "test"}]}'



# Search

curl -X POST http://localhost:8082/collections/documents/search \

  -H "Content-Type: application/json" \

  -d '{"query": [0.1, 0.2, ...], "k": 5}'

```

---

## Architecture

```

citadel-vector/

  citadel_vector/

    __init__.py          # Public API: HNSWIndex, VectorStore, distance functions

    hnsw.py              # Core HNSW algorithm -- graph construction, beam search, deletion

    distance.py          # Distance metrics: cosine, euclidean, dot product (scalar + batch)

    storage.py           # Persistent storage: NumPy arrays + JSON graph + SQLite metadata

    config.py            # VectorConfig dataclass with tunable defaults

    server.py            # Optional FastAPI REST server for HTTP access

  tests/

    test_hnsw.py         # Unit tests + recall benchmark (>90% recall@10 on 1000 vectors)

    test_distance.py     # Distance function correctness tests

    test_storage.py      # Persistence round-trip tests

```

### Storage Format

The `VectorStore` persists three files:

| File | Format | Contents |
|------|--------|----------|
| `vectors.npy` | NumPy binary | All vector data as a float64 array |
| `graph.pkl` | JSON | Graph adjacency lists, node levels, entry point, index parameters |
| `metadata.db` | SQLite | Key-value metadata store (one row per vector ID) |

This separation means vectors can be memory-mapped for large datasets, the graph structure is human-readable for debugging, and metadata queries can use SQL.

---

## Dependencies

- **Python 3.10+**
- **NumPy** -- the only required dependency. All distance calculations and vector storage use NumPy arrays.
- **FastAPI + Uvicorn** (optional) -- only needed for the REST server. Install with `pip install citadel-vector[server]`.

No C extensions. No compiled binaries. No CUDA. Pure Python + NumPy.

---

## Relationship to Citadel

This HNSW engine is one of six packages in the [Citadel](https://github.com/dbhavery/citadel) platform:

| Package | Purpose |
|---------|---------|
| **citadel-vector** (this) | HNSW vector search engine |
| citadel-gateway | LLM provider proxy with caching, rate limiting, circuit breaking |
| citadel-agents | YAML-defined ReAct agents with tool use and multi-agent orchestration |
| citadel-ingest | Document ingestion pipeline (parse, chunk, embed, deduplicate) |
| citadel-trace | LLM observability -- spans, cost tracking, latency metrics, alerts |
| citadel-dashboard | Single-file HTML operations dashboard |

Each package is independently installable and usable. The ingest pipeline writes to this vector store, the agent framework reads from it for RAG, and the gateway routes LLM calls for embedding generation. But none of these connections are required -- `citadel-vector` works entirely on its own.

---

## References

- Malkov, Y. A., & Yashunin, D. A. (2018). *Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs*. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824-836. [arXiv:1603.09320](https://arxiv.org/abs/1603.09320)

---

## License

MIT License. See [LICENSE](https://github.com/dbhavery/citadel/blob/main/LICENSE) for details.