library_name: custom
license: mit
tags:
- vector-search
- hnsw
- nearest-neighbor
- information-retrieval
- from-scratch
- approximate-nearest-neighbor
language:
- en
pipeline_tag: feature-extraction
HNSW Vector Engine -- Zero-Dependency Approximate Nearest Neighbor Search
A from-scratch implementation of the HNSW (Hierarchical Navigable Small World) algorithm for approximate nearest neighbor search. No FAISS. No ChromaDB. No Annoy. Just the algorithm described in the Malkov & Yashunin 2018 paper, implemented directly in Python with NumPy.
This is the vector search engine from the Citadel AI operations platform, extracted here as a standalone reference. Citadel uses it as its built-in vector index -- no external vector database required.
Paper: Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs (Malkov & Yashunin, 2018)
Source: github.com/dbhavery/citadel -- see packages/citadel-vector/
Why Build HNSW From Scratch
Most projects that need vector search reach for a prebuilt library -- FAISS, Hnswlib, ChromaDB, Pinecone. These are excellent tools. But they are also opaque. When your recall drops, when your index corrupts, when performance degrades on a specific data distribution, you are debugging a black box.
Building HNSW from the paper forces you to understand every decision the algorithm makes: how layers are assigned, why the greedy search converges, what happens when you prune neighbor lists, and how the entry point selection affects traversal. That understanding is the difference between using a tool and knowing a tool.
This implementation prioritizes clarity and correctness over raw speed. It is intended as a readable, well-documented reference that you can study, modify, and extend. For production workloads at scale, FAISS or Hnswlib will be faster due to their C++/SIMD cores. For applications in the tens-of-thousands to low-millions range -- which covers most RAG pipelines, recommendation engines, and local AI applications -- this implementation is both fast enough and transparent enough to use directly.
How HNSW Works
HNSW builds a multi-layer navigable graph over the dataset. The core insight is borrowed from skip lists: maintain multiple layers of the same graph at decreasing density, so that search starts with coarse, long-range hops at the top and refines to precise, short-range hops at the bottom.
The Multi-Layer Structure
Layer 3 (sparsest): [A] -------------- [F]
| |
Layer 2: [A] --- [C] ------- [F] --- [H]
| | | |
Layer 1: [A] - [B] - [C] - [E] - [F] - [G] - [H]
| | | | | | |
Layer 0 (densest): [A] [B] - [C] - [D] - [E] - [F] - [G] - [H] - [I] - [J]
Every node exists in layer 0. Each higher layer contains a geometrically decreasing random subset of nodes. When a new vector is inserted, its maximum layer is drawn from an exponential distribution: level = floor(-ln(uniform()) * mL), where mL = 1/ln(M) and M is the maximum connections per node. Most nodes land on layer 0. A few reach layer 1. Even fewer reach layer 2. This gives the probabilistic skip-list property without any explicit balancing.
Insertion Algorithm
When inserting a new vector q at assigned level L:
Top-down greedy phase: Starting from the global entry point at the highest layer, perform greedy search (beam width = 1) down to layer
L+1. At each layer, find the single nearest neighbor toqand use it as the entry point for the next layer down. This quickly navigates to the region of the graph closest toq.Connection phase: From layer
min(L, max_layer)down to layer 0, perform beam search withef_constructioncandidates. Select theMnearest neighbors from the candidates and create bidirectional edges betweenqand each selected neighbor.Pruning: If any neighbor now exceeds its maximum connection count (
Mfor layers > 0,2*Mfor layer 0), prune its edge list down to theMnearest connections. Layer 0 is allowed double the connections because it carries all the data and needs higher connectivity for recall.Entry point update: If
Lexceeds the current maximum layer,qbecomes the new global entry point.
Search Algorithm
Searching for the k nearest neighbors of a query vector q:
Top-down traversal: Starting from the entry point, greedily descend from the top layer to layer 1, keeping only the single closest node at each layer. This narrows the search to the right neighborhood in O(log n) hops.
Layer 0 beam search: At layer 0, perform a beam search with width
ef_search(a tunable parameter). The search maintains two heaps -- a min-heap of candidates to expand, and a max-heap of the current best results. At each step, pop the closest unexpanded candidate, examine its neighbors, and add any neighbor closer than the current worst result to both heaps.Termination: The search stops when the closest unexpanded candidate is farther than the worst result in the beam and the beam is full. Return the top
kresults from the beam.
Key Parameters
| Parameter | Default | Role |
|---|---|---|
M |
16 | Max edges per node per layer. Higher M = better recall, more memory, slower insertion. |
ef_construction |
200 | Beam width during insertion. Higher = better graph quality, slower build. |
ef_search |
50 | Beam width during search. Higher = better recall, slower queries. This is the primary recall/speed knob at query time. |
Performance Characteristics
| Operation | Time Complexity | Notes |
|---|---|---|
| Insert | O(log n * ef_construction * M) | Dominated by the beam search at each layer during connection. |
| Search | O(log n * ef_search * M) | Top-down traversal is O(log n) layers; layer 0 search is bounded by ef_search. |
| Delete | O(1) | Lazy deletion -- marks the node, does not rebuild the graph. Deleted nodes are still traversed but excluded from results. |
| Memory | O(n * M * layers) | Each node stores M neighbor IDs per layer. Average layers per node is ~1/ln(M). |
For typical configurations (M=16, ef_construction=200, ef_search=50), this implementation achieves >90% recall@10 against brute-force search on 1,000 random 64-dimensional vectors, verified by the test suite.
Supported Distance Metrics
| Metric | Function | Range | Use Case |
|---|---|---|---|
cosine |
1 - cos(a, b) |
[0, 2] | Normalized embeddings (most embedding models). Default. |
euclidean |
||a - b||_2 |
[0, inf) | Raw feature vectors, spatial data. |
dot |
-dot(a, b) |
(-inf, inf) | Maximum inner product search (MIPS). |
All distance functions are implemented with NumPy for vectorized computation. Batch variants (batch_cosine_distance, batch_euclidean_distance) are provided for operations over multiple vectors.
Usage
Basic Index Operations
import numpy as np
from citadel_vector import HNSWIndex
# Create an index for 384-dimensional vectors (e.g., sentence-transformers output)
index = HNSWIndex(
dim=384,
max_elements=50_000,
M=16,
ef_construction=200,
metric="cosine",
)
# Insert vectors with metadata
index.add(
vector=np.random.randn(384),
id="doc_001",
metadata={"source": "readme.md", "chunk": 0},
)
# Batch insert
vectors = np.random.randn(100, 384)
ids = [f"doc_{i:03d}" for i in range(100)]
metadatas = [{"source": f"file_{i}.txt"} for i in range(100)]
index.batch_add(vectors, ids, metadatas)
# Search
query = np.random.randn(384)
results = index.search(query, k=5, ef_search=100)
for doc_id, distance, metadata in results:
print(f" {doc_id}: distance={distance:.4f}, source={metadata['source']}")
# Filtered search -- only return results matching a predicate
results = index.search(
query,
k=5,
filter_fn=lambda meta: meta is not None and meta.get("source") == "readme.md",
)
# Lazy deletion
index.delete("doc_001")
assert "doc_001" not in index
Persistent Storage (VectorStore)
from citadel_vector import VectorStore
# Create a persistent store -- data is saved to disk
store = VectorStore(path="./my_vectors", dim=384, metric="cosine")
# Add vectors (metadata is persisted to SQLite)
store.add(np.random.randn(384), "doc_1", metadata={"title": "Introduction"})
store.add(np.random.randn(384), "doc_2", metadata={"title": "Methods"})
# Save the index (vectors as .npy, graph as JSON, metadata in SQLite)
store.save()
# Load from disk in another process
store = VectorStore.load("./my_vectors")
results = store.search(np.random.randn(384), k=5)
# Inspect index statistics
print(store.stats())
# {'count': 2, 'dim': 384, 'metric': 'cosine', 'max_elements': 100000, 'layers': 1, ...}
REST API Server
# Start the server
# citadel-vector serve --port 8082
# Or programmatically:
from citadel_vector.server import create_app
app = create_app(storage_dir="./vector_data")
# Create a collection
curl -X POST http://localhost:8082/collections \
-H "Content-Type: application/json" \
-d '{"name": "documents", "dim": 384, "metric": "cosine"}'
# Add vectors
curl -X POST http://localhost:8082/collections/documents/add \
-H "Content-Type: application/json" \
-d '{"vectors": [[0.1, 0.2, ...]], "ids": ["doc_1"], "metadatas": [{"title": "test"}]}'
# Search
curl -X POST http://localhost:8082/collections/documents/search \
-H "Content-Type: application/json" \
-d '{"query": [0.1, 0.2, ...], "k": 5}'
Architecture
citadel-vector/
citadel_vector/
__init__.py # Public API: HNSWIndex, VectorStore, distance functions
hnsw.py # Core HNSW algorithm -- graph construction, beam search, deletion
distance.py # Distance metrics: cosine, euclidean, dot product (scalar + batch)
storage.py # Persistent storage: NumPy arrays + JSON graph + SQLite metadata
config.py # VectorConfig dataclass with tunable defaults
server.py # Optional FastAPI REST server for HTTP access
tests/
test_hnsw.py # Unit tests + recall benchmark (>90% recall@10 on 1000 vectors)
test_distance.py # Distance function correctness tests
test_storage.py # Persistence round-trip tests
Storage Format
The VectorStore persists three files:
| File | Format | Contents |
|---|---|---|
vectors.npy |
NumPy binary | All vector data as a float64 array |
graph.pkl |
JSON | Graph adjacency lists, node levels, entry point, index parameters |
metadata.db |
SQLite | Key-value metadata store (one row per vector ID) |
This separation means vectors can be memory-mapped for large datasets, the graph structure is human-readable for debugging, and metadata queries can use SQL.
Dependencies
- Python 3.10+
- NumPy -- the only required dependency. All distance calculations and vector storage use NumPy arrays.
- FastAPI + Uvicorn (optional) -- only needed for the REST server. Install with
pip install citadel-vector[server].
No C extensions. No compiled binaries. No CUDA. Pure Python + NumPy.
Relationship to Citadel
This HNSW engine is one of six packages in the Citadel platform:
| Package | Purpose |
|---|---|
| citadel-vector (this) | HNSW vector search engine |
| citadel-gateway | LLM provider proxy with caching, rate limiting, circuit breaking |
| citadel-agents | YAML-defined ReAct agents with tool use and multi-agent orchestration |
| citadel-ingest | Document ingestion pipeline (parse, chunk, embed, deduplicate) |
| citadel-trace | LLM observability -- spans, cost tracking, latency metrics, alerts |
| citadel-dashboard | Single-file HTML operations dashboard |
Each package is independently installable and usable. The ingest pipeline writes to this vector store, the agent framework reads from it for RAG, and the gateway routes LLM calls for embedding generation. But none of these connections are required -- citadel-vector works entirely on its own.
References
- Malkov, Y. A., & Yashunin, D. A. (2018). Efficient and robust approximate nearest neighbor using Hierarchical Navigable Small World graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4), 824-836. arXiv:1603.09320
License
MIT License. See LICENSE for details.