arxiv:2605.28034

Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings

Published on May 27

· Submitted by

Stan Kirdey on May 28

Upvote

Authors:

Stanislav Kirdey ,

Abstract

Clark Hash is a compact, stateless codec that reduces neural embedding storage size by 32x through deterministic sparse projections and scalar quantization while maintaining high similarity accuracy.

AI-generated summary

Clark Hash is a small method for storing neural embeddings in less space. It normalizes each database vector, applies a deterministic sparse signed Johnson-Lindenstrauss projection, clips the result, and stores a fixed-width scalar-quantized code. Queries stay in floating point and are scored against the stored sketches. In the default 384-dimensional sentence-embedding setting, Clark Hash stores a cosine-search vector in 48 bytes instead of 1536 bytes for dense f32 storage. This is 32x smaller. The method does not need a training pass, learned codebooks, rotations, or corpus statistics before new vectors can be stored. We describe the codec, the Rust implementation, and a multilingual sentence-similarity evaluation on 9,304 labeled pairs from 29 subsets. With a multilingual MiniLM encoder, the 48-byte sketches reached 0.910 and 0.946 macro Pearson correlation with dense cosine scores on STS17 and STS22. Clark Hash is not a new Johnson-Lindenstrauss theorem and it is not a replacement for approximate nearest-neighbor indexes. It is a simple stateless codec for compact embedding storage.

View arXiv page View PDF Project page GitHub 6 Add to collection

Community

voice-biomarkers

Paper author Paper submitter about 11 hours ago

Paper: Clark Hash: Stateless Sparse Johnson-Lindenstrauss Quantization for Neural Embeddings
arXiv: 2605.28034 [cs.AI]
Author: Stanislav Kirdey, Clark Labs Inc
Date: 2026-05-28

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.28034

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.28034 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.28034 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.28034 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.