bne-binary-4096

Native 4096-bit binary embedding model from the Binary Native Embeddings project.

Backbone: prajjwal1/bert-mini (4L × 256d, ~11M params)
Output: 4096-dim {-1,+1} binary via Linear(256→4096) + LayerNorm + STE
Training: tanh contrastive loss on NLI 550k pairs, 3 epochs
Key: differential LR (encoder 2e-5, projection 1e-3) + Straight-Through Estimator

STS-B Spearman	Recall@10 (SciFact)	Memory / 1k vecs	Retrieval vs float (FAISS POPCNT)
0.7275	0.2958	500 KB	6.0x faster at 1M vecs (FAISS AVX2+POPCNT, Intel Core Ultra 7)

Part of binary-native-embeddings-for-CPU-Retrieval · Discussion

Why binary?

At 1M vectors with FAISS IndexBinaryFlat (AVX2 + POPCNT, Intel Core Ultra 7):

float32 384-dim: 3 601 ms
binary 2048-dim: 293 ms (12.3x faster)
binary 4096-dim: 596 ms (6.0x faster)

POPCNT processes 64 bits/cycle; 2048-bit Hamming distance = 32 POPCNT instructions vs 384 multiply-accumulates, plus 6× better cache utilization (256 bytes/vector vs 1 536 bytes).

Note: float uses IndexFlatIP (cosine similarity) and binary uses IndexBinaryFlat (Hamming distance) — different metrics, but timings are comparable for measuring ranking latency at scale.

Usage

import torch
from transformers import BertTokenizer
from huggingface_hub import hf_hub_download

tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini")

from models.binary_embedder import BinaryEmbedder
model = BinaryEmbedder(binary_dim=4096)
weights = hf_hub_download("korben99/bne-binary-4096", "binary_embedder_4096.pt")
model.load_state_dict(torch.load(weights, map_location="cpu"))
model.eval()

vecs = model.encode(["hello world"], tokenizer)  # (1, 4096), values in {-1, +1}

Downloads last month: 34