bne-binary-2048

Native 2048-bit binary embedding model from the Binary Native Embeddings project.

Backbone: prajjwal1/bert-mini (4L × 256d, ~11M params)
Output: 2048-dim {-1,+1} binary via Linear(256→2048) + LayerNorm + STE
Training: tanh contrastive loss on NLI 550k pairs, 3 epochs
Key: differential LR (encoder 2e-5, projection 1e-3) + Straight-Through Estimator

STS-B Spearman	Recall@10 (SciFact)	Memory / 1k vecs	Retrieval vs float (FAISS POPCNT)
0.7293	0.2761	250 KB	12.3x faster at 1M vecs (FAISS AVX2+POPCNT, Intel Core Ultra 7)

Part of binary-native-embeddings-for-CPU-Retrieval · Discussion

Why binary?

At 1M vectors with FAISS IndexBinaryFlat (AVX2 + POPCNT, Intel Core Ultra 7):

float32 384-dim: 3 601 ms
binary 2048-dim: 293 ms (12.3x faster)
binary 4096-dim: 596 ms (6.0x faster)

POPCNT processes 64 bits/cycle; 2048-bit Hamming distance = 32 POPCNT instructions vs 384 multiply-accumulates, plus 6× better cache utilization (256 bytes/vector vs 1 536 bytes).

Note: float uses IndexFlatIP (cosine similarity) and binary uses IndexBinaryFlat (Hamming distance) — different metrics, but timings are comparable for measuring ranking latency at scale.

Usage

import torch
from transformers import BertTokenizer
from huggingface_hub import hf_hub_download

tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini")

from models.binary_embedder import BinaryEmbedder
model = BinaryEmbedder(binary_dim=2048)
weights = hf_hub_download("korben99/bne-binary-2048", "binary_embedder_2048.pt")
model.load_state_dict(torch.load(weights, map_location="cpu"))
model.eval()

vecs = model.encode(["hello world"], tokenizer)  # (1, 2048), values in {-1, +1}

Downloads last month: 30