bne-binary-2048
Native 2048-bit binary embedding model from the Binary Native Embeddings project.
- Backbone:
prajjwal1/bert-mini(4L × 256d, ~11M params) - Output: 2048-dim {-1,+1} binary via Linear(256→2048) + LayerNorm + STE
- Training: tanh contrastive loss on NLI 550k pairs, 3 epochs
- Key: differential LR (encoder 2e-5, projection 1e-3) + Straight-Through Estimator
| STS-B Spearman | Recall@10 (SciFact) | Memory / 1k vecs | Retrieval vs float (FAISS POPCNT) |
|---|---|---|---|
| 0.7293 | 0.2761 | 250 KB | 12.3x faster at 1M vecs (FAISS AVX2+POPCNT, Intel Core Ultra 7) |
Part of binary-native-embeddings-for-CPU-Retrieval · Discussion
Why binary?
At 1M vectors with FAISS IndexBinaryFlat (AVX2 + POPCNT, Intel Core Ultra 7):
- float32 384-dim: 3 601 ms
- binary 2048-dim: 293 ms (12.3x faster)
- binary 4096-dim: 596 ms (6.0x faster)
POPCNT processes 64 bits/cycle; 2048-bit Hamming distance = 32 POPCNT instructions vs 384 multiply-accumulates, plus 6× better cache utilization (256 bytes/vector vs 1 536 bytes).
Note: float uses
IndexFlatIP(cosine similarity) and binary usesIndexBinaryFlat(Hamming distance) — different metrics, but timings are comparable for measuring ranking latency at scale.
Usage
import torch
from transformers import BertTokenizer
from huggingface_hub import hf_hub_download
tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini")
from models.binary_embedder import BinaryEmbedder
model = BinaryEmbedder(binary_dim=2048)
weights = hf_hub_download("korben99/bne-binary-2048", "binary_embedder_2048.pt")
model.load_state_dict(torch.load(weights, map_location="cpu"))
model.eval()
vecs = model.encode(["hello world"], tokenizer) # (1, 2048), values in {-1, +1}
- Downloads last month
- 30