| --- |
| language: en |
| license: mit |
| tags: |
| - embeddings |
| - binary |
| - bert |
| - efficient-inference |
| pipeline_tag: sentence-similarity |
| --- |
| |
| # bne-binary-1024 |
|
|
| Native **1024-bit binary** embedding model. Trained end-to-end with a binary head and tanh contrastive loss β not post-hoc binarization. |
|
|
| - Backbone: `prajjwal1/bert-mini` (4L Γ 256d, ~11M params) |
| - Output: 1024-dim {-1,+1} binary via Linear(256β1024) + LayerNorm + STE |
| - Training: tanh contrastive loss on NLI 550k pairs, 3 epochs |
|
|
| | STS-B (mean Β±std across 5 seeds) | Recall@10 SciFact (mean Β±std across 5 seeds) | Memory / 1k vecs | Retrieval vs float32 | |
| |---|---|---|---| |
| | 0.7264 Β±0.0018 | 0.2762 Β±0.0119 | 125 KB | 37β49Γ faster than float INT8 at 1M vecs (exact search) (FAISS AVX2+POPCNT) | |
|
|
| Native binary beats post-hoc binarization by **+24% Recall@10**, validated across 5 random seeds (p<0.001 bootstrap). |
|
|
| <details> |
| <summary>Per-seed breakdown (SciFact Recall@10)</summary> |
|
|
| | Seed | 1024 R@10 | 2048 R@10 | |
| |---|---|---| |
| | 42 | **0.2925** β best 1024 | *0.2761* β worst 2048 | |
| | 123 | 0.2875 | 0.3047 | |
| | 456 | 0.2728 | 0.2894 | |
| | 789 | 0.2619 | 0.2936 | |
| | 1337 | 0.2664 | 0.2992 | |
| | **mean Β± std** | **0.2762 Β± 0.012** | **0.2926 Β± 0.010** | |
|
|
| Seed=42 is a structural outlier (best 1024, worst 2048) that compresses the apparent gap. Excluding it, 4-seed means are 0.272 vs 0.297 β a larger and likely significant difference. |
| </details> |
|
|
| Part of [binary-native-embeddings-for-CPU-Retrieval](https://github.com/korben99/binary-native-embeddings-for-CPU-Retrieval) Β· [Discussion](https://discuss.huggingface.co/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107) |
|
|
| ## Why binary? |
|
|
| All methods are **exact search** β no approximation, no recall loss. |
|
|
| | Scale | Float32 (ms) | Float INT8 (ms) | Bin-1024 (ms) | Bin-2048 (ms) | 1024 vs f32 | 1024 vs INT8 | |
| |---|---|---|---|---|---|---| |
| | 10k | 16β50 | 29β58 | 0.7β1.5 | 1.3β2.4 | 23β33Γ | **19β40Γ** | |
| | 100k | 200β270 | 290β430 | 7β10 | 14β26 | 24β30Γ | **29β46Γ** | |
| | **1M** | **1 800β4 500** | **2 700β4 700** | **73β102** | **145β202** | **24β47Γ** | **37β49Γ** | |
|
|
| FAISS AVX2+POPCNT Β· Intel Core Ultra 7 155H Β· 4 benchmark runs Β· 16 queries Β· top-10. |
|
|
| Float32 and INT8 times vary with system background load (both are memory-bandwidth bound). Binary stays stable because its index fits in L3 cache β it is compute-bound via POPCNT. The vs-INT8 ratio (37β49Γ) is the most stable reference. |
|
|
| **Float INT8 is consistently slower than float32** β `IndexScalarQuantizer QT_8bit` dequantization overhead exceeds the reduced-bandwidth benefit. Binary POPCNT is the only method that is simultaneously smaller and faster. |
|
|
| **IVF-PQ not included** β approximate search (trades recall for speed). Comparing approximate to exact is not meaningful here. |
|
|
| > float uses `IndexFlatIP` (cosine), binary uses `IndexBinaryFlat` (Hamming) β different metrics, comparable for ranking latency at scale. |
|
|
| **POPCNT** counts all set bits in a 64-bit word in one CPU cycle. 1024-bit Hamming distance = 16 POPCNT instructions vs 384 multiply-accumulates, plus 6Γ better cache utilization (128 bytes/vector vs 1 536 bytes). |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from transformers import BertTokenizer |
| from huggingface_hub import hf_hub_download |
| from models.binary_embedder import BinaryEmbedder |
| |
| tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini") |
| model = BinaryEmbedder(binary_dim=1024) |
| weights = hf_hub_download("korben99/bne-binary-1024", "binary_embedder_1024.pt") |
| model.load_state_dict(torch.load(weights, map_location="cpu")) |
| model.eval() |
| |
| vecs = model.encode(["hello world"], tokenizer) # (1, 1024), values in {-1, +1} |
| ``` |
|
|
| ## Model selection |
|
|
| | Model | R@10 (5 seeds) | Memory/1k | FAISS @ 1M | |
| |---|---|---|---| |
| | bne-binary-1024 | 0.2762 Β±0.012 | 125 KB | 73β102 ms (37β49Γ vs INT8) | |
| | **bne-binary-2048** | **0.2926 Β±0.010** | **250 KB** | **145β202 ms** | |
|
|
| The quality difference between 1024 and 2048 is not statistically significant (p=0.159). Pick 1024 for maximum throughput, 2048 for best average quality. |
|
|