Upload folder using huggingface_hub
Browse files- README.md +89 -0
- binary_embedder_1024.pt +3 -0
- config.json +5 -0
README.md
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- embeddings
|
| 6 |
+
- binary
|
| 7 |
+
- bert
|
| 8 |
+
- efficient-inference
|
| 9 |
+
pipeline_tag: sentence-similarity
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# bne-binary-1024
|
| 13 |
+
|
| 14 |
+
Native **1024-bit binary** embedding model. Trained end-to-end with a binary head and tanh contrastive loss β not post-hoc binarization.
|
| 15 |
+
|
| 16 |
+
- Backbone: `prajjwal1/bert-mini` (4L Γ 256d, ~11M params)
|
| 17 |
+
- Output: 1024-dim {-1,+1} binary via Linear(256β1024) + LayerNorm + STE
|
| 18 |
+
- Training: tanh contrastive loss on NLI 550k pairs, 3 epochs
|
| 19 |
+
|
| 20 |
+
| STS-B (mean Β±std across 5 seeds) | Recall@10 SciFact (mean Β±std across 5 seeds) | Memory / 1k vecs | Retrieval vs float32 |
|
| 21 |
+
|---|---|---|---|
|
| 22 |
+
| 0.7264 Β±0.0018 | 0.2762 Β±0.0119 | 125 KB | 37β49Γ faster than float INT8 at 1M vecs (exact search) (FAISS AVX2+POPCNT) |
|
| 23 |
+
|
| 24 |
+
Native binary beats post-hoc binarization by **+24% Recall@10**, validated across 5 random seeds (p<0.001 bootstrap).
|
| 25 |
+
|
| 26 |
+
<details>
|
| 27 |
+
<summary>Per-seed breakdown (SciFact Recall@10)</summary>
|
| 28 |
+
|
| 29 |
+
| Seed | 1024 R@10 | 2048 R@10 |
|
| 30 |
+
|---|---|---|
|
| 31 |
+
| 42 | **0.2925** β best 1024 | *0.2761* β worst 2048 |
|
| 32 |
+
| 123 | 0.2875 | 0.3047 |
|
| 33 |
+
| 456 | 0.2728 | 0.2894 |
|
| 34 |
+
| 789 | 0.2619 | 0.2936 |
|
| 35 |
+
| 1337 | 0.2664 | 0.2992 |
|
| 36 |
+
| **mean Β± std** | **0.2762 Β± 0.012** | **0.2926 Β± 0.010** |
|
| 37 |
+
|
| 38 |
+
Seed=42 is a structural outlier (best 1024, worst 2048) that compresses the apparent gap. Excluding it, 4-seed means are 0.272 vs 0.297 β a larger and likely significant difference.
|
| 39 |
+
</details>
|
| 40 |
+
|
| 41 |
+
Part of [binary-native-embeddings-for-CPU-Retrieval](https://github.com/korben99/binary-native-embeddings-for-CPU-Retrieval) Β· [Discussion](https://discuss.huggingface.co/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107)
|
| 42 |
+
|
| 43 |
+
## Why binary?
|
| 44 |
+
|
| 45 |
+
All methods are **exact search** β no approximation, no recall loss.
|
| 46 |
+
|
| 47 |
+
| Scale | Float32 (ms) | Float INT8 (ms) | Bin-1024 (ms) | Bin-2048 (ms) | 1024 vs f32 | 1024 vs INT8 |
|
| 48 |
+
|---|---|---|---|---|---|---|
|
| 49 |
+
| 10k | 16β50 | 29β58 | 0.7β1.5 | 1.3β2.4 | 23β33Γ | **19β40Γ** |
|
| 50 |
+
| 100k | 200β270 | 290β430 | 7β10 | 14β26 | 24β30Γ | **29β46Γ** |
|
| 51 |
+
| **1M** | **1 800β4 500** | **2 700β4 700** | **73β102** | **145β202** | **24β47Γ** | **37β49Γ** |
|
| 52 |
+
|
| 53 |
+
FAISS AVX2+POPCNT Β· Intel Core Ultra 7 155H Β· 4 benchmark runs Β· 16 queries Β· top-10.
|
| 54 |
+
|
| 55 |
+
Float32 and INT8 times vary with system background load (both are memory-bandwidth bound). Binary stays stable because its index fits in L3 cache β it is compute-bound via POPCNT. The vs-INT8 ratio (37β49Γ) is the most stable reference.
|
| 56 |
+
|
| 57 |
+
**Float INT8 is consistently slower than float32** β `IndexScalarQuantizer QT_8bit` dequantization overhead exceeds the reduced-bandwidth benefit. Binary POPCNT is the only method that is simultaneously smaller and faster.
|
| 58 |
+
|
| 59 |
+
**IVF-PQ not included** β approximate search (trades recall for speed). Comparing approximate to exact is not meaningful here.
|
| 60 |
+
|
| 61 |
+
> float uses `IndexFlatIP` (cosine), binary uses `IndexBinaryFlat` (Hamming) β different metrics, comparable for ranking latency at scale.
|
| 62 |
+
|
| 63 |
+
**POPCNT** counts all set bits in a 64-bit word in one CPU cycle. 1024-bit Hamming distance = 16 POPCNT instructions vs 384 multiply-accumulates, plus 6Γ better cache utilization (128 bytes/vector vs 1 536 bytes).
|
| 64 |
+
|
| 65 |
+
## Usage
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
import torch
|
| 69 |
+
from transformers import BertTokenizer
|
| 70 |
+
from huggingface_hub import hf_hub_download
|
| 71 |
+
from models.binary_embedder import BinaryEmbedder
|
| 72 |
+
|
| 73 |
+
tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini")
|
| 74 |
+
model = BinaryEmbedder(binary_dim=1024)
|
| 75 |
+
weights = hf_hub_download("korben99/bne-binary-1024", "binary_embedder_1024.pt")
|
| 76 |
+
model.load_state_dict(torch.load(weights, map_location="cpu"))
|
| 77 |
+
model.eval()
|
| 78 |
+
|
| 79 |
+
vecs = model.encode(["hello world"], tokenizer) # (1, 1024), values in {-1, +1}
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Model selection
|
| 83 |
+
|
| 84 |
+
| Model | R@10 (5 seeds) | Memory/1k | FAISS @ 1M |
|
| 85 |
+
|---|---|---|---|
|
| 86 |
+
| bne-binary-1024 | 0.2762 Β±0.012 | 125 KB | 73β102 ms (37β49Γ vs INT8) |
|
| 87 |
+
| **bne-binary-2048** | **0.2926 Β±0.010** | **250 KB** | **145β202 ms** |
|
| 88 |
+
|
| 89 |
+
The quality difference between 1024 and 2048 is not statistically significant (p=0.159). Pick 1024 for maximum throughput, 2048 for best average quality.
|
binary_embedder_1024.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4ad3f7c7224fef2a3215484e71e2407e7272323bb957739d49bf11dfca638b92
|
| 3 |
+
size 45774239
|
config.json
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "BinaryEmbedder",
|
| 3 |
+
"binary_dim": 1024,
|
| 4 |
+
"backbone": "prajjwal1/bert-mini"
|
| 5 |
+
}
|