korben99 commited on
Commit
6513ed0
Β·
verified Β·
1 Parent(s): 487e808

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +89 -0
  2. binary_embedder_1024.pt +3 -0
  3. config.json +5 -0
README.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - embeddings
6
+ - binary
7
+ - bert
8
+ - efficient-inference
9
+ pipeline_tag: sentence-similarity
10
+ ---
11
+
12
+ # bne-binary-1024
13
+
14
+ Native **1024-bit binary** embedding model. Trained end-to-end with a binary head and tanh contrastive loss β€” not post-hoc binarization.
15
+
16
+ - Backbone: `prajjwal1/bert-mini` (4L Γ— 256d, ~11M params)
17
+ - Output: 1024-dim {-1,+1} binary via Linear(256β†’1024) + LayerNorm + STE
18
+ - Training: tanh contrastive loss on NLI 550k pairs, 3 epochs
19
+
20
+ | STS-B (mean Β±std across 5 seeds) | Recall@10 SciFact (mean Β±std across 5 seeds) | Memory / 1k vecs | Retrieval vs float32 |
21
+ |---|---|---|---|
22
+ | 0.7264 Β±0.0018 | 0.2762 Β±0.0119 | 125 KB | 37–49Γ— faster than float INT8 at 1M vecs (exact search) (FAISS AVX2+POPCNT) |
23
+
24
+ Native binary beats post-hoc binarization by **+24% Recall@10**, validated across 5 random seeds (p<0.001 bootstrap).
25
+
26
+ <details>
27
+ <summary>Per-seed breakdown (SciFact Recall@10)</summary>
28
+
29
+ | Seed | 1024 R@10 | 2048 R@10 |
30
+ |---|---|---|
31
+ | 42 | **0.2925** ← best 1024 | *0.2761* ← worst 2048 |
32
+ | 123 | 0.2875 | 0.3047 |
33
+ | 456 | 0.2728 | 0.2894 |
34
+ | 789 | 0.2619 | 0.2936 |
35
+ | 1337 | 0.2664 | 0.2992 |
36
+ | **mean Β± std** | **0.2762 Β± 0.012** | **0.2926 Β± 0.010** |
37
+
38
+ Seed=42 is a structural outlier (best 1024, worst 2048) that compresses the apparent gap. Excluding it, 4-seed means are 0.272 vs 0.297 β€” a larger and likely significant difference.
39
+ </details>
40
+
41
+ Part of [binary-native-embeddings-for-CPU-Retrieval](https://github.com/korben99/binary-native-embeddings-for-CPU-Retrieval) Β· [Discussion](https://discuss.huggingface.co/t/native-binary-embeddings-experiment-curious-about-your-thoughts/177107)
42
+
43
+ ## Why binary?
44
+
45
+ All methods are **exact search** β€” no approximation, no recall loss.
46
+
47
+ | Scale | Float32 (ms) | Float INT8 (ms) | Bin-1024 (ms) | Bin-2048 (ms) | 1024 vs f32 | 1024 vs INT8 |
48
+ |---|---|---|---|---|---|---|
49
+ | 10k | 16–50 | 29–58 | 0.7–1.5 | 1.3–2.4 | 23–33Γ— | **19–40Γ—** |
50
+ | 100k | 200–270 | 290–430 | 7–10 | 14–26 | 24–30Γ— | **29–46Γ—** |
51
+ | **1M** | **1 800–4 500** | **2 700–4 700** | **73–102** | **145–202** | **24–47Γ—** | **37–49Γ—** |
52
+
53
+ FAISS AVX2+POPCNT Β· Intel Core Ultra 7 155H Β· 4 benchmark runs Β· 16 queries Β· top-10.
54
+
55
+ Float32 and INT8 times vary with system background load (both are memory-bandwidth bound). Binary stays stable because its index fits in L3 cache β€” it is compute-bound via POPCNT. The vs-INT8 ratio (37–49Γ—) is the most stable reference.
56
+
57
+ **Float INT8 is consistently slower than float32** β€” `IndexScalarQuantizer QT_8bit` dequantization overhead exceeds the reduced-bandwidth benefit. Binary POPCNT is the only method that is simultaneously smaller and faster.
58
+
59
+ **IVF-PQ not included** β€” approximate search (trades recall for speed). Comparing approximate to exact is not meaningful here.
60
+
61
+ > float uses `IndexFlatIP` (cosine), binary uses `IndexBinaryFlat` (Hamming) β€” different metrics, comparable for ranking latency at scale.
62
+
63
+ **POPCNT** counts all set bits in a 64-bit word in one CPU cycle. 1024-bit Hamming distance = 16 POPCNT instructions vs 384 multiply-accumulates, plus 6Γ— better cache utilization (128 bytes/vector vs 1 536 bytes).
64
+
65
+ ## Usage
66
+
67
+ ```python
68
+ import torch
69
+ from transformers import BertTokenizer
70
+ from huggingface_hub import hf_hub_download
71
+ from models.binary_embedder import BinaryEmbedder
72
+
73
+ tokenizer = BertTokenizer.from_pretrained("prajjwal1/bert-mini")
74
+ model = BinaryEmbedder(binary_dim=1024)
75
+ weights = hf_hub_download("korben99/bne-binary-1024", "binary_embedder_1024.pt")
76
+ model.load_state_dict(torch.load(weights, map_location="cpu"))
77
+ model.eval()
78
+
79
+ vecs = model.encode(["hello world"], tokenizer) # (1, 1024), values in {-1, +1}
80
+ ```
81
+
82
+ ## Model selection
83
+
84
+ | Model | R@10 (5 seeds) | Memory/1k | FAISS @ 1M |
85
+ |---|---|---|---|
86
+ | bne-binary-1024 | 0.2762 Β±0.012 | 125 KB | 73–102 ms (37–49Γ— vs INT8) |
87
+ | **bne-binary-2048** | **0.2926 Β±0.010** | **250 KB** | **145–202 ms** |
88
+
89
+ The quality difference between 1024 and 2048 is not statistically significant (p=0.159). Pick 1024 for maximum throughput, 2048 for best average quality.
binary_embedder_1024.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ad3f7c7224fef2a3215484e71e2407e7272323bb957739d49bf11dfca638b92
3
+ size 45774239
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "model_type": "BinaryEmbedder",
3
+ "binary_dim": 1024,
4
+ "backbone": "prajjwal1/bert-mini"
5
+ }