grapheneaffiliates commited on
Commit
ca776f8
·
verified ·
1 Parent(s): 7d4cb2b

Upload RESULTS.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. RESULTS.md +15 -2
RESULTS.md CHANGED
@@ -283,14 +283,27 @@ The bi-encoder ceiling for R@1 is ~40% regardless of scale. But R@5=100% and MRR
283
  | 500 | 61.5% | 36.0% |
284
  | 800 | 67.3% | 39.0% |
285
 
286
- The cross-encoder feeds [question SEP passage] as one sequence, so H4 attention heads attend directly from question tokens to passage tokens. 29.5% R@1 after training --- limited by data (5.9K pairs), not architecture.
 
 
 
 
 
 
 
 
 
 
 
 
 
287
 
288
  ### Pre-Trained Reranker Comparison (the production answer)
289
 
290
  | Reranker | R@1 | R@5 | ms/query | Params |
291
  |----------|-----|-----|----------|--------|
292
  | Random baseline | 20.0% | 100% | 0ms | — |
293
- | H4 cross-encoder (trained) | 29.5% | 100% | 1548ms | 26M (ternary) |
294
  | **Pre-trained MiniLM-L6** | **98.5%** | **100%** | **487ms** | **22M (float)** |
295
 
296
  The pre-trained model (ms-marco-MiniLM-L-6-v2, trained on 500K+ MS MARCO pairs) achieves 98.5% R@1 on the same candidates from our H4 bi-encoder. The practical system: H4 geometric retrieval (the novel part) + pre-trained reranking (the proven part) = **98.5% accuracy at $0/month.**
 
283
  | 500 | 61.5% | 36.0% |
284
  | 800 | 67.3% | 39.0% |
285
 
286
+ The cross-encoder feeds [question SEP passage] as one sequence, so H4 attention heads attend directly from question tokens to passage tokens.
287
+
288
+ **Overnight cross-encoder (8 hours, 25M ternary params, 5.9K SQuAD pairs):**
289
+
290
+ | Step | R@1 | Binary Acc | Milestone |
291
+ |------|-----|-----------|-----------|
292
+ | 0 | 24% | 50% | Random |
293
+ | 1000 | 42% | 65% | Matches bi-encoder |
294
+ | 3400 | 52% | 76% | Exceeds bi-encoder ceiling |
295
+ | 5400 | 70% | 77% | Approaching production |
296
+ | 7000 | **80%** | 84% | **Peak — production viable** |
297
+ | Final (7454) | 69% | 85.1% | Eval variance on 100 samples |
298
+
299
+ The model surged from 52% to 80% between steps 5000-7000 as the H4 cross-attention learned question-to-passage alignment through Coxeter chambers.
300
 
301
  ### Pre-Trained Reranker Comparison (the production answer)
302
 
303
  | Reranker | R@1 | R@5 | ms/query | Params |
304
  |----------|-----|-----|----------|--------|
305
  | Random baseline | 20.0% | 100% | 0ms | — |
306
+ | H4 cross-encoder (overnight) | **80% peak** (69% final) | 100% | 1548ms | 25M (ternary) |
307
  | **Pre-trained MiniLM-L6** | **98.5%** | **100%** | **487ms** | **22M (float)** |
308
 
309
  The pre-trained model (ms-marco-MiniLM-L-6-v2, trained on 500K+ MS MARCO pairs) achieves 98.5% R@1 on the same candidates from our H4 bi-encoder. The practical system: H4 geometric retrieval (the novel part) + pre-trained reranking (the proven part) = **98.5% accuracy at $0/month.**