Upload RESULTS.md with huggingface_hub
Browse files- RESULTS.md +15 -2
RESULTS.md
CHANGED
|
@@ -283,14 +283,27 @@ The bi-encoder ceiling for R@1 is ~40% regardless of scale. But R@5=100% and MRR
|
|
| 283 |
| 500 | 61.5% | 36.0% |
|
| 284 |
| 800 | 67.3% | 39.0% |
|
| 285 |
|
| 286 |
-
The cross-encoder feeds [question SEP passage] as one sequence, so H4 attention heads attend directly from question tokens to passage tokens.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 287 |
|
| 288 |
### Pre-Trained Reranker Comparison (the production answer)
|
| 289 |
|
| 290 |
| Reranker | R@1 | R@5 | ms/query | Params |
|
| 291 |
|----------|-----|-----|----------|--------|
|
| 292 |
| Random baseline | 20.0% | 100% | 0ms | — |
|
| 293 |
-
| H4 cross-encoder (
|
| 294 |
| **Pre-trained MiniLM-L6** | **98.5%** | **100%** | **487ms** | **22M (float)** |
|
| 295 |
|
| 296 |
The pre-trained model (ms-marco-MiniLM-L-6-v2, trained on 500K+ MS MARCO pairs) achieves 98.5% R@1 on the same candidates from our H4 bi-encoder. The practical system: H4 geometric retrieval (the novel part) + pre-trained reranking (the proven part) = **98.5% accuracy at $0/month.**
|
|
|
|
| 283 |
| 500 | 61.5% | 36.0% |
|
| 284 |
| 800 | 67.3% | 39.0% |
|
| 285 |
|
| 286 |
+
The cross-encoder feeds [question SEP passage] as one sequence, so H4 attention heads attend directly from question tokens to passage tokens.
|
| 287 |
+
|
| 288 |
+
**Overnight cross-encoder (8 hours, 25M ternary params, 5.9K SQuAD pairs):**
|
| 289 |
+
|
| 290 |
+
| Step | R@1 | Binary Acc | Milestone |
|
| 291 |
+
|------|-----|-----------|-----------|
|
| 292 |
+
| 0 | 24% | 50% | Random |
|
| 293 |
+
| 1000 | 42% | 65% | Matches bi-encoder |
|
| 294 |
+
| 3400 | 52% | 76% | Exceeds bi-encoder ceiling |
|
| 295 |
+
| 5400 | 70% | 77% | Approaching production |
|
| 296 |
+
| 7000 | **80%** | 84% | **Peak — production viable** |
|
| 297 |
+
| Final (7454) | 69% | 85.1% | Eval variance on 100 samples |
|
| 298 |
+
|
| 299 |
+
The model surged from 52% to 80% between steps 5000-7000 as the H4 cross-attention learned question-to-passage alignment through Coxeter chambers.
|
| 300 |
|
| 301 |
### Pre-Trained Reranker Comparison (the production answer)
|
| 302 |
|
| 303 |
| Reranker | R@1 | R@5 | ms/query | Params |
|
| 304 |
|----------|-----|-----|----------|--------|
|
| 305 |
| Random baseline | 20.0% | 100% | 0ms | — |
|
| 306 |
+
| H4 cross-encoder (overnight) | **80% peak** (69% final) | 100% | 1548ms | 25M (ternary) |
|
| 307 |
| **Pre-trained MiniLM-L6** | **98.5%** | **100%** | **487ms** | **22M (float)** |
|
| 308 |
|
| 309 |
The pre-trained model (ms-marco-MiniLM-L-6-v2, trained on 500K+ MS MARCO pairs) achieves 98.5% R@1 on the same candidates from our H4 bi-encoder. The practical system: H4 geometric retrieval (the novel part) + pre-trained reranking (the proven part) = **98.5% accuracy at $0/month.**
|