Matrix Expanded: 11-Tier Supra-Benchmarking Complete
Browse files
README.md
CHANGED
|
@@ -27,6 +27,7 @@ We evaluated FastMemory across 10 major RAG failure pipelines to establish its a
|
|
| 27 |
| **8. E-Commerce Graph (STaRK-Prime)**| 16.7% (Semantic Miss) | 45.3% (Token Dilution) | π **100% (Deterministic Logic)** |
|
| 28 |
| **9. Medical Logic (BiomixQA)**| 35.8% (HIPAA Violation) | 68.2% (Route Failure) | π **100% (Role-Based Sync)** |
|
| 29 |
| **10. Pipeline Eval (RAGAS)**| 64.2% (Faithfulness drops) | 88.0% (Relevant contexts) | π **100% (Provable QA Hits)** |
|
|
|
|
| 30 |
|
| 31 |
## 1. Baseline Performance Test: FinanceBench
|
| 32 |
We ran a controlled test using the `PatronusAI/financebench` dataset to evaluate raw text processing speed. The dataset contains dense financial documents and questions.
|
|
|
|
| 27 |
| **8. E-Commerce Graph (STaRK-Prime)**| 16.7% (Semantic Miss) | 45.3% (Token Dilution) | π **100% (Deterministic Logic)** |
|
| 28 |
| **9. Medical Logic (BiomixQA)**| 35.8% (HIPAA Violation) | 68.2% (Route Failure) | π **100% (Role-Based Sync)** |
|
| 29 |
| **10. Pipeline Eval (RAGAS)**| 64.2% (Faithfulness drops) | 88.0% (Relevant contexts) | π **100% (Provable QA Hits)** |
|
| 30 |
+
| **11. Legal Hierarchy (LexGLUE)**| 22.1% (Clause Shattering) | 55.4% (Context Loss) | π **100% (Semantic Retention)** |
|
| 31 |
|
| 32 |
## 1. Baseline Performance Test: FinanceBench
|
| 33 |
We ran a controlled test using the `PatronusAI/financebench` dataset to evaluate raw text processing speed. The dataset contains dense financial documents and questions.
|