VORTEXRAG-Framework / README.md
vigneshwar234's picture
Add comprehensive VORTEXRAG framework model card
4dbb914 verified
---
license: mit
language:
- en
tags:
- retrieval-augmented-generation
- rag
- causal-reasoning
- hallucination-reduction
- semantic-drift
- context-window-poisoning
- multi-hop-qa
- information-retrieval
- nlp
- question-answering
library_name: vortexrag
pipeline_tag: question-answering
---
# VORTEXRAG Framework
**Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation**
A unified 7-layer RAG framework that simultaneously eliminates **Semantic Drift** and **Context Window Poisoning** — the two compounding failure modes that undermine factual grounding in standard RAG systems.
## Key Results
| Metric | VORTEXRAG | vs Naive RAG | vs CRAG | vs Self-RAG |
|--------|-----------|--------------|---------|-------------|
| EM | **74.8** | +13.6 | +7.9 | +6.4 |
| F1 | **82.6** | +14.2 | +8.3 | +6.7 |
| Faithfulness | **0.94** | +0.23 | +0.16 | +0.13 |
| Semantic Drift Reduction | **61%** | — | — | — |
| Context Poison Reduction | **71%** | — | — | — |
| Added Latency | **45ms** | — | 2.5× faster | 2.2× faster |
Evaluated on NQ + HotpotQA + MuSiQue + 2WikiMultiHopQA (31,240 total questions).
## The 7-Layer Pipeline
```
Query
[L1: TVE] Tri-Vector Encoding
│ v = [α·sem(768d); β·syn(64d); γ·cau(32d)]
│ Encodes text as orthogonal semantic+syntactic+causal vectors
[L2: VRC] Vortex Retrieval Cone
│ spiral_rank = TVE·e^{−λr}·cos(nθ)
│ Geometric suppression of causally orthogonal chunks (θ > 45°)
[L3: SDC] Semantic Drift Corrector ← per-chunk causal gate
│ SDS = 1 − tanh(‖v_cau(q) − v_cau(c)‖ / τ) ≥ 0.72
│ Eliminates individual semantic drift
[L4: CPG] Context Poison Guard ← window-level quality gate
│ ESR = Σ SDS·w / (P+ε) ≥ 3.5
│ Greedy-optimal purging (Theorem 5.1)
[L5: RFG] Rank Fusion Gate
│ Φ = TVE^α × SDS^β × ESR_contrib^γ (multiplicative, no-weak-link)
[L6: CCB] Causal Context Builder
│ pos = rank(Φ+) × causal_depth
│ Root-cause chunks at pos=0 (U-shaped LLM recall exploitation)
[L6: LLM] Generation
[L7: FV] Faithfulness Verifier ←──────────────── regeneration loop ──┐
│ ΔR = 1 − ROUGE-L × NLI ≤ 0.15 │
│ DeBERTa-v3-small CrossEncoder NLI │
└─── if ΔR > δ_FV: re-weight RFG → retry (max 3 iterations) ────────┘
Answer* (argmin ΔR across iterations)
```
## Quick Start
```bash
pip install vortexrag
```
```python
from vortexrag import VortexRAG, VortexConfig
# Initialize with domain preset
config = VortexConfig(domain="general") # general, medical, legal, financial, code...
rag = VortexRAG(config)
# Index your documents
rag.index(["Document 1...", "Document 2...", "Document 3..."])
# Query
result = rag.query("Why did X cause Y rather than Z?")
print(result.answer)
print(f"Faithfulness: ΔR={result.delta_r:.3f}")
print(f"Context Quality: ESR={result.esr:.3f}")
```
## Domain Presets
VORTEXRAG ships with 11 pre-calibrated domain parameter vectors:
| Domain | τ | θ_CPG | γ (causal) | β (syntactic) | Use Case |
|--------|---|-------|-----------|--------------|----------|
| `general` | 0.80 | 3.5 | 0.25 | 0.25 | Default balanced |
| `medical` | 0.35 | 5.0 | **0.40** | 0.15 | Drug mechanisms, clinical QA |
| `legal` | 0.40 | 4.5 | 0.35 | **0.30** | Precedent chains, statutory analysis |
| `scientific` | 0.30 | 4.0 | **0.40** | 0.20 | Physics, chemistry, biology |
| `financial` | 0.50 | 3.5 | 0.30 | 0.25 | Market causation, risk analysis |
| `code` | 0.60 | 3.5 | 0.25 | **0.45** | Debugging, AST-structured retrieval |
| `cybersecurity` | 0.45 | 4.0 | 0.35 | 0.30 | Exploit chains, threat intel |
| `educational` | 0.65 | 3.0 | 0.25 | 0.20 | Concept progression, tutoring |
| `historical` | 0.90 | 3.0 | 0.35 | 0.20 | Event causation chains |
| `creative` | 1.20 | 2.5 | 0.15 | 0.20 | Thematic retrieval |
## Theoretical Contributions
- **Theorem 5.1 (CPG Greedy Optimality):** Per-step removal of argmin SDS maximizes ΔESR. Proof via monotone derivative argument.
- **Corollary 5.1 (Convergence):** Purge terminates in ≤|W|−3 steps with strictly monotone increasing ESR.
- **Proposition 10.1 (TVE Orthogonality):** Cross-arm correlation ρ < 0.08 empirically via Johnson-Lindenstrauss.
- **CCB Positional Optimality:** Optimal under U-shaped recall model f(pos) ≈ ½(1+cos(π·pos/L)) (Liu et al. 2023).
## Ablation Results
Every layer contributes:
| Layer Added | EM | ΔEM | Insight |
|-------------|----|----|---------|
| Baseline | 61.2 | — | Standard cosine RAG |
| + TVE | 65.3 | +4.1 | Causal encoding separates mechanism from consequence |
| + VRC | 67.8 | +2.5 | Geometric filtering of causally orthogonal docs |
| + SDC | 70.4 | +2.6 | Per-chunk SDS gate eliminates individual drift |
| + CPG | 72.1 | +1.7 | Window ESR constraint (+39pp context poisoning reduction) |
| + RFG | 73.4 | +1.3 | Multiplicative no-weak-link fusion |
| + CCB | 73.9 | +0.5 | Root-cause chunks at attention-peak position |
| + FV | **74.8** | +0.9 | Faithfulness gate with regeneration loop |
## Links
- 📄 **Research Paper:** https://doi.org/10.5281/zenodo.20285144
- 💻 **GitHub:** https://github.com/vignesh2027/VORTEXRAG
- 🌐 **Docs:** https://vignesh2027.github.io/VORTEXRAG
- 🤗 **Live Demo:** https://huggingface.co/spaces/vigneshwar234/VORTEXRAG
- 📊 **Benchmarks:** https://huggingface.co/datasets/vigneshwar234/VORTEXRAG-Benchmarks
## Citation
```bibtex
@article{vignesh2026vortexrag,
title = {{VORTEXRAG}: Vector Orthogonal Resonance-Tuned EXtraction
Retrieval-Augmented Generation},
author = {Vignesh L},
year = {2026},
month = {May},
doi = {10.5281/zenodo.20285144},
url = {https://github.com/vignesh2027/VORTEXRAG},
note = {Independent Research Preprint. v2.0. MIT License.},
keywords= {RAG, Semantic Drift, Context Window Poisoning, Causal NLP,
Multi-Hop QA, Faithfulness Verification}
}
```
**Author:** Vignesh L — Independent Researcher
**ORCID:** https://orcid.org/0009-0004-9777-7592
**License:** MIT
**Version:** v2.0 — May 2026