| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - retrieval-augmented-generation |
| - rag |
| - causal-reasoning |
| - hallucination-reduction |
| - semantic-drift |
| - context-window-poisoning |
| - multi-hop-qa |
| - information-retrieval |
| - nlp |
| - question-answering |
| library_name: vortexrag |
| pipeline_tag: question-answering |
| --- |
| |
| # VORTEXRAG Framework |
|
|
| **Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation** |
|
|
| A unified 7-layer RAG framework that simultaneously eliminates **Semantic Drift** and **Context Window Poisoning** — the two compounding failure modes that undermine factual grounding in standard RAG systems. |
|
|
| ## Key Results |
|
|
| | Metric | VORTEXRAG | vs Naive RAG | vs CRAG | vs Self-RAG | |
| |--------|-----------|--------------|---------|-------------| |
| | EM | **74.8** | +13.6 | +7.9 | +6.4 | |
| | F1 | **82.6** | +14.2 | +8.3 | +6.7 | |
| | Faithfulness | **0.94** | +0.23 | +0.16 | +0.13 | |
| | Semantic Drift Reduction | **61%** | — | — | — | |
| | Context Poison Reduction | **71%** | — | — | — | |
| | Added Latency | **45ms** | — | 2.5× faster | 2.2× faster | |
|
|
| Evaluated on NQ + HotpotQA + MuSiQue + 2WikiMultiHopQA (31,240 total questions). |
|
|
| ## The 7-Layer Pipeline |
|
|
| ``` |
| Query |
| │ |
| ▼ |
| [L1: TVE] Tri-Vector Encoding |
| │ v = [α·sem(768d); β·syn(64d); γ·cau(32d)] |
| │ Encodes text as orthogonal semantic+syntactic+causal vectors |
| │ |
| ▼ |
| [L2: VRC] Vortex Retrieval Cone |
| │ spiral_rank = TVE·e^{−λr}·cos(nθ) |
| │ Geometric suppression of causally orthogonal chunks (θ > 45°) |
| │ |
| ▼ |
| [L3: SDC] Semantic Drift Corrector ← per-chunk causal gate |
| │ SDS = 1 − tanh(‖v_cau(q) − v_cau(c)‖ / τ) ≥ 0.72 |
| │ Eliminates individual semantic drift |
| │ |
| ▼ |
| [L4: CPG] Context Poison Guard ← window-level quality gate |
| │ ESR = Σ SDS·w / (P+ε) ≥ 3.5 |
| │ Greedy-optimal purging (Theorem 5.1) |
| │ |
| ▼ |
| [L5: RFG] Rank Fusion Gate |
| │ Φ = TVE^α × SDS^β × ESR_contrib^γ (multiplicative, no-weak-link) |
| │ |
| ▼ |
| [L6: CCB] Causal Context Builder |
| │ pos = rank(Φ+) × causal_depth |
| │ Root-cause chunks at pos=0 (U-shaped LLM recall exploitation) |
| │ |
| ▼ |
| [L6: LLM] Generation |
| │ |
| ▼ |
| [L7: FV] Faithfulness Verifier ←──────────────── regeneration loop ──┐ |
| │ ΔR = 1 − ROUGE-L × NLI ≤ 0.15 │ |
| │ DeBERTa-v3-small CrossEncoder NLI │ |
| └─── if ΔR > δ_FV: re-weight RFG → retry (max 3 iterations) ────────┘ |
| │ |
| ▼ |
| Answer* (argmin ΔR across iterations) |
| ``` |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install vortexrag |
| ``` |
|
|
| ```python |
| from vortexrag import VortexRAG, VortexConfig |
| |
| # Initialize with domain preset |
| config = VortexConfig(domain="general") # general, medical, legal, financial, code... |
| rag = VortexRAG(config) |
| |
| # Index your documents |
| rag.index(["Document 1...", "Document 2...", "Document 3..."]) |
| |
| # Query |
| result = rag.query("Why did X cause Y rather than Z?") |
| print(result.answer) |
| print(f"Faithfulness: ΔR={result.delta_r:.3f}") |
| print(f"Context Quality: ESR={result.esr:.3f}") |
| ``` |
|
|
| ## Domain Presets |
|
|
| VORTEXRAG ships with 11 pre-calibrated domain parameter vectors: |
|
|
| | Domain | τ | θ_CPG | γ (causal) | β (syntactic) | Use Case | |
| |--------|---|-------|-----------|--------------|----------| |
| | `general` | 0.80 | 3.5 | 0.25 | 0.25 | Default balanced | |
| | `medical` | 0.35 | 5.0 | **0.40** | 0.15 | Drug mechanisms, clinical QA | |
| | `legal` | 0.40 | 4.5 | 0.35 | **0.30** | Precedent chains, statutory analysis | |
| | `scientific` | 0.30 | 4.0 | **0.40** | 0.20 | Physics, chemistry, biology | |
| | `financial` | 0.50 | 3.5 | 0.30 | 0.25 | Market causation, risk analysis | |
| | `code` | 0.60 | 3.5 | 0.25 | **0.45** | Debugging, AST-structured retrieval | |
| | `cybersecurity` | 0.45 | 4.0 | 0.35 | 0.30 | Exploit chains, threat intel | |
| | `educational` | 0.65 | 3.0 | 0.25 | 0.20 | Concept progression, tutoring | |
| | `historical` | 0.90 | 3.0 | 0.35 | 0.20 | Event causation chains | |
| | `creative` | 1.20 | 2.5 | 0.15 | 0.20 | Thematic retrieval | |
| |
| ## Theoretical Contributions |
| |
| - **Theorem 5.1 (CPG Greedy Optimality):** Per-step removal of argmin SDS maximizes ΔESR. Proof via monotone derivative argument. |
| - **Corollary 5.1 (Convergence):** Purge terminates in ≤|W|−3 steps with strictly monotone increasing ESR. |
| - **Proposition 10.1 (TVE Orthogonality):** Cross-arm correlation ρ < 0.08 empirically via Johnson-Lindenstrauss. |
| - **CCB Positional Optimality:** Optimal under U-shaped recall model f(pos) ≈ ½(1+cos(π·pos/L)) (Liu et al. 2023). |
| |
| ## Ablation Results |
| |
| Every layer contributes: |
| |
| | Layer Added | EM | ΔEM | Insight | |
| |-------------|----|----|---------| |
| | Baseline | 61.2 | — | Standard cosine RAG | |
| | + TVE | 65.3 | +4.1 | Causal encoding separates mechanism from consequence | |
| | + VRC | 67.8 | +2.5 | Geometric filtering of causally orthogonal docs | |
| | + SDC | 70.4 | +2.6 | Per-chunk SDS gate eliminates individual drift | |
| | + CPG | 72.1 | +1.7 | Window ESR constraint (+39pp context poisoning reduction) | |
| | + RFG | 73.4 | +1.3 | Multiplicative no-weak-link fusion | |
| | + CCB | 73.9 | +0.5 | Root-cause chunks at attention-peak position | |
| | + FV | **74.8** | +0.9 | Faithfulness gate with regeneration loop | |
| |
| ## Links |
| |
| - 📄 **Research Paper:** https://doi.org/10.5281/zenodo.20285144 |
| - 💻 **GitHub:** https://github.com/vignesh2027/VORTEXRAG |
| - 🌐 **Docs:** https://vignesh2027.github.io/VORTEXRAG |
| - 🤗 **Live Demo:** https://huggingface.co/spaces/vigneshwar234/VORTEXRAG |
| - 📊 **Benchmarks:** https://huggingface.co/datasets/vigneshwar234/VORTEXRAG-Benchmarks |
| |
| ## Citation |
| |
| ```bibtex |
| @article{vignesh2026vortexrag, |
| title = {{VORTEXRAG}: Vector Orthogonal Resonance-Tuned EXtraction |
| Retrieval-Augmented Generation}, |
| author = {Vignesh L}, |
| year = {2026}, |
| month = {May}, |
| doi = {10.5281/zenodo.20285144}, |
| url = {https://github.com/vignesh2027/VORTEXRAG}, |
| note = {Independent Research Preprint. v2.0. MIT License.}, |
| keywords= {RAG, Semantic Drift, Context Window Poisoning, Causal NLP, |
| Multi-Hop QA, Faithfulness Verification} |
| } |
| ``` |
| |
| **Author:** Vignesh L — Independent Researcher |
| **ORCID:** https://orcid.org/0009-0004-9777-7592 |
| **License:** MIT |
| **Version:** v2.0 — May 2026 |
| |