--- license: mit language: - en tags: - retrieval-augmented-generation - rag - causal-reasoning - hallucination-reduction - semantic-drift - context-window-poisoning - multi-hop-qa - information-retrieval - nlp - question-answering library_name: vortexrag pipeline_tag: question-answering --- # VORTEXRAG Framework **Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation** A unified 7-layer RAG framework that simultaneously eliminates **Semantic Drift** and **Context Window Poisoning** — the two compounding failure modes that undermine factual grounding in standard RAG systems. ## Key Results | Metric | VORTEXRAG | vs Naive RAG | vs CRAG | vs Self-RAG | |--------|-----------|--------------|---------|-------------| | EM | **74.8** | +13.6 | +7.9 | +6.4 | | F1 | **82.6** | +14.2 | +8.3 | +6.7 | | Faithfulness | **0.94** | +0.23 | +0.16 | +0.13 | | Semantic Drift Reduction | **61%** | — | — | — | | Context Poison Reduction | **71%** | — | — | — | | Added Latency | **45ms** | — | 2.5× faster | 2.2× faster | Evaluated on NQ + HotpotQA + MuSiQue + 2WikiMultiHopQA (31,240 total questions). ## The 7-Layer Pipeline ``` Query │ ▼ [L1: TVE] Tri-Vector Encoding │ v = [α·sem(768d); β·syn(64d); γ·cau(32d)] │ Encodes text as orthogonal semantic+syntactic+causal vectors │ ▼ [L2: VRC] Vortex Retrieval Cone │ spiral_rank = TVE·e^{−λr}·cos(nθ) │ Geometric suppression of causally orthogonal chunks (θ > 45°) │ ▼ [L3: SDC] Semantic Drift Corrector ← per-chunk causal gate │ SDS = 1 − tanh(‖v_cau(q) − v_cau(c)‖ / τ) ≥ 0.72 │ Eliminates individual semantic drift │ ▼ [L4: CPG] Context Poison Guard ← window-level quality gate │ ESR = Σ SDS·w / (P+ε) ≥ 3.5 │ Greedy-optimal purging (Theorem 5.1) │ ▼ [L5: RFG] Rank Fusion Gate │ Φ = TVE^α × SDS^β × ESR_contrib^γ (multiplicative, no-weak-link) │ ▼ [L6: CCB] Causal Context Builder │ pos = rank(Φ+) × causal_depth │ Root-cause chunks at pos=0 (U-shaped LLM recall exploitation) │ ▼ [L6: LLM] Generation │ ▼ [L7: FV] Faithfulness Verifier ←──────────────── regeneration loop ──┐ │ ΔR = 1 − ROUGE-L × NLI ≤ 0.15 │ │ DeBERTa-v3-small CrossEncoder NLI │ └─── if ΔR > δ_FV: re-weight RFG → retry (max 3 iterations) ────────┘ │ ▼ Answer* (argmin ΔR across iterations) ``` ## Quick Start ```bash pip install vortexrag ``` ```python from vortexrag import VortexRAG, VortexConfig # Initialize with domain preset config = VortexConfig(domain="general") # general, medical, legal, financial, code... rag = VortexRAG(config) # Index your documents rag.index(["Document 1...", "Document 2...", "Document 3..."]) # Query result = rag.query("Why did X cause Y rather than Z?") print(result.answer) print(f"Faithfulness: ΔR={result.delta_r:.3f}") print(f"Context Quality: ESR={result.esr:.3f}") ``` ## Domain Presets VORTEXRAG ships with 11 pre-calibrated domain parameter vectors: | Domain | τ | θ_CPG | γ (causal) | β (syntactic) | Use Case | |--------|---|-------|-----------|--------------|----------| | `general` | 0.80 | 3.5 | 0.25 | 0.25 | Default balanced | | `medical` | 0.35 | 5.0 | **0.40** | 0.15 | Drug mechanisms, clinical QA | | `legal` | 0.40 | 4.5 | 0.35 | **0.30** | Precedent chains, statutory analysis | | `scientific` | 0.30 | 4.0 | **0.40** | 0.20 | Physics, chemistry, biology | | `financial` | 0.50 | 3.5 | 0.30 | 0.25 | Market causation, risk analysis | | `code` | 0.60 | 3.5 | 0.25 | **0.45** | Debugging, AST-structured retrieval | | `cybersecurity` | 0.45 | 4.0 | 0.35 | 0.30 | Exploit chains, threat intel | | `educational` | 0.65 | 3.0 | 0.25 | 0.20 | Concept progression, tutoring | | `historical` | 0.90 | 3.0 | 0.35 | 0.20 | Event causation chains | | `creative` | 1.20 | 2.5 | 0.15 | 0.20 | Thematic retrieval | ## Theoretical Contributions - **Theorem 5.1 (CPG Greedy Optimality):** Per-step removal of argmin SDS maximizes ΔESR. Proof via monotone derivative argument. - **Corollary 5.1 (Convergence):** Purge terminates in ≤|W|−3 steps with strictly monotone increasing ESR. - **Proposition 10.1 (TVE Orthogonality):** Cross-arm correlation ρ < 0.08 empirically via Johnson-Lindenstrauss. - **CCB Positional Optimality:** Optimal under U-shaped recall model f(pos) ≈ ½(1+cos(π·pos/L)) (Liu et al. 2023). ## Ablation Results Every layer contributes: | Layer Added | EM | ΔEM | Insight | |-------------|----|----|---------| | Baseline | 61.2 | — | Standard cosine RAG | | + TVE | 65.3 | +4.1 | Causal encoding separates mechanism from consequence | | + VRC | 67.8 | +2.5 | Geometric filtering of causally orthogonal docs | | + SDC | 70.4 | +2.6 | Per-chunk SDS gate eliminates individual drift | | + CPG | 72.1 | +1.7 | Window ESR constraint (+39pp context poisoning reduction) | | + RFG | 73.4 | +1.3 | Multiplicative no-weak-link fusion | | + CCB | 73.9 | +0.5 | Root-cause chunks at attention-peak position | | + FV | **74.8** | +0.9 | Faithfulness gate with regeneration loop | ## Links - 📄 **Research Paper:** https://doi.org/10.5281/zenodo.20285144 - 💻 **GitHub:** https://github.com/vignesh2027/VORTEXRAG - 🌐 **Docs:** https://vignesh2027.github.io/VORTEXRAG - 🤗 **Live Demo:** https://huggingface.co/spaces/vigneshwar234/VORTEXRAG - 📊 **Benchmarks:** https://huggingface.co/datasets/vigneshwar234/VORTEXRAG-Benchmarks ## Citation ```bibtex @article{vignesh2026vortexrag, title = {{VORTEXRAG}: Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation}, author = {Vignesh L}, year = {2026}, month = {May}, doi = {10.5281/zenodo.20285144}, url = {https://github.com/vignesh2027/VORTEXRAG}, note = {Independent Research Preprint. v2.0. MIT License.}, keywords= {RAG, Semantic Drift, Context Window Poisoning, Causal NLP, Multi-Hop QA, Faithfulness Verification} } ``` **Author:** Vignesh L — Independent Researcher **ORCID:** https://orcid.org/0009-0004-9777-7592 **License:** MIT **Version:** v2.0 — May 2026