File size: 6,336 Bytes

4dbb914

---
license: mit
language:
  - en
tags:
  - retrieval-augmented-generation
  - rag
  - causal-reasoning
  - hallucination-reduction
  - semantic-drift
  - context-window-poisoning
  - multi-hop-qa
  - information-retrieval
  - nlp
  - question-answering
library_name: vortexrag
pipeline_tag: question-answering
---

# VORTEXRAG Framework

**Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation**

A unified 7-layer RAG framework that simultaneously eliminates **Semantic Drift** and **Context Window Poisoning** — the two compounding failure modes that undermine factual grounding in standard RAG systems.

## Key Results

| Metric | VORTEXRAG | vs Naive RAG | vs CRAG | vs Self-RAG |
|--------|-----------|--------------|---------|-------------|
| EM | **74.8** | +13.6 | +7.9 | +6.4 |
| F1 | **82.6** | +14.2 | +8.3 | +6.7 |
| Faithfulness | **0.94** | +0.23 | +0.16 | +0.13 |
| Semantic Drift Reduction | **61%** | — | — | — |
| Context Poison Reduction | **71%** | — | — | — |
| Added Latency | **45ms** | — | 2.5× faster | 2.2× faster |

Evaluated on NQ + HotpotQA + MuSiQue + 2WikiMultiHopQA (31,240 total questions).

## The 7-Layer Pipeline

```
Query
  │
  ▼
[L1: TVE] Tri-Vector Encoding
  │  v = [α·sem(768d); β·syn(64d); γ·cau(32d)]
  │  Encodes text as orthogonal semantic+syntactic+causal vectors
  │
  ▼
[L2: VRC] Vortex Retrieval Cone
  │  spiral_rank = TVE·e^{−λr}·cos(nθ)
  │  Geometric suppression of causally orthogonal chunks (θ > 45°)
  │
  ▼
[L3: SDC] Semantic Drift Corrector      ← per-chunk causal gate
  │  SDS = 1 − tanh(‖v_cau(q) − v_cau(c)‖ / τ) ≥ 0.72
  │  Eliminates individual semantic drift
  │
  ▼
[L4: CPG] Context Poison Guard          ← window-level quality gate
  │  ESR = Σ SDS·w / (P+ε) ≥ 3.5
  │  Greedy-optimal purging (Theorem 5.1)
  │
  ▼
[L5: RFG] Rank Fusion Gate
  │  Φ = TVE^α × SDS^β × ESR_contrib^γ  (multiplicative, no-weak-link)
  │
  ▼
[L6: CCB] Causal Context Builder
  │  pos = rank(Φ+) × causal_depth
  │  Root-cause chunks at pos=0 (U-shaped LLM recall exploitation)
  │
  ▼
[L6: LLM] Generation
  │
  ▼
[L7: FV] Faithfulness Verifier ←──────────────── regeneration loop ──┐
  │  ΔR = 1 − ROUGE-L × NLI ≤ 0.15                                   │
  │  DeBERTa-v3-small CrossEncoder NLI                                 │
  └─── if ΔR > δ_FV: re-weight RFG → retry (max 3 iterations) ────────┘
  │
  ▼
Answer* (argmin ΔR across iterations)
```

## Quick Start

```bash
pip install vortexrag
```

```python
from vortexrag import VortexRAG, VortexConfig

# Initialize with domain preset
config = VortexConfig(domain="general")  # general, medical, legal, financial, code...
rag = VortexRAG(config)

# Index your documents
rag.index(["Document 1...", "Document 2...", "Document 3..."])

# Query
result = rag.query("Why did X cause Y rather than Z?")
print(result.answer)
print(f"Faithfulness: ΔR={result.delta_r:.3f}")
print(f"Context Quality: ESR={result.esr:.3f}")
```

## Domain Presets

VORTEXRAG ships with 11 pre-calibrated domain parameter vectors:

| Domain | τ | θ_CPG | γ (causal) | β (syntactic) | Use Case |
|--------|---|-------|-----------|--------------|----------|
| `general` | 0.80 | 3.5 | 0.25 | 0.25 | Default balanced |
| `medical` | 0.35 | 5.0 | **0.40** | 0.15 | Drug mechanisms, clinical QA |
| `legal` | 0.40 | 4.5 | 0.35 | **0.30** | Precedent chains, statutory analysis |
| `scientific` | 0.30 | 4.0 | **0.40** | 0.20 | Physics, chemistry, biology |
| `financial` | 0.50 | 3.5 | 0.30 | 0.25 | Market causation, risk analysis |
| `code` | 0.60 | 3.5 | 0.25 | **0.45** | Debugging, AST-structured retrieval |
| `cybersecurity` | 0.45 | 4.0 | 0.35 | 0.30 | Exploit chains, threat intel |
| `educational` | 0.65 | 3.0 | 0.25 | 0.20 | Concept progression, tutoring |
| `historical` | 0.90 | 3.0 | 0.35 | 0.20 | Event causation chains |
| `creative` | 1.20 | 2.5 | 0.15 | 0.20 | Thematic retrieval |

## Theoretical Contributions

- **Theorem 5.1 (CPG Greedy Optimality):** Per-step removal of argmin SDS maximizes ΔESR. Proof via monotone derivative argument.
- **Corollary 5.1 (Convergence):** Purge terminates in ≤|W|−3 steps with strictly monotone increasing ESR.
- **Proposition 10.1 (TVE Orthogonality):** Cross-arm correlation ρ < 0.08 empirically via Johnson-Lindenstrauss.
- **CCB Positional Optimality:** Optimal under U-shaped recall model f(pos) ≈ ½(1+cos(π·pos/L)) (Liu et al. 2023).

## Ablation Results

Every layer contributes:

| Layer Added | EM | ΔEM | Insight |
|-------------|----|----|---------|
| Baseline | 61.2 | — | Standard cosine RAG |
| + TVE | 65.3 | +4.1 | Causal encoding separates mechanism from consequence |
| + VRC | 67.8 | +2.5 | Geometric filtering of causally orthogonal docs |
| + SDC | 70.4 | +2.6 | Per-chunk SDS gate eliminates individual drift |
| + CPG | 72.1 | +1.7 | Window ESR constraint (+39pp context poisoning reduction) |
| + RFG | 73.4 | +1.3 | Multiplicative no-weak-link fusion |
| + CCB | 73.9 | +0.5 | Root-cause chunks at attention-peak position |
| + FV | **74.8** | +0.9 | Faithfulness gate with regeneration loop |

## Links

- 📄 **Research Paper:** https://doi.org/10.5281/zenodo.20285144
- 💻 **GitHub:** https://github.com/vignesh2027/VORTEXRAG
- 🌐 **Docs:** https://vignesh2027.github.io/VORTEXRAG
- 🤗 **Live Demo:** https://huggingface.co/spaces/vigneshwar234/VORTEXRAG
- 📊 **Benchmarks:** https://huggingface.co/datasets/vigneshwar234/VORTEXRAG-Benchmarks

## Citation

```bibtex
@article{vignesh2026vortexrag,
  title   = {{VORTEXRAG}: Vector Orthogonal Resonance-Tuned EXtraction
             Retrieval-Augmented Generation},
  author  = {Vignesh L},
  year    = {2026},
  month   = {May},
  doi     = {10.5281/zenodo.20285144},
  url     = {https://github.com/vignesh2027/VORTEXRAG},
  note    = {Independent Research Preprint. v2.0. MIT License.},
  keywords= {RAG, Semantic Drift, Context Window Poisoning, Causal NLP,
             Multi-Hop QA, Faithfulness Verification}
}
```

**Author:** Vignesh L — Independent Researcher  
**ORCID:** https://orcid.org/0009-0004-9777-7592  
**License:** MIT  
**Version:** v2.0 — May 2026