Add comprehensive VORTEXRAG framework model card

4dbb914 verified 2 days ago

6.34 kB

license: mit
language:
  - en
tags:
  - retrieval-augmented-generation
  - rag
  - causal-reasoning
  - hallucination-reduction
  - semantic-drift
  - context-window-poisoning
  - multi-hop-qa
  - information-retrieval
  - nlp
  - question-answering
library_name: vortexrag
pipeline_tag: question-answering

VORTEXRAG Framework

Vector Orthogonal Resonance-Tuned EXtraction Retrieval-Augmented Generation

A unified 7-layer RAG framework that simultaneously eliminates Semantic Drift and Context Window Poisoning — the two compounding failure modes that undermine factual grounding in standard RAG systems.

Key Results

Metric	VORTEXRAG	vs Naive RAG	vs CRAG	vs Self-RAG
EM	74.8	+13.6	+7.9	+6.4
F1	82.6	+14.2	+8.3	+6.7
Faithfulness	0.94	+0.23	+0.16	+0.13
Semantic Drift Reduction	61%	—	—	—
Context Poison Reduction	71%	—	—	—
Added Latency	45ms	—	2.5× faster	2.2× faster

Evaluated on NQ + HotpotQA + MuSiQue + 2WikiMultiHopQA (31,240 total questions).

The 7-Layer Pipeline

Query
  │
  ▼
[L1: TVE] Tri-Vector Encoding
  │  v = [α·sem(768d); β·syn(64d); γ·cau(32d)]
  │  Encodes text as orthogonal semantic+syntactic+causal vectors
  │
  ▼
[L2: VRC] Vortex Retrieval Cone
  │  spiral_rank = TVE·e^{−λr}·cos(nθ)
  │  Geometric suppression of causally orthogonal chunks (θ > 45°)
  │
  ▼
[L3: SDC] Semantic Drift Corrector      ← per-chunk causal gate
  │  SDS = 1 − tanh(‖v_cau(q) − v_cau(c)‖ / τ) ≥ 0.72
  │  Eliminates individual semantic drift
  │
  ▼
[L4: CPG] Context Poison Guard          ← window-level quality gate
  │  ESR = Σ SDS·w / (P+ε) ≥ 3.5
  │  Greedy-optimal purging (Theorem 5.1)
  │
  ▼
[L5: RFG] Rank Fusion Gate
  │  Φ = TVE^α × SDS^β × ESR_contrib^γ  (multiplicative, no-weak-link)
  │
  ▼
[L6: CCB] Causal Context Builder
  │  pos = rank(Φ+) × causal_depth
  │  Root-cause chunks at pos=0 (U-shaped LLM recall exploitation)
  │
  ▼
[L6: LLM] Generation
  │
  ▼
[L7: FV] Faithfulness Verifier ←──────────────── regeneration loop ──┐
  │  ΔR = 1 − ROUGE-L × NLI ≤ 0.15                                   │
  │  DeBERTa-v3-small CrossEncoder NLI                                 │
  └─── if ΔR > δ_FV: re-weight RFG → retry (max 3 iterations) ────────┘
  │
  ▼
Answer* (argmin ΔR across iterations)

Quick Start

pip install vortexrag

from vortexrag import VortexRAG, VortexConfig

# Initialize with domain preset
config = VortexConfig(domain="general")  # general, medical, legal, financial, code...
rag = VortexRAG(config)

# Index your documents
rag.index(["Document 1...", "Document 2...", "Document 3..."])

# Query
result = rag.query("Why did X cause Y rather than Z?")
print(result.answer)
print(f"Faithfulness: ΔR={result.delta_r:.3f}")
print(f"Context Quality: ESR={result.esr:.3f}")

Domain Presets

VORTEXRAG ships with 11 pre-calibrated domain parameter vectors:

Domain	τ	θ_CPG	γ (causal)	β (syntactic)	Use Case
`general`	0.80	3.5	0.25	0.25	Default balanced
`medical`	0.35	5.0	0.40	0.15	Drug mechanisms, clinical QA
`legal`	0.40	4.5	0.35	0.30	Precedent chains, statutory analysis
`scientific`	0.30	4.0	0.40	0.20	Physics, chemistry, biology
`financial`	0.50	3.5	0.30	0.25	Market causation, risk analysis
`code`	0.60	3.5	0.25	0.45	Debugging, AST-structured retrieval
`cybersecurity`	0.45	4.0	0.35	0.30	Exploit chains, threat intel
`educational`	0.65	3.0	0.25	0.20	Concept progression, tutoring
`historical`	0.90	3.0	0.35	0.20	Event causation chains
`creative`	1.20	2.5	0.15	0.20	Thematic retrieval

Theoretical Contributions

Theorem 5.1 (CPG Greedy Optimality): Per-step removal of argmin SDS maximizes ΔESR. Proof via monotone derivative argument.
Corollary 5.1 (Convergence): Purge terminates in ≤|W|−3 steps with strictly monotone increasing ESR.
Proposition 10.1 (TVE Orthogonality): Cross-arm correlation ρ < 0.08 empirically via Johnson-Lindenstrauss.
CCB Positional Optimality: Optimal under U-shaped recall model f(pos) ≈ ½(1+cos(π·pos/L)) (Liu et al. 2023).

Ablation Results

Every layer contributes:

Layer Added	EM	ΔEM	Insight
Baseline	61.2	—	Standard cosine RAG
+ TVE	65.3	+4.1	Causal encoding separates mechanism from consequence
+ VRC	67.8	+2.5	Geometric filtering of causally orthogonal docs
+ SDC	70.4	+2.6	Per-chunk SDS gate eliminates individual drift
+ CPG	72.1	+1.7	Window ESR constraint (+39pp context poisoning reduction)
+ RFG	73.4	+1.3	Multiplicative no-weak-link fusion
+ CCB	73.9	+0.5	Root-cause chunks at attention-peak position
+ FV	74.8	+0.9	Faithfulness gate with regeneration loop

Citation

@article{vignesh2026vortexrag,
  title   = {{VORTEXRAG}: Vector Orthogonal Resonance-Tuned EXtraction
             Retrieval-Augmented Generation},
  author  = {Vignesh L},
  year    = {2026},
  month   = {May},
  doi     = {10.5281/zenodo.20285144},
  url     = {https://github.com/vignesh2027/VORTEXRAG},
  note    = {Independent Research Preprint. v2.0. MIT License.},
  keywords= {RAG, Semantic Drift, Context Window Poisoning, Causal NLP,
             Multi-Hop QA, Faithfulness Verification}
}

Author: Vignesh L — Independent Researcher
ORCID: https://orcid.org/0009-0004-9777-7592
License: MIT
Version: v2.0 — May 2026

vigneshwar234
/

VORTEXRAG-Framework