# Multi-Manifold Retrieval: Proof of Concept A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems. ## Core Idea Standard RAG systems use a single shared embedding space for queries and documents, making the **document geometry identical to the retrieval geometry**. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement. Multi-Manifold Retrieval **decouples** these geometries by using: - Separate query and document manifolds (M_Q and M_D) - A non-decomposable cross-manifold relevance operator R(q, d) This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold). ## Project Structure ``` multi_manifold_retrieval/ ├── models/ │ ├── cross_manifold_operator.py # Construction C: Attention-Geometric Hybrid │ ├── encoders.py # Sentence-transformer wrapper │ └── baseline.py # Standard cosine similarity baseline ├── training/ │ ├── train.py # Training loop │ ├── data.py # MS MARCO data loading │ └── losses.py # Contrastive loss ├── evaluation/ │ ├── spectral_analysis.py # L_D, L_R, spectral discrepancy, Fiedler alignment │ ├── retrieval_metrics.py # MRR@10, Recall@100 │ └── attack_simulation.py # GeoPoison-RAG simulation proofs/ ├── proof_theorem_4_3.tex # Spectral Decoupling theorem └── proof_theorem_6_1.tex # Query Complexity Lower Bound theorem configs/ └── default.yaml # Hyperparameters run_experiment.py # End-to-end pipeline ``` ## Setup ```bash pip install -r requirements.txt ``` ## Running Full experiment (train + evaluate + spectral analysis + attack): ```bash python run_experiment.py --config configs/default.yaml ``` Skip training and load from checkpoint: ```bash python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt ``` ## Key Metrics | Metric | Baseline (expected) | Multi-Manifold (expected) | |--------|-------------------|--------------------------| | Spectral discrepancy δ | ≈ 0 | > 0 (significant) | | Fiedler alignment cos(θ) | ≈ 1 | < 0.5 | | ASR@10 | > 0.8 | Significantly lower | | MRR@10 | Reference | ≥ 80% of baseline | ## Formal Proofs - `proofs/proof_theorem_4_3.tex`: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling δ ≥ Ω(κ_R · λ_2(L_D)). - `proofs/proof_theorem_6_1.tex`: Proves that an adaptive adversary needs Ω(Vol(M_Q) / V_{d_Q}(ε/κ_R)) oracle queries to reconstruct R.