| # Multi-Manifold Retrieval: Proof of Concept |
|
|
| A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems. |
|
|
| ## Core Idea |
|
|
| Standard RAG systems use a single shared embedding space for queries and documents, making the **document geometry identical to the retrieval geometry**. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement. |
|
|
| Multi-Manifold Retrieval **decouples** these geometries by using: |
| - Separate query and document manifolds (M_Q and M_D) |
| - A non-decomposable cross-manifold relevance operator R(q, d) |
|
|
| This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold). |
|
|
| ## Project Structure |
|
|
| ``` |
| multi_manifold_retrieval/ |
| βββ models/ |
| β βββ cross_manifold_operator.py # Construction C: Attention-Geometric Hybrid |
| β βββ encoders.py # Sentence-transformer wrapper |
| β βββ baseline.py # Standard cosine similarity baseline |
| βββ training/ |
| β βββ train.py # Training loop |
| β βββ data.py # MS MARCO data loading |
| β βββ losses.py # Contrastive loss |
| βββ evaluation/ |
| β βββ spectral_analysis.py # L_D, L_R, spectral discrepancy, Fiedler alignment |
| β βββ retrieval_metrics.py # MRR@10, Recall@100 |
| β βββ attack_simulation.py # GeoPoison-RAG simulation |
| proofs/ |
| βββ proof_theorem_4_3.tex # Spectral Decoupling theorem |
| βββ proof_theorem_6_1.tex # Query Complexity Lower Bound theorem |
| configs/ |
| βββ default.yaml # Hyperparameters |
| run_experiment.py # End-to-end pipeline |
| ``` |
|
|
| ## Setup |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Running |
|
|
| Full experiment (train + evaluate + spectral analysis + attack): |
| ```bash |
| python run_experiment.py --config configs/default.yaml |
| ``` |
|
|
| Skip training and load from checkpoint: |
| ```bash |
| python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt |
| ``` |
|
|
| ## Key Metrics |
|
|
| | Metric | Baseline (expected) | Multi-Manifold (expected) | |
| |--------|-------------------|--------------------------| |
| | Spectral discrepancy Ξ΄ | β 0 | > 0 (significant) | |
| | Fiedler alignment cos(ΞΈ) | β 1 | < 0.5 | |
| | ASR@10 | > 0.8 | Significantly lower | |
| | MRR@10 | Reference | β₯ 80% of baseline | |
|
|
| ## Formal Proofs |
|
|
| - `proofs/proof_theorem_4_3.tex`: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling Ξ΄ β₯ Ξ©(ΞΊ_R Β· Ξ»_2(L_D)). |
| - `proofs/proof_theorem_6_1.tex`: Proves that an adaptive adversary needs Ξ©(Vol(M_Q) / V_{d_Q}(Ξ΅/ΞΊ_R)) oracle queries to reconstruct R. |
|
|