File size: 2,942 Bytes
b464490 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | # Multi-Manifold Retrieval: Proof of Concept
A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems.
## Core Idea
Standard RAG systems use a single shared embedding space for queries and documents, making the **document geometry identical to the retrieval geometry**. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement.
Multi-Manifold Retrieval **decouples** these geometries by using:
- Separate query and document manifolds (M_Q and M_D)
- A non-decomposable cross-manifold relevance operator R(q, d)
This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold).
## Project Structure
```
multi_manifold_retrieval/
βββ models/
β βββ cross_manifold_operator.py # Construction C: Attention-Geometric Hybrid
β βββ encoders.py # Sentence-transformer wrapper
β βββ baseline.py # Standard cosine similarity baseline
βββ training/
β βββ train.py # Training loop
β βββ data.py # MS MARCO data loading
β βββ losses.py # Contrastive loss
βββ evaluation/
β βββ spectral_analysis.py # L_D, L_R, spectral discrepancy, Fiedler alignment
β βββ retrieval_metrics.py # MRR@10, Recall@100
β βββ attack_simulation.py # GeoPoison-RAG simulation
proofs/
βββ proof_theorem_4_3.tex # Spectral Decoupling theorem
βββ proof_theorem_6_1.tex # Query Complexity Lower Bound theorem
configs/
βββ default.yaml # Hyperparameters
run_experiment.py # End-to-end pipeline
```
## Setup
```bash
pip install -r requirements.txt
```
## Running
Full experiment (train + evaluate + spectral analysis + attack):
```bash
python run_experiment.py --config configs/default.yaml
```
Skip training and load from checkpoint:
```bash
python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt
```
## Key Metrics
| Metric | Baseline (expected) | Multi-Manifold (expected) |
|--------|-------------------|--------------------------|
| Spectral discrepancy Ξ΄ | β 0 | > 0 (significant) |
| Fiedler alignment cos(ΞΈ) | β 1 | < 0.5 |
| ASR@10 | > 0.8 | Significantly lower |
| MRR@10 | Reference | β₯ 80% of baseline |
## Formal Proofs
- `proofs/proof_theorem_4_3.tex`: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling Ξ΄ β₯ Ξ©(ΞΊ_R Β· Ξ»_2(L_D)).
- `proofs/proof_theorem_6_1.tex`: Proves that an adaptive adversary needs Ξ©(Vol(M_Q) / V_{d_Q}(Ξ΅/ΞΊ_R)) oracle queries to reconstruct R.
|