Multi-Manifold Retrieval: Proof of Concept

A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems.

Core Idea

Standard RAG systems use a single shared embedding space for queries and documents, making the document geometry identical to the retrieval geometry. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement.

Multi-Manifold Retrieval decouples these geometries by using:

Separate query and document manifolds (M_Q and M_D)
A non-decomposable cross-manifold relevance operator R(q, d)

This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold).

Project Structure

multi_manifold_retrieval/
├── models/
│   ├── cross_manifold_operator.py   # Construction C: Attention-Geometric Hybrid
│   ├── encoders.py                  # Sentence-transformer wrapper
│   └── baseline.py                  # Standard cosine similarity baseline
├── training/
│   ├── train.py                     # Training loop
│   ├── data.py                      # MS MARCO data loading
│   └── losses.py                    # Contrastive loss
├── evaluation/
│   ├── spectral_analysis.py         # L_D, L_R, spectral discrepancy, Fiedler alignment
│   ├── retrieval_metrics.py         # MRR@10, Recall@100
│   └── attack_simulation.py         # GeoPoison-RAG simulation
proofs/
├── proof_theorem_4_3.tex            # Spectral Decoupling theorem
└── proof_theorem_6_1.tex            # Query Complexity Lower Bound theorem
configs/
└── default.yaml                     # Hyperparameters
run_experiment.py                    # End-to-end pipeline

Setup

pip install -r requirements.txt

Running

Full experiment (train + evaluate + spectral analysis + attack):

python run_experiment.py --config configs/default.yaml

Skip training and load from checkpoint:

python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt

Key Metrics

Metric	Baseline (expected)	Multi-Manifold (expected)
Spectral discrepancy δ	≈ 0	> 0 (significant)
Fiedler alignment cos(θ)	≈ 1	< 0.5
ASR@10	> 0.8	Significantly lower
MRR@10	Reference	≥ 80% of baseline

Formal Proofs

proofs/proof_theorem_4_3.tex: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling δ ≥ Ω(κ_R · λ_2(L_D)).
proofs/proof_theorem_6_1.tex: Proves that an adaptive adversary needs Ω(Vol(M_Q) / V_{d_Q}(ε/κ_R)) oracle queries to reconstruct R.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support