YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Multi-Manifold Retrieval: Proof of Concept

A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems.

Core Idea

Standard RAG systems use a single shared embedding space for queries and documents, making the document geometry identical to the retrieval geometry. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement.

Multi-Manifold Retrieval decouples these geometries by using:

  • Separate query and document manifolds (M_Q and M_D)
  • A non-decomposable cross-manifold relevance operator R(q, d)

This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold).

Project Structure

multi_manifold_retrieval/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ cross_manifold_operator.py   # Construction C: Attention-Geometric Hybrid
β”‚   β”œβ”€β”€ encoders.py                  # Sentence-transformer wrapper
β”‚   └── baseline.py                  # Standard cosine similarity baseline
β”œβ”€β”€ training/
β”‚   β”œβ”€β”€ train.py                     # Training loop
β”‚   β”œβ”€β”€ data.py                      # MS MARCO data loading
β”‚   └── losses.py                    # Contrastive loss
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ spectral_analysis.py         # L_D, L_R, spectral discrepancy, Fiedler alignment
β”‚   β”œβ”€β”€ retrieval_metrics.py         # MRR@10, Recall@100
β”‚   └── attack_simulation.py         # GeoPoison-RAG simulation
proofs/
β”œβ”€β”€ proof_theorem_4_3.tex            # Spectral Decoupling theorem
└── proof_theorem_6_1.tex            # Query Complexity Lower Bound theorem
configs/
└── default.yaml                     # Hyperparameters
run_experiment.py                    # End-to-end pipeline

Setup

pip install -r requirements.txt

Running

Full experiment (train + evaluate + spectral analysis + attack):

python run_experiment.py --config configs/default.yaml

Skip training and load from checkpoint:

python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt

Key Metrics

Metric Baseline (expected) Multi-Manifold (expected)
Spectral discrepancy Ξ΄ β‰ˆ 0 > 0 (significant)
Fiedler alignment cos(ΞΈ) β‰ˆ 1 < 0.5
ASR@10 > 0.8 Significantly lower
MRR@10 Reference β‰₯ 80% of baseline

Formal Proofs

  • proofs/proof_theorem_4_3.tex: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling Ξ΄ β‰₯ Ξ©(ΞΊ_R Β· Ξ»_2(L_D)).
  • proofs/proof_theorem_6_1.tex: Proves that an adaptive adversary needs Ξ©(Vol(M_Q) / V_{d_Q}(Ξ΅/ΞΊ_R)) oracle queries to reconstruct R.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support