bragee
/

Multi-Manifold-Retrieval_POC

Model card Files Files and versions

Multi-Manifold-Retrieval_POC / README.md

bragee's picture

Upload model checkpoints and code

b464490 verified 12 days ago

|

history blame contribute delete

2.94 kB

	# Multi-Manifold Retrieval: Proof of Concept

	A proof-of-concept implementation of the Multi-Manifold Retrieval defense against spectral poisoning attacks (GeoPoison-RAG) on Retrieval-Augmented Generation systems.

	## Core Idea

	Standard RAG systems use a single shared embedding space for queries and documents, making the document geometry identical to the retrieval geometry. GeoPoison-RAG exploits this by computing the spectral structure (Fiedler vector) of the document graph Laplacian to find optimal adversarial placement.

	Multi-Manifold Retrieval decouples these geometries by using:
	- Separate query and document manifolds (M_Q and M_D)
	- A non-decomposable cross-manifold relevance operator R(q, d)

	This breaks the attack because the Laplacian the attacker computes (document space) no longer predicts the Laplacian governing retrieval (cross-manifold).

	## Project Structure

	```
	multi_manifold_retrieval/
	├── models/
	│ ├── cross_manifold_operator.py # Construction C: Attention-Geometric Hybrid
	│ ├── encoders.py # Sentence-transformer wrapper
	│ └── baseline.py # Standard cosine similarity baseline
	├── training/
	│ ├── train.py # Training loop
	│ ├── data.py # MS MARCO data loading
	│ └── losses.py # Contrastive loss
	├── evaluation/
	│ ├── spectral_analysis.py # L_D, L_R, spectral discrepancy, Fiedler alignment
	│ ├── retrieval_metrics.py # MRR@10, Recall@100
	│ └── attack_simulation.py # GeoPoison-RAG simulation
	proofs/
	├── proof_theorem_4_3.tex # Spectral Decoupling theorem
	└── proof_theorem_6_1.tex # Query Complexity Lower Bound theorem
	configs/
	└── default.yaml # Hyperparameters
	run_experiment.py # End-to-end pipeline
	```

	## Setup

	```bash
	pip install -r requirements.txt
	```

	## Running

	Full experiment (train + evaluate + spectral analysis + attack):
	```bash
	python run_experiment.py --config configs/default.yaml
	```

	Skip training and load from checkpoint:
	```bash
	python run_experiment.py --skip-train --checkpoint checkpoints/best_operator.pt
	```

	## Key Metrics

	\| Metric \| Baseline (expected) \| Multi-Manifold (expected) \|
	\|--------\|-------------------\|--------------------------\|
	\| Spectral discrepancy δ \| ≈ 0 \| > 0 (significant) \|
	\| Fiedler alignment cos(θ) \| ≈ 1 \| < 0.5 \|
	\| ASR@10 \| > 0.8 \| Significantly lower \|
	\| MRR@10 \| Reference \| ≥ 80% of baseline \|

	## Formal Proofs

	- `proofs/proof_theorem_4_3.tex`: Proves that non-decomposable R with positive cross-manifold curvature guarantees spectral decoupling δ ≥ Ω(κ_R · λ_2(L_D)).
	- `proofs/proof_theorem_6_1.tex`: Proves that an adaptive adversary needs Ω(Vol(M_Q) / V_{d_Q}(ε/κ_R)) oracle queries to reconstruct R.