Game-Theoretic Scheduling of Istio Ambient Mesh Waypoint Proxies in Multi-Tenant Kubernetes Clusters

Overview

This project implements a comprehensive framework for game-theoretic scheduling of waypoint proxies in Istio Ambient Mesh deployments within multi-tenant Kubernetes clusters. It includes 5 scheduling strategies (3 from the research blueprint + 2 novel contributions), a complete mathematical framework with formal proofs, and a rigorous evaluation pipeline with statistical testing.

Strategies

Blueprint Strategies

S1: Baseline K8s Default (strategies/s01_baseline_k8s_default.py)
- Emulates default kube-scheduler with LeastRequestedPriority scoring
- Routes tenants to nearest proxy (lowest latency)
- Serves as baseline for comparison
S2: Resource-Aware Heuristic (strategies/s02_resource_aware_heuristic.py)
- Two-phase greedy: efficiency-based placement + LPT load-balancing routing
- 2-approximation guarantee for makespan objective
- O(NM log N) computational complexity
S3: Stackelberg Equilibrium (strategies/s03_stackelberg_equilibrium.py)
- Full bilevel optimization: leader (platform) places proxies, followers (tenants) route traffic
- Exact enumeration for small instances, PGD for larger ones
- Existence guaranteed by Kakutani's fixed-point theorem

Novel Contributions

S4: Potential Game (strategies/s04_potential_game.py) ⭐ NOVEL
- Reformulates routing as a Rosenthal congestion game with exact potential
- Guaranteed convergence via best-response dynamics (finite-time, any initial state)
- Price of Anarchy: PoA ≤ 5/3 for quadratic congestion (tight bound)
- Decentralized: tenants can independently compute best responses
- Pure-strategy Nash equilibrium always exists
S5: VCG Mechanism Design (strategies/s05_vcg_mechanism_design.py) ⭐ NOVEL
- Applies Vickrey-Clarke-Groves mechanism for truthful resource allocation
- Incentive compatible: truthful reporting is a dominant strategy
- Individually rational: all tenants have non-negative utility
- Allocatively efficient: maximizes social welfare
- Computes Clarke pivot payments to ensure truthfulness

Theoretical Results

Property	S1	S2	S3	S4	S5
Equilibrium existence	N/A	N/A	✓	✓	✓
Convergence guarantee	N/A	N/A	Conditional	✓ Always	N/A
PoA bound	None	2-approx	None	≤ 5/3	1 (optimal)
Incentive compatible	✗	✗	✗	✗	✓
Decentralized	✗	✗	✗	✓	✗
Complexity	O(NM)	O(NM log N)	NP-hard	O(N²M·T)	NP-hard

Full proofs in docs/mathematical_proofs.md — includes 15 theorems covering equilibrium existence, uniqueness, convergence rates, PoA bounds, and mechanism design properties.

Project Structure

project/
├── core/
│   ├── system_model.py          # Cluster, tenant, node, latency models
│   ├── payoff_engine.py         # Cost functions, Nash verification
│   └── equilibrium_solver.py    # CVXPY solver, best-response, PGD
├── strategies/
│   ├── s01_baseline_k8s_default.py
│   ├── s02_resource_aware_heuristic.py
│   ├── s03_stackelberg_equilibrium.py
│   ├── s04_potential_game.py          # Novel
│   └── s05_vcg_mechanism_design.py    # Novel
├── data/
│   └── synthetic_generator.py   # Scenario generation (small/medium/large/xlarge)
├── evaluation/
│   ├── metrics.py               # 30+ metrics (latency, fairness, SLA, isolation)
│   ├── visualizations.py        # 8 plot types (CDF, radar, Pareto, heatmap, etc.)
│   └── statistical_tests.py     # Wilcoxon, bootstrap CI, Friedman, Nemenyi
├── docs/
│   └── mathematical_proofs.md   # 15 formal theorems with complete proofs
├── results/                     # Generated plots and JSON results
├── run_all.py                   # Master evaluation runner
└── README.md

Usage

Quick Test

python run_all.py --quick

Full Evaluation

python run_all.py --scenarios small medium large --seeds 30

Custom Run

python run_all.py --scenarios medium --seeds 10 --output ./my_results

Dependencies

numpy scipy cvxpy matplotlib seaborn pandas networkx

Key Design Decisions

Synthetic data generation calibrated to Alibaba cluster-trace and Azure VM trace distributions (log-normal demands, zone-aware latency, discrete capacity tiers)
CVXPY for convex subproblems with SCS solver; greedy fallback for robustness
Pure strategies for potential game — enables Rosenthal potential and strongest convergence guarantees
Numerical verification of theoretical properties (potential game property, Nash equilibrium, incentive compatibility)

Mathematical Framework

The system models N tenants and M nodes with:

Congestion: cong_j(x) = κ(ℓ_j/C_j)^p (polynomial, typically p=2)
Tenant cost: c_i = α·latency + β·contention + γ·isolation
Platform cost: weighted sum of avg latency, resource imbalance, unfairness, isolation

See docs/mathematical_proofs.md for complete formalization.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support