div18

readme fix

0c40ab0 about 1 month ago

12.2 kB

	---
	title: AntiAtropos
	emoji: 🚀
	colorFrom: indigo
	colorTo: blue
	sdk: docker
	app_file: server/app.py
	pinned: false
	---

	# AntiAtropos: The Physics of Autonomous SRE

	> "Infrastructure is not a static set of configurations; it is a dynamic system of energy, flow, and stability."

	[![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-blue)](https://keshav051-antiatropos.hf.space/)
	[![Code & Infrastructure](https://img.shields.io/badge/%F0%9F%92%BB%20Code-Source-green)](https://huggingface.co/Keshav051/AntiAtropos/tree/main)
	[![Trained Models & Logs](https://img.shields.io/badge/%F0%9F%A7%A0%20Models-QLoRA-orange)](https://huggingface.co/Keshav051/antiatropos-qlora)
	[![Demo Video](https://img.shields.io/badge/%F0%9F%93%B9%20Video-Demo-red)](https://youtu.be/46SX0HocpSs)

	## Table of Contents
	- [Demo Video](#demo-video)
	- [The Vision](#the-vision-beyond-runbooks)
	- [The Physics Engine](#the-physics-engine)
	- [Architecture](#architecture)
	- [Reward Engineering](#reward-engineering-the-differentiable-sre)
	- [Task Curriculum & Results](#task-curriculum--results)
	- [Training: RL with Unsloth + Hugging Face Jobs](#training-rl-with-unsloth--hugging-face-jobs)
	- [Quick Start](#quick-start)

	---

	---

	> Hackathon Submission: We are building for "Theme #3: World Modelling for Professional Tasks."
	> AntiAtropos governs clusters the way physics governs a pendulum—by minimizing Lyapunov energy. Perfect SLA at 50% lower cost.

	## Demo Video
	[![AntiAtropos Demo Video](https://img.youtube.com/vi/46SX0HocpSs/0.jpg)](https://youtu.be/46SX0HocpSs)

	AntiAtropos is a Reinforcement Learning environment where an AI agent learns to stabilize a 5-node microservice cluster by treating it as a physical system. Using QLoRA REINFORCE on a Qwen3.5-4B model, the agent is trained to minimize Lyapunov graph energy under a Drift-Plus-Penalty objective that balances stability against infrastructure cost. The trained policy scales predictively, reroutes around failures, and holds the line during traffic surges.

	---

	## The Vision: Beyond Runbooks

	Traditional DevOps relies on static thresholds and "If-This-Then-That" runbooks. This doesn't scale with the complexity of modern microservice DAGs. AntiAtropos moves from reactive scripts to Dynamical System Control.

	Agents in AntiAtropos are trained to minimize the Lyapunov Energy of the cluster-balancing the potential energy of backlogs to maintain equilibrium under extreme pressure.

	---

	## The Physics Engine

	AntiAtropos simulates a 5-node cluster with high-fidelity operational dynamics:

	- Fluid Queue Dynamics: Requests flow like water through reservoirs (nodes) and pipes (edges). Overloaded nodes create Upstream Backpressure, physically throttling parent service rates.
	- Lyapunov Stability: System health is captured by a single scalar Energy Function ($V(s) = \sum w_i Q_i^2$). Squaring queue depths penalizes load concentration, forcing agents to balance the cluster.
	- The Hockey-Stick Curve: Implements M/M/1 queueing dynamics where latency explodes exponentially as utilization hits 100%.
	- Operational Reality: Includes 5-tick Boot Delays for scaling, traffic reroute decay, and hard safety constraints on VIP nodes.

	---

	## OpenEnv Specification Compliance

	AntiAtropos implements typed OpenEnv interfaces using Pydantic models and an OpenEnv-compatible FastAPI server:

	- Action Model: `SREAction` in `models.py` (Typed fields for action type, node ID, and parameter).
	- Observation Model: `ClusterObservation` + `NodeObservation` in `models.py` (High-fidelity telemetry).
	- Standard API: Implements `reset()`, `step(action)`, and `state` according to the OpenEnv specification.
	- Manifest: `openenv.yaml` at the root defines the runtime (`fastapi`), app entrypoint (`server.app:app`), and port (`7860`).

	### Action Space (`SREAction`)
	- `NO_OP`: Hold position (essential for cost discipline).
	- `SCALE_UP`: Expand node capacity (triggering a cold-start delay).
	- `SCALE_DOWN`: Remove capacity (prioritizing pending/booting pods).
	- `REROUTE_TRAFFIC`: Shift load from target to healthy peers.
	- `SHED_LOAD`: Drop traffic fraction (Safety-guarded; forbidden on VIP nodes).

	### Observation Space (`ClusterObservation`)
	- Global: Step count, Average Latency, Error Rate, Total Backlog, Cost per Hour, Lyapunov Energy.
	- Per-Node: Queue depth, Status (HEALTHY/DEGRADED/FAILED), CPU Util, Capacity, Inflow/Outflow rates.

	---

	## Cluster Architecture & Control Plane

	AntiAtropos models a 5-node production DAG with a centralized control plane.

	### Topology (The Directed Graph)
	Traffic flows through a hierarchical structure, enabling realistic cascading failure simulations:
	```
	node-0 (VIP Ingress) --+--> node-1 (Checkout)
	+--> node-2 (Catalog) --> node-3 (Database)
	node-4 (Auth Ingress) --+
	```
	- node-0: The VIP Payment Gateway. Business-critical; load shedding is forbidden.
	- node-4: Independent ingress for Auth services.
	- Backpressure propagation: If `node-3` overflows, it throttles `node-2`, which in turn throttles `node-0`.

	### The Live K8s Bridge
	The environment includes a `KubernetesExecutor` that allows the same agent logic to control a live cluster:
	- Binding: Uses `ANTIATROPOS_WORKLOAD_MAP` to map simulator "nodes" to real K8s Deployments.
	- Execution: Translates high-level actions into `patch_namespaced_deployment_scale` calls with transient retry logic.
	- Reconciliation: Ingests live Prometheus metrics to align simulator state with real infrastructure reality.

	---

	## Reward Engineering: The Differentiable SRE

	Our reward function is grounded in Neely's Drift-Plus-Penalty framework, providing a dense, informative signal:

	1. Lyapunov Drift ($\Delta V$): Measures the one-tick change in system energy. Negative drift means the cluster is stabilizing.
	2. Smooth Sigmoid SLA: Dual sigmoids (Latency and Error Rate) provide gradient before a violation.
	3. Three-Tier Economics: Distinguishes between "Paid-for" Baseline capacity, "Justified" scaling, and "Idle Waste" (penalized 20x).
	4. Control-Barrier Function: A quadratic "Danger Zone" penalty that fires near catastrophic failure ($Q > 150$).

	---

	## Task Curriculum & Results

	\| Task \| Category \| Weight \| Mean Score (Baseline) \| Mean Score (Trained) \|
	\|---\|---\|---\|:---:\|:---:\|
	\| task-1 \| Capacity Ramp \| 40% \| 0.69 \| 0.88 \|
	\| task-2 \| Fault Tolerance \| 30% \| 0.70 \| 0.82 \|
	\| task-3 \| Surge Stability \| 30% \| 0.21 \| 0.94 \|

	---

	## Training: RL with Unsloth + Hugging Face Jobs

	All training artifacts — model checkpoints, metrics logs, stderr/stdout, and evaluation plots — are pushed to the [Keshav051/antiatropos-qlora](https://huggingface.co/Keshav051/antiatropos-qlora) Hugging Face Hub repository. Each run lives under its own subdirectory (e.g., `run_0011/`).

	### Reference Runs

	\| Run \| Loss Type \| Description \| Link \|
	\|-----\|-----------\|-------------\|------\|
	\| run_0011 \| REINFORCE + baseline \| Reference run — fully converged policy after 500 iterations. This is the canonical trained model discussed in the blog. \| [View on Hub](https://huggingface.co/Keshav051/antiatropos-qlora/tree/main/run_0011) \|
	\| grpo_run_001 \| GRPO \| Experimental GRPO run for comparison against the REINFORCE baseline. See the blog for analysis. \| [View on Hub](https://huggingface.co/Keshav051/antiatropos-qlora/tree/main/grpo_run_001) \|

	Each run folder contains:
	- `checkpoint-NNNN/` — LoRA adapter weights at every 5th iteration
	- `metrics.jsonl` — per-step telemetry for every episode across all iterations
	- `eval_results.jsonl` — heuristic vs trained comparison at each evaluation interval
	- `plots/` — loss curves, reward curves, and action distribution plots
	- `train.log` — full stderr/stdout from the training container

	> Note: The `logs/` directory at the project root also contains local copies of key run artifacts for offline inspection.

	### How Training Works

	Training uses two core Hugging Face technologies:
	1. 🤗 Hugging Face Jobs — serverless GPU infrastructure. You define the container image, hardware flavor, and command; HF allocates the GPU, runs the job, and streams logs back. No SSH, no cluster management.
	2. Unsloth RL — 4-bit QLoRA with REINFORCE/GRPO support. The base model (Qwen3.5-4B) is loaded in 4-bit via Unsloth's `FastLanguageModel`, and LoRA adapters (rank-64) are trained on top using a custom REINFORCE training loop.

	Inside the job container, the AntiAtropos FastAPI simulator starts on CPU (localhost:8000) while the GPU handles model forward/backward passes. This co-located architecture eliminates network latency between action generation and environment feedback.

	### Launching Training

	The only required argument is `--run-id` — everything else has sensible defaults:

	```bash
	# Minimal launch — 15 iterations, 6 episodes/iter, 20 steps/episode
	python training/launch_train.py --run-id run_007
	```

	This uses all defaults:
	- `--hub-model-repo` = `Keshav051/antiatropos-qlora` (artifacts pushed here)
	- `--num-iterations` = `15` (training iterations)
	- `--num-episodes` = `6` (episodes per iteration; 2 per task for curriculum balance)
	- `--max-steps` = `20` (max environment steps per episode)
	- `--eval-interval` = `50` (evaluate vs heuristic every N iterations — rarely needed for short runs)
	- `--checkpoint-interval` = `5` (save checkpoint every N iterations)
	- `--plot-interval` = `10` (generate plots every N iterations)
	- `--loss-type` = `reinforce_baseline` (REINFORCE with baseline; use `grpo` for GRPO)
	- `--flavor` = `a10g-large` (NVIDIA A10G, 24 GiB, ~$0.34/hr)
	- `--timeout` = `4h` (job timeout)

	To override any default, just pass the flag:

	```bash
	# Full training (500 iterations, A10G, ~$7):
	python training/launch_train.py --run-id run_012 --num-iterations 500 --num-episodes 6

	# GRPO experiment:
	python training/launch_train.py --run-id grpo_run_002 --loss-type grpo

	# Longer timeout for deep training:
	python training/launch_train.py --run-id run_013 --num-iterations 500 --timeout 12h
	```

	### Prerequisites

	1. `pip install "huggingface_hub>=0.25.0"`
	2. `huggingface-cli login` (or set `HF_TOKEN` environment variable)
	3. A Hugging Face Pro or Team account (required for GPU Jobs)
	4. The target Hub model repo is auto-created if it doesn't exist

	For the full list of options:
	```bash
	python training/launch_train.py --help
	```

	---

	## Quick Start

	### Local Installation
	```bash
	pip install -e .
	uvicorn server.app:app --host 0.0.0.0 --port 7860
	```

	### Evaluation & Observation

	The `inference.py` script is the primary tool for validating model performance. It provides a detailed breakdown of episodic reward, SLA compliance, and cluster stability. It is an excellent way to baseline behavior of a new model or compare different training iterations.

	To configure the environment, use the `.env` file. Key "knobs" include:
	- `ENV_URL`: The URL of the AntiAtropos simulation server (e.g., your HF Space).
	- `MODEL_NAME`: The identifier for the model to test (supports Groq, Local, or HF).
	- `GROQ_API_KEY`: Required if using Groq-based inference for rapid prototyping.
	- `ANTIATROPOS_ENV_MODE`: Set to `simulated` for training or `live` for K8s control.

	```bash
	# Set your API key and run the evaluation harness
	python inference.py --task all --mode trained
	```

	---

	---

	## Future Horizons: The Path to Autonomous Cloud Safety

	AntiAtropos is the foundation for a new class of Differentiable SRE. Our roadmap includes:
	- Multi-Agent Coordination: Training specialized agents (e.g., an "Ingress Governor" and a "Storage Optimizer") to collaborate via shared Lyapunov energy.
	- Formal Verification: Using the Lyapunov certificates generated during training to provide mathematical guarantees of stability before an agent is deployed to production.
	- Predictive Traffic Shaping: Moving from reactive scaling to predictive world-modeling of seasonal traffic surges.

	---

	Built with passion for the 2026 AntiAtropos Hackathon.