--- title: AntiAtropos emoji: πŸš€ colorFrom: indigo colorTo: blue sdk: docker app_file: server/app.py pinned: false --- # AntiAtropos: The Physics of Autonomous SRE > **"Infrastructure is not a static set of configurations; it is a dynamic system of energy, flow, and stability."** [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Space-blue)](https://keshav051-antiatropos.hf.space/) [![Code & Infrastructure](https://img.shields.io/badge/%F0%9F%92%BB%20Code-Source-green)](https://huggingface.co/Keshav051/AntiAtropos/tree/main) [![Trained Models & Logs](https://img.shields.io/badge/%F0%9F%A7%A0%20Models-QLoRA-orange)](https://huggingface.co/Keshav051/antiatropos-qlora) [![Demo Video](https://img.shields.io/badge/%F0%9F%93%B9%20Video-Demo-red)](https://youtu.be/46SX0HocpSs) ## Table of Contents - [Demo Video](#demo-video) - [The Vision](#the-vision-beyond-runbooks) - [The Physics Engine](#the-physics-engine) - [Architecture](#architecture) - [Reward Engineering](#reward-engineering-the-differentiable-sre) - [Task Curriculum & Results](#task-curriculum--results) - [Training: RL with Unsloth + Hugging Face Jobs](#training-rl-with-unsloth--hugging-face-jobs) - [Quick Start](#quick-start) --- --- > **Hackathon Submission:** We are building for **"Theme #3: World Modelling for Professional Tasks."** > AntiAtropos governs clusters the way physics governs a pendulumβ€”by minimizing Lyapunov energy. Perfect SLA at **50% lower cost**. ## Demo Video [![AntiAtropos Demo Video](https://img.youtube.com/vi/46SX0HocpSs/0.jpg)](https://youtu.be/46SX0HocpSs) AntiAtropos is a **Reinforcement Learning environment** where an AI agent learns to stabilize a 5-node microservice cluster by treating it as a physical system. Using **QLoRA REINFORCE** on a Qwen3.5-4B model, the agent is trained to minimize Lyapunov graph energy under a Drift-Plus-Penalty objective that balances stability against infrastructure cost. The trained policy scales predictively, reroutes around failures, and holds the line during traffic surges. --- ## The Vision: Beyond Runbooks Traditional DevOps relies on static thresholds and "If-This-Then-That" runbooks. This doesn't scale with the complexity of modern microservice DAGs. AntiAtropos moves from reactive scripts to **Dynamical System Control**. Agents in AntiAtropos are trained to minimize the **Lyapunov Energy** of the cluster-balancing the potential energy of backlogs to maintain equilibrium under extreme pressure. --- ## The Physics Engine AntiAtropos simulates a 5-node cluster with high-fidelity operational dynamics: - **Fluid Queue Dynamics**: Requests flow like water through reservoirs (nodes) and pipes (edges). Overloaded nodes create **Upstream Backpressure**, physically throttling parent service rates. - **Lyapunov Stability**: System health is captured by a single scalar Energy Function ($V(s) = \sum w_i Q_i^2$). Squaring queue depths penalizes load concentration, forcing agents to balance the cluster. - **The Hockey-Stick Curve**: Implements M/M/1 queueing dynamics where latency explodes exponentially as utilization hits 100%. - **Operational Reality**: Includes **5-tick Boot Delays** for scaling, traffic reroute decay, and hard safety constraints on VIP nodes. --- ## OpenEnv Specification Compliance AntiAtropos implements typed OpenEnv interfaces using Pydantic models and an OpenEnv-compatible FastAPI server: - **Action Model**: `SREAction` in `models.py` (Typed fields for action type, node ID, and parameter). - **Observation Model**: `ClusterObservation` + `NodeObservation` in `models.py` (High-fidelity telemetry). - **Standard API**: Implements `reset()`, `step(action)`, and `state` according to the OpenEnv specification. - **Manifest**: `openenv.yaml` at the root defines the runtime (`fastapi`), app entrypoint (`server.app:app`), and port (`7860`). ### Action Space (`SREAction`) - `NO_OP`: Hold position (essential for cost discipline). - `SCALE_UP`: Expand node capacity (triggering a cold-start delay). - `SCALE_DOWN`: Remove capacity (prioritizing pending/booting pods). - `REROUTE_TRAFFIC`: Shift load from target to healthy peers. - `SHED_LOAD`: Drop traffic fraction (Safety-guarded; forbidden on VIP nodes). ### Observation Space (`ClusterObservation`) - **Global**: Step count, Average Latency, Error Rate, Total Backlog, Cost per Hour, Lyapunov Energy. - **Per-Node**: Queue depth, Status (HEALTHY/DEGRADED/FAILED), CPU Util, Capacity, Inflow/Outflow rates. --- ## Cluster Architecture & Control Plane AntiAtropos models a 5-node production DAG with a centralized control plane. ### Topology (The Directed Graph) Traffic flows through a hierarchical structure, enabling realistic cascading failure simulations: ``` node-0 (VIP Ingress) --+--> node-1 (Checkout) +--> node-2 (Catalog) --> node-3 (Database) node-4 (Auth Ingress) --+ ``` - **node-0**: The VIP Payment Gateway. Business-critical; load shedding is forbidden. - **node-4**: Independent ingress for Auth services. - **Backpressure propagation**: If `node-3` overflows, it throttles `node-2`, which in turn throttles `node-0`. ### The Live K8s Bridge The environment includes a `KubernetesExecutor` that allows the same agent logic to control a live cluster: - **Binding**: Uses `ANTIATROPOS_WORKLOAD_MAP` to map simulator "nodes" to real K8s Deployments. - **Execution**: Translates high-level actions into `patch_namespaced_deployment_scale` calls with transient retry logic. - **Reconciliation**: Ingests live Prometheus metrics to align simulator state with real infrastructure reality. --- ## Reward Engineering: The Differentiable SRE Our reward function is grounded in Neely's **Drift-Plus-Penalty** framework, providing a dense, informative signal: 1. **Lyapunov Drift ($\Delta V$)**: Measures the one-tick change in system energy. Negative drift means the cluster is stabilizing. 2. **Smooth Sigmoid SLA**: Dual sigmoids (Latency and Error Rate) provide gradient **before** a violation. 3. **Three-Tier Economics**: Distinguishes between "Paid-for" Baseline capacity, "Justified" scaling, and "Idle Waste" (penalized 20x). 4. **Control-Barrier Function**: A quadratic "Danger Zone" penalty that fires near catastrophic failure ($Q > 150$). --- ## Task Curriculum & Results | Task | Category | Weight | Mean Score (Baseline) | Mean Score (Trained) | |---|---|---|:---:|:---:| | **task-1** | **Capacity Ramp** | 40% | 0.69 | **0.88** | | **task-2** | **Fault Tolerance** | 30% | 0.70 | **0.82** | | **task-3** | **Surge Stability** | 30% | 0.21 | **0.94** | --- ## Training: RL with Unsloth + Hugging Face Jobs All training artifacts β€” model checkpoints, metrics logs, stderr/stdout, and evaluation plots β€” are pushed to the **[Keshav051/antiatropos-qlora](https://huggingface.co/Keshav051/antiatropos-qlora)** Hugging Face Hub repository. Each run lives under its own subdirectory (e.g., `run_0011/`). ### Reference Runs | Run | Loss Type | Description | Link | |-----|-----------|-------------|------| | **run_0011** | REINFORCE + baseline | **Reference run** β€” fully converged policy after 500 iterations. This is the canonical trained model discussed in the blog. | [View on Hub](https://huggingface.co/Keshav051/antiatropos-qlora/tree/main/run_0011) | | **grpo_run_001** | GRPO | Experimental GRPO run for comparison against the REINFORCE baseline. See the blog for analysis. | [View on Hub](https://huggingface.co/Keshav051/antiatropos-qlora/tree/main/grpo_run_001) | Each run folder contains: - `checkpoint-NNNN/` β€” LoRA adapter weights at every 5th iteration - `metrics.jsonl` β€” per-step telemetry for every episode across all iterations - `eval_results.jsonl` β€” heuristic vs trained comparison at each evaluation interval - `plots/` β€” loss curves, reward curves, and action distribution plots - `train.log` β€” full stderr/stdout from the training container > **Note:** The `logs/` directory at the project root also contains local copies of key run artifacts for offline inspection. ### How Training Works Training uses two core Hugging Face technologies: 1. **πŸ€— Hugging Face Jobs** β€” serverless GPU infrastructure. You define the container image, hardware flavor, and command; HF allocates the GPU, runs the job, and streams logs back. No SSH, no cluster management. 2. **Unsloth RL** β€” 4-bit QLoRA with REINFORCE/GRPO support. The base model (Qwen3.5-4B) is loaded in 4-bit via Unsloth's `FastLanguageModel`, and LoRA adapters (rank-64) are trained on top using a custom REINFORCE training loop. Inside the job container, the AntiAtropos FastAPI simulator starts on CPU (localhost:8000) while the GPU handles model forward/backward passes. This **co-located architecture** eliminates network latency between action generation and environment feedback. ### Launching Training The **only required argument** is `--run-id` β€” everything else has sensible defaults: ```bash # Minimal launch β€” 15 iterations, 6 episodes/iter, 20 steps/episode python training/launch_train.py --run-id run_007 ``` This uses all defaults: - **`--hub-model-repo`** = `Keshav051/antiatropos-qlora` (artifacts pushed here) - **`--num-iterations`** = `15` (training iterations) - **`--num-episodes`** = `6` (episodes per iteration; 2 per task for curriculum balance) - **`--max-steps`** = `20` (max environment steps per episode) - **`--eval-interval`** = `50` (evaluate vs heuristic every N iterations β€” rarely needed for short runs) - **`--checkpoint-interval`** = `5` (save checkpoint every N iterations) - **`--plot-interval`** = `10` (generate plots every N iterations) - **`--loss-type`** = `reinforce_baseline` (REINFORCE with baseline; use `grpo` for GRPO) - **`--flavor`** = `a10g-large` (NVIDIA A10G, 24 GiB, ~$0.34/hr) - **`--timeout`** = `4h` (job timeout) To override any default, just pass the flag: ```bash # Full training (500 iterations, A10G, ~$7): python training/launch_train.py --run-id run_012 --num-iterations 500 --num-episodes 6 # GRPO experiment: python training/launch_train.py --run-id grpo_run_002 --loss-type grpo # Longer timeout for deep training: python training/launch_train.py --run-id run_013 --num-iterations 500 --timeout 12h ``` ### Prerequisites 1. `pip install "huggingface_hub>=0.25.0"` 2. `huggingface-cli login` (or set `HF_TOKEN` environment variable) 3. A Hugging Face Pro or Team account (required for GPU Jobs) 4. The target Hub model repo is auto-created if it doesn't exist For the full list of options: ```bash python training/launch_train.py --help ``` --- ## Quick Start ### Local Installation ```bash pip install -e . uvicorn server.app:app --host 0.0.0.0 --port 7860 ``` ### Evaluation & Observation The `inference.py` script is the primary tool for validating model performance. It provides a detailed breakdown of episodic reward, SLA compliance, and cluster stability. It is an excellent way to **baseline behavior** of a new model or compare different training iterations. To configure the environment, use the `.env` file. Key "knobs" include: - `ENV_URL`: The URL of the AntiAtropos simulation server (e.g., your HF Space). - `MODEL_NAME`: The identifier for the model to test (supports Groq, Local, or HF). - `GROQ_API_KEY`: Required if using Groq-based inference for rapid prototyping. - `ANTIATROPOS_ENV_MODE`: Set to `simulated` for training or `live` for K8s control. ```bash # Set your API key and run the evaluation harness python inference.py --task all --mode trained ``` --- --- ## Future Horizons: The Path to Autonomous Cloud Safety AntiAtropos is the foundation for a new class of **Differentiable SRE**. Our roadmap includes: - **Multi-Agent Coordination**: Training specialized agents (e.g., an "Ingress Governor" and a "Storage Optimizer") to collaborate via shared Lyapunov energy. - **Formal Verification**: Using the Lyapunov certificates generated during training to provide mathematical guarantees of stability before an agent is deployed to production. - **Predictive Traffic Shaping**: Moving from reactive scaling to predictive world-modeling of seasonal traffic surges. --- *Built with passion for the 2026 AntiAtropos Hackathon.*