AntiAtropos: The Physics of Autonomous SRE
"Infrastructure is not a static set of configurations; it is a dynamic system of energy, flow, and stability."
Table of Contents
- Demo Video
- The Vision
- The Physics Engine
- Architecture
- Reward Engineering
- Task Curriculum & Results
- Training: RL with Unsloth + Hugging Face Jobs
- Quick Start
Hackathon Submission: We are building for "Theme #3: World Modelling for Professional Tasks."
AntiAtropos governs clusters the way physics governs a pendulumβby minimizing Lyapunov energy. Perfect SLA at 50% lower cost.
Demo Video
AntiAtropos is a Reinforcement Learning environment where an AI agent learns to stabilize a 5-node microservice cluster by treating it as a physical system. Using QLoRA REINFORCE on a Qwen3.5-4B model, the agent is trained to minimize Lyapunov graph energy under a Drift-Plus-Penalty objective that balances stability against infrastructure cost. The trained policy scales predictively, reroutes around failures, and holds the line during traffic surges.
The Vision: Beyond Runbooks
Traditional DevOps relies on static thresholds and "If-This-Then-That" runbooks. This doesn't scale with the complexity of modern microservice DAGs. AntiAtropos moves from reactive scripts to Dynamical System Control.
Agents in AntiAtropos are trained to minimize the Lyapunov Energy of the cluster-balancing the potential energy of backlogs to maintain equilibrium under extreme pressure.
The Physics Engine
AntiAtropos simulates a 5-node cluster with high-fidelity operational dynamics:
- Fluid Queue Dynamics: Requests flow like water through reservoirs (nodes) and pipes (edges). Overloaded nodes create Upstream Backpressure, physically throttling parent service rates.
- Lyapunov Stability: System health is captured by a single scalar Energy Function ($V(s) = \sum w_i Q_i^2$). Squaring queue depths penalizes load concentration, forcing agents to balance the cluster.
- The Hockey-Stick Curve: Implements M/M/1 queueing dynamics where latency explodes exponentially as utilization hits 100%.
- Operational Reality: Includes 5-tick Boot Delays for scaling, traffic reroute decay, and hard safety constraints on VIP nodes.
OpenEnv Specification Compliance
AntiAtropos implements typed OpenEnv interfaces using Pydantic models and an OpenEnv-compatible FastAPI server:
- Action Model:
SREActioninmodels.py(Typed fields for action type, node ID, and parameter). - Observation Model:
ClusterObservation+NodeObservationinmodels.py(High-fidelity telemetry). - Standard API: Implements
reset(),step(action), andstateaccording to the OpenEnv specification. - Manifest:
openenv.yamlat the root defines the runtime (fastapi), app entrypoint (server.app:app), and port (7860).
Action Space (SREAction)
NO_OP: Hold position (essential for cost discipline).SCALE_UP: Expand node capacity (triggering a cold-start delay).SCALE_DOWN: Remove capacity (prioritizing pending/booting pods).REROUTE_TRAFFIC: Shift load from target to healthy peers.SHED_LOAD: Drop traffic fraction (Safety-guarded; forbidden on VIP nodes).
Observation Space (ClusterObservation)
- Global: Step count, Average Latency, Error Rate, Total Backlog, Cost per Hour, Lyapunov Energy.
- Per-Node: Queue depth, Status (HEALTHY/DEGRADED/FAILED), CPU Util, Capacity, Inflow/Outflow rates.
Cluster Architecture & Control Plane
AntiAtropos models a 5-node production DAG with a centralized control plane.
Topology (The Directed Graph)
Traffic flows through a hierarchical structure, enabling realistic cascading failure simulations:
node-0 (VIP Ingress) --+--> node-1 (Checkout)
+--> node-2 (Catalog) --> node-3 (Database)
node-4 (Auth Ingress) --+
- node-0: The VIP Payment Gateway. Business-critical; load shedding is forbidden.
- node-4: Independent ingress for Auth services.
- Backpressure propagation: If
node-3overflows, it throttlesnode-2, which in turn throttlesnode-0.
The Live K8s Bridge
The environment includes a KubernetesExecutor that allows the same agent logic to control a live cluster:
- Binding: Uses
ANTIATROPOS_WORKLOAD_MAPto map simulator "nodes" to real K8s Deployments. - Execution: Translates high-level actions into
patch_namespaced_deployment_scalecalls with transient retry logic. - Reconciliation: Ingests live Prometheus metrics to align simulator state with real infrastructure reality.
Reward Engineering: The Differentiable SRE
Our reward function is grounded in Neely's Drift-Plus-Penalty framework, providing a dense, informative signal:
- Lyapunov Drift ($\Delta V$): Measures the one-tick change in system energy. Negative drift means the cluster is stabilizing.
- Smooth Sigmoid SLA: Dual sigmoids (Latency and Error Rate) provide gradient before a violation.
- Three-Tier Economics: Distinguishes between "Paid-for" Baseline capacity, "Justified" scaling, and "Idle Waste" (penalized 20x).
- Control-Barrier Function: A quadratic "Danger Zone" penalty that fires near catastrophic failure ($Q > 150$).
Task Curriculum & Results
| Task | Category | Weight | Mean Score (Baseline) | Mean Score (Trained) |
|---|---|---|---|---|
| task-1 | Capacity Ramp | 40% | 0.69 | 0.88 |
| task-2 | Fault Tolerance | 30% | 0.70 | 0.82 |
| task-3 | Surge Stability | 30% | 0.21 | 0.94 |
Training: RL with Unsloth + Hugging Face Jobs
All training artifacts β model checkpoints, metrics logs, stderr/stdout, and evaluation plots β are pushed to the Keshav051/antiatropos-qlora Hugging Face Hub repository. Each run lives under its own subdirectory (e.g., run_0011/).
Reference Runs
| Run | Loss Type | Description | Link |
|---|---|---|---|
| run_0011 | REINFORCE + baseline | Reference run β fully converged policy after 500 iterations. This is the canonical trained model discussed in the blog. | View on Hub |
| grpo_run_001 | GRPO | Experimental GRPO run for comparison against the REINFORCE baseline. See the blog for analysis. | View on Hub |
Each run folder contains:
checkpoint-NNNN/β LoRA adapter weights at every 5th iterationmetrics.jsonlβ per-step telemetry for every episode across all iterationseval_results.jsonlβ heuristic vs trained comparison at each evaluation intervalplots/β loss curves, reward curves, and action distribution plotstrain.logβ full stderr/stdout from the training container
Note: The
logs/directory at the project root also contains local copies of key run artifacts for offline inspection.
How Training Works
Training uses two core Hugging Face technologies:
- π€ Hugging Face Jobs β serverless GPU infrastructure. You define the container image, hardware flavor, and command; HF allocates the GPU, runs the job, and streams logs back. No SSH, no cluster management.
- Unsloth RL β 4-bit QLoRA with REINFORCE/GRPO support. The base model (Qwen3.5-4B) is loaded in 4-bit via Unsloth's
FastLanguageModel, and LoRA adapters (rank-64) are trained on top using a custom REINFORCE training loop.
Inside the job container, the AntiAtropos FastAPI simulator starts on CPU (localhost:8000) while the GPU handles model forward/backward passes. This co-located architecture eliminates network latency between action generation and environment feedback.
Launching Training
The only required argument is --run-id β everything else has sensible defaults:
# Minimal launch β 15 iterations, 6 episodes/iter, 20 steps/episode
python training/launch_train.py --run-id run_007
This uses all defaults:
--hub-model-repo=Keshav051/antiatropos-qlora(artifacts pushed here)--num-iterations=15(training iterations)--num-episodes=6(episodes per iteration; 2 per task for curriculum balance)--max-steps=20(max environment steps per episode)--eval-interval=50(evaluate vs heuristic every N iterations β rarely needed for short runs)--checkpoint-interval=5(save checkpoint every N iterations)--plot-interval=10(generate plots every N iterations)--loss-type=reinforce_baseline(REINFORCE with baseline; usegrpofor GRPO)--flavor=a10g-large(NVIDIA A10G, 24 GiB, ~$0.34/hr)--timeout=4h(job timeout)
To override any default, just pass the flag:
# Full training (500 iterations, A10G, ~$7):
python training/launch_train.py --run-id run_012 --num-iterations 500 --num-episodes 6
# GRPO experiment:
python training/launch_train.py --run-id grpo_run_002 --loss-type grpo
# Longer timeout for deep training:
python training/launch_train.py --run-id run_013 --num-iterations 500 --timeout 12h
Prerequisites
pip install "huggingface_hub>=0.25.0"huggingface-cli login(or setHF_TOKENenvironment variable)- A Hugging Face Pro or Team account (required for GPU Jobs)
- The target Hub model repo is auto-created if it doesn't exist
For the full list of options:
python training/launch_train.py --help
Quick Start
Local Installation
pip install -e .
uvicorn server.app:app --host 0.0.0.0 --port 7860
Evaluation & Observation
The inference.py script is the primary tool for validating model performance. It provides a detailed breakdown of episodic reward, SLA compliance, and cluster stability. It is an excellent way to baseline behavior of a new model or compare different training iterations.
To configure the environment, use the .env file. Key "knobs" include:
ENV_URL: The URL of the AntiAtropos simulation server (e.g., your HF Space).MODEL_NAME: The identifier for the model to test (supports Groq, Local, or HF).GROQ_API_KEY: Required if using Groq-based inference for rapid prototyping.ANTIATROPOS_ENV_MODE: Set tosimulatedfor training orlivefor K8s control.
# Set your API key and run the evaluation harness
python inference.py --task all --mode trained
Future Horizons: The Path to Autonomous Cloud Safety
AntiAtropos is the foundation for a new class of Differentiable SRE. Our roadmap includes:
- Multi-Agent Coordination: Training specialized agents (e.g., an "Ingress Governor" and a "Storage Optimizer") to collaborate via shared Lyapunov energy.
- Formal Verification: Using the Lyapunov certificates generated during training to provide mathematical guarantees of stability before an agent is deployed to production.
- Predictive Traffic Shaping: Moving from reactive scaling to predictive world-modeling of seasonal traffic surges.
Built with passion for the 2026 AntiAtropos Hackathon.
