PranavKK1201 commited on
Commit ·
0a957b8
1
Parent(s): 0c74ebd
agent.md and claude.md
Browse files
AGENTS.md
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# AntiAtropos: The Physics of Autonomous SRE
|
| 2 |
+
|
| 3 |
+
> **"Infrastructure is not a static set of configurations; it is a dynamic system of energy, flow, and stability."**
|
| 4 |
+
|
| 5 |
+
## The Vision
|
| 6 |
+
AntiAtropos is a next-generation **Autonomous SRE (Site Reliability Engineering) Control Environment**. While traditional DevOps relies on static thresholds (e.g., "if CPU > 80%"), AntiAtropos treats a microservice cluster as a **Physics Engine**.
|
| 7 |
+
|
| 8 |
+
Our vision is to move from reactive scripts to **Dynamical System Control**. We are building an environment where AI agents don't just "fix things"—they balance the "Potential Energy" of a cluster to maintain equilibrium under extreme pressure.
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## 1. The Physics Engine Concept
|
| 13 |
+
Traditional observability measures metrics; we measure **Stability**. We have modeled our 10-node cluster using **Fluid Queue Dynamics**, treating request flow like water and nodes like reservoirs.
|
| 14 |
+
|
| 15 |
+
### The Lyapunov Potential ($V$)
|
| 16 |
+
The "North Star" of our environment is the **Lyapunov Energy Function**:
|
| 17 |
+
$$V(s) = \sum_{i=1}^{N} w_i \cdot Q_i^2$$
|
| 18 |
+
* **$Q_i$ (Queue Depth):** The "Potential Energy" or mass accumulated in a service.
|
| 19 |
+
* **$w_i$ (Weight):** The "Gravity" or business importance (node-0 is the VIP Payment Gateway).
|
| 20 |
+
* **Cascading Failures:** Our physics engine models "Backlog Pressure," where one failing node can trigger a chain reaction across its neighbors.
|
| 21 |
+
|
| 22 |
+
### Advanced Latency Dynamics (M/M/1)
|
| 23 |
+
We move beyond linear latency models. AntiAtropos implements a **"Hockey-Stick" Latency Curve**. As utilization approaches 100%, latency increases exponentially—modeling the "Point of No Return" that real-world on-call engineers fear.
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## 2. Training Strategy: The Professional Loop
|
| 28 |
+
To build a hackathon-winning agent, we use a complex training pipeline coordinated between **Google Colab** and **Hugging Face**:
|
| 29 |
+
|
| 30 |
+
### Progressive Curriculum Learning
|
| 31 |
+
Agents are not trained at random. They follow a **Curriculum** (`curriculum.py`) that graduates them through increasingly difficult stages:
|
| 32 |
+
1. **Stage 1-3:** Capacity Ramping (Learning to scale).
|
| 33 |
+
2. **Stage 4-5:** Fault Tolerance (Learning to reroute).
|
| 34 |
+
3. **Stage 6-8:** Surge Stability (Learning to balance competing pressures).
|
| 35 |
+
4. **Finals:** Sustained protection under cascading failure conditions.
|
| 36 |
+
|
| 37 |
+
### Episodic Replay Buffer
|
| 38 |
+
Using `replay.py`, our agents maintain a "Long-term Memory" of **Key Transitions**. Instead of relearning from scratch, the model uses **Few-Shot Demonstrations** to see how successful previous strategies were executed.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## 3. Upcoming & Unconfirmed Roadmap
|
| 43 |
+
> [!IMPORTANT]
|
| 44 |
+
> **DISCLAIMER:** The following features are in the research phase and are NOT yet finalized or confirmed. Please consult with the core team before assuming implementation details.
|
| 45 |
+
|
| 46 |
+
* **Multi-Token Attention for SRE:** Investigating the use of frequency-selective transformation to capture "cluster breathiness" (p99 jitter) rather than just global averages.
|
| 47 |
+
* **Graph Neural Network (GNN) Control:** Potential pivot toward modeling the cluster as a dynamic graph to directly manage the "topology of stress."
|
| 48 |
+
* **Cross-Cluster Generalization:** Testing models trained on 10 nodes against 20 and 50 node environments.
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## Why This Wins
|
| 53 |
+
AntiAtropos doesn't follow runbooks. It understands the **laws of motion** within a cluster. By training agents to minimize "System Energy," we create infrastructure that is inherently self-healing, cost-efficient, and mathematically stable.
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
*Created for the 2026 AntiAtropos Hackathon.*
|
CLAUDE.md
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Refer to AGENT.md for instructions
|