File size: 3,822 Bytes
220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 620d849 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 220196d 047e6c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | ---
title: "Agentic Reliability Framework MVP"
emoji: "π§ "
colorFrom: "indigo"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.49.1"
app_file: "app.py"
pinned: true
python_version: "3.10"
license: "mit"
---
# π§ Agentic Reliability Framework MVP
**Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.**
This project explores **agentic reliability systems** β blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
Built with:
- β‘ **Gradio 5.49.1** for live visualization & dashboard UI
- π§© **FastAPI** for REST endpoints (`/add-event`) with API key support
- π§ **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory
- π **FAISS** for similarity search across past incidents
- π **FileLock** for safe concurrent saves in multi-user environments
- π€ **Hugging Face Router Inference API** for adaptive reliability insights
- βοΈ **Python 3.10** runtime
---
## π Features
| Capability | Description |
|-------------|--------------|
| **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds |
| **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) |
| **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header |
| **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser |
---
## π§ Example Output
β
**Event Processed (Anomaly)**
Component: api-service
Latency: 224 ms
Error Rate: 0.062
Status: Anomaly
Analysis: Error 404: Not Found
Healing Action: Restarted container (Found 3 similar incidents)
---
## π§© Architecture Overview
ββββββββββββββββββββββββ
β Gradio Frontend UI β
βββββββββββ¬βββββββββββββ
β (submit telemetry)
βΌ
ββββββββββββββββββββββββ
β FastAPI /add-event β
β + API Key validation β
βββββββββββ¬βββββββββββββ
β (call)
βΌ
βββββββββββββββββββββββββββββββ
β Hugging Face Inference API β
β β Reliability insight text β
βββββββββββ¬ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β FAISS + Sentence Transformersβ
β β Embedding + similarity map β
βββββββββββββββββββββββββββββββ
---
## π§Ύ API Usage
**Endpoint:**
`POST /add-event`
**Headers:**
`X-API-Key: <your_api_key>`
**Body:**
```json
{
"component": "api-service",
"latency": 200,
"error_rate": 0.04
}
{
"status": "ok",
"event": {
"timestamp": "2025-11-08 23:29:03",
"component": "api-service",
"status": "Anomaly",
"analysis": "Error 404: Not Found",
"healing_action": "Restarted container Found 3 similar incidents ..."
}
}
git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py
Then open http://localhost:7860
π Live Space & Collaboration
π Launch Live Demo on Hugging Face
π Contribute or Fork on GitHub
π§ Author
Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
π LinkedIn
β’ GitHub
πͺͺ License
MIT License Β© 2025 Juan D. Petter
|