petter2025 commited on
Commit
220196d
·
verified ·
1 Parent(s): 047e6c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -32
README.md CHANGED
@@ -1,51 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # 🧠 Agentic Reliability Framework MVP
2
 
3
- **Author:** [Juan D. Petter](https://github.com/petter2025)
4
- **Stack:** Python 3.10 · Gradio 5.49.1 · FastAPI · FAISS · SentenceTransformers · Hugging Face Router API
5
 
6
- ---
7
 
8
- ## 🚀 Overview
 
 
 
 
 
 
 
9
 
10
- The **Agentic Reliability Framework MVP** is an intelligent observability and self-healing system that combines:
11
 
12
- - **Adaptive anomaly detection** (latency/error-rate monitoring)
13
- - **AI-driven root cause analysis** powered by Hugging Face’s Inference Router
14
- - **Persistent FAISS vector memory** for incident recall and similarity search
15
- - **Agentic self-healing simulation** (automated reliability responses)
16
 
17
- This MVP runs entirely in a Hugging Face Space and doubles as both:
18
- - a **visual Gradio dashboard**, and
19
- - a **REST API backend** (via FastAPI).
 
 
 
 
 
20
 
21
  ---
22
 
23
- ## ⚙️ Features
24
 
25
- | Module | Description |
26
- |--------|--------------|
27
- | **Anomaly Detection** | Adaptive thresholding + random test anomalies |
28
- | **AI Insight Generation** | Uses `mistralai/Mixtral-8x7B-Instruct-v0.1` via Hugging Face Router |
29
- | **Self-Healing Simulation** | Randomized corrective actions |
30
- | **Persistent Memory** | FAISS + JSON persistence with file locks |
31
- | **REST API** | `/add-event` route secured with an optional API key |
32
- | **UI Dashboard** | Live anomaly visualization via Gradio Blocks |
33
 
34
- ---
 
 
 
 
 
35
 
36
- ## 🧩 Tech Stack
37
 
38
- | Component | Purpose |
39
- |------------|----------|
40
- | `gradio` | Dashboard / UI |
41
- | `fastapi` | API backend |
42
- | `sentence-transformers` | Embeddings for FAISS |
43
- | `faiss-cpu` | Vector memory store |
44
- | `requests` | Hugging Face Inference Router calls |
45
- | `filelock` | Safe concurrent persistence |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ---
48
 
49
- ## 🧠 Architecture Flow
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
 
 
1
+ ---
2
+ title: "Agentic Reliability Framework MVP"
3
+ emoji: "🧠"
4
+ colorFrom: "indigo"
5
+ colorTo: "blue"
6
+ sdk: "gradio"
7
+ sdk_version: "5.49.1"
8
+ app_file: "app.py"
9
+ pinned: true
10
+ python_version: "3.10"
11
+ license: "mit"
12
+ ---
13
+
14
  # 🧠 Agentic Reliability Framework MVP
15
 
16
+ **Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.**
 
17
 
18
+ This project explores **agentic reliability systems** — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
19
 
20
+ Built with:
21
+ - ⚡ **Gradio 5.49.1** for live visualization & dashboard UI
22
+ - 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support
23
+ - 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory
24
+ - 🔍 **FAISS** for similarity search across past incidents
25
+ - 🔒 **FileLock** for safe concurrent saves in multi-user environments
26
+ - 🤖 **Hugging Face Router Inference API** for adaptive reliability insights
27
+ - ☁️ **Python 3.10** runtime
28
 
29
+ ---
30
 
31
+ ## 🚀 Features
 
 
 
32
 
33
+ | Capability | Description |
34
+ |-------------|--------------|
35
+ | **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds |
36
+ | **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries |
37
+ | **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) |
38
+ | **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases |
39
+ | **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header |
40
+ | **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser |
41
 
42
  ---
43
 
44
+ ## 🧠 Example Output
45
 
46
+ **Event Processed (Anomaly)**
 
 
 
 
 
 
 
47
 
48
+ Component: api-service
49
+ Latency: 224 ms
50
+ Error Rate: 0.062
51
+ Status: Anomaly
52
+ Analysis: Error 404: Not Found
53
+ Healing Action: Restarted container (Found 3 similar incidents)
54
 
 
55
 
56
+ ---
57
+
58
+ ## 🧩 Architecture Overview
59
+
60
+ ┌──────────────────────┐
61
+ Gradio Frontend UI
62
+ └─────────┬────────────┘
63
+ (submit telemetry)
64
+
65
+ ┌──────────────────────┐
66
+ │ FastAPI /add-event │
67
+ │ + API Key validation │
68
+ └─────────┬────────────┘
69
+ │ (call)
70
+
71
+ ┌─────────────────────────────┐
72
+ │ Hugging Face Inference API │
73
+ │ → Reliability insight text │
74
+ └─────────┬───────────────────┘
75
+
76
+
77
+ ┌─────────────────────────────┐
78
+ │ FAISS + Sentence Transformers│
79
+ │ → Embedding + similarity map │
80
+ └─────────────────────────────┘
81
 
82
  ---
83
 
84
+ ## 🧾 API Usage
85
+
86
+ **Endpoint:**
87
+ `POST /add-event`
88
+
89
+ **Headers:**
90
+ `X-API-Key: <your_api_key>`
91
+
92
+ **Body:**
93
+ ```json
94
+ {
95
+ "component": "api-service",
96
+ "latency": 200,
97
+ "error_rate": 0.04
98
+ }
99
+
100
+ {
101
+ "status": "ok",
102
+ "event": {
103
+ "timestamp": "2025-11-08 23:29:03",
104
+ "component": "api-service",
105
+ "status": "Anomaly",
106
+ "analysis": "Error 404: Not Found",
107
+ "healing_action": "Restarted container Found 3 similar incidents ..."
108
+ }
109
+ }
110
+
111
+ git clone https://github.com/petterjuan/agentic-reliability-framework.git
112
+ cd agentic-reliability-framework
113
+ pip install -r requirements.txt
114
+ python app.py
115
+
116
+ Then open http://localhost:7860
117
+
118
+ 🌍 Live Space & Collaboration
119
+
120
+ 👉 Launch Live Demo on Hugging Face
121
+
122
+ 👉 Contribute or Fork on GitHub
123
+
124
+ 🧭 Author
125
+
126
+ Juan D. Petter
127
+ AI Engineer & Cloud Architect
128
+ Building Agentic Systems for Scalable Automation | ex-NetApp
129
+ 🔗 LinkedIn
130
+ • GitHub
131
+
132
+ 🪪 License
133
+
134
+ MIT License © 2025 Juan D. Petter
135
 
136