---
title: "Agentic Reliability Framework MVP"
emoji: "🧠"
colorFrom: "indigo"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.49.1"
app_file: "app.py"
pinned: true
python_version: "3.10"
license: "mit"
---

# 🧠 Agentic Reliability Framework MVP

**Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.**

This project explores **agentic reliability systems** — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.

Built with:
- ⚡ **Gradio 5.49.1** for live visualization & dashboard UI  
- 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support  
- 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory  
- 🔍 **FAISS** for similarity search across past incidents  
- 🔒 **FileLock** for safe concurrent saves in multi-user environments  
- 🤖 **Hugging Face Router Inference API** for adaptive reliability insights  
- ☁️ **Python 3.10** runtime

---

## 🚀 Features

| Capability | Description |
|-------------|--------------|
| **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds |
| **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) |
| **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header |
| **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser |

---

## 🧠 Example Output

✅ **Event Processed (Anomaly)**

Component: api-service
Latency: 224 ms
Error Rate: 0.062
Status: Anomaly
Analysis: Error 404: Not Found
Healing Action: Restarted container (Found 3 similar incidents)


---

## 🧩 Architecture Overview

┌──────────────────────┐
│ Gradio Frontend UI │
└─────────┬────────────┘
│ (submit telemetry)
▼
┌──────────────────────┐
│ FastAPI /add-event │
│ + API Key validation │
└─────────┬────────────┘
│ (call)
▼
┌─────────────────────────────┐
│ Hugging Face Inference API │
│ → Reliability insight text │
└─────────┬───────────────────┘
│
▼
┌─────────────────────────────┐
│ FAISS + Sentence Transformers│
│ → Embedding + similarity map │
└─────────────────────────────┘

---

## 🧾 API Usage

**Endpoint:**  
`POST /add-event`

**Headers:**  
`X-API-Key: <your_api_key>`

**Body:**
```json
{
  "component": "api-service",
  "latency": 200,
  "error_rate": 0.04
}

{
  "status": "ok",
  "event": {
    "timestamp": "2025-11-08 23:29:03",
    "component": "api-service",
    "status": "Anomaly",
    "analysis": "Error 404: Not Found",
    "healing_action": "Restarted container Found 3 similar incidents ..."
  }
}

git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py

Then open http://localhost:7860

🌍 Live Space & Collaboration

👉 Launch Live Demo on Hugging Face

👉 Contribute or Fork on GitHub

🧭 Author

Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
🔗 LinkedIn
 • GitHub

🪪 License

MIT License © 2025 Juan D. Petter