petter2025's picture
Update README.md
220196d verified
|
raw
history blame
3.82 kB
---
title: "Agentic Reliability Framework MVP"
emoji: "🧠"
colorFrom: "indigo"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.49.1"
app_file: "app.py"
pinned: true
python_version: "3.10"
license: "mit"
---
# 🧠 Agentic Reliability Framework MVP
**Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.**
This project explores **agentic reliability systems** β€” blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
Built with:
- ⚑ **Gradio 5.49.1** for live visualization & dashboard UI
- 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support
- 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory
- πŸ” **FAISS** for similarity search across past incidents
- πŸ”’ **FileLock** for safe concurrent saves in multi-user environments
- πŸ€– **Hugging Face Router Inference API** for adaptive reliability insights
- ☁️ **Python 3.10** runtime
---
## πŸš€ Features
| Capability | Description |
|-------------|--------------|
| **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds |
| **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) |
| **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header |
| **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser |
---
## 🧠 Example Output
βœ… **Event Processed (Anomaly)**
Component: api-service
Latency: 224 ms
Error Rate: 0.062
Status: Anomaly
Analysis: Error 404: Not Found
Healing Action: Restarted container (Found 3 similar incidents)
---
## 🧩 Architecture Overview
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio Frontend UI β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (submit telemetry)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI /add-event β”‚
β”‚ + API Key validation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (call)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Hugging Face Inference API β”‚
β”‚ β†’ Reliability insight text β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FAISS + Sentence Transformersβ”‚
β”‚ β†’ Embedding + similarity map β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
---
## 🧾 API Usage
**Endpoint:**
`POST /add-event`
**Headers:**
`X-API-Key: <your_api_key>`
**Body:**
```json
{
"component": "api-service",
"latency": 200,
"error_rate": 0.04
}
{
"status": "ok",
"event": {
"timestamp": "2025-11-08 23:29:03",
"component": "api-service",
"status": "Anomaly",
"analysis": "Error 404: Not Found",
"healing_action": "Restarted container Found 3 similar incidents ..."
}
}
git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py
Then open http://localhost:7860
🌍 Live Space & Collaboration
πŸ‘‰ Launch Live Demo on Hugging Face
πŸ‘‰ Contribute or Fork on GitHub
🧭 Author
Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
πŸ”— LinkedIn
β€’ GitHub
πŸͺͺ License
MIT License Β© 2025 Juan D. Petter