| --- |
| title: "Agentic Reliability Framework MVP" |
| emoji: "π§ " |
| colorFrom: "indigo" |
| colorTo: "blue" |
| sdk: "gradio" |
| sdk_version: "5.49.1" |
| app_file: "app.py" |
| pinned: true |
| python_version: "3.10" |
| license: "mit" |
| --- |
| |
| # π§ Agentic Reliability Framework MVP |
|
|
| **Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.** |
|
|
| This project explores **agentic reliability systems** β blending observability, vector-based persistence, and AI inference to create self-healing cloud operations. |
|
|
| Built with: |
| - β‘ **Gradio 5.49.1** for live visualization & dashboard UI |
| - π§© **FastAPI** for REST endpoints (`/add-event`) with API key support |
| - π§ **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory |
| - π **FAISS** for similarity search across past incidents |
| - π **FileLock** for safe concurrent saves in multi-user environments |
| - π€ **Hugging Face Router Inference API** for adaptive reliability insights |
| - βοΈ **Python 3.10** runtime |
|
|
| --- |
|
|
| ## π Features |
|
|
| | Capability | Description | |
| |-------------|--------------| |
| | **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds | |
| | **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries | |
| | **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) | |
| | **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases | |
| | **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header | |
| | **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser | |
|
|
| --- |
|
|
| ## π§ Example Output |
|
|
| β
**Event Processed (Anomaly)** |
|
|
| Component: api-service |
| Latency: 224 ms |
| Error Rate: 0.062 |
| Status: Anomaly |
| Analysis: Error 404: Not Found |
| Healing Action: Restarted container (Found 3 similar incidents) |
|
|
|
|
| --- |
|
|
| ## π§© Architecture Overview |
|
|
| ββββββββββββββββββββββββ |
| β Gradio Frontend UI β |
| βββββββββββ¬βββββββββββββ |
| β (submit telemetry) |
| βΌ |
| ββββββββββββββββββββββββ |
| β FastAPI /add-event β |
| β + API Key validation β |
| βββββββββββ¬βββββββββββββ |
| β (call) |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β Hugging Face Inference API β |
| β β Reliability insight text β |
| βββββββββββ¬ββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββ |
| β FAISS + Sentence Transformersβ |
| β β Embedding + similarity map β |
| βββββββββββββββββββββββββββββββ |
|
|
| --- |
|
|
| ## π§Ύ API Usage |
|
|
| **Endpoint:** |
| `POST /add-event` |
|
|
| **Headers:** |
| `X-API-Key: <your_api_key>` |
|
|
| **Body:** |
| ```json |
| { |
| "component": "api-service", |
| "latency": 200, |
| "error_rate": 0.04 |
| } |
| |
| { |
| "status": "ok", |
| "event": { |
| "timestamp": "2025-11-08 23:29:03", |
| "component": "api-service", |
| "status": "Anomaly", |
| "analysis": "Error 404: Not Found", |
| "healing_action": "Restarted container Found 3 similar incidents ..." |
| } |
| } |
| |
| git clone https://github.com/petterjuan/agentic-reliability-framework.git |
| cd agentic-reliability-framework |
| pip install -r requirements.txt |
| python app.py |
| |
| Then open http://localhost:7860 |
| |
| π Live Space & Collaboration |
| |
| π Launch Live Demo on Hugging Face |
| |
| π Contribute or Fork on GitHub |
| |
| π§ Author |
| |
| Juan D. Petter |
| AI Engineer & Cloud Architect |
| Building Agentic Systems for Scalable Automation | ex-NetApp |
| π LinkedIn |
| β’ GitHub |
| |
| πͺͺ License |
| |
| MIT License Β© 2025 Juan D. Petter |
| |
| |
| |