--- title: "Agentic Reliability Framework MVP" emoji: "🧠" colorFrom: "indigo" colorTo: "blue" sdk: "gradio" sdk_version: "5.49.1" app_file: "app.py" pinned: true python_version: "3.10" license: "mit" --- # 🧠 Agentic Reliability Framework MVP **Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.** This project explores **agentic reliability systems** β€” blending observability, vector-based persistence, and AI inference to create self-healing cloud operations. Built with: - ⚑ **Gradio 5.49.1** for live visualization & dashboard UI - 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support - 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory - πŸ” **FAISS** for similarity search across past incidents - πŸ”’ **FileLock** for safe concurrent saves in multi-user environments - πŸ€– **Hugging Face Router Inference API** for adaptive reliability insights - ☁️ **Python 3.10** runtime --- ## πŸš€ Features | Capability | Description | |-------------|--------------| | **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds | | **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries | | **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) | | **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases | | **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header | | **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser | --- ## 🧠 Example Output βœ… **Event Processed (Anomaly)** Component: api-service Latency: 224 ms Error Rate: 0.062 Status: Anomaly Analysis: Error 404: Not Found Healing Action: Restarted container (Found 3 similar incidents) --- ## 🧩 Architecture Overview β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Gradio Frontend UI β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (submit telemetry) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FastAPI /add-event β”‚ β”‚ + API Key validation β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ (call) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Hugging Face Inference API β”‚ β”‚ β†’ Reliability insight text β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FAISS + Sentence Transformersβ”‚ β”‚ β†’ Embedding + similarity map β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ --- ## 🧾 API Usage **Endpoint:** `POST /add-event` **Headers:** `X-API-Key: ` **Body:** ```json { "component": "api-service", "latency": 200, "error_rate": 0.04 } { "status": "ok", "event": { "timestamp": "2025-11-08 23:29:03", "component": "api-service", "status": "Anomaly", "analysis": "Error 404: Not Found", "healing_action": "Restarted container Found 3 similar incidents ..." } } git clone https://github.com/petterjuan/agentic-reliability-framework.git cd agentic-reliability-framework pip install -r requirements.txt python app.py Then open http://localhost:7860 🌍 Live Space & Collaboration πŸ‘‰ Launch Live Demo on Hugging Face πŸ‘‰ Contribute or Fork on GitHub 🧭 Author Juan D. Petter AI Engineer & Cloud Architect Building Agentic Systems for Scalable Automation | ex-NetApp πŸ”— LinkedIn β€’ GitHub πŸͺͺ License MIT License Β© 2025 Juan D. Petter