title: Agentic Reliability Framework MVP
emoji: π§
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
python_version: '3.10'
license: mit
π§ Agentic Reliability Framework MVP
Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.
This project explores agentic reliability systems β blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
Built with:
- β‘ Gradio 5.49.1 for live visualization & dashboard UI
- π§© FastAPI for REST endpoints (
/add-event) with API key support - π§ Sentence Transformers (
all-MiniLM-L6-v2) for embedding-based anomaly memory - π FAISS for similarity search across past incidents
- π FileLock for safe concurrent saves in multi-user environments
- π€ Hugging Face Router Inference API for adaptive reliability insights
- βοΈ Python 3.10 runtime
π Features
| Capability | Description |
|---|---|
| Adaptive Anomaly Detection | Detects anomalies dynamically based on latency and error-rate thresholds |
| AI Root Cause Analysis | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| Self-Healing Actions | Simulates healing actions (scale-up, restart, etc.) |
| Persistent Memory (FAISS) | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| Secure REST API | /add-event endpoint secured by X-API-Key header |
| Interactive Gradio UI | Visualize, test, and analyze events live in your browser |
π§ Example Output
β Event Processed (Anomaly)
Component: api-service Latency: 224 ms Error Rate: 0.062 Status: Anomaly Analysis: Error 404: Not Found Healing Action: Restarted container (Found 3 similar incidents)
π§© Architecture Overview
ββββββββββββββββββββββββ β Gradio Frontend UI β βββββββββββ¬βββββββββββββ β (submit telemetry) βΌ ββββββββββββββββββββββββ β FastAPI /add-event β β + API Key validation β βββββββββββ¬βββββββββββββ β (call) βΌ βββββββββββββββββββββββββββββββ β Hugging Face Inference API β β β Reliability insight text β βββββββββββ¬ββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββ β FAISS + Sentence Transformersβ β β Embedding + similarity map β βββββββββββββββββββββββββββββββ
π§Ύ API Usage
Endpoint:POST /add-event
Headers:X-API-Key: <your_api_key>
Body:
{
"component": "api-service",
"latency": 200,
"error_rate": 0.04
}
{
"status": "ok",
"event": {
"timestamp": "2025-11-08 23:29:03",
"component": "api-service",
"status": "Anomaly",
"analysis": "Error 404: Not Found",
"healing_action": "Restarted container Found 3 similar incidents ..."
}
}
git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py
Then open http://localhost:7860
π Live Space & Collaboration
π Launch Live Demo on Hugging Face
π Contribute or Fork on GitHub
π§ Author
Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
π LinkedIn
β’ GitHub
πͺͺ License
MIT License Β© 2025 Juan D. Petter