Spaces:

A-R-F
/

Agentic-Reliability-Framework-API

Running

App Files Files Community

Agentic-Reliability-Framework-API / README.md

petter2025

Update README.md

220196d verified 5 months ago

preview code

raw

history blame

3.82 kB

	---
	title: "Agentic Reliability Framework MVP"
	emoji: "🧠"
	colorFrom: "indigo"
	colorTo: "blue"
	sdk: "gradio"
	sdk_version: "5.49.1"
	app_file: "app.py"
	pinned: true
	python_version: "3.10"
	license: "mit"
	---

	# 🧠 Agentic Reliability Framework MVP

	Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.

	This project explores agentic reliability systems — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.

	Built with:
	- ⚡ Gradio 5.49.1 for live visualization & dashboard UI
	- 🧩 FastAPI for REST endpoints (`/add-event`) with API key support
	- 🧠 Sentence Transformers (`all-MiniLM-L6-v2`) for embedding-based anomaly memory
	- 🔍 FAISS for similarity search across past incidents
	- 🔒 FileLock for safe concurrent saves in multi-user environments
	- 🤖 Hugging Face Router Inference API for adaptive reliability insights
	- ☁️ Python 3.10 runtime

	---

	## 🚀 Features

	\| Capability \| Description \|
	\|-------------\|--------------\|
	\| Adaptive Anomaly Detection \| Detects anomalies dynamically based on latency and error-rate thresholds \|
	\| AI Root Cause Analysis \| Uses the Hugging Face Inference API for contextual one-line incident summaries \|
	\| Self-Healing Actions \| Simulates healing actions (scale-up, restart, etc.) \|
	\| Persistent Memory (FAISS) \| Learns from prior incidents, clusters patterns, and retrieves similar cases \|
	\| Secure REST API \| `/add-event` endpoint secured by `X-API-Key` header \|
	\| Interactive Gradio UI \| Visualize, test, and analyze events live in your browser \|

	---

	## 🧠 Example Output

	✅ Event Processed (Anomaly)

	Component: api-service
	Latency: 224 ms
	Error Rate: 0.062
	Status: Anomaly
	Analysis: Error 404: Not Found
	Healing Action: Restarted container (Found 3 similar incidents)


	---

	## 🧩 Architecture Overview

	┌──────────────────────┐
	│ Gradio Frontend UI │
	└─────────┬────────────┘
	│ (submit telemetry)
	▼
	┌──────────────────────┐
	│ FastAPI /add-event │
	│ + API Key validation │
	└─────────┬────────────┘
	│ (call)
	▼
	┌─────────────────────────────┐
	│ Hugging Face Inference API │
	│ → Reliability insight text │
	└─────────┬───────────────────┘
	│
	▼
	┌─────────────────────────────┐
	│ FAISS + Sentence Transformers│
	│ → Embedding + similarity map │
	└─────────────────────────────┘

	---

	## 🧾 API Usage

	Endpoint:
	`POST /add-event`

	Headers:
	`X-API-Key: <your_api_key>`

	Body:
	```json
	{
	"component": "api-service",
	"latency": 200,
	"error_rate": 0.04
	}

	{
	"status": "ok",
	"event": {
	"timestamp": "2025-11-08 23:29:03",
	"component": "api-service",
	"status": "Anomaly",
	"analysis": "Error 404: Not Found",
	"healing_action": "Restarted container Found 3 similar incidents ..."
	}
	}

	git clone https://github.com/petterjuan/agentic-reliability-framework.git
	cd agentic-reliability-framework
	pip install -r requirements.txt
	python app.py

	Then open http://localhost:7860

	🌍 Live Space & Collaboration

	👉 Launch Live Demo on Hugging Face

	👉 Contribute or Fork on GitHub

	🧭 Author

	Juan D. Petter
	AI Engineer & Cloud Architect
	Building Agentic Systems for Scalable Automation \| ex-NetApp
	🔗 LinkedIn
	• GitHub

	🪪 License

	MIT License © 2025 Juan D. Petter