matrix-ai / README.md
ruslanmv's picture
Update README.md
820c76e
|
raw
history blame
6.06 kB
metadata
title: matrix-ai
emoji: 🧠
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false

matrix-ai

matrix-ai is the AI planning microservice for the Matrix EcoSystem. It generates short, low‑risk, auditable remediation plans from a compact health context provided by Matrix Guardian. The service is designed for Hugging Face Spaces or Inference Endpoints, but also runs locally.

Endpoints

  • POST /v1/plan – internal API for Matrix Guardian: returns a safe JSON plan.
  • POST /v1/chat – (optional) RAG-style Q&A about MatrixHub (kept lightweight in Stage‑1).

The service emphasizes safety, performance, and auditability:

  • Strict, schema‑validated JSON plans (bounded steps, risk label, rationale)
  • PII redaction before calling upstream model endpoints
  • Exponential backoff, short timeouts, and structured JSON logs
  • In‑memory rate limiting (per‑IP), optional auth for private deployments
  • ETag support and response caching for non‑mutating reads

Last Updated: 2025‑09‑27 (UTC)


Architecture (at a glance)

flowchart LR
  subgraph Client[Matrix Operators / Observers]
  end

  Client -->|monitor| HubAPI[Matrix‑Hub API]
  Guardian[Matrix‑Guardian
control plane] -->|/v1/plan| AI[matrix‑ai
HF Space]
  Guardian -->|/status,/apps,...| HubAPI
  HubAPI <-->|SQL| DB[(MatrixDB
Postgres)]

  AI -->|HF Inference| HF[Hugging Face
Inference API]

  classDef svc fill:#0ea5e9,stroke:#0b4,stroke-width:1,color:#fff
  classDef db fill:#f59e0b,stroke:#0b4,stroke-width:1,color:#fff
  class Guardian,AI,HubAPI svc
  class DB db

Sequence: POST /v1/plan

sequenceDiagram
  participant G as Matrix‑Guardian
  participant A as matrix‑ai
  participant H as HF Inference

  G->>A: POST /v1/plan { context, constraints }
  A->>A: redact PII, validate payload
  A->>H: model.generate prompt [retries, timeout]
  H-->>A: model output text
  A->>A: parse → strict JSON plan fallback if needed
  A-->>G: 200 { plan_id, steps[], risk, explanation }

Quick Start (Local Development)

# 1) Create venv
python3 -m venv .venv
source .venv/bin/activate

# 2) Install deps
pip install -r requirements.txt

# 3) Configure env (local only; use Space Secrets in prod)
export HF_TOKEN="your_hugging_face_token"

# 4) Run
uvicorn app.main:app --host 0.0.0.0 --port 7860

OpenAPI docs: http://localhost:7860/docs


Deploy to Hugging Face Spaces

  1. Push the repository to a new Space.
  2. In Settings → Secrets, add:
    • HF_TOKEN (required) – used by the upstream HF Inference client
    • ADMIN_TOKEN (optional) – if set, private‑gates /v1/plan and /v1/chat
  3. Choose hardware. CPU is fine for tests; GPU recommended for larger models.
  4. The Space will serve FastAPI on the default port; the two endpoints are ready.

For Inference Endpoints, mirror the same env and start command.


Configuration

All options can be set via environment variables (Space Secrets in HF) or .env for local use.

Variable Default Purpose
HF_TOKEN Token for Hugging Face Inference API (required)
MODEL_NAME meta-llama/Meta-Llama-3.1-8B-Instruct Upstream model ID (example)
MAX_NEW_TOKENS 256 Output token cap for plan generations
TEMPERATURE 0.2 Generation temperature
RATE_LIMIT_PER_MIN 120 Per‑IP fixed‑window limit
REQUEST_TIMEOUT_SEC 15 HTTP client timeout to HF
RETRY_MAX_ATTEMPTS 3 Retry budget to HF
CACHE_TTL_SEC 30 Optional in‑memory caching for GET
ADMIN_TOKEN If set, requires Authorization: Bearer <ADMIN_TOKEN>
LOG_LEVEL INFO Log level (JSON logs)

Names are illustrative; keep them in sync with your configs/settings.yaml if present.


API

POST /v1/plan

Description: Generate a short, low‑risk remediation plan from a compact app health context.

Headers

Content-Type: application/json
Authorization: Bearer <ADMIN_TOKEN>   # required iff ADMIN_TOKEN set

Request body (example)

{
  "context": {
    "entity_uid": "matrix-ai",
    "health": {"score": 0.64, "status": "degraded", "last_checked": "2025-09-27T00:00:00Z"},
    "recent_checks": [
      {"check": "http", "result": "fail", "latency_ms": 900, "ts": "2025-09-27T00:00:00Z"}
    ]
  },
  "constraints": {"max_steps": 3, "risk": "low"}
}

Response (example)

{
  "plan_id": "pln_01J9YX2H6ZP9R2K9THT2J9F7G4",
  "risk": "low",
  "steps": [
    {"action": "reprobe", "target": "https://service/health", "retries": 2},
    {"action": "pin_lkg", "entity_uid": "matrix-ai"}
  ],
  "explanation": "Transient HTTP failures observed; re-probe and pin to last-known-good if still failing."
}

Status codes

  • 200 – plan generated
  • 400 – invalid payload (schema)
  • 401/403 – missing/invalid bearer (only if ADMIN_TOKEN configured)
  • 429 – rate limited
  • 502 – upstream model error after retries

POST /v1/chat

Optional, Stage‑1 placeholder. Given a query about MatrixHub, returns an answer with citations if a local KB is configured.


Safety & Reliability

  • PII redaction – tokens/emails removed from prompts as a pre‑filter
  • Strict schema – JSON plan parsing with fallbacks; rejects unsafe shapes
  • Time‑boxed – short timeouts and bounded retries to HF Inference
  • Rate‑limited – per‑IP fixed window (configurable)
  • Structured logs – JSON logs only; no sensitive payloads are logged

Observability

  • Request IDs (correlated across Guardian ↔ AI)
  • Latency + retry counters
  • Plan success/failure metrics (prom‑friendly if you expose metrics)

Development Notes

  • Keep /v1/plan internal behind a network boundary or ADMIN_TOKEN.
  • Validate payloads rigorously (Pydantic) and write contract tests for the plan schema.
  • If you switch models, re‑run golden tests to guard against plan drift.

License

Apache‑2.0