Spaces:

ruslanmv
/

matrix-ai

Sleeping

App Files Files Community

matrix-ai / README.md

ruslanmv

Update README.md

820c76e 3 months ago

preview code

raw

history blame

6.06 kB

metadata

title: matrix-ai
emoji: 🧠
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false

matrix-ai

matrix-ai is the AI planning microservice for the Matrix EcoSystem. It generates short, low‑risk, auditable remediation plans from a compact health context provided by Matrix Guardian. The service is designed for Hugging Face Spaces or Inference Endpoints, but also runs locally.

Endpoints

POST /v1/plan – internal API for Matrix Guardian: returns a safe JSON plan.

POST /v1/chat – (optional) RAG-style Q&A about MatrixHub (kept lightweight in Stage‑1).

The service emphasizes safety, performance, and auditability:

Strict, schema‑validated JSON plans (bounded steps, risk label, rationale)
PII redaction before calling upstream model endpoints
Exponential backoff, short timeouts, and structured JSON logs
In‑memory rate limiting (per‑IP), optional auth for private deployments
ETag support and response caching for non‑mutating reads

Last Updated: 2025‑09‑27 (UTC)

Architecture (at a glance)

flowchart LR
  subgraph Client[Matrix Operators / Observers]
  end

  Client -->|monitor| HubAPI[Matrix‑Hub API]
  Guardian[Matrix‑Guardian
control plane] -->|/v1/plan| AI[matrix‑ai
HF Space]
  Guardian -->|/status,/apps,...| HubAPI
  HubAPI <-->|SQL| DB[(MatrixDB
Postgres)]

  AI -->|HF Inference| HF[Hugging Face
Inference API]

  classDef svc fill:#0ea5e9,stroke:#0b4,stroke-width:1,color:#fff
  classDef db fill:#f59e0b,stroke:#0b4,stroke-width:1,color:#fff
  class Guardian,AI,HubAPI svc
  class DB db

Sequence: `POST /v1/plan`

sequenceDiagram
  participant G as Matrix‑Guardian
  participant A as matrix‑ai
  participant H as HF Inference

  G->>A: POST /v1/plan { context, constraints }
  A->>A: redact PII, validate payload
  A->>H: model.generate prompt [retries, timeout]
  H-->>A: model output text
  A->>A: parse → strict JSON plan fallback if needed
  A-->>G: 200 { plan_id, steps[], risk, explanation }

Quick Start (Local Development)

# 1) Create venv
python3 -m venv .venv
source .venv/bin/activate

# 2) Install deps
pip install -r requirements.txt

# 3) Configure env (local only; use Space Secrets in prod)
export HF_TOKEN="your_hugging_face_token"

# 4) Run
uvicorn app.main:app --host 0.0.0.0 --port 7860

OpenAPI docs: http://localhost:7860/docs

Deploy to Hugging Face Spaces

Push the repository to a new Space.
In Settings → Secrets, add:
- HF_TOKEN (required) – used by the upstream HF Inference client
- ADMIN_TOKEN (optional) – if set, private‑gates /v1/plan and /v1/chat
Choose hardware. CPU is fine for tests; GPU recommended for larger models.
The Space will serve FastAPI on the default port; the two endpoints are ready.

For Inference Endpoints, mirror the same env and start command.

Configuration

All options can be set via environment variables (Space Secrets in HF) or .env for local use.

Variable	Default	Purpose
`HF_TOKEN`	—	Token for Hugging Face Inference API (required)
`MODEL_NAME`	`meta-llama/Meta-Llama-3.1-8B-Instruct`	Upstream model ID (example)
`MAX_NEW_TOKENS`	`256`	Output token cap for plan generations
`TEMPERATURE`	`0.2`	Generation temperature
`RATE_LIMIT_PER_MIN`	`120`	Per‑IP fixed‑window limit
`REQUEST_TIMEOUT_SEC`	`15`	HTTP client timeout to HF
`RETRY_MAX_ATTEMPTS`	`3`	Retry budget to HF
`CACHE_TTL_SEC`	`30`	Optional in‑memory caching for GET
`ADMIN_TOKEN`	—	If set, requires `Authorization: Bearer <ADMIN_TOKEN>`
`LOG_LEVEL`	`INFO`	Log level (JSON logs)

Names are illustrative; keep them in sync with your configs/settings.yaml if present.

API

`POST /v1/plan`

Description: Generate a short, low‑risk remediation plan from a compact app health context.

Headers

Content-Type: application/json
Authorization: Bearer <ADMIN_TOKEN>   # required iff ADMIN_TOKEN set

Request body (example)

{
  "context": {
    "entity_uid": "matrix-ai",
    "health": {"score": 0.64, "status": "degraded", "last_checked": "2025-09-27T00:00:00Z"},
    "recent_checks": [
      {"check": "http", "result": "fail", "latency_ms": 900, "ts": "2025-09-27T00:00:00Z"}
    ]
  },
  "constraints": {"max_steps": 3, "risk": "low"}
}

Response (example)

{
  "plan_id": "pln_01J9YX2H6ZP9R2K9THT2J9F7G4",
  "risk": "low",
  "steps": [
    {"action": "reprobe", "target": "https://service/health", "retries": 2},
    {"action": "pin_lkg", "entity_uid": "matrix-ai"}
  ],
  "explanation": "Transient HTTP failures observed; re-probe and pin to last-known-good if still failing."
}

Status codes

200 – plan generated
400 – invalid payload (schema)
401/403 – missing/invalid bearer (only if ADMIN_TOKEN configured)
429 – rate limited
502 – upstream model error after retries

`POST /v1/chat`

Optional, Stage‑1 placeholder. Given a query about MatrixHub, returns an answer with citations if a local KB is configured.

Safety & Reliability

PII redaction – tokens/emails removed from prompts as a pre‑filter
Strict schema – JSON plan parsing with fallbacks; rejects unsafe shapes
Time‑boxed – short timeouts and bounded retries to HF Inference
Rate‑limited – per‑IP fixed window (configurable)
Structured logs – JSON logs only; no sensitive payloads are logged

Observability

Request IDs (correlated across Guardian ↔ AI)
Latency + retry counters
Plan success/failure metrics (prom‑friendly if you expose metrics)

Development Notes

Keep /v1/plan internal behind a network boundary or ADMIN_TOKEN.
Validate payloads rigorously (Pydantic) and write contract tests for the plan schema.
If you switch models, re‑run golden tests to guard against plan drift.

License

Apache‑2.0