luciferai-devil commited on 24 days ago

Commit

316b3f1

verified ·

1 Parent(s): 2544d6e

Deploy Smriti AI Hugging Face handler

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +187 -0
config.json +11 -0
examples/request_delete.json +6 -0
examples/request_distractor.json +8 -0
examples/request_memory_inject.json +11 -0
examples/request_recall.json +11 -0
handler.py +647 -0
requirements.txt +19 -0
smriti_endpoint_config.yaml +35 -0
smriti_vendor/mempalace/__init__.py +3 -0
smriti_vendor/mempalace/__pycache__/__init__.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/agent.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/api.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/cli.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/core.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/gifp.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/identity_fingerprint.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/knowledge_graph.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/macp.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/mem_palace.cpython-310.pyc +0 -0
smriti_vendor/mempalace/__pycache__/semantic_memory.cpython-310.pyc +0 -0
smriti_vendor/mempalace/agent.py +3 -0
smriti_vendor/mempalace/api.py +3 -0
smriti_vendor/mempalace/cli.py +3 -0
smriti_vendor/mempalace/core.py +3 -0
smriti_vendor/mempalace/gifp.py +3 -0
smriti_vendor/mempalace/identity_fingerprint.py +3 -0
smriti_vendor/mempalace/knowledge_graph.py +3 -0
smriti_vendor/mempalace/macp.py +3 -0
smriti_vendor/mempalace/mem_palace.py +3 -0
smriti_vendor/mempalace/semantic_memory.py +3 -0
smriti_vendor/smriti/__init__.py +115 -0
smriti_vendor/smriti/__main__.py +7 -0
smriti_vendor/smriti/__pycache__/__init__.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/__main__.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/agent.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/api.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/backends.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/cli.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/config.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/core.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/gifp.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/identity_fingerprint.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/knowledge_graph.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/macp.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/mem_palace.cpython-310.pyc +0 -0
smriti_vendor/smriti/__pycache__/semantic_memory.cpython-310.pyc +0 -0
smriti_vendor/smriti/agent.py +262 -0
smriti_vendor/smriti/api.py +538 -0
smriti_vendor/smriti/backends.py +494 -0

README.md ADDED Viewed

	@@ -0,0 +1,187 @@

+---
+license: apache-2.0
+language:
+  - en
+library_name: smriti-ai
+tags:
+  - ai-agent
+  - memory
+  - small-language-models
+  - inference-time-augmentation
+  - semantic-search
+  - knowledge-graph
+  - identity-continuity
+  - rag
+pipeline_tag: text-generation
+---
+# Smriti AI
+## What this is
+Smriti AI is a memory-augmented inference layer for small language models. It adds external memory, semantic retrieval, knowledge-graph recall, identity continuity, and privacy-ready memory deletion without changing base model weights.
+This repository layout is intended for a Hugging Face model-style deployment with a custom `handler.py`. The handler loads a base causal language model or calls a remote model endpoint, wraps it with Smriti AI memory, and returns model responses plus retrieved memories.
+## What this is not
+Smriti AI is not a newly trained foundation model. It is not a fine-tuned model unless a separate fine-tuned checkpoint is explicitly included. It is an inference-time wrapper around a base language model.
+Do not interpret this repository as a standalone model checkpoint. The base model is configured through `BASE_MODEL_ID` or `HF_ENDPOINT_URL`.
+## Research Lineage
+Smriti AI follows four principles:
+- **External memory**: conversational facts live outside model weights in a persistent, inspectable store.
+- **Training-free recall**: relevant facts are retrieved and injected at inference time without fine-tuning the base model.
+- **Identity continuity**: persona evidence is tracked as an embedding fingerprint so outputs can be checked for drift.
+- **Small-model augmentation**: small causal language models can become more useful when paired with explicit memory and retrieval.
+Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results.
+## Architecture
+```text
+User request
+  -> Smriti AI handler
+  -> memory retrieval
+  -> graph retrieval
+  -> identity context
+  -> base model inference
+  -> response
+  -> memory write/update
+```
+The handler supports JSON, SQLite, Redis, and Postgres memory backends. For production, use Redis/Postgres or another external durable store. Do not store private user memory in the Hugging Face model repository.
+## Supported base models
+Smriti AI is model-agnostic for Hugging Face causal language models.
+Supported families depend on the installed `transformers` version and endpoint hardware:
+- Gemma-style causal LMs when available, including the current benchmark path `google/gemma-4-E2B-it`.
+- Llama/Phi/Mistral/Qwen-style causal LMs if supported by the runtime environment.
+- Tiny CPU-safe local smoke-test models such as `sshleifer/tiny-gpt2` for handler validation only.
+Tiny models are useful for endpoint plumbing tests. They are not public Smriti AI quality benchmarks.
+## Evaluation
+Current local Gemma 4-only benchmark artifacts in the main Smriti AI repository report:
+| Evaluation | Baseline Recall | Smriti AI Recall | Notes |
+|---|---:|---:|---|
+| Gemma-style three-fact protocol | 0/3 | 3/3 | Smriti AI recalls all injected facts after distractors. |
+| Five-mode comparison | 0/3 | 3/3 | TF-IDF, Semantic, Semantic+Graph, and Semantic+Graph+Identity all recall 3/3 in the checked-in run. |
+| Original broader protocol rerun | 0/3 | 3/3 | Overall average improves from 0.524 to 0.832 (`+58.9%`) in the current local Gemma 4 CPU rerun. |
+Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results.
+## Privacy
+Smriti AI stores user memory. Treat it as user data.
+- Memory can be encrypted by setting `SMRITI_ENCRYPTION_KEY`.
+- `delete_memory` is supported by the handler.
+- Production deployments should use external memory storage such as Redis/Postgres.
+- Do not store private user memory in the Hugging Face model repository.
+- Public/demo deployments should not receive real PII.
+## Limitations
+- Retrieval quality depends on the quality and specificity of stored memory.
+- Public/demo deployments should not receive real PII.
+- Durable memory requires external backend or persistent endpoint storage.
+- Latency depends on the base model, backend, retrieval mode, and endpoint hardware.
+- A tiny CPU demo model validates handler plumbing but will not produce Gemma-quality answers.
+- If no `BASE_MODEL_ID` or `HF_ENDPOINT_URL` is configured, the handler falls back to memory-only responses.
+## Environment variables
+| Variable | Purpose |
+|---|---|
+| `BASE_MODEL_ID` | Hugging Face model ID to load inside the endpoint. |
+| `HF_ENDPOINT_URL` | Optional remote model endpoint URL. If set, the handler calls this URL instead of loading a local base model. |
+| `HF_TOKEN` | Token for gated/private base models or protected remote endpoints. |
+| `SMRITI_MEMORY_BACKEND` | `json`, `sqlite`, `redis`, or `postgres`. |
+| `SMRITI_MEMORY_PATH` | JSON user-memory directory or SQLite file path. |
+| `REDIS_URL` | External Redis URL. Takes precedence when present. |
+| `POSTGRES_DSN` | External Postgres DSN. Takes precedence when present and Redis is not configured. |
+| `SMRITI_ENCRYPTION_KEY` | Memory encryption key. Do not commit it. |
+| `SMRITI_RETRIEVAL_MODE` | `tfidf`, `semantic`, `semantic_graph`, or `semantic_graph_identity`. |
+| `SMRITI_PUBLIC_DEMO` | `true` or `false`. Use `true` only for non-PII demos. |
+| `SMRITI_MAX_MEMORY_ENTRIES` | Maximum retained entries per user/topic. |
+## How to call the endpoint
+### Chat / fact injection
+```json
+{
+  "inputs": {
+    "operation": "chat",
+    "user_id": "customer-123",
+    "message": "My name is Alex and I am a marine biologist.",
+    "retrieval_mode": "semantic_graph_identity"
+  },
+  "parameters": {
+    "max_new_tokens": 256,
+    "temperature": 0.7,
+    "top_p": 0.9,
+    "return_memories": true
+  }
+}
+```
+### Recall
+```json
+{
+  "inputs": {
+    "operation": "chat",
+    "user_id": "customer-123",
+    "message": "What do you remember about me?",
+    "retrieval_mode": "semantic_graph_identity"
+  },
+  "parameters": {
+    "return_memories": true
+  }
+}
+```
+### Delete memory
+```json
+{
+  "inputs": {
+    "operation": "delete_memory",
+    "user_id": "customer-123"
+  }
+}
+```
+### Health
+```json
+{
+  "inputs": {
+    "operation": "health"
+  }
+}
+```
+## Local test
+```bash
+pip install -r requirements.txt
+BASE_MODEL_ID=sshleifer/tiny-gpt2 \
+SMRITI_MEMORY_BACKEND=json \
+SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json \
+python test_handler_local.py
+```
+## Custom-container deployment
+If the standard Hugging Face handler is insufficient for your model size, CUDA libraries, Redis client policy, or enterprise network requirements, deploy the same files in a custom container. Use the main Smriti AI repository Dockerfiles as the starting point, install this handler, and expose a compatible HTTP API through Hugging Face Inference Endpoints custom container support.

config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "project": "Smriti AI",
+  "base_model": "REPLACE_WITH_BASE_MODEL_ID",
+  "retrieval_mode": "semantic_graph_identity",
+  "memory_backend": "json",
+  "public_demo": false,
+  "max_memory_entries": 1000,
+  "enable_identity": true,
+  "enable_graph": true,
+  "enable_encryption": true
+}

examples/request_delete.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "inputs": {
+    "operation": "delete_memory",
+    "user_id": "demo-user"
+  }
+}

examples/request_distractor.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "inputs": {
+    "operation": "chat",
+    "user_id": "demo-user",
+    "message": "What is the capital of France?",
+    "retrieval_mode": "semantic_graph_identity"
+  }
+}

examples/request_memory_inject.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "inputs": {
+    "operation": "chat",
+    "user_id": "demo-user",
+    "message": "My name is Alex and I am a marine biologist based in Hawaii.",
+    "retrieval_mode": "semantic_graph_identity"
+  },
+  "parameters": {
+    "return_memories": true
+  }
+}

examples/request_recall.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "inputs": {
+    "operation": "chat",
+    "user_id": "demo-user",
+    "message": "What do you remember about me?",
+    "retrieval_mode": "semantic_graph_identity"
+  },
+  "parameters": {
+    "return_memories": true
+  }
+}

handler.py ADDED Viewed

	@@ -0,0 +1,647 @@

+"""Hugging Face custom inference handler for Smriti AI.
+This file is intentionally deployment glue. Core memory, retrieval, graph, and
+identity behavior comes from the installed `smriti` package.
+"""
+from __future__ import annotations
+import json
+import logging
+import os
+import re
+import sys
+import time
+import urllib.error
+import urllib.request
+from pathlib import Path
+from threading import RLock
+from typing import Any, Dict, List, Optional, Tuple
+VENDOR_SRC = Path(__file__).resolve().parent / "smriti_vendor"
+if VENDOR_SRC.exists() and str(VENDOR_SRC) not in sys.path:
+    sys.path.insert(0, str(VENDOR_SRC))
+from smriti import IdentityFingerprint, MemPalaceLite, SmritiAILite  # noqa: E402
+from smriti.backends import (  # noqa: E402
+    JsonBackend,
+    MemoryBackend,
+    MemoryCipher,
+    PostgresBackend,
+    RedisBackend,
+    SqliteBackend,
+)
+LOGGER = logging.getLogger("smriti.hf_handler")
+if not LOGGER.handlers:
+    logging.basicConfig(level=os.getenv("SMRITI_LOG_LEVEL", "INFO"))
+DEFAULT_CONFIG = {
+    "project": "Smriti AI",
+    "base_model": "REPLACE_WITH_BASE_MODEL_ID",
+    "retrieval_mode": "semantic_graph_identity",
+    "memory_backend": "json",
+    "public_demo": False,
+    "max_memory_entries": 1000,
+    "enable_identity": True,
+    "enable_graph": True,
+    "enable_encryption": True,
+}
+class EndpointHandler:
+    """Hugging Face custom inference endpoint handler."""
+    def __init__(self, path: str = ""):
+        self.root = _resolve_root(path)
+        self.config = _load_config(self.root / "config.json")
+        self.lock = RLock()
+        self.memories: Dict[str, MemPalaceLite] = {}
+        self.identities: Dict[str, IdentityFingerprint] = {}
+        self.backend_warning: Optional[str] = None
+        self.base_model_id = _clean_model_id(
+            os.getenv("BASE_MODEL_ID") or self.config.get("base_model", "")
+        )
+        self.endpoint_url = os.getenv("HF_ENDPOINT_URL", "").strip()
+        self.hf_token = os.getenv("HF_TOKEN", "").strip()
+        self.default_retrieval_mode = os.getenv(
+            "SMRITI_RETRIEVAL_MODE",
+            str(self.config.get("retrieval_mode", "semantic_graph_identity")),
+        )
+        self.max_memory_entries = _int_env(
+            "SMRITI_MAX_MEMORY_ENTRIES",
+            int(self.config.get("max_memory_entries", 1000)),
+        )
+        self.public_demo = _bool_env("SMRITI_PUBLIC_DEMO", bool(self.config.get("public_demo", False)))
+        self.enable_graph_default = bool(self.config.get("enable_graph", True))
+        self.enable_identity_default = bool(self.config.get("enable_identity", True))
+        self.enable_encryption = bool(self.config.get("enable_encryption", True))
+        self.backend, self.backend_name = self._init_backend()
+        self.model = None
+        self.tokenizer = None
+        self.device = "cpu"
+        if self.endpoint_url:
+            LOGGER.info(
+                "Smriti AI handler using remote model endpoint; backend=%s retrieval=%s",
+                self.backend_name,
+                self.default_retrieval_mode,
+            )
+        elif self.base_model_id:
+            self._load_local_model(self.base_model_id)
+        else:
+            LOGGER.warning(
+                "No BASE_MODEL_ID or HF_ENDPOINT_URL configured; handler will run memory-only."
+            )
+        LOGGER.info(
+            "Smriti AI handler ready: base_model=%s remote_endpoint=%s backend=%s retrieval=%s encryption=%s public_demo=%s",
+            self.base_model_id or "memory-only",
+            bool(self.endpoint_url),
+            self.backend_name,
+            self.default_retrieval_mode,
+            self.enable_encryption and bool(os.getenv("SMRITI_ENCRYPTION_KEY") or os.getenv("SMRITI_MEMORY_KEY")),
+            self.public_demo,
+        )
+    def __call__(self, data: Dict[str, Any]) -> Dict[str, Any]:
+        start = time.perf_counter()
+        try:
+            inputs, parameters = _normalize_request(data)
+            operation = str(inputs.get("operation", "chat")).lower()
+            if operation == "health":
+                return self._health(start)
+            if operation == "delete_memory":
+                return self._delete_memory(inputs, start)
+            if operation != "chat":
+                return _error(f"Unsupported operation: {operation}", start)
+            return self._chat(inputs, parameters, start)
+        except Exception as exc:  # Defensive boundary for endpoint runtimes.
+            LOGGER.exception("Unhandled Smriti AI handler error")
+            return _error(f"handler_error:{exc.__class__.__name__}: {exc}", start)
+    # ------------------------------------------------------------------
+    # Operation handlers
+    # ------------------------------------------------------------------
+    def _chat(
+        self,
+        inputs: Dict[str, Any],
+        parameters: Dict[str, Any],
+        start: float,
+    ) -> Dict[str, Any]:
+        user_id = str(inputs.get("user_id") or "").strip()
+        message = str(inputs.get("message") or "").strip()
+        topic_id = str(inputs.get("topic_id") or "general").strip() or "general"
+        if not user_id:
+            return _error("user_id is required", start)
+        if not message:
+            return _error("message is required for chat operation", start)
+        retrieval_mode = str(inputs.get("retrieval_mode") or self.default_retrieval_mode)
+        base_retrieval = _base_retrieval_mode(retrieval_mode)
+        include_graph = self.enable_graph_default and "graph" in retrieval_mode
+        identity_enabled = self.enable_identity_default and "identity" in retrieval_mode
+        with self.lock:
+            memory = self._get_memory(user_id, topic_id, base_retrieval)
+            context, retrieved_memories, graph_facts, retrieval_warning = self._retrieve_context(
+                memory,
+                user_id,
+                topic_id,
+                message,
+                include_graph,
+            )
+            identity = self._get_identity(user_id, identity_enabled)
+            agent = SmritiAILite(
+                model=self.model,
+                tokenizer=self.tokenizer,
+                retrieval_mode=base_retrieval,
+                session_id=user_id,
+                topic_id=topic_id,
+                memory=memory,
+                identity=identity,
+                auto_device=False,
+            )
+            agent.build_prompt = lambda user_input: _build_prompt(
+                agent,
+                memory,
+                user_id,
+                topic_id,
+                user_input,
+                include_graph,
+                identity_enabled,
+            )
+            generation_calls = 0
+            def generate(prompt: str, max_tokens: int = 256) -> str:
+                nonlocal generation_calls
+                generation_calls += 1
+                return self._generate_text(prompt, parameters, max_tokens=max_tokens)
+            agent._generate = generate  # type: ignore[method-assign]
+            try:
+                response = agent.chat(message)
+            except Exception as exc:
+                LOGGER.exception("Model generation failed")
+                return _error(f"model_generation_failed:{exc.__class__.__name__}: {exc}", start)
+            response = _stabilize_recall_answer(message, response, retrieved_memories, graph_facts)
+            _replace_last_assistant_history(memory, response)
+            identity_check = agent.identity.evaluate_output(response) if identity_enabled else None
+            save_warning = self._save_memory(user_id, memory)
+        warnings = [item for item in [self.backend_warning, retrieval_warning, save_warning] if item]
+        return {
+            "response": response,
+            "retrieved_memories": retrieved_memories,
+            "graph_facts": graph_facts,
+            "identity": {
+                "enabled": identity_enabled,
+                "drift_score": float(identity_check.distance) if identity_check else 0.0,
+                "refinement_triggered": generation_calls > 1,
+            },
+            "latency_ms": round((time.perf_counter() - start) * 1000, 3),
+            "backend": self.backend_name,
+            "retrieval_mode": retrieval_mode,
+            "warnings": warnings,
+        }
+    def _delete_memory(self, inputs: Dict[str, Any], start: float) -> Dict[str, Any]:
+        user_id = str(inputs.get("user_id") or "").strip()
+        if not user_id:
+            return _error("user_id is required for delete_memory operation", start)
+        with self.lock:
+            existed_cache = self.memories.pop(user_id, None) is not None
+            self.identities.pop(user_id, None)
+            try:
+                deleted_backend = self.backend.delete_user(user_id)
+            except Exception as exc:
+                LOGGER.exception("Memory backend delete failed")
+                return _error(f"backend_delete_failed:{exc.__class__.__name__}: {exc}", start)
+        return {
+            "deleted": bool(existed_cache or deleted_backend),
+            "user_id": user_id,
+            "latency_ms": round((time.perf_counter() - start) * 1000, 3),
+            "backend": self.backend_name,
+        }
+    def _health(self, start: float) -> Dict[str, Any]:
+        return {
+            "status": "ok",
+            "project": "Smriti AI",
+            "base_model": self.base_model_id or ("remote-endpoint" if self.endpoint_url else "memory-only"),
+            "backend": self.backend_name,
+            "retrieval_mode": self.default_retrieval_mode,
+            "latency_ms": round((time.perf_counter() - start) * 1000, 3),
+        }
+    # ------------------------------------------------------------------
+    # Runtime setup
+    # ------------------------------------------------------------------
+    def _init_backend(self) -> Tuple[MemoryBackend, str]:
+        encryption_key = os.getenv("SMRITI_ENCRYPTION_KEY") or os.getenv("SMRITI_MEMORY_KEY")
+        if encryption_key:
+            os.environ["SMRITI_MEMORY_KEY"] = encryption_key
+        cipher = MemoryCipher(encryption_key if self.enable_encryption else None)
+        redis_url = os.getenv("REDIS_URL") or os.getenv("SMRITI_REDIS_URL")
+        postgres_dsn = os.getenv("POSTGRES_DSN") or os.getenv("SMRITI_POSTGRES_DSN")
+        selected = (os.getenv("SMRITI_MEMORY_BACKEND") or self.config.get("memory_backend") or "json").lower()
+        memory_path = os.getenv("SMRITI_MEMORY_PATH", "/tmp/smriti_hf_memory")
+        if redis_url:
+            return RedisBackend(url=redis_url, cipher=cipher), "redis"
+        if postgres_dsn:
+            return PostgresBackend(dsn=postgres_dsn, cipher=cipher), "postgres"
+        if selected == "redis":
+            return RedisBackend(url=redis_url or "redis://localhost:6379/0", cipher=cipher), "redis"
+        if selected in {"postgres", "postgresql"}:
+            return PostgresBackend(dsn=postgres_dsn or "", cipher=cipher), "postgres"
+        if selected == "sqlite":
+            return SqliteBackend(path=memory_path, cipher=cipher), "sqlite"
+        return JsonBackend(root=_json_root(memory_path), cipher=cipher), "json"
+    def _load_local_model(self, model_id: str) -> None:
+        try:
+            import torch
+            from transformers import AutoModelForCausalLM, AutoTokenizer
+        except Exception as exc:
+            raise RuntimeError("Install torch and transformers to load a local base model.") from exc
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        dtype = torch.float32
+        if self.device == "cuda":
+            dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
+        kwargs = {"token": self.hf_token} if self.hf_token else {}
+        self.tokenizer = AutoTokenizer.from_pretrained(model_id, **kwargs)
+        if getattr(self.tokenizer, "pad_token_id", None) is None:
+            self.tokenizer.pad_token = self.tokenizer.eos_token
+        try:
+            self.model = AutoModelForCausalLM.from_pretrained(model_id, dtype=dtype, **kwargs)
+        except TypeError:
+            self.model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=dtype, **kwargs)
+        self.model.to(self.device)
+        self.model.eval()
+        LOGGER.info("Loaded local base model %s on %s", model_id, self.device)
+    # ------------------------------------------------------------------
+    # Memory and generation helpers
+    # ------------------------------------------------------------------
+    def _get_memory(self, user_id: str, topic_id: str, retrieval_mode: str) -> MemPalaceLite:
+        self.backend_warning = None
+        if user_id not in self.memories:
+            state = None
+            try:
+                state = self.backend.load(user_id)
+            except Exception as exc:
+                LOGGER.exception("Memory backend load failed; starting empty memory")
+                self.backend_warning = f"backend_load_failed:{exc.__class__.__name__}"
+            if state:
+                memory = MemPalaceLite.from_dict(
+                    state,
+                    retrieval_mode=retrieval_mode,
+                    session_id=user_id,
+                    topic_id=topic_id,
+                    max_facts=self.max_memory_entries,
+                    max_entries_per_topic=self.max_memory_entries,
+                )
+            else:
+                memory = MemPalaceLite(
+                    retrieval_mode=retrieval_mode,
+                    session_id=user_id,
+                    topic_id=topic_id,
+                    max_facts=self.max_memory_entries,
+                    max_entries_per_topic=self.max_memory_entries,
+                )
+            self.memories[user_id] = memory
+        memory = self.memories[user_id]
+        if memory.retrieval_mode != retrieval_mode:
+            memory = MemPalaceLite.from_dict(
+                memory.to_dict(),
+                retrieval_mode=retrieval_mode,
+                session_id=user_id,
+                topic_id=topic_id,
+                max_facts=self.max_memory_entries,
+                max_entries_per_topic=self.max_memory_entries,
+            )
+            self.memories[user_id] = memory
+        memory.session_id = user_id
+        memory.topic_id = topic_id
+        return memory
+    def _get_identity(self, user_id: str, enabled: bool) -> IdentityFingerprint:
+        if user_id not in self.identities:
+            threshold = 0.35 if enabled else 2.0
+            self.identities[user_id] = IdentityFingerprint(
+                role="helpful AI assistant with persistent memory",
+                threshold=threshold,
+            )
+        identity = self.identities[user_id]
+        if not enabled:
+            identity.threshold = 2.0
+        return identity
+    def _retrieve_context(
+        self,
+        memory: MemPalaceLite,
+        user_id: str,
+        topic_id: str,
+        message: str,
+        include_graph: bool,
+    ) -> Tuple[str, List[str], List[str], Optional[str]]:
+        try:
+            context = memory.get_context(
+                query=message,
+                session_id=user_id,
+                topic_id=topic_id,
+                include_graph=include_graph,
+            )
+            retrieved_memories = memory.retrieve_facts(
+                message,
+                k=5,
+                session_id=user_id,
+                topic_id=topic_id,
+            )
+            graph_facts = _section_bullets(context, "[RELATED GRAPH FACTS]") if include_graph else []
+            return context, retrieved_memories, graph_facts, None
+        except Exception as exc:
+            LOGGER.exception("Memory retrieval failed")
+            return "", [], [], f"retrieval_failed:{exc.__class__.__name__}"
+    def _save_memory(self, user_id: str, memory: MemPalaceLite) -> Optional[str]:
+        try:
+            self.backend.save(user_id, memory.to_dict())
+            return None
+        except Exception as exc:
+            LOGGER.exception("Memory backend save failed")
+            return f"backend_save_failed:{exc.__class__.__name__}"
+    def _generate_text(self, prompt: str, parameters: Dict[str, Any], max_tokens: int = 256) -> str:
+        max_new_tokens = int(parameters.get("max_new_tokens", max_tokens) or max_tokens)
+        temperature = float(parameters.get("temperature", 0.7))
+        top_p = float(parameters.get("top_p", 0.9))
+        if self.endpoint_url:
+            return self._generate_remote(prompt, max_new_tokens, temperature, top_p)
+        if self.model is not None and self.tokenizer is not None:
+            return self._generate_local(prompt, max_new_tokens, temperature, top_p)
+        return _memory_only_answer(prompt)
+    def _generate_local(
+        self,
+        prompt: str,
+        max_new_tokens: int,
+        temperature: float,
+        top_p: float,
+    ) -> str:
+        import torch
+        messages = [{"role": "user", "content": prompt}]
+        try:
+            formatted = self.tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True,
+            )
+        except Exception:
+            formatted = prompt
+        inputs = self.tokenizer(
+            formatted,
+            return_tensors="pt",
+            truncation=True,
+            max_length=2048,
+        )
+        inputs = {key: value.to(self.device) for key, value in inputs.items()}
+        generate_kwargs = {
+            "max_new_tokens": max_new_tokens,
+            "do_sample": temperature > 0,
+            "pad_token_id": getattr(self.tokenizer, "eos_token_id", None),
+        }
+        if temperature > 0:
+            generate_kwargs["temperature"] = temperature
+            generate_kwargs["top_p"] = top_p
+        with torch.inference_mode():
+            output = self.model.generate(**inputs, **generate_kwargs)
+        return self.tokenizer.decode(
+            output[0, inputs["input_ids"].shape[1] :].detach().cpu(),
+            skip_special_tokens=True,
+        ).strip()
+    def _generate_remote(
+        self,
+        prompt: str,
+        max_new_tokens: int,
+        temperature: float,
+        top_p: float,
+    ) -> str:
+        payload = {
+            "inputs": prompt,
+            "parameters": {
+                "max_new_tokens": max_new_tokens,
+                "temperature": temperature,
+                "top_p": top_p,
+            },
+        }
+        headers = {"Content-Type": "application/json"}
+        if self.hf_token:
+            headers["Authorization"] = f"Bearer {self.hf_token}"
+        request = urllib.request.Request(
+            self.endpoint_url,
+            data=json.dumps(payload).encode("utf-8"),
+            headers=headers,
+            method="POST",
+        )
+        try:
+            with urllib.request.urlopen(request, timeout=120) as response:  # noqa: S310
+                raw = response.read().decode("utf-8")
+        except urllib.error.HTTPError as exc:
+            body = exc.read().decode("utf-8", errors="replace")
+            raise RuntimeError(f"remote endpoint HTTP {exc.code}: {body[:300]}") from exc
+        parsed = json.loads(raw)
+        return _extract_generated_text(parsed)
+# ----------------------------------------------------------------------
+# Request, context, and formatting helpers
+# ----------------------------------------------------------------------
+def _resolve_root(path: str) -> Path:
+    if path:
+        root = Path(path).resolve()
+        return root.parent if root.is_file() else root
+    return Path(__file__).resolve().parent
+def _load_config(path: Path) -> Dict[str, Any]:
+    if not path.exists():
+        return dict(DEFAULT_CONFIG)
+    data = json.loads(path.read_text(encoding="utf-8"))
+    config = dict(DEFAULT_CONFIG)
+    config.update(data)
+    return config
+def _normalize_request(data: Dict[str, Any]) -> Tuple[Dict[str, Any], Dict[str, Any]]:
+    if not isinstance(data, dict):
+        raise ValueError("Request body must be a JSON object.")
+    if "inputs" in data:
+        inputs = data.get("inputs") or {}
+        if isinstance(inputs, str):
+            inputs = {"message": inputs}
+        parameters = data.get("parameters") or {}
+    else:
+        inputs = data
+        parameters = data.get("parameters") or {}
+    if not isinstance(inputs, dict) or not isinstance(parameters, dict):
+        raise ValueError("inputs and parameters must be JSON objects.")
+    return inputs, parameters
+def _base_retrieval_mode(mode: str) -> str:
+    return "tfidf" if str(mode).lower().startswith("tfidf") else "semantic"
+def _build_prompt(
+    agent: SmritiAILite,
+    memory: MemPalaceLite,
+    user_id: str,
+    topic_id: str,
+    user_input: str,
+    include_graph: bool,
+    identity_enabled: bool,
+) -> str:
+    identity = agent.identity.get_identity_prompt() if identity_enabled else ""
+    context = memory.get_context(
+        query=user_input,
+        session_id=user_id,
+        topic_id=topic_id,
+        include_graph=include_graph,
+    )
+    parts = [part for part in [identity.strip(), context.strip(), user_input.strip()] if part]
+    return "\n\n".join(parts)
+def _section_bullets(context: str, heading: str) -> List[str]:
+    if heading not in context:
+        return []
+    after = context.split(heading, 1)[1]
+    chunks = re.split(r"\n\[[A-Z ]+\]", after, maxsplit=1)
+    section = chunks[0]
+    bullets = []
+    for line in section.splitlines():
+        cleaned = line.strip()
+        if cleaned.startswith("*"):
+            bullets.append(cleaned.lstrip("* ").strip())
+    return bullets
+def _memory_only_answer(prompt: str) -> str:
+    facts = _section_bullets(prompt, "[REMEMBERED FACTS]")
+    graph = _section_bullets(prompt, "[RELATED GRAPH FACTS]")
+    combined = facts + [item for item in graph if item not in facts]
+    if combined:
+        return "I remember: " + "; ".join(combined[:5])
+    return "Memory updated. No prior relevant context was found."
+def _is_recall_query(message: str) -> bool:
+    lowered = message.lower()
+    return any(
+        phrase in lowered
+        for phrase in [
+            "remember",
+            "what do you know about me",
+            "who am i",
+            "where do i work",
+            "what is my name",
+            "what do i do",
+        ]
+    )
+def _stabilize_recall_answer(
+    message: str,
+    response: str,
+    retrieved_memories: List[str],
+    graph_facts: List[str],
+) -> str:
+    if not _is_recall_query(message):
+        return response
+    combined = retrieved_memories + [item for item in graph_facts if item not in retrieved_memories]
+    if not combined:
+        return response
+    if _mentions_memory_terms(response, combined):
+        return response
+    return "I remember: " + "; ".join(combined[:5])
+def _mentions_memory_terms(response: str, memories: List[str]) -> bool:
+    response_terms = set(re.findall(r"[a-z0-9']{4,}", response.lower()))
+    memory_terms = set()
+    for memory in memories:
+        memory_terms.update(re.findall(r"[a-z0-9']{4,}", memory.lower()))
+    return bool(response_terms & memory_terms)
+def _replace_last_assistant_history(memory: MemPalaceLite, response: str) -> None:
+    if memory.history and memory.history[-1].category == "assistant_output":
+        memory.history[-1].content = "Assistant: " + response[:200]
+def _extract_generated_text(parsed: Any) -> str:
+    if isinstance(parsed, list) and parsed:
+        return _extract_generated_text(parsed[0])
+    if isinstance(parsed, dict):
+        for key in ["generated_text", "response", "text", "output"]:
+            value = parsed.get(key)
+            if isinstance(value, str):
+                return value.strip()
+        if "outputs" in parsed:
+            return _extract_generated_text(parsed["outputs"])
+    if isinstance(parsed, str):
+        return parsed.strip()
+    raise RuntimeError("Remote endpoint did not return generated text.")
+def _json_root(memory_path: str) -> Path:
+    path = Path(memory_path)
+    if path.suffix.lower() in {".json", ".jsonl"}:
+        return path.with_suffix("")
+    return path
+def _clean_model_id(value: str) -> str:
+    value = (value or "").strip()
+    if not value or value == "REPLACE_WITH_BASE_MODEL_ID":
+        return ""
+    return value
+def _bool_env(name: str, default: bool) -> bool:
+    raw = os.getenv(name)
+    if raw is None:
+        return default
+    return raw.strip().lower() in {"1", "true", "yes", "on"}
+def _int_env(name: str, default: int) -> int:
+    try:
+        return int(os.getenv(name, str(default)))
+    except ValueError:
+        return default
+def _error(message: str, start: float) -> Dict[str, Any]:
+    return {
+        "error": message,
+        "latency_ms": round((time.perf_counter() - start) * 1000, 3),
+    }

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+# Smriti AI is not yet assumed to be published on PyPI for this deployment artifact.
+# Until it is published, install the package directly from the GitHub repository.
+git+https://github.com/Luciferai04/smriti-ai.git
+# After PyPI publication, replace the GitHub line above with:
+# smriti-ai>=0.3.1
+transformers
+accelerate
+torch
+sentence-transformers
+faiss-cpu
+networkx
+cryptography
+pydantic
+redis
+psycopg2-binary
+huggingface_hub
+requests

smriti_endpoint_config.yaml ADDED Viewed

	@@ -0,0 +1,35 @@

+# Smriti AI Hugging Face Inference Endpoint configuration template.
+# Values here are documentation defaults. Set real values as endpoint environment
+# variables or managed secrets, not as committed plaintext.
+BASE_MODEL_ID: ""
+HF_ENDPOINT_URL: ""
+HF_TOKEN: ""
+SMRITI_MEMORY_BACKEND: "json"
+SMRITI_MEMORY_PATH: "/data/smriti_memory"
+REDIS_URL: ""
+POSTGRES_DSN: ""
+SMRITI_ENCRYPTION_KEY: ""
+SMRITI_RETRIEVAL_MODE: "semantic_graph_identity"
+SMRITI_PUBLIC_DEMO: "false"
+SMRITI_MAX_MEMORY_ENTRIES: "1000"
+warnings:
+  - Do not commit HF_TOKEN.
+  - Do not commit SMRITI_ENCRYPTION_KEY.
+  - Production memory should use Redis/Postgres or another external durable storage service.
+  - The Hugging Face model repository should not contain user memory files.
+  - Public demo endpoints should not receive real PII.
+variables:
+  BASE_MODEL_ID: Hugging Face model ID to load locally inside the endpoint.
+  HF_ENDPOINT_URL: Optional remote model endpoint URL. If set, Smriti calls it instead of loading BASE_MODEL_ID locally.
+  HF_TOKEN: Hugging Face token for gated/private base models or protected remote endpoints.
+  SMRITI_MEMORY_BACKEND: json | sqlite | redis | postgres.
+  SMRITI_MEMORY_PATH: Path for JSON user-memory directory or SQLite database file.
+  REDIS_URL: External Redis URL. Takes precedence when present.
+  POSTGRES_DSN: External Postgres DSN. Takes precedence when present and REDIS_URL is empty.
+  SMRITI_ENCRYPTION_KEY: Encryption key for user memory. Maps to Smriti's SMRITI_MEMORY_KEY.
+  SMRITI_RETRIEVAL_MODE: tfidf | semantic | semantic_graph | semantic_graph_identity.
+  SMRITI_PUBLIC_DEMO: true | false. Use true only for non-PII demos.
+  SMRITI_MAX_MEMORY_ENTRIES: Maximum fact entries retained per user/topic.

smriti_vendor/mempalace/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Backward-compatible imports for the renamed :mod:`smriti` package."""
2	+
3	+ from smriti import * # noqa: F401,F403

smriti_vendor/mempalace/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (224 Bytes). View file

smriti_vendor/mempalace/__pycache__/agent.cpython-310.pyc ADDED Viewed

Binary file (207 Bytes). View file

smriti_vendor/mempalace/__pycache__/api.cpython-310.pyc ADDED Viewed

Binary file (201 Bytes). View file

smriti_vendor/mempalace/__pycache__/cli.cpython-310.pyc ADDED Viewed

Binary file (201 Bytes). View file

smriti_vendor/mempalace/__pycache__/core.cpython-310.pyc ADDED Viewed

Binary file (204 Bytes). View file

smriti_vendor/mempalace/__pycache__/gifp.cpython-310.pyc ADDED Viewed

Binary file (204 Bytes). View file

smriti_vendor/mempalace/__pycache__/identity_fingerprint.cpython-310.pyc ADDED Viewed

Binary file (252 Bytes). View file

smriti_vendor/mempalace/__pycache__/knowledge_graph.cpython-310.pyc ADDED Viewed

Binary file (237 Bytes). View file

smriti_vendor/mempalace/__pycache__/macp.cpython-310.pyc ADDED Viewed

Binary file (204 Bytes). View file

smriti_vendor/mempalace/__pycache__/mem_palace.cpython-310.pyc ADDED Viewed

Binary file (222 Bytes). View file

smriti_vendor/mempalace/__pycache__/semantic_memory.cpython-310.pyc ADDED Viewed

Binary file (237 Bytes). View file

smriti_vendor/mempalace/agent.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.agent`."""
2	+
3	+ from smriti.agent import * # noqa: F401,F403

smriti_vendor/mempalace/api.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.api`."""
2	+
3	+ from smriti.api import * # noqa: F401,F403

smriti_vendor/mempalace/cli.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.cli`."""
2	+
3	+ from smriti.cli import * # noqa: F401,F403

smriti_vendor/mempalace/core.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.core`."""
2	+
3	+ from smriti.core import * # noqa: F401,F403

smriti_vendor/mempalace/gifp.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.gifp`."""
2	+
3	+ from smriti.gifp import * # noqa: F401,F403

smriti_vendor/mempalace/identity_fingerprint.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.identity_fingerprint`."""
2	+
3	+ from smriti.identity_fingerprint import * # noqa: F401,F403

smriti_vendor/mempalace/knowledge_graph.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.knowledge_graph`."""
2	+
3	+ from smriti.knowledge_graph import * # noqa: F401,F403

smriti_vendor/mempalace/macp.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.macp`."""
2	+
3	+ from smriti.macp import * # noqa: F401,F403

smriti_vendor/mempalace/mem_palace.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.mem_palace`."""
2	+
3	+ from smriti.mem_palace import * # noqa: F401,F403

smriti_vendor/mempalace/semantic_memory.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Compatibility wrapper for :mod:`smriti.semantic_memory`."""
2	+
3	+ from smriti.semantic_memory import * # noqa: F401,F403

smriti_vendor/smriti/__init__.py ADDED Viewed

	@@ -0,0 +1,115 @@

+"""
+Smriti AI — Inference-time memory framework for small language models.
+Smriti AI adds semantic memory, reasoning continuity, and identity
+governance to any HuggingFace causal LM with zero fine-tuning. The name
+comes from smriti, a Sanskrit term associated with memory and remembrance.
+Features:
+- Semantic memory with FAISS-based retrieval
+- Knowledge graph integration
+- Embedding-based identity governance (GIFP v1.0)
+- Multi-user support via API and CLI
+Quick start:
+    from smriti import MemPalaceLite, SmritiAILite
+    memory = MemPalaceLite(retrieval_mode="semantic")
+    agent  = SmritiAILite(model=model, tokenizer=tokenizer)
+    reply  = agent.chat("My name is Jordan and I am a marine biologist.")
+"""
+from .agent import BaselineGemma, GodelAILite, SmritiAILite
+from .backends import (
+    JsonBackend,
+    MemoryBackend,
+    MemoryCipher,
+    PostgresBackend,
+    RedisBackend,
+    SqliteBackend,
+    build_backend,
+)
+from .config import SmritiConfig, configure_environment_from_file, load_config, write_default_config
+from .core import MemoryEntry, MemPalaceLite
+from .gifp import GIFPLite
+from .macp import MACPLite, ReasoningStep
+# New modules for enhanced functionality
+try:
+    from .semantic_memory import (
+        RetrievalResult,
+        SemanticMemory,
+        MemoryEntry as SemanticMemoryEntry,
+    )
+except ImportError:
+    RetrievalResult = None
+    SemanticMemory = None
+    SemanticMemoryEntry = None
+try:
+    from .knowledge_graph import GraphTriple, KnowledgeGraphMemory
+except ImportError:
+    GraphTriple = None
+    KnowledgeGraphMemory = None
+try:
+    from .identity_fingerprint import IdentityCheck, IdentityFingerprint
+except ImportError:
+    IdentityCheck = None
+    IdentityFingerprint = None
+__version__ = "0.3.1"
+__author__ = "Alton Lee Wei Bin (creator35lwb)"
+__all__ = [
+    "MemoryEntry",
+    "MemPalaceLite",
+    "ReasoningStep",
+    "MACPLite",
+    "GIFPLite",
+    "SmritiAILite",
+    "GodelAILite",
+    "BaselineGemma",
+    "MemoryBackend",
+    "MemoryCipher",
+    "JsonBackend",
+    "SqliteBackend",
+    "RedisBackend",
+    "PostgresBackend",
+    "build_backend",
+    "SmritiConfig",
+    "load_config",
+    "configure_environment_from_file",
+    "write_default_config",
+]
+# Add new classes if available
+if SemanticMemory is not None:
+    __all__.extend(["SemanticMemory", "SemanticMemoryEntry", "RetrievalResult"])
+if KnowledgeGraphMemory is not None:
+    __all__.extend(["KnowledgeGraphMemory", "GraphTriple"])
+if IdentityFingerprint is not None:
+    __all__.extend(["IdentityFingerprint", "IdentityCheck"])
+__all__.extend(["api_app", "create_app", "get_memory", "set_agent_factory", "set_memory_backend", "cli_main"])
+def __getattr__(name: str):
+    """Lazy optional API/CLI exports without double-registering Prometheus metrics."""
+    if name in {"api_app", "create_app", "get_memory", "set_agent_factory", "set_memory_backend"}:
+        from .api import app as api_app
+        from .api import create_app, get_memory, set_agent_factory, set_memory_backend
+        values = {
+            "api_app": api_app,
+            "create_app": create_app,
+            "get_memory": get_memory,
+            "set_agent_factory": set_agent_factory,
+            "set_memory_backend": set_memory_backend,
+        }
+        return values[name]
+    if name == "cli_main":
+        from .cli import main as cli_main
+        return cli_main
+    raise AttributeError(name)

smriti_vendor/smriti/__main__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""Run the Smriti AI CLI with `python -m smriti`."""
+from .cli import main
+if __name__ == "__main__":
+    raise SystemExit(main())

smriti_vendor/smriti/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (2.83 kB). View file

smriti_vendor/smriti/__pycache__/__main__.cpython-310.pyc ADDED Viewed

Binary file (266 Bytes). View file

smriti_vendor/smriti/__pycache__/agent.cpython-310.pyc ADDED Viewed

Binary file (8.03 kB). View file

smriti_vendor/smriti/__pycache__/api.cpython-310.pyc ADDED Viewed

Binary file (16.1 kB). View file

smriti_vendor/smriti/__pycache__/backends.cpython-310.pyc ADDED Viewed

Binary file (20.5 kB). View file

smriti_vendor/smriti/__pycache__/cli.cpython-310.pyc ADDED Viewed

Binary file (8.95 kB). View file

smriti_vendor/smriti/__pycache__/config.cpython-310.pyc ADDED Viewed

Binary file (5.37 kB). View file

smriti_vendor/smriti/__pycache__/core.cpython-310.pyc ADDED Viewed

Binary file (13.9 kB). View file

smriti_vendor/smriti/__pycache__/gifp.cpython-310.pyc ADDED Viewed

Binary file (510 Bytes). View file

smriti_vendor/smriti/__pycache__/identity_fingerprint.cpython-310.pyc ADDED Viewed

Binary file (9.65 kB). View file

smriti_vendor/smriti/__pycache__/knowledge_graph.cpython-310.pyc ADDED Viewed

Binary file (13.3 kB). View file

smriti_vendor/smriti/__pycache__/macp.cpython-310.pyc ADDED Viewed

Binary file (2.13 kB). View file

smriti_vendor/smriti/__pycache__/mem_palace.cpython-310.pyc ADDED Viewed

Binary file (289 Bytes). View file

smriti_vendor/smriti/__pycache__/semantic_memory.cpython-310.pyc ADDED Viewed

Binary file (17.3 kB). View file

smriti_vendor/smriti/agent.py ADDED Viewed

	@@ -0,0 +1,262 @@

+import os
+from contextlib import nullcontext
+from typing import Any, Dict, List, Optional, Tuple
+from .core import MemPalaceLite
+from .identity_fingerprint import IdentityFingerprint
+from .macp import MACPLite
+try:
+    import torch
+except Exception:
+    torch = None
+try:
+    from transformers import GenerationConfig
+except Exception:
+    GenerationConfig = None
+class SmritiAILite:
+    """
+    Model-agnostic SLM wrapper with semantic memory, graph memory, reasoning
+    continuity, and GIFP v1.0 identity governance.
+    Pass any pre-loaded HuggingFace causal LM and tokenizer.
+    """
+    def __init__(
+        self,
+        model: Any,
+        tokenizer: Any,
+        memory_path: Optional[str] = None,
+        retrieval_mode: str = "semantic",
+        session_id: str = "default",
+        topic_id: str = "general",
+        memory: Optional[MemPalaceLite] = None,
+        identity: Optional[IdentityFingerprint] = None,
+        auto_device: bool = True,
+    ):
+        self.model = model
+        self.tokenizer = tokenizer
+        self.session_id = session_id
+        self.topic_id = topic_id
+        if memory is not None:
+            self.memory = memory
+        elif memory_path and os.path.exists(memory_path):
+            self.memory = MemPalaceLite.load(memory_path, retrieval_mode=retrieval_mode)
+        else:
+            self.memory = MemPalaceLite(
+                retrieval_mode=retrieval_mode,
+                session_id=session_id,
+                topic_id=topic_id,
+            )
+        self.continuity = MACPLite()
+        self.identity = identity or IdentityFingerprint(
+            role="helpful AI assistant with persistent memory"
+        )
+        self.identity.set_constraints(
+            [
+                "Always be helpful and accurate",
+                "Reference previous context when relevant",
+                "Maintain logical consistency across turns",
+                "Acknowledge uncertainty when present",
+            ]
+        )
+        self.device, self.autocast_dtype = configure_inference_device()
+        if auto_device:
+            self._move_model_to_best_device()
+    def build_prompt(self, user_input: str) -> str:
+        identity = self.identity.get_identity_prompt()
+        ctx = self.memory.get_context(
+            query=user_input,
+            session_id=self.session_id,
+            topic_id=self.topic_id,
+        )
+        if ctx:
+            return identity + "\n" + ctx + "\n\n" + user_input
+        return identity + "\n" + user_input
+    def _generate(self, prompt: str, max_tokens: int = 256) -> str:
+        if torch is None or GenerationConfig is None:
+            raise RuntimeError("torch and transformers are required for model generation.")
+        messages = [{"role": "user", "content": prompt}]
+        try:
+            formatted = self.tokenizer.apply_chat_template(
+                messages, tokenize=False, add_generation_prompt=True
+            )
+        except Exception:
+            formatted = prompt
+        inputs = self.tokenizer(
+            formatted,
+            return_tensors="pt",
+            truncation=True,
+            max_length=2048,
+        )
+        model_device = _model_device(self.model) or self.device
+        inputs = {key: value.to(model_device) for key, value in inputs.items()}
+        cfg = GenerationConfig(
+            max_new_tokens=max_tokens,
+            temperature=0.7,
+            top_p=0.9,
+            do_sample=True,
+            pad_token_id=getattr(self.tokenizer, "eos_token_id", None),
+        )
+        with torch.inference_mode(), _autocast_context(model_device, self.autocast_dtype):
+            out = self.model.generate(**inputs, generation_config=cfg)
+        return self.tokenizer.decode(
+            out[0, inputs["input_ids"].shape[1] :].detach().cpu(),
+            skip_special_tokens=True,
+        ).strip()
+    def chat(self, user_input: str, refine: bool = False) -> str:
+        self.continuity.start_chain(user_input)
+        self.identity.observe_user_input(user_input)
+        prompt = self.build_prompt(user_input)
+        response = self._generate(prompt)
+        context = self.memory.get_context(
+            query=user_input,
+            session_id=self.session_id,
+            topic_id=self.topic_id,
+        )
+        response, identity_check = self.identity.ensure_aligned(
+            response,
+            self._generate,
+            user_input=user_input,
+            context=context,
+        )
+        if refine and identity_check.consistency_score < 0.5:
+            response = self.identity.refinement_pass(
+                self._generate,
+                response,
+                user_input=user_input,
+                context=context,
+            )
+            identity_check = self.identity.evaluate_output(response)
+        self.continuity.add_step(
+            user_input,
+            response,
+            identity_check.consistency_score,
+            "continue" if identity_check.consistency_score > 0.7 else "refine",
+        )
+        for fact in self.memory.extract_facts(response, user_input=user_input):
+            self.memory.add_fact(fact, session_id=self.session_id, topic_id=self.topic_id)
+        self.memory.add_to_history("User: " + user_input, "user_input")
+        self.memory.add_to_history(
+            "Assistant: " + response[:200], "assistant_output"
+        )
+        self.identity.record_behavior(response)
+        return response
+    def save_memory(self, path: str):
+        self.memory.save(path)
+    def load_memory(self, path: str):
+        self.memory = MemPalaceLite.load(path, retrieval_mode=self.memory.retrieval_mode)
+    def get_memory_state(self) -> Dict:
+        return self.memory.to_dict()
+    def get_reasoning_chain(self) -> str:
+        return self.continuity.get_chain_summary()
+    def _move_model_to_best_device(self) -> None:
+        if torch is None or self.device is None or str(self.device) == "cpu":
+            return
+        try:
+            current = _model_device(self.model)
+            if current is not None and str(current).startswith("cuda"):
+                return
+            self.model.to(self.device)
+        except Exception:
+            pass
+class BaselineGemma:
+    """Plain causal LM with no memory, no identity layer, no continuity."""
+    def __init__(self, model: Any, tokenizer: Any, auto_device: bool = True):
+        self.model = model
+        self.tokenizer = tokenizer
+        self._history: List[str] = []
+        self.device, self.autocast_dtype = configure_inference_device()
+        if auto_device and torch is not None and str(self.device) != "cpu":
+            try:
+                self.model.to(self.device)
+            except Exception:
+                pass
+    def chat(self, user_input: str) -> str:
+        if torch is None or GenerationConfig is None:
+            raise RuntimeError("torch and transformers are required for model generation.")
+        messages = [{"role": "user", "content": user_input}]
+        try:
+            prompt = self.tokenizer.apply_chat_template(
+                messages, tokenize=False, add_generation_prompt=True
+            )
+        except Exception:
+            prompt = user_input
+        inputs = self.tokenizer(
+            prompt,
+            return_tensors="pt",
+            truncation=True,
+            max_length=2048,
+        )
+        model_device = _model_device(self.model) or self.device
+        inputs = {key: value.to(model_device) for key, value in inputs.items()}
+        cfg = GenerationConfig(
+            max_new_tokens=getattr(self, "max_new_tokens", 256),
+            temperature=0.7,
+            top_p=0.9,
+            do_sample=True,
+            pad_token_id=getattr(self.tokenizer, "eos_token_id", None),
+        )
+        with torch.inference_mode(), _autocast_context(model_device, self.autocast_dtype):
+            out = self.model.generate(**inputs, generation_config=cfg)
+        response = self.tokenizer.decode(
+            out[0, inputs["input_ids"].shape[1] :].detach().cpu(),
+            skip_special_tokens=True,
+        ).strip()
+        self._history.extend(["User: " + user_input, "Assistant: " + response])
+        return response
+    def reset(self):
+        self._history = []
+def configure_inference_device() -> Tuple[Any, Any]:
+    """Return the preferred torch device and mixed-precision dtype."""
+    if torch is None:
+        return "cpu", None
+    if torch.cuda.is_available():
+        dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
+        return torch.device("cuda"), dtype
+    return torch.device("cpu"), torch.float32
+def _model_device(model: Any) -> Any:
+    try:
+        return next(model.parameters()).device
+    except Exception:
+        return None
+def _autocast_context(device: Any, dtype: Any):
+    if torch is None or dtype is None:
+        return nullcontext()
+    if str(device).startswith("cuda"):
+        return torch.autocast(device_type="cuda", dtype=dtype)
+    return nullcontext()
+# Backwards compatibility for existing user code.
+GodelAILite = SmritiAILite

smriti_vendor/smriti/api.py ADDED Viewed

	@@ -0,0 +1,538 @@

+"""FastAPI layer for multi-user, multi-agent Smriti AI memory access."""
+from __future__ import annotations
+import json
+import logging
+import os
+import time
+import uuid
+from contextlib import contextmanager
+from threading import RLock
+from typing import Any, Callable, Dict, Iterator, List, Optional
+from fastapi import FastAPI, HTTPException, Request, Response
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field
+from prometheus_client import CONTENT_TYPE_LATEST, Counter, Gauge, Histogram, generate_latest
+from .backends import MemoryBackend, build_backend
+from .config import configure_environment_from_file, load_config
+from .core import MemPalaceLite
+AgentFactory = Callable[..., Any]
+USER_MEMORIES: Dict[str, MemPalaceLite] = {}
+MEMORY_LOCK = RLock()
+MEMORY_BACKEND: Optional[MemoryBackend] = None
+AGENT_FACTORY: Optional[AgentFactory] = None
+LOGGER = logging.getLogger("smriti.api")
+LOGGER.setLevel(logging.INFO)
+if not LOGGER.handlers:
+    _handler = logging.StreamHandler()
+    _handler.setFormatter(logging.Formatter("%(levelname)s:%(name)s:%(message)s"))
+    LOGGER.addHandler(_handler)
+LOGGER.propagate = False
+HTTP_REQUESTS = Counter(
+    "smriti_http_requests_total",
+    "Total HTTP requests handled by the Smriti AI API.",
+    ("method", "path", "status"),
+)
+HTTP_ERRORS = Counter(
+    "smriti_http_errors_total",
+    "Total HTTP requests that completed with status >= 500.",
+    ("method", "path"),
+)
+HTTP_LATENCY = Histogram(
+    "smriti_http_request_latency_seconds",
+    "End-to-end HTTP request latency.",
+    ("method", "path"),
+)
+RETRIEVAL_LATENCY = Histogram(
+    "smriti_retrieval_latency_seconds",
+    "Memory retrieval latency for chat requests.",
+    ("retrieval_mode",),
+)
+TOKEN_USAGE = Counter(
+    "smriti_tokens_total",
+    "Approximate whitespace-token count observed by the API.",
+    ("user_id", "agent_id"),
+)
+USER_MEMORY_COUNT = Gauge(
+    "smriti_user_memories",
+    "Number of in-memory user memory stores.",
+)
+USER_MEMORY_BYTES = Gauge(
+    "smriti_user_memory_bytes",
+    "Approximate serialized memory size by user.",
+    ("user_id",),
+)
+class ChatRequest(BaseModel):
+    user_id: str
+    message: str
+    topic_id: str = "general"
+    agent_id: str = "executor"
+    retrieval_mode: str = "semantic"
+class ChatResponse(BaseModel):
+    user_id: str
+    agent_id: str
+    topic_id: str
+    response: str
+    retrieved_context: str
+    memory: Dict[str, Any]
+class MemoryLoadRequest(BaseModel):
+    user_id: str
+    memory: Optional[Dict[str, Any]] = None
+    path: Optional[str] = None
+    retrieval_mode: str = "semantic"
+class MemorySaveRequest(BaseModel):
+    user_id: str
+    path: Optional[str] = None
+class MemoryDeleteRequest(BaseModel):
+    user_id: str
+    path: Optional[str] = None
+class GraphQueryRequest(BaseModel):
+    user_id: str
+    query_entity: str
+    topic_id: Optional[str] = None
+    depth: int = Field(default=1, ge=1, le=4)
+def set_agent_factory(factory: Optional[AgentFactory]) -> None:
+    """
+    Register a callable that returns a configured model agent.
+    The callable receives `user_id`, `memory`, `topic_id`, and `agent_id`.
+    When no factory is configured, `/chat` runs in memory-only mode.
+    """
+    global AGENT_FACTORY
+    AGENT_FACTORY = factory
+def set_memory_backend(backend: Optional[MemoryBackend]) -> None:
+    """Override the configured persistence backend for tests or deployments."""
+    global MEMORY_BACKEND
+    MEMORY_BACKEND = backend
+def get_memory_backend() -> MemoryBackend:
+    """Return the configured durable backend, constructing it lazily from env."""
+    global MEMORY_BACKEND
+    if MEMORY_BACKEND is None:
+        configure_environment_from_file()
+        MEMORY_BACKEND = build_backend()
+    return MEMORY_BACKEND
+def get_memory(user_id: str, retrieval_mode: str = "semantic") -> MemPalaceLite:
+    with MEMORY_LOCK:
+        if user_id not in USER_MEMORIES:
+            state = None
+            try:
+                state = get_memory_backend().load(user_id)
+            except Exception:
+                LOGGER.exception("Durable memory load failed; starting empty memory")
+            if state:
+                memory = MemPalaceLite.from_dict(state, retrieval_mode=retrieval_mode)
+                memory.session_id = user_id
+            else:
+                memory = MemPalaceLite(
+                    retrieval_mode=retrieval_mode,
+                    session_id=user_id,
+                )
+            USER_MEMORIES[user_id] = memory
+            USER_MEMORY_COUNT.set(len(USER_MEMORIES))
+        return USER_MEMORIES[user_id]
+def create_app() -> FastAPI:
+    config = configure_environment_from_file()
+    app = FastAPI(
+        title="Smriti AI API",
+        version="0.3.1",
+        description="Semantic memory, knowledge graph and identity governance API.",
+    )
+    app.add_middleware(
+        CORSMiddleware,
+        allow_origins=config.cors_origins,
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+    @app.middleware("http")
+    async def request_observability(request: Request, call_next: Callable[..., Any]) -> Response:
+        request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
+        path = request.url.path
+        start = time.perf_counter()
+        status_code = 500
+        try:
+            _enforce_api_key(request)
+            response = await call_next(request)
+            status_code = response.status_code
+        except HTTPException as exc:
+            status_code = exc.status_code
+            response = Response(
+                content=f'{{"detail":"{exc.detail}"}}',
+                status_code=exc.status_code,
+                media_type="application/json",
+            )
+        except Exception:
+            LOGGER.exception(
+                "Unhandled API request failure",
+                extra={"request_id": request_id, "path": path},
+            )
+            response = Response(
+                content='{"detail":"Internal server error"}',
+                status_code=500,
+                media_type="application/json",
+            )
+        duration = time.perf_counter() - start
+        HTTP_LATENCY.labels(request.method, path).observe(duration)
+        HTTP_REQUESTS.labels(request.method, path, str(status_code)).inc()
+        if status_code >= 500:
+            HTTP_ERRORS.labels(request.method, path).inc()
+        USER_MEMORY_COUNT.set(len(USER_MEMORIES))
+        response.headers["x-request-id"] = request_id
+        LOGGER.info(
+            "request completed request_id=%s method=%s path=%s status=%s duration_s=%.6f",
+            request_id,
+            request.method,
+            path,
+            status_code,
+            duration,
+        )
+        return response
+    @app.get("/health")
+    def health() -> Dict[str, Any]:
+        return {"status": "ok", "users": len(USER_MEMORIES)}
+    @app.get("/metrics")
+    def metrics() -> Response:
+        return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)
+    @app.post("/chat", response_model=ChatResponse)
+    def chat(request: ChatRequest) -> ChatResponse:
+        memory = get_memory(request.user_id, retrieval_mode=request.retrieval_mode)
+        memory.retrieval_mode = request.retrieval_mode
+        memory.topic_id = request.topic_id
+        context, degraded, warnings = _safe_get_context(
+            memory,
+            query=request.message,
+            session_id=request.user_id,
+            topic_id=request.topic_id,
+            retrieval_mode=request.retrieval_mode,
+        )
+        if AGENT_FACTORY is not None:
+            agent = _build_agent(
+                AGENT_FACTORY,
+                user_id=request.user_id,
+                memory=memory,
+                topic_id=request.topic_id,
+                agent_id=request.agent_id,
+            )
+            with MEMORY_LOCK:
+                try:
+                    response = agent.chat(request.message)
+                except Exception as exc:
+                    LOGGER.exception("Agent factory chat failed")
+                    degraded = True
+                    warnings.append(f"agent_failure:{exc.__class__.__name__}")
+                    response = _memory_only_response(context)
+                state = memory.to_dict()
+        else:
+            response = _memory_only_response(context)
+            with MEMORY_LOCK:
+                _safe_update_memory(
+                    memory,
+                    request.message,
+                    response,
+                    request.user_id,
+                    request.topic_id,
+                    warnings,
+                )
+                state = memory.to_dict()
+                _persist_if_configured(request.user_id, state, warnings)
+        TOKEN_USAGE.labels(request.user_id, request.agent_id).inc(
+            _count_tokens(request.message) + _count_tokens(response)
+        )
+        state["_degraded"] = degraded
+        state["_warnings"] = warnings
+        return ChatResponse(
+            user_id=request.user_id,
+            agent_id=request.agent_id,
+            topic_id=request.topic_id,
+            response=response,
+            retrieved_context=context,
+            memory=state,
+        )
+    @app.post("/memory/load")
+    def load_memory(request: MemoryLoadRequest) -> Dict[str, Any]:
+        with MEMORY_LOCK:
+            if request.path:
+                memory = MemPalaceLite.load(
+                    request.path,
+                    retrieval_mode=request.retrieval_mode,
+                )
+            elif request.memory:
+                memory = MemPalaceLite.from_dict(
+                    request.memory or {},
+                    retrieval_mode=request.retrieval_mode,
+                )
+            else:
+                state = get_memory_backend().load(request.user_id)
+                if state is None:
+                    raise HTTPException(status_code=404, detail="No memory found for user.")
+                memory = MemPalaceLite.from_dict(
+                    state,
+                    retrieval_mode=request.retrieval_mode,
+                )
+            memory.session_id = request.user_id
+            USER_MEMORIES[request.user_id] = memory
+            return memory.to_dict()
+    @app.post("/memory/save")
+    def save_memory(request: MemorySaveRequest) -> Dict[str, Any]:
+        memory = get_memory(request.user_id)
+        with MEMORY_LOCK:
+            if request.path:
+                memory.save(request.path)
+            state = memory.to_dict()
+            if not request.path:
+                get_memory_backend().save(request.user_id, state)
+                _observe_memory_size(request.user_id, state)
+            return state
+    @app.post("/memory/delete")
+    def delete_memory(request: MemoryDeleteRequest) -> Dict[str, Any]:
+        with MEMORY_LOCK:
+            existed = USER_MEMORIES.pop(request.user_id, None) is not None
+            USER_MEMORY_COUNT.set(len(USER_MEMORIES))
+            deleted_file = False
+            if request.path and os.path.exists(request.path):
+                os.remove(request.path)
+                deleted_file = True
+            deleted_backend = False
+            try:
+                deleted_backend = get_memory_backend().delete_user(request.user_id)
+            except Exception:
+                LOGGER.exception("Durable memory deletion failed")
+            try:
+                USER_MEMORY_BYTES.remove(request.user_id)
+            except Exception:
+                pass
+            return {
+                "user_id": request.user_id,
+                "deleted_memory": existed,
+                "deleted_file": deleted_file,
+                "deleted_backend": deleted_backend,
+                "remaining_users": len(USER_MEMORIES),
+            }
+    @app.post("/graph/query")
+    def graph_query(request: GraphQueryRequest) -> Dict[str, Any]:
+        memory = get_memory(request.user_id)
+        try:
+            triples = memory.knowledge_graph.query_graph(
+                request.user_id,
+                request.query_entity,
+                depth=request.depth,
+                topic_id=request.topic_id,
+            )
+            degraded = False
+            warnings: List[str] = []
+        except Exception as exc:
+            LOGGER.exception("Knowledge graph query failed")
+            triples = []
+            degraded = True
+            warnings = [f"knowledge_graph_failure:{exc.__class__.__name__}"]
+        return {
+            "user_id": request.user_id,
+            "query_entity": request.query_entity,
+            "triples": [triple.__dict__ for triple in triples],
+            "facts": memory.knowledge_graph.triples_to_text(triples),
+            "degraded": degraded,
+            "warnings": warnings,
+        }
+    return app
+def _build_agent(factory: AgentFactory, **kwargs: Any) -> Any:
+    try:
+        return factory(**kwargs)
+    except TypeError:
+        return factory(kwargs["memory"])
+def _memory_only_response(context: str) -> str:
+    if context:
+        bullets = []
+        for line in context.splitlines():
+            cleaned = line.strip()
+            if cleaned.startswith("* "):
+                bullets.append(cleaned[2:].strip())
+        if bullets:
+            rendered = "\n".join(f"- {fact}" for fact in bullets[:5])
+            return f"Memory updated. I found relevant context:\n{rendered}"
+        return "Memory updated. Relevant context is available for the configured model."
+    return "Memory updated. No prior relevant context was found."
+def _safe_get_context(
+    memory: MemPalaceLite,
+    query: str,
+    session_id: str,
+    topic_id: str,
+    retrieval_mode: str,
+) -> tuple[str, bool, List[str]]:
+    warnings: List[str] = []
+    with _observe_retrieval(retrieval_mode):
+        try:
+            return (
+                memory.get_context(
+                    query=query,
+                    session_id=session_id,
+                    topic_id=topic_id,
+                ),
+                False,
+                warnings,
+            )
+        except Exception as exc:
+            LOGGER.exception("Primary retrieval failed; degrading to TF-IDF/no-graph context")
+            warnings.append(f"primary_retrieval_failure:{exc.__class__.__name__}")
+    original_mode = memory.retrieval_mode
+    try:
+        memory.retrieval_mode = "tfidf"
+        with _observe_retrieval("tfidf"):
+            context = memory.get_context(
+                query=query,
+                session_id=session_id,
+                topic_id=topic_id,
+                include_graph=False,
+            )
+        warnings.append("degraded_to_tfidf")
+        return context, True, warnings
+    except Exception as exc:
+        LOGGER.exception("Fallback TF-IDF retrieval failed")
+        warnings.append(f"fallback_retrieval_failure:{exc.__class__.__name__}")
+        return "", True, warnings
+    finally:
+        memory.retrieval_mode = original_mode
+def _safe_update_memory(
+    memory: MemPalaceLite,
+    user_message: str,
+    response: str,
+    session_id: str,
+    topic_id: str,
+    warnings: List[str],
+) -> None:
+    try:
+        for fact in memory.extract_facts("", user_input=user_message):
+            try:
+                memory.add_fact(fact, session_id=session_id, topic_id=topic_id)
+            except Exception as exc:
+                LOGGER.exception("Fact storage failed")
+                warnings.append(f"fact_storage_failure:{exc.__class__.__name__}")
+        memory.add_to_history("User: " + user_message, "user_input")
+        memory.add_to_history("Assistant: " + response[:200], "assistant_output")
+    except Exception as exc:
+        LOGGER.exception("Memory update failed")
+        warnings.append(f"memory_update_failure:{exc.__class__.__name__}")
+def _persist_if_configured(user_id: str, state: Dict[str, Any], warnings: List[str]) -> None:
+    if os.getenv("SMRITI_AUTOSAVE", "0").lower() not in {"1", "true", "yes"}:
+        _observe_memory_size(user_id, state)
+        return
+    try:
+        get_memory_backend().save(user_id, state)
+        _observe_memory_size(user_id, state)
+    except Exception as exc:
+        LOGGER.exception("Durable memory autosave failed")
+        warnings.append(f"autosave_failure:{exc.__class__.__name__}")
+def _observe_memory_size(user_id: str, state: Dict[str, Any]) -> None:
+    try:
+        USER_MEMORY_BYTES.labels(user_id).set(len(json.dumps(state)))
+    except Exception:
+        pass
+@contextmanager
+def _observe_retrieval(retrieval_mode: str) -> Iterator[None]:
+    start = time.perf_counter()
+    try:
+        yield
+    finally:
+        RETRIEVAL_LATENCY.labels(retrieval_mode).observe(time.perf_counter() - start)
+def _count_tokens(text: str) -> int:
+    return max(1, len(text.split())) if text else 0
+def _enforce_api_key(request: Request) -> None:
+    expected = os.getenv("SMRITI_API_KEY")
+    if not expected:
+        return
+    if request.url.path in {"/health", "/metrics", "/docs", "/openapi.json"}:
+        return
+    supplied = request.headers.get("x-api-key")
+    if supplied != expected:
+        raise HTTPException(status_code=401, detail="Invalid or missing API key.")
+app = create_app()
+def main(argv: Optional[List[str]] = None) -> None:
+    """Run the API with `python -m smriti.api` or the `smriti-api` entry point."""
+    import argparse
+    import uvicorn
+    parser = argparse.ArgumentParser(description="Run the Smriti AI FastAPI service.")
+    parser.add_argument("--config", help="Path to config.yaml.")
+    parser.add_argument("--host", help="Bind host. Defaults to config or SMRITI_HOST.")
+    parser.add_argument("--port", type=int, help="Bind port. Defaults to config or SMRITI_PORT.")
+    parser.add_argument("--reload", action="store_true", help="Enable Uvicorn reload mode.")
+    args = parser.parse_args(argv)
+    if args.config:
+        os.environ["SMRITI_CONFIG_PATH"] = args.config
+    config = load_config()
+    uvicorn.run(
+        "smriti.api:app" if args.reload else app,
+        host=args.host or os.getenv("SMRITI_HOST", config.host),
+        port=args.port or int(os.getenv("SMRITI_PORT", config.port)),
+        reload=args.reload,
+    )
+if __name__ == "__main__":
+    main()

smriti_vendor/smriti/backends.py ADDED Viewed

	@@ -0,0 +1,494 @@

+"""Durable memory backends for Smriti AI.
+Backends persist complete user memory blobs and also expose a minimal entry API
+for tools that want to store/retrieve lightweight facts without instantiating the
+full runtime. Optional encryption is applied at the blob boundary so JSON, SQL,
+Redis, and Postgres stores share the same privacy behavior.
+"""
+from __future__ import annotations
+import base64
+import hashlib
+import json
+import os
+import re
+import sqlite3
+import time
+from abc import ABC, abstractmethod
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+class MemoryBackend(ABC):
+    """Abstract persistence contract for user-isolated Smriti AI memory."""
+    @abstractmethod
+    def load(self, user_id: str) -> Optional[Dict[str, Any]]:
+        """Load a complete memory state for one user, or None if absent."""
+    @abstractmethod
+    def save(self, user_id: str, memory: Dict[str, Any]) -> None:
+        """Persist a complete memory state for one user."""
+    @abstractmethod
+    def add_entry(
+        self,
+        user_id: str,
+        session_id: str,
+        topic_id: str,
+        text: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        """Persist one lightweight fact/entry for a user/session/topic."""
+    @abstractmethod
+    def retrieve(
+        self,
+        user_id: str,
+        session_id: Optional[str] = None,
+        topic_id: Optional[str] = None,
+        query: str = "",
+        k: int = 5,
+    ) -> List[Dict[str, Any]]:
+        """Retrieve lightweight entries scoped to a user/session/topic."""
+    @abstractmethod
+    def delete_user(self, user_id: str) -> bool:
+        """Delete all memory owned by one user. Return whether anything existed."""
+class MemoryCipher:
+    """Optional symmetric encryption wrapper using Fernet when configured."""
+    def __init__(self, secret: Optional[str] = None):
+        self.secret = secret or os.getenv("SMRITI_MEMORY_KEY")
+        self._fernet = None
+        if self.secret:
+            try:
+                from cryptography.fernet import Fernet
+            except Exception as exc:  # pragma: no cover - depends on optional install.
+                raise RuntimeError(
+                    "SMRITI_MEMORY_KEY is set, but cryptography is not installed. "
+                    "Install smriti-ai[security] or smriti-ai[full]."
+                ) from exc
+            self._fernet = Fernet(_fernet_key(self.secret))
+    @property
+    def enabled(self) -> bool:
+        return self._fernet is not None
+    def wrap(self, payload: Dict[str, Any]) -> Dict[str, Any]:
+        if not self._fernet:
+            return {"encrypted": False, "payload": payload}
+        raw = json.dumps(payload, sort_keys=True).encode("utf-8")
+        return {
+            "encrypted": True,
+            "algorithm": "fernet-sha256-derived-key",
+            "payload": self._fernet.encrypt(raw).decode("utf-8"),
+        }
+    def unwrap(self, wrapper: Dict[str, Any]) -> Dict[str, Any]:
+        if not wrapper.get("encrypted"):
+            return dict(wrapper.get("payload", {}))
+        if not self._fernet:
+            raise RuntimeError("Memory blob is encrypted but SMRITI_MEMORY_KEY is not configured.")
+        decrypted = self._fernet.decrypt(wrapper["payload"].encode("utf-8"))
+        return json.loads(decrypted.decode("utf-8"))
+def build_backend(kind: Optional[str] = None, **kwargs: Any) -> MemoryBackend:
+    """Construct a backend from an explicit kind or SMRITI_MEMORY_BACKEND."""
+    selected = (kind or os.getenv("SMRITI_MEMORY_BACKEND") or "json").lower()
+    if selected == "json":
+        return JsonBackend(root=kwargs.get("root") or os.getenv("SMRITI_MEMORY_DIR", "data/memory"))
+    if selected == "sqlite":
+        return SqliteBackend(path=kwargs.get("path") or os.getenv("SMRITI_SQLITE_PATH", "data/smriti_memory.sqlite3"))
+    if selected == "redis":
+        return RedisBackend(url=kwargs.get("url") or os.getenv("SMRITI_REDIS_URL", "redis://localhost:6379/0"))
+    if selected in {"postgres", "postgresql"}:
+        return PostgresBackend(dsn=kwargs.get("dsn") or os.getenv("SMRITI_POSTGRES_DSN", ""))
+    raise ValueError("SMRITI_MEMORY_BACKEND must be one of: json, sqlite, redis, postgres.")
+class JsonBackend(MemoryBackend):
+    """File-per-user JSON backend. This preserves the original local behavior."""
+    def __init__(self, root: str | Path = "data/memory", cipher: Optional[MemoryCipher] = None):
+        self.root = Path(root)
+        self.cipher = cipher or MemoryCipher()
+    def load(self, user_id: str) -> Optional[Dict[str, Any]]:
+        path = self._path(user_id)
+        if not path.exists():
+            return None
+        return self.cipher.unwrap(json.loads(path.read_text(encoding="utf-8")))
+    def save(self, user_id: str, memory: Dict[str, Any]) -> None:
+        self.root.mkdir(parents=True, exist_ok=True)
+        self._path(user_id).write_text(
+            json.dumps(self.cipher.wrap(memory), indent=2),
+            encoding="utf-8",
+        )
+    def add_entry(
+        self,
+        user_id: str,
+        session_id: str,
+        topic_id: str,
+        text: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        state = self.load(user_id) or {"backend_entries": []}
+        state.setdefault("backend_entries", []).append(_entry(session_id, topic_id, text, metadata))
+        self.save(user_id, state)
+    def retrieve(
+        self,
+        user_id: str,
+        session_id: Optional[str] = None,
+        topic_id: Optional[str] = None,
+        query: str = "",
+        k: int = 5,
+    ) -> List[Dict[str, Any]]:
+        state = self.load(user_id) or {}
+        return _rank_entries(state.get("backend_entries", []), session_id, topic_id, query, k)
+    def delete_user(self, user_id: str) -> bool:
+        path = self._path(user_id)
+        existed = path.exists()
+        if existed:
+            path.unlink()
+        return existed
+    def _path(self, user_id: str) -> Path:
+        return self.root / f"{_safe_id(user_id)}.json"
+class SqliteBackend(MemoryBackend):
+    """SQLite backend for local durable multi-user memory."""
+    def __init__(self, path: str | Path = "data/smriti_memory.sqlite3", cipher: Optional[MemoryCipher] = None):
+        self.path = Path(path)
+        self.cipher = cipher or MemoryCipher()
+        self._init_schema()
+    def load(self, user_id: str) -> Optional[Dict[str, Any]]:
+        with self._connect() as conn:
+            row = conn.execute("SELECT payload FROM user_memory WHERE user_id = ?", (user_id,)).fetchone()
+        if not row:
+            return None
+        return self.cipher.unwrap(json.loads(row[0]))
+    def save(self, user_id: str, memory: Dict[str, Any]) -> None:
+        payload = json.dumps(self.cipher.wrap(memory))
+        with self._connect() as conn:
+            conn.execute(
+                """
+                INSERT INTO user_memory(user_id, payload, updated_at)
+                VALUES(?, ?, ?)
+                ON CONFLICT(user_id) DO UPDATE SET payload=excluded.payload, updated_at=excluded.updated_at
+                """,
+                (user_id, payload, time.time()),
+            )
+    def add_entry(
+        self,
+        user_id: str,
+        session_id: str,
+        topic_id: str,
+        text: str,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> None:
+        with self._connect() as conn:
+            conn.execute(
+                """
+                INSERT INTO memory_entries(user_id, session_id, topic_id, text, metadata, created_at)
+                VALUES(?, ?, ?, ?, ?, ?)
+                """,
+                (user_id, session_id, topic_id, text, json.dumps(metadata or {}), time.time()),
+            )
+    def retrieve(
+        self,
+        user_id: str,
+        session_id: Optional[str] = None,
+        topic_id: Optional[str] = None,
+        query: str = "",
+        k: int = 5,
+    ) -> List[Dict[str, Any]]:
+        clauses = ["user_id = ?"]
+        params: List[Any] = [user_id]
+        if session_id:
+            clauses.append("session_id = ?")
+            params.append(session_id)
+        if topic_id:
+            clauses.append("topic_id = ?")
+            params.append(topic_id)
+        params.append(max(1, k * 5))
+        sql = f"""
+            SELECT session_id, topic_id, text, metadata, created_at
+            FROM memory_entries
+            WHERE {' AND '.join(clauses)}
+            ORDER BY created_at DESC
+            LIMIT ?
+        """
+        with self._connect() as conn:
+            rows = conn.execute(sql, params).fetchall()
+        entries = [
+            {
+                "session_id": row[0],
+                "topic_id": row[1],
+                "text": row[2],
+                "metadata": json.loads(row[3] or "{}"),
+                "created_at": row[4],
+            }
+            for row in rows
+        ]
+        return _rank_entries(entries, session_id, topic_id, query, k)
+    def delete_user(self, user_id: str) -> bool:
+        with self._connect() as conn:
+            before = conn.total_changes
+            conn.execute("DELETE FROM user_memory WHERE user_id = ?", (user_id,))
+            conn.execute("DELETE FROM memory_entries WHERE user_id = ?", (user_id,))
+            return conn.total_changes > before
+    def _connect(self) -> sqlite3.Connection:
+        self.path.parent.mkdir(parents=True, exist_ok=True)
+        return sqlite3.connect(self.path)
+    def _init_schema(self) -> None:
+        with self._connect() as conn:
+            conn.execute(
+                """
+                CREATE TABLE IF NOT EXISTS user_memory(
+                    user_id TEXT PRIMARY KEY,
+                    payload TEXT NOT NULL,
+                    updated_at REAL NOT NULL
+                )
+                """
+            )
+            conn.execute(
+                """
+                CREATE TABLE IF NOT EXISTS memory_entries(
+                    id INTEGER PRIMARY KEY AUTOINCREMENT,
+                    user_id TEXT NOT NULL,
+                    session_id TEXT NOT NULL,
+                    topic_id TEXT NOT NULL,
+                    text TEXT NOT NULL,
+                    metadata TEXT NOT NULL,
+                    created_at REAL NOT NULL
+                )
+                """
+            )
+            conn.execute(
+                "CREATE INDEX IF NOT EXISTS idx_entries_user_session_topic ON memory_entries(user_id, session_id, topic_id, created_at)"
+            )
+class RedisBackend(MemoryBackend):  # pragma: no cover - requires external Redis service.
+    """Redis backend using string payloads and per-user entry lists."""
+    def __init__(self, url: str = "redis://localhost:6379/0", cipher: Optional[MemoryCipher] = None):
+        try:
+            import redis
+        except Exception as exc:  # pragma: no cover - optional dependency.
+            raise RuntimeError("Install redis to use RedisBackend: pip install smriti-ai[backends]") from exc
+        self.client = redis.Redis.from_url(url, decode_responses=True)
+        self.cipher = cipher or MemoryCipher()
+    def load(self, user_id: str) -> Optional[Dict[str, Any]]:
+        raw = self.client.get(self._payload_key(user_id))
+        if not raw:
+            return None
+        return self.cipher.unwrap(json.loads(raw))
+    def save(self, user_id: str, memory: Dict[str, Any]) -> None:
+        self.client.set(self._payload_key(user_id), json.dumps(self.cipher.wrap(memory)))
+    def add_entry(self, user_id: str, session_id: str, topic_id: str, text: str, metadata: Optional[Dict[str, Any]] = None) -> None:
+        self.client.lpush(self._entries_key(user_id), json.dumps(_entry(session_id, topic_id, text, metadata)))
+    def retrieve(self, user_id: str, session_id: Optional[str] = None, topic_id: Optional[str] = None, query: str = "", k: int = 5) -> List[Dict[str, Any]]:
+        raw_entries = self.client.lrange(self._entries_key(user_id), 0, max(0, k * 5 - 1))
+        entries = [json.loads(item) for item in raw_entries]
+        return _rank_entries(entries, session_id, topic_id, query, k)
+    def delete_user(self, user_id: str) -> bool:
+        return bool(self.client.delete(self._payload_key(user_id), self._entries_key(user_id)))
+    def _payload_key(self, user_id: str) -> str:
+        return f"smriti:user:{_safe_id(user_id)}:payload"
+    def _entries_key(self, user_id: str) -> str:
+        return f"smriti:user:{_safe_id(user_id)}:entries"
+class PostgresBackend(MemoryBackend):  # pragma: no cover - requires external Postgres service.
+    """Postgres backend using psycopg2 and indexed user/session/topic tables."""
+    def __init__(self, dsn: str, cipher: Optional[MemoryCipher] = None):
+        if not dsn:
+            raise ValueError("SMRITI_POSTGRES_DSN is required for PostgresBackend.")
+        try:
+            import psycopg2
+        except Exception as exc:  # pragma: no cover - optional dependency.
+            raise RuntimeError("Install psycopg2-binary to use PostgresBackend: pip install smriti-ai[backends]") from exc
+        self._psycopg2 = psycopg2
+        self.dsn = dsn
+        self.cipher = cipher or MemoryCipher()
+        self._init_schema()
+    def load(self, user_id: str) -> Optional[Dict[str, Any]]:
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute("SELECT payload FROM user_memory WHERE user_id = %s", (user_id,))
+            row = cur.fetchone()
+        if not row:
+            return None
+        return self.cipher.unwrap(row[0] if isinstance(row[0], dict) else json.loads(row[0]))
+    def save(self, user_id: str, memory: Dict[str, Any]) -> None:
+        payload = json.dumps(self.cipher.wrap(memory))
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute(
+                """
+                INSERT INTO user_memory(user_id, payload, updated_at)
+                VALUES(%s, %s::jsonb, NOW())
+                ON CONFLICT(user_id) DO UPDATE SET payload=excluded.payload, updated_at=excluded.updated_at
+                """,
+                (user_id, payload),
+            )
+    def add_entry(self, user_id: str, session_id: str, topic_id: str, text: str, metadata: Optional[Dict[str, Any]] = None) -> None:
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute(
+                """
+                INSERT INTO memory_entries(user_id, session_id, topic_id, text, metadata)
+                VALUES(%s, %s, %s, %s, %s::jsonb)
+                """,
+                (user_id, session_id, topic_id, text, json.dumps(metadata or {})),
+            )
+    def retrieve(self, user_id: str, session_id: Optional[str] = None, topic_id: Optional[str] = None, query: str = "", k: int = 5) -> List[Dict[str, Any]]:
+        clauses = ["user_id = %s"]
+        params: List[Any] = [user_id]
+        if session_id:
+            clauses.append("session_id = %s")
+            params.append(session_id)
+        if topic_id:
+            clauses.append("topic_id = %s")
+            params.append(topic_id)
+        params.append(max(1, k * 5))
+        sql = f"""
+            SELECT session_id, topic_id, text, metadata, EXTRACT(EPOCH FROM created_at)
+            FROM memory_entries
+            WHERE {' AND '.join(clauses)}
+            ORDER BY created_at DESC
+            LIMIT %s
+        """
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute(sql, params)
+            rows = cur.fetchall()
+        entries = [
+            {
+                "session_id": row[0],
+                "topic_id": row[1],
+                "text": row[2],
+                "metadata": row[3] or {},
+                "created_at": float(row[4]),
+            }
+            for row in rows
+        ]
+        return _rank_entries(entries, session_id, topic_id, query, k)
+    def delete_user(self, user_id: str) -> bool:
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute("DELETE FROM user_memory WHERE user_id = %s", (user_id,))
+            memory_deleted = cur.rowcount
+            cur.execute("DELETE FROM memory_entries WHERE user_id = %s", (user_id,))
+            entries_deleted = cur.rowcount
+        return bool(memory_deleted or entries_deleted)
+    def _connect(self):
+        return self._psycopg2.connect(self.dsn)
+    def _init_schema(self) -> None:
+        with self._connect() as conn, conn.cursor() as cur:
+            cur.execute(
+                """
+                CREATE TABLE IF NOT EXISTS user_memory(
+                    user_id TEXT PRIMARY KEY,
+                    payload JSONB NOT NULL,
+                    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+                )
+                """
+            )
+            cur.execute(
+                """
+                CREATE TABLE IF NOT EXISTS memory_entries(
+                    id BIGSERIAL PRIMARY KEY,
+                    user_id TEXT NOT NULL,
+                    session_id TEXT NOT NULL,
+                    topic_id TEXT NOT NULL,
+                    text TEXT NOT NULL,
+                    metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
+                    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+                )
+                """
+            )
+            cur.execute(
+                "CREATE INDEX IF NOT EXISTS idx_smriti_entries_user_session_topic ON memory_entries(user_id, session_id, topic_id, created_at DESC)"
+            )
+def _fernet_key(secret: str) -> bytes:
+    raw = secret.encode("utf-8")
+    try:
+        base64.urlsafe_b64decode(raw)
+        if len(raw) == 44:
+            return raw
+    except Exception:
+        pass
+    return base64.urlsafe_b64encode(hashlib.sha256(raw).digest())
+def _safe_id(value: str) -> str:
+    return re.sub(r"[^a-zA-Z0-9_.-]+", "_", value.strip()) or "default"
+def _entry(session_id: str, topic_id: str, text: str, metadata: Optional[Dict[str, Any]]) -> Dict[str, Any]:
+    return {
+        "session_id": session_id,
+        "topic_id": topic_id,
+        "text": text,
+        "metadata": metadata or {},
+        "created_at": time.time(),
+    }
+def _rank_entries(
+    entries: List[Dict[str, Any]],
+    session_id: Optional[str],
+    topic_id: Optional[str],
+    query: str,
+    k: int,
+) -> List[Dict[str, Any]]:
+    scoped = [
+        entry
+        for entry in entries
+        if (not session_id or entry.get("session_id") == session_id)
+        and (not topic_id or entry.get("topic_id") == topic_id)
+    ]
+    if not query.strip():
+        return sorted(scoped, key=lambda item: item.get("created_at", 0), reverse=True)[:k]
+    q_terms = set(re.findall(r"[a-z0-9']+", query.lower()))
+    scored = []
+    for entry in scoped:
+        terms = set(re.findall(r"[a-z0-9']+", entry.get("text", "").lower()))
+        overlap = len(q_terms & terms) / max(1, len(q_terms | terms))
+        recency = entry.get("created_at", 0)
+        scored.append((overlap, recency, entry))
+    scored.sort(key=lambda item: (item[0], item[1]), reverse=True)
+    return [entry for _, _, entry in scored[:k]]