Spaces:
Running
Refactor for HF Spaces: single-container, SQLite shim, ChromaDB on /data
Browse filesMirrors the deployment shape used by the working HF Spaces deploys CCAI-Vibe-Demo and CU-Student-AIProject-Helper:
* multi-stage Dockerfile (Node CRA build -> python:3.12-slim + bundle), non-root uid 1000, port 7860, REACT_APP_API_URL='' so the SPA shares the FastAPI origin via a static mount at /.
* MongoDB replaced with a drop-in SQLite shim at app/core/db.py (aiosqlite on \/cybersecurity_panel.db). The shim emulates the pymongo/motor idioms the routers already call (find_one, find/sort/limit, insert_one, update_one/many, delete_one/many, replace_one, count_documents, find_one_and_delete, aggregate \/\/\) so router code is untouched.
* app/core/database.py keeps its public API (connect_to_mongo, close_mongo_connection, get_database) but now wraps the shim. canvas_database.py is now a no-op (schema lives in the shim's \_init_schema).
* requirements.txt: + aiosqlite~=0.20, - motor~=3.7. pymongo kept for bson.ObjectId and the error types the shim raises.
* RAGManager / EnhancedRAGManager / chroma_client now resolve their persist directories via _default_chroma_path('<subdir>') -> \/chroma/<subdir> when DATA_DIR is set, falling back to the historical relative paths for local installs. ChromaDB embeddings now survive HF Space rebuilds.
* Avatar URL fallback in config.py defaults to '' (relative) instead of http://localhost:8000 so single-origin deployments serve avatars off the same host as the SPA.
* GET / in api/routes/root.py moved to GET /api/info so the SPA owns the root path; main.py mounts ./static at / when present, otherwise keeps the JSON banner.
* docker-compose.yml rewritten as a single-service wrapper around the same image used on HF, with a cybersecurity_data named volume mounted at /data so SQLite + ChromaDB survive 'docker compose down'. Default host port is 7861 (override via CYBERPANEL_HOST_PORT) to avoid clashing with the CCAI demo on 3002/8002.
* README.md gains HF Space frontmatter (sdk: docker, app_port: 7860, title: Cybersecurity Panel) and a deployment section listing required Space secrets (JWT_SECRET_KEY, GEMINI_API_KEY, optional OPENAI_API_KEY / VLLM_API_KEY).
Co-authored-by: Cursor <cursoragent@cursor.com>
- Dockerfile +68 -35
- README.md +31 -2
- docker-compose.yml +29 -42
- multi_llm_chatbot_backend/app/api/routes/root.py +5 -2
- multi_llm_chatbot_backend/app/config.py +5 -1
- multi_llm_chatbot_backend/app/core/canvas_database.py +30 -63
- multi_llm_chatbot_backend/app/core/database.py +35 -71
- multi_llm_chatbot_backend/app/core/db.py +688 -0
- multi_llm_chatbot_backend/app/core/rag_manager.py +22 -4
- multi_llm_chatbot_backend/app/main.py +31 -14
- multi_llm_chatbot_backend/app/utils/chroma_client.py +22 -1
- multi_llm_chatbot_backend/requirements.txt +4 -2
|
@@ -1,45 +1,78 @@
|
|
| 1 |
-
# syntax=docker/dockerfile:1
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
|
|
|
| 6 |
|
| 7 |
-
ENV OVOS_CONFIG_BASE_FOLDER=neon
|
| 8 |
-
ENV OVOS_CONFIG_FILENAME=neon.yaml
|
| 9 |
-
ENV XDG_CONFIG_HOME=/config
|
| 10 |
-
|
| 11 |
-
RUN apt-get update && \
|
| 12 |
-
apt-get install -y --no-install-recommends \
|
| 13 |
-
python3 \
|
| 14 |
-
python3-pip \
|
| 15 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 16 |
-
|
| 17 |
-
# ---- Python dependencies (cached unless requirements.txt changes) ----------
|
| 18 |
-
WORKDIR /ccai/multi_llm_chatbot_backend
|
| 19 |
-
COPY multi_llm_chatbot_backend/requirements.txt ./
|
| 20 |
-
RUN --mount=type=cache,target=/root/.cache/pip \
|
| 21 |
-
pip install --break-system-packages -r requirements.txt
|
| 22 |
-
|
| 23 |
-
# ---- Node dependencies (cached unless package.json changes) ----------------
|
| 24 |
-
WORKDIR /ccai/phd-advisor-frontend
|
| 25 |
COPY phd-advisor-frontend/package.json phd-advisor-frontend/package-lock.json* ./
|
| 26 |
RUN --mount=type=cache,target=/root/.npm \
|
| 27 |
-
npm
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
#
|
| 30 |
-
|
| 31 |
-
|
|
|
|
| 32 |
|
| 33 |
-
# ----
|
| 34 |
-
FROM
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
RUN apt-get update && \
|
| 37 |
apt-get install -y --no-install-recommends ffmpeg && \
|
| 38 |
rm -rf /var/lib/apt/lists/*
|
| 39 |
-
WORKDIR /ccai/multi_llm_chatbot_backend
|
| 40 |
-
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
| 41 |
|
| 42 |
-
#
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# syntax=docker/dockerfile:1.7
|
| 2 |
+
# ---------------------------------------------------------------------------
|
| 3 |
+
# Cybersecurity Panel — single-container HuggingFace Spaces image.
|
| 4 |
+
#
|
| 5 |
+
# Mirrors the structural choices used by the working HF Spaces deployments
|
| 6 |
+
# CCAI-Vibe-Demo and CU-Student-AIProject-Helper:
|
| 7 |
+
# * multi-stage Node frontend build → Python+SPA bundle
|
| 8 |
+
# * non-root user uid 1000, port 7860 (HF Spaces convention)
|
| 9 |
+
# * persistence on the HF Storage Bucket mount at /data (SQLite shim)
|
| 10 |
+
# * REACT_APP_API_URL is set to the empty string at build time so the SPA
|
| 11 |
+
# issues relative URLs and shares the FastAPI origin
|
| 12 |
+
#
|
| 13 |
+
# Build context: repo root (this file).
|
| 14 |
+
# ---------------------------------------------------------------------------
|
| 15 |
|
| 16 |
+
# ---- 1. Frontend build (CRA) ----------------------------------------------
|
| 17 |
+
FROM node:20-bookworm AS frontend-build
|
| 18 |
+
WORKDIR /app/frontend
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
COPY phd-advisor-frontend/package.json phd-advisor-frontend/package-lock.json* ./
|
| 21 |
RUN --mount=type=cache,target=/root/.npm \
|
| 22 |
+
npm ci
|
| 23 |
+
|
| 24 |
+
COPY phd-advisor-frontend/ ./
|
| 25 |
|
| 26 |
+
# Empty REACT_APP_API_URL → CRA inlines '' so every fetch() hits the same
|
| 27 |
+
# origin as the SPA (the FastAPI server below).
|
| 28 |
+
ENV REACT_APP_API_URL=""
|
| 29 |
+
RUN npm run build
|
| 30 |
|
| 31 |
+
# ---- 2. Python runtime + SPA bundle ---------------------------------------
|
| 32 |
+
FROM python:3.12-slim-bookworm
|
| 33 |
+
|
| 34 |
+
# ffmpeg is required by the /api/voice/transcribe endpoint
|
| 35 |
+
# (browser WebM/Opus → WAV for Whisper). Kept on the runtime image only,
|
| 36 |
+
# not on the build image, so we don't pay the apt cost during incremental
|
| 37 |
+
# code rebuilds.
|
| 38 |
RUN apt-get update && \
|
| 39 |
apt-get install -y --no-install-recommends ffmpeg && \
|
| 40 |
rm -rf /var/lib/apt/lists/*
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
# HF Spaces runs the container as uid 1000 and expects the app to do the
|
| 43 |
+
# same; with a non-root user we cannot bind privileged ports, but :7860 is
|
| 44 |
+
# fine.
|
| 45 |
+
RUN useradd -m -u 1000 user
|
| 46 |
+
USER user
|
| 47 |
+
ENV HOME=/home/user \
|
| 48 |
+
PATH=/home/user/.local/bin:$PATH \
|
| 49 |
+
PYTHONUNBUFFERED=1 \
|
| 50 |
+
CORS_ORIGINS=* \
|
| 51 |
+
DATA_DIR=/data \
|
| 52 |
+
CONFIG_PATH=/home/user/app/cybersecurity_config.yaml
|
| 53 |
+
|
| 54 |
+
WORKDIR $HOME/app
|
| 55 |
+
|
| 56 |
+
# ---- Python deps (cached unless requirements.txt changes) -----------------
|
| 57 |
+
COPY --chown=user multi_llm_chatbot_backend/requirements.txt ./
|
| 58 |
+
RUN --mount=type=cache,target=/home/user/.cache/pip,uid=1000,gid=1000 \
|
| 59 |
+
pip install --no-cache-dir --user -r requirements.txt
|
| 60 |
+
|
| 61 |
+
# ---- Backend source -------------------------------------------------------
|
| 62 |
+
COPY --chown=user multi_llm_chatbot_backend/ ./
|
| 63 |
+
|
| 64 |
+
# ---- Top-level configuration files (config.yaml + persona definitions) ----
|
| 65 |
+
COPY --chown=user cybersecurity_config.yaml ./cybersecurity_config.yaml
|
| 66 |
+
COPY --chown=user phd_config.yaml ./phd_config.yaml
|
| 67 |
+
COPY --chown=user undergrad_config.yaml ./undergrad_config.yaml
|
| 68 |
+
COPY --chown=user personas/ ./personas/
|
| 69 |
+
|
| 70 |
+
# ---- Frontend bundle ------------------------------------------------------
|
| 71 |
+
# main.py mounts $HOME/app/static at "/" so the SPA is served same-origin
|
| 72 |
+
# with the API.
|
| 73 |
+
COPY --chown=user --from=frontend-build /app/frontend/build ./static
|
| 74 |
+
|
| 75 |
+
ENV PYTHONPATH=$HOME/app
|
| 76 |
+
|
| 77 |
+
EXPOSE 7860
|
| 78 |
+
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
|
|
@@ -1,6 +1,35 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
-
An AI-powered academic guidance system that provides personalized advice through specialized advisor personas. Get diverse perspectives on your PhD journey from multiple AI advisors, each bringing unique expertise in methodology, theory, and practical guidance.
|
| 4 |
|
| 5 |
## Features
|
| 6 |
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Cybersecurity Panel
|
| 3 |
+
emoji: 🛡️
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: docker
|
| 7 |
+
pinned: false
|
| 8 |
+
app_port: 7860
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# Cybersecurity Panel
|
| 12 |
+
|
| 13 |
+
An AI-powered cybersecurity guidance system that provides expert advice through a panel of specialized advisor personas. Ask about threats, compliance, incident response, architecture, and career growth and get diverse perspectives from multiple AI advisors. Built on Neon AI's Collaborative Conversational AI (CCAI) framework.
|
| 14 |
+
|
| 15 |
+
## Hugging Face Spaces deployment
|
| 16 |
+
|
| 17 |
+
This Space ships as a single Docker image built from the repository root [`Dockerfile`](Dockerfile). The container does three things:
|
| 18 |
+
|
| 19 |
+
1. Builds the React frontend (CRA) at image-build time with `REACT_APP_API_URL=""` so every `fetch` issues a relative URL.
|
| 20 |
+
2. Serves the bundled SPA from FastAPI at `/`, with the API exposed on `/api/...`, `/auth/...`, etc., on the same `:7860` origin.
|
| 21 |
+
3. Persists user data (auth, profiles, chat sessions, onboarding, canvas state) in **SQLite via `aiosqlite`** at `${DATA_DIR}/cybersecurity_panel.db`. Mount a Hugging Face Storage Bucket at `/data` to make the database survive Space rebuilds — there is **no MongoDB**, **no Atlas**, no third-party data plane (the persistence pattern follows [`CU-Student-AIProject-Helper`](https://github.com/NeonClary/CU-Student-AIProject-Helper)).
|
| 22 |
+
|
| 23 |
+
### Required Space secrets
|
| 24 |
+
|
| 25 |
+
| Secret | Purpose |
|
| 26 |
+
|--------|---------|
|
| 27 |
+
| `JWT_SECRET_KEY` | Signs auth tokens. Set this to a long random string. |
|
| 28 |
+
| `GEMINI_API_KEY` | Powers the default Gemini provider (model: `gemini-2.5-flash`). |
|
| 29 |
+
| `OPENAI_API_KEY` | Optional — only required if you switch to the OpenAI provider. |
|
| 30 |
+
| `VLLM_API_KEY` | Optional — only required if you point the orchestrator/advisors at a Neon vLLM endpoint. |
|
| 31 |
+
|
| 32 |
|
|
|
|
| 33 |
|
| 34 |
## Features
|
| 35 |
|
|
@@ -1,52 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
services:
|
| 2 |
-
|
| 3 |
build:
|
| 4 |
context: .
|
| 5 |
dockerfile: Dockerfile
|
| 6 |
-
|
| 7 |
ports:
|
| 8 |
-
- "
|
| 9 |
-
networks:
|
| 10 |
-
- application
|
| 11 |
-
depends_on:
|
| 12 |
-
- database
|
| 13 |
environment:
|
| 14 |
-
MONGODB_CONNECTION_STRING: "mongodb://database:27017"
|
| 15 |
-
MONGODB_DATABASE: cybersecurity_advisor
|
| 16 |
JWT_SECRET_KEY: ${JWT_SECRET_KEY:-CHANGEME-by-overriding-in-dot-env-file}
|
| 17 |
-
GEMINI_API_KEY: ${GEMINI_API_KEY:-
|
| 18 |
-
VLLM_API_KEY: ${VLLM_API_KEY:-}
|
| 19 |
-
CORS_ORIGINS: ${CORS_ORIGINS:-http://localhost:3010,http://localhost:3000}
|
| 20 |
-
GEMINI_MODEL: gemini-2.5-flash
|
| 21 |
-
CONFIG_PATH: ${CONFIG_PATH:-/ccai/cybersecurity_config.yaml}
|
| 22 |
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
depends_on:
|
| 33 |
-
- backend
|
| 34 |
-
environment:
|
| 35 |
-
REACT_APP_API_URL: ${REACT_APP_API_URL:-http://localhost:8000}
|
| 36 |
-
PORT: "3010"
|
| 37 |
-
HOST: "0.0.0.0"
|
| 38 |
-
WDS_SOCKET_PORT: "3010"
|
| 39 |
-
BROWSER: "none"
|
| 40 |
-
database:
|
| 41 |
-
image: mongo:8.0
|
| 42 |
volumes:
|
| 43 |
-
-
|
| 44 |
-
|
| 45 |
-
- application
|
| 46 |
volumes:
|
| 47 |
-
|
| 48 |
-
networks:
|
| 49 |
-
application:
|
| 50 |
-
driver: bridge
|
| 51 |
-
driver_opts:
|
| 52 |
-
com.docker.network.bridge.host_binding_ipv4: "0.0.0.0"
|
|
|
|
| 1 |
+
# ---------------------------------------------------------------------------
|
| 2 |
+
# Local-dev wrapper around the single-container HuggingFace Spaces image.
|
| 3 |
+
#
|
| 4 |
+
# This file used to spin up three services (FastAPI backend, CRA dev server,
|
| 5 |
+
# MongoDB). The HF deployment shape (single-container, SQLite on the bucket)
|
| 6 |
+
# is now also the local-dev shape — same image you push to the Space — so
|
| 7 |
+
# this is a one-service compose with a named volume that emulates the HF
|
| 8 |
+
# Storage Bucket mount.
|
| 9 |
+
#
|
| 10 |
+
# Override CYBERPANEL_HOST_PORT via .env (or shell) if 7861 conflicts; the
|
| 11 |
+
# in-container port is fixed at 7860 to mirror HF Spaces.
|
| 12 |
+
# ---------------------------------------------------------------------------
|
| 13 |
+
|
| 14 |
services:
|
| 15 |
+
app:
|
| 16 |
build:
|
| 17 |
context: .
|
| 18 |
dockerfile: Dockerfile
|
| 19 |
+
image: cybersecurity-panel:dev
|
| 20 |
ports:
|
| 21 |
+
- "${CYBERPANEL_HOST_PORT:-7861}:7860"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
environment:
|
|
|
|
|
|
|
| 23 |
JWT_SECRET_KEY: ${JWT_SECRET_KEY:-CHANGEME-by-overriding-in-dot-env-file}
|
| 24 |
+
GEMINI_API_KEY: ${GEMINI_API_KEY:-}
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
|
| 26 |
+
VLLM_API_KEY: ${VLLM_API_KEY:-}
|
| 27 |
+
VLLM_API_USERNAME: ${VLLM_API_USERNAME:-}
|
| 28 |
+
GEMINI_MODEL: ${GEMINI_MODEL:-gemini-2.5-flash}
|
| 29 |
+
CONFIG_PATH: ${CONFIG_PATH:-/home/user/app/cybersecurity_config.yaml}
|
| 30 |
+
# Same-origin in single-container; only relevant for cross-origin dev.
|
| 31 |
+
CORS_ORIGINS: ${CORS_ORIGINS:-*}
|
| 32 |
+
# SQLite + ChromaDB persistence both land here. The named volume below
|
| 33 |
+
# mirrors the HF Storage Bucket mount so user data survives rebuilds.
|
| 34 |
+
DATA_DIR: /data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
volumes:
|
| 36 |
+
- cybersecurity_data:/data
|
| 37 |
+
|
|
|
|
| 38 |
volumes:
|
| 39 |
+
cybersecurity_data:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -7,8 +7,11 @@ logger = logging.getLogger(__name__)
|
|
| 7 |
|
| 8 |
router = APIRouter()
|
| 9 |
|
| 10 |
-
@router.get("/")
|
| 11 |
-
def
|
|
|
|
|
|
|
|
|
|
| 12 |
title = get_settings().app.title
|
| 13 |
return {
|
| 14 |
"message": f"{title} Backend is up and running",
|
|
|
|
| 7 |
|
| 8 |
router = APIRouter()
|
| 9 |
|
| 10 |
+
@router.get("/api/info")
|
| 11 |
+
def info():
|
| 12 |
+
# Moved from "/" so the React SPA can own the root path when the bundle
|
| 13 |
+
# is mounted in single-container (HF Spaces) deployments. The banner is
|
| 14 |
+
# still useful as a quick liveness check for ops.
|
| 15 |
title = get_settings().app.title
|
| 16 |
return {
|
| 17 |
"message": f"{title} Backend is up and running",
|
|
@@ -157,7 +157,11 @@ class PersonaItemConfig(_IconValidatorMixin):
|
|
| 157 |
self.avatar, self.id,
|
| 158 |
)
|
| 159 |
return f"icon://{self.icon}"
|
| 160 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
return f"{base}/api/avatars/bundled/{self.avatar}"
|
| 162 |
|
| 163 |
def to_frontend_config(self) -> dict:
|
|
|
|
| 157 |
self.avatar, self.id,
|
| 158 |
)
|
| 159 |
return f"icon://{self.icon}"
|
| 160 |
+
# Default to empty string (→ relative URL) so single-origin Spaces
|
| 161 |
+
# deployments serve avatars off the same host as the SPA. Local
|
| 162 |
+
# ``npm start`` development setups can still set REACT_APP_API_URL
|
| 163 |
+
# explicitly to point at the backend on a different port.
|
| 164 |
+
base = os.getenv("REACT_APP_API_URL", "").rstrip("/")
|
| 165 |
return f"{base}/api/avatars/bundled/{self.avatar}"
|
| 166 |
|
| 167 |
def to_frontend_config(self) -> dict:
|
|
@@ -1,68 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import logging
|
| 2 |
-
from
|
| 3 |
|
| 4 |
logger = logging.getLogger(__name__)
|
| 5 |
|
| 6 |
-
async def setup_canvas_collections(db: AsyncIOMotorDatabase):
|
| 7 |
-
"""Setup MongoDB collections and indexes for PhD Canvas"""
|
| 8 |
-
try:
|
| 9 |
-
collection = db.phd_canvases
|
| 10 |
-
|
| 11 |
-
# Create simple indexes
|
| 12 |
-
await collection.create_index("user_id")
|
| 13 |
-
logger.info("Created index on user_id")
|
| 14 |
-
|
| 15 |
-
await collection.create_index("last_updated", background=True)
|
| 16 |
-
logger.info("Created index on last_updated")
|
| 17 |
-
|
| 18 |
-
# Create compound indexes
|
| 19 |
-
await collection.create_index([("user_id", 1), ("last_updated", -1)])
|
| 20 |
-
logger.info("Created compound index on user_id and last_updated")
|
| 21 |
-
|
| 22 |
-
await collection.create_index([("user_id", 1), ("created_at", -1)])
|
| 23 |
-
logger.info("Created compound index on user_id and created_at")
|
| 24 |
-
|
| 25 |
-
# Ensure TTL index for old canvases (optional cleanup after 2 years)
|
| 26 |
-
await collection.create_index(
|
| 27 |
-
"created_at",
|
| 28 |
-
expireAfterSeconds=63072000 # 2 years in seconds
|
| 29 |
-
)
|
| 30 |
-
logger.info("Created TTL index for canvas cleanup")
|
| 31 |
-
|
| 32 |
-
logger.info("PhD Canvas database setup completed successfully")
|
| 33 |
-
|
| 34 |
-
except Exception as e:
|
| 35 |
-
logger.error("Error setting up canvas collections: %s", str(e))
|
| 36 |
-
raise
|
| 37 |
|
| 38 |
-
async def
|
| 39 |
-
"""
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
}
|
| 56 |
-
]
|
| 57 |
-
|
| 58 |
-
orphaned_canvases = await db.phd_canvases.aggregate(pipeline).to_list(length=100)
|
| 59 |
-
|
| 60 |
-
if orphaned_canvases:
|
| 61 |
-
orphaned_ids = [canvas["_id"] for canvas in orphaned_canvases]
|
| 62 |
-
result = await db.phd_canvases.delete_many({"_id": {"$in": orphaned_ids}})
|
| 63 |
-
logger.info("Cleaned up %d orphaned canvas records", result.deleted_count)
|
| 64 |
-
else:
|
| 65 |
-
logger.info("No orphaned canvas records found")
|
| 66 |
-
|
| 67 |
-
except Exception as e:
|
| 68 |
-
logger.error("Error during canvas cleanup: %s", str(e))
|
|
|
|
| 1 |
+
"""PhD Canvas database setup — SQLite-shim aware.
|
| 2 |
+
|
| 3 |
+
The original implementation was a Mongo-only module that created compound
|
| 4 |
+
indexes and a TTL index on the ``phd_canvases`` collection at startup. With
|
| 5 |
+
the SQLite-on-HF-bucket persistence shim, the schema (including the
|
| 6 |
+
``user_id`` index) is declared once at connection time inside
|
| 7 |
+
``app.core.db``, so this module becomes a compatibility no-op.
|
| 8 |
+
|
| 9 |
+
We keep the function signatures so any caller that still imports
|
| 10 |
+
``setup_canvas_collections`` or ``cleanup_old_canvas_data`` keeps working.
|
| 11 |
+
"""
|
| 12 |
+
|
| 13 |
import logging
|
| 14 |
+
from typing import Any
|
| 15 |
|
| 16 |
logger = logging.getLogger(__name__)
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
async def setup_canvas_collections(db: Any) -> None:
|
| 20 |
+
"""No-op on the SQLite shim — indexes are created by ``app.core.db``."""
|
| 21 |
+
logger.info(
|
| 22 |
+
"PhD Canvas setup: schema and indexes are managed by the SQLite shim; "
|
| 23 |
+
"skipping legacy Mongo index creation."
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
async def cleanup_old_canvas_data(db: Any) -> None:
|
| 28 |
+
"""Maintenance helper — disabled on the SQLite shim.
|
| 29 |
+
|
| 30 |
+
The original used a Mongo ``$lookup`` aggregation against the ``users``
|
| 31 |
+
collection, which the in-process aggregation runner doesn't support. If
|
| 32 |
+
you need the orphan cleanup, run it via a separate script that does two
|
| 33 |
+
passes (``users``, then ``phd_canvases``) and calls ``delete_many``.
|
| 34 |
+
"""
|
| 35 |
+
logger.info("cleanup_old_canvas_data is a no-op on the SQLite shim.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@@ -1,86 +1,50 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
import logging
|
| 5 |
|
| 6 |
-
from app.
|
| 7 |
|
| 8 |
logger = logging.getLogger(__name__)
|
| 9 |
|
| 10 |
-
class Database:
|
| 11 |
-
client: AsyncIOMotorClient = None
|
| 12 |
-
database = None
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
try:
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
mongo_url = os.getenv("MONGODB_CONNECTION_STRING", "")
|
| 26 |
-
if not mongo_url:
|
| 27 |
-
raise ValueError(
|
| 28 |
-
"MongoDB connection string not set. "
|
| 29 |
-
"Provide it in config.yaml (mongodb.connection_string) "
|
| 30 |
-
"or as the MONGODB_CONNECTION_STRING environment variable."
|
| 31 |
-
)
|
| 32 |
|
| 33 |
-
db_name = settings.mongodb.database_name
|
| 34 |
-
|
| 35 |
-
db.client = AsyncIOMotorClient(mongo_url)
|
| 36 |
-
db.database = db.client[db_name]
|
| 37 |
-
|
| 38 |
-
# Test connection
|
| 39 |
-
await db.client.admin.command('ping')
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
try:
|
| 46 |
-
from app.core.canvas_database import setup_canvas_collections
|
| 47 |
-
await setup_canvas_collections(db.database) # Pass db directly
|
| 48 |
-
logger.info("Canvas database initialization completed")
|
| 49 |
-
except Exception as canvas_error:
|
| 50 |
-
logger.error(f"Canvas database initialization failed: {canvas_error}")
|
| 51 |
-
|
| 52 |
-
except ConnectionFailure as e:
|
| 53 |
-
logger.error(f"Failed to connect to MongoDB: {e}")
|
| 54 |
-
raise
|
| 55 |
-
except Exception as e:
|
| 56 |
-
logger.error(f"Unexpected error connecting to MongoDB: {e}")
|
| 57 |
-
raise
|
| 58 |
|
| 59 |
-
async def close_mongo_connection():
|
| 60 |
-
"""Close database connection"""
|
| 61 |
-
if db.client:
|
| 62 |
-
db.client.close()
|
| 63 |
-
logger.info("Disconnected from MongoDB")
|
| 64 |
|
| 65 |
-
async def create_indexes():
|
| 66 |
-
"""
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
await db.database.users.create_index("created_at")
|
| 71 |
-
|
| 72 |
-
# Indexes for chat_sessions collection
|
| 73 |
-
await db.database.chat_sessions.create_index("user_id")
|
| 74 |
-
await db.database.chat_sessions.create_index("created_at")
|
| 75 |
-
await db.database.chat_sessions.create_index([("user_id", 1), ("created_at", -1)])
|
| 76 |
|
| 77 |
-
await db.database.user_profiles.create_index("user_id", unique=True)
|
| 78 |
-
await db.database.onboarding_conversations.create_index("user_id", unique=True)
|
| 79 |
-
|
| 80 |
-
logger.info("Database indexes created successfully")
|
| 81 |
-
except Exception as e:
|
| 82 |
-
logger.warning(f"Error creating indexes: {e}")
|
| 83 |
|
| 84 |
def get_database():
|
| 85 |
-
"""
|
| 86 |
-
return
|
|
|
|
| 1 |
+
"""Database connection facade.
|
| 2 |
+
|
| 3 |
+
Used to wrap pymongo / motor against a real MongoDB instance. For Hugging
|
| 4 |
+
Face Spaces deployment we now back the same public API with SQLite on a HF
|
| 5 |
+
Storage Bucket mount (see ``app.core.db``). The shim emulates the Mongo
|
| 6 |
+
idioms the routers and services use, so this module simply forwards to it
|
| 7 |
+
and keeps the legacy ``connect_to_mongo`` / ``close_mongo_connection`` /
|
| 8 |
+
``get_database`` API intact for ``main.py`` and the route files.
|
| 9 |
+
"""
|
| 10 |
+
|
| 11 |
import logging
|
| 12 |
|
| 13 |
+
from app.core.db import _close, _open, get_database as _get_database
|
| 14 |
|
| 15 |
logger = logging.getLogger(__name__)
|
| 16 |
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
async def connect_to_mongo() -> None:
|
| 19 |
+
"""Open the SQLite-backed database and run schema bootstrap."""
|
| 20 |
+
await _open()
|
| 21 |
+
logger.info("SQLite-backed database ready (Mongo API shim active).")
|
| 22 |
+
# Legacy hook: previously called ``setup_canvas_collections`` here.
|
| 23 |
+
# The phd_canvases table is already declared in the shim's schema, so
|
| 24 |
+
# the call below is a no-op but kept for symmetry with the original
|
| 25 |
+
# startup path.
|
| 26 |
try:
|
| 27 |
+
from app.core.canvas_database import setup_canvas_collections
|
| 28 |
|
| 29 |
+
await setup_canvas_collections(_get_database())
|
| 30 |
+
logger.info("Canvas database initialization completed (no-op on SQLite shim).")
|
| 31 |
+
except Exception as canvas_error: # pragma: no cover
|
| 32 |
+
logger.error(f"Canvas database initialization failed: {canvas_error}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
async def close_mongo_connection() -> None:
|
| 36 |
+
"""Best-effort connection close on FastAPI shutdown."""
|
| 37 |
+
await _close()
|
| 38 |
+
logger.info("Disconnected from SQLite-backed database.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
async def create_indexes() -> None:
|
| 42 |
+
"""Index creation is handled by the shim's schema bootstrap; kept for
|
| 43 |
+
backwards compatibility so any out-of-process maintenance script that
|
| 44 |
+
imports this name keeps working."""
|
| 45 |
+
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
def get_database():
|
| 49 |
+
"""Return the SQLite-backed Mongo-API-compatible Database object."""
|
| 50 |
+
return _get_database()
|
|
@@ -0,0 +1,688 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Persistent storage facade backed by SQLite + filesystem on a Hugging Face
|
| 2 |
+
Storage Bucket mount.
|
| 3 |
+
|
| 4 |
+
Background
|
| 5 |
+
----------
|
| 6 |
+
This Space used to require MongoDB (via Motor / GridFS). On Hugging Face
|
| 7 |
+
Spaces the container has no Mongo, the legacy ``/data`` Persistent Storage
|
| 8 |
+
was retired, and the user does not want to use Atlas (or any paid third
|
| 9 |
+
party). The replacement is a single Hugging Face Storage Bucket mounted into
|
| 10 |
+
the Space at ``$DATA_DIR`` (default ``/data``). Everything durable lives
|
| 11 |
+
there:
|
| 12 |
+
|
| 13 |
+
${DATA_DIR}/cybersecurity_panel.db — SQLite database (users, chat
|
| 14 |
+
sessions, profiles, onboarding,
|
| 15 |
+
PhD canvases)
|
| 16 |
+
|
| 17 |
+
A single ``hf buckets sync hf://buckets/<org>/<bucket> ./backup`` therefore
|
| 18 |
+
captures the full state and can be restored before the next redeploy.
|
| 19 |
+
|
| 20 |
+
Design
|
| 21 |
+
------
|
| 22 |
+
This module exposes the **subset** of the Motor / pymongo API that the
|
| 23 |
+
existing routers and services already call (``find_one``, ``find``,
|
| 24 |
+
``insert_one``, ``update_one``, ``update_many``, ``delete_one``,
|
| 25 |
+
``delete_many``, ``replace_one``, ``count_documents``,
|
| 26 |
+
``find_one_and_delete``, ``aggregate`` with ``$match``/``$group``/``$in``).
|
| 27 |
+
The on-disk format is plain JSON-encoded documents in SQLite tables (one row
|
| 28 |
+
per document, with an indexed ``_id`` primary key, optional ``email`` UNIQUE
|
| 29 |
+
column on ``users``, and an indexed/UNIQUE ``user_id`` column on the user
|
| 30 |
+
scoped collections).
|
| 31 |
+
|
| 32 |
+
Why not full SQL? The router code uses Mongo idioms (``$in``, ``$set``,
|
| 33 |
+
``find_one_and_delete``, an aggregation pipeline) and ``bson.ObjectId`` IDs.
|
| 34 |
+
A small in-process query engine over JSON dicts keeps the routers untouched
|
| 35 |
+
while still benefiting from SQLite's durability and crash safety on the
|
| 36 |
+
bucket mount.
|
| 37 |
+
|
| 38 |
+
Why SQLite and not pure JSON files? Atomic commit, crash-consistent writes,
|
| 39 |
+
unique constraints (email, user-scoped profile/onboarding), and one
|
| 40 |
+
mongodump-style salvage file. WAL mode is *attempted* but if the bucket
|
| 41 |
+
FUSE doesn't support the required fcntl/mmap semantics SQLite falls back to
|
| 42 |
+
the default rollback journal automatically and this module logs a warning.
|
| 43 |
+
|
| 44 |
+
Errors from the underlying storage are re-raised as
|
| 45 |
+
``pymongo.errors.PyMongoError`` (or ``DuplicateKeyError`` for unique-index
|
| 46 |
+
violations) so the existing 5xx / DuplicateKey handlers keep working
|
| 47 |
+
unchanged.
|
| 48 |
+
"""
|
| 49 |
+
|
| 50 |
+
from __future__ import annotations
|
| 51 |
+
|
| 52 |
+
import asyncio
|
| 53 |
+
import json
|
| 54 |
+
import logging
|
| 55 |
+
import os
|
| 56 |
+
from datetime import datetime, timezone
|
| 57 |
+
from pathlib import Path
|
| 58 |
+
from typing import Any, AsyncIterator, Mapping, Sequence
|
| 59 |
+
|
| 60 |
+
import aiosqlite
|
| 61 |
+
from bson import ObjectId
|
| 62 |
+
from pymongo.errors import DuplicateKeyError, PyMongoError
|
| 63 |
+
|
| 64 |
+
LOG = logging.getLogger(__name__)
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
# ---------------------------------------------------------------------------
|
| 68 |
+
# Schema
|
| 69 |
+
# ---------------------------------------------------------------------------
|
| 70 |
+
|
| 71 |
+
# Each entry: (table name, has_unique_email_column, user_id_unique).
|
| 72 |
+
# users have a UNIQUE email column. user_profiles and onboarding_conversations
|
| 73 |
+
# have a UNIQUE user_id column (one record per user). chat_sessions and
|
| 74 |
+
# phd_canvases keep user_id only as an index (many records per user).
|
| 75 |
+
_TABLES: tuple[tuple[str, bool, bool], ...] = (
|
| 76 |
+
("users", True, False),
|
| 77 |
+
("chat_sessions", False, False),
|
| 78 |
+
("user_profiles", False, True),
|
| 79 |
+
("onboarding_conversations", False, True),
|
| 80 |
+
("phd_canvases", False, False),
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
_TABLE_NAMES = {t[0] for t in _TABLES}
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
# ---------------------------------------------------------------------------
|
| 87 |
+
# Paths
|
| 88 |
+
# ---------------------------------------------------------------------------
|
| 89 |
+
|
| 90 |
+
def _data_dir() -> Path:
|
| 91 |
+
raw = os.environ.get("DATA_DIR") or "/data"
|
| 92 |
+
p = Path(raw).expanduser()
|
| 93 |
+
p.mkdir(parents=True, exist_ok=True)
|
| 94 |
+
return p
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def _db_path() -> Path:
|
| 98 |
+
return _data_dir() / "cybersecurity_panel.db"
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
# ---------------------------------------------------------------------------
|
| 102 |
+
# JSON (de)serialization that round-trips ObjectId / datetime
|
| 103 |
+
# ---------------------------------------------------------------------------
|
| 104 |
+
|
| 105 |
+
def _enc(o: Any) -> Any:
|
| 106 |
+
if isinstance(o, ObjectId):
|
| 107 |
+
return {"$oid": str(o)}
|
| 108 |
+
if isinstance(o, datetime):
|
| 109 |
+
if o.tzinfo is None:
|
| 110 |
+
o = o.replace(tzinfo=timezone.utc)
|
| 111 |
+
return {"$date": o.isoformat()}
|
| 112 |
+
raise TypeError(f"Type {type(o)!r} is not JSON serialisable")
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
def _hook(d: dict) -> Any:
|
| 116 |
+
if len(d) == 1:
|
| 117 |
+
if "$oid" in d:
|
| 118 |
+
return ObjectId(d["$oid"])
|
| 119 |
+
if "$date" in d:
|
| 120 |
+
return datetime.fromisoformat(d["$date"])
|
| 121 |
+
return d
|
| 122 |
+
|
| 123 |
+
|
| 124 |
+
def _dumps(doc: Mapping) -> str:
|
| 125 |
+
return json.dumps(dict(doc), default=_enc)
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def _loads(text: str) -> dict:
|
| 129 |
+
return json.loads(text, object_hook=_hook)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
def _id_str(v: Any) -> str:
|
| 133 |
+
if isinstance(v, ObjectId):
|
| 134 |
+
return str(v)
|
| 135 |
+
if isinstance(v, str):
|
| 136 |
+
return v
|
| 137 |
+
return str(v)
|
| 138 |
+
|
| 139 |
+
|
| 140 |
+
# ---------------------------------------------------------------------------
|
| 141 |
+
# Connection management (single shared aiosqlite connection)
|
| 142 |
+
# ---------------------------------------------------------------------------
|
| 143 |
+
|
| 144 |
+
_conn: aiosqlite.Connection | None = None
|
| 145 |
+
_open_lock = asyncio.Lock()
|
| 146 |
+
|
| 147 |
+
|
| 148 |
+
async def _open() -> aiosqlite.Connection:
|
| 149 |
+
global _conn
|
| 150 |
+
if _conn is not None:
|
| 151 |
+
return _conn
|
| 152 |
+
async with _open_lock:
|
| 153 |
+
if _conn is not None:
|
| 154 |
+
return _conn
|
| 155 |
+
path = _db_path()
|
| 156 |
+
try:
|
| 157 |
+
conn = await aiosqlite.connect(str(path))
|
| 158 |
+
except (aiosqlite.OperationalError, OSError) as exc:
|
| 159 |
+
raise PyMongoError(f"Cannot open database at {path}: {exc}") from exc
|
| 160 |
+
|
| 161 |
+
# WAL gives concurrent readers; if the bucket FUSE doesn't support the
|
| 162 |
+
# mmap/lock semantics WAL needs, SQLite stays in the default rollback
|
| 163 |
+
# journal which still works on every filesystem we care about.
|
| 164 |
+
try:
|
| 165 |
+
cur = await conn.execute("PRAGMA journal_mode=WAL")
|
| 166 |
+
row = await cur.fetchone()
|
| 167 |
+
mode = (row[0] if row else "?")
|
| 168 |
+
if str(mode).lower() != "wal":
|
| 169 |
+
LOG.warning(
|
| 170 |
+
"Could not enable WAL mode on %s (got %r); using rollback journal",
|
| 171 |
+
path, mode,
|
| 172 |
+
)
|
| 173 |
+
except aiosqlite.OperationalError as exc:
|
| 174 |
+
LOG.warning("PRAGMA journal_mode=WAL failed (%s); using rollback journal", exc)
|
| 175 |
+
|
| 176 |
+
try:
|
| 177 |
+
await conn.execute("PRAGMA synchronous=NORMAL")
|
| 178 |
+
await conn.execute("PRAGMA foreign_keys=ON")
|
| 179 |
+
except aiosqlite.OperationalError:
|
| 180 |
+
pass
|
| 181 |
+
|
| 182 |
+
await _init_schema(conn)
|
| 183 |
+
await conn.commit()
|
| 184 |
+
LOG.info("SQLite ready at %s", path)
|
| 185 |
+
_conn = conn
|
| 186 |
+
return _conn
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
async def _init_schema(conn: aiosqlite.Connection) -> None:
|
| 190 |
+
parts: list[str] = []
|
| 191 |
+
for name, has_email, user_id_unique in _TABLES:
|
| 192 |
+
if has_email:
|
| 193 |
+
parts.append(
|
| 194 |
+
f"""
|
| 195 |
+
CREATE TABLE IF NOT EXISTS {name} (
|
| 196 |
+
_id TEXT PRIMARY KEY,
|
| 197 |
+
email TEXT UNIQUE,
|
| 198 |
+
doc TEXT NOT NULL
|
| 199 |
+
);
|
| 200 |
+
"""
|
| 201 |
+
)
|
| 202 |
+
else:
|
| 203 |
+
uniq = "UNIQUE" if user_id_unique else ""
|
| 204 |
+
parts.append(
|
| 205 |
+
f"""
|
| 206 |
+
CREATE TABLE IF NOT EXISTS {name} (
|
| 207 |
+
_id TEXT PRIMARY KEY,
|
| 208 |
+
user_id TEXT {uniq},
|
| 209 |
+
doc TEXT NOT NULL
|
| 210 |
+
);
|
| 211 |
+
CREATE INDEX IF NOT EXISTS idx_{name}_user_id ON {name}(user_id);
|
| 212 |
+
"""
|
| 213 |
+
)
|
| 214 |
+
await conn.executescript("\n".join(parts))
|
| 215 |
+
|
| 216 |
+
|
| 217 |
+
async def _close() -> None:
|
| 218 |
+
"""Best-effort connection close (called from FastAPI lifespan teardown)."""
|
| 219 |
+
global _conn
|
| 220 |
+
if _conn is not None:
|
| 221 |
+
try:
|
| 222 |
+
await _conn.close()
|
| 223 |
+
except Exception: # pragma: no cover - shutdown path
|
| 224 |
+
pass
|
| 225 |
+
_conn = None
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
# ---------------------------------------------------------------------------
|
| 229 |
+
# Filter matching (only the Mongo operators the routers actually use)
|
| 230 |
+
# ---------------------------------------------------------------------------
|
| 231 |
+
|
| 232 |
+
def _match_value(actual: Any, want: Any) -> bool:
|
| 233 |
+
if isinstance(want, dict) and len(want) == 1:
|
| 234 |
+
op, arg = next(iter(want.items()))
|
| 235 |
+
if op == "$in":
|
| 236 |
+
return actual in arg
|
| 237 |
+
return actual == want
|
| 238 |
+
|
| 239 |
+
|
| 240 |
+
def _matches(doc: Mapping, query: Mapping) -> bool:
|
| 241 |
+
for k, v in query.items():
|
| 242 |
+
if not _match_value(doc.get(k), v):
|
| 243 |
+
return False
|
| 244 |
+
return True
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
# ---------------------------------------------------------------------------
|
| 248 |
+
# Result types (shaped like pymongo's so existing ``res.inserted_id`` etc. works)
|
| 249 |
+
# ---------------------------------------------------------------------------
|
| 250 |
+
|
| 251 |
+
class _InsertOneResult:
|
| 252 |
+
def __init__(self, inserted_id: ObjectId) -> None:
|
| 253 |
+
self.inserted_id = inserted_id
|
| 254 |
+
|
| 255 |
+
|
| 256 |
+
class _UpdateResult:
|
| 257 |
+
def __init__(self, matched: int, modified: int, upserted_id: ObjectId | None = None) -> None:
|
| 258 |
+
self.matched_count = matched
|
| 259 |
+
self.modified_count = modified
|
| 260 |
+
self.upserted_id = upserted_id
|
| 261 |
+
|
| 262 |
+
|
| 263 |
+
class _DeleteResult:
|
| 264 |
+
def __init__(self, deleted: int) -> None:
|
| 265 |
+
self.deleted_count = deleted
|
| 266 |
+
|
| 267 |
+
|
| 268 |
+
# ---------------------------------------------------------------------------
|
| 269 |
+
# Cursor / aggregation
|
| 270 |
+
# ---------------------------------------------------------------------------
|
| 271 |
+
|
| 272 |
+
def _sort_key_factory(field: str):
|
| 273 |
+
def keyfn(d: Mapping):
|
| 274 |
+
v = d.get(field)
|
| 275 |
+
return (v is None, v)
|
| 276 |
+
return keyfn
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
class _Cursor:
|
| 280 |
+
"""Lazy cursor: docs are fetched the first time you await iteration / to_list."""
|
| 281 |
+
|
| 282 |
+
def __init__(self, fetch):
|
| 283 |
+
self._fetch = fetch
|
| 284 |
+
self._sort_field: str | None = None
|
| 285 |
+
self._sort_dir: int = 1
|
| 286 |
+
self._limit: int | None = None
|
| 287 |
+
self._skip: int = 0
|
| 288 |
+
|
| 289 |
+
def sort(self, field: str | list[tuple[str, int]], direction: int = 1) -> "_Cursor":
|
| 290 |
+
# Routers also call .sort([("created_at", -1)]) — accept that form too,
|
| 291 |
+
# honoring only the first key (Mongo allows compound sort; for the
|
| 292 |
+
# routers we have, single-key sort is sufficient).
|
| 293 |
+
if isinstance(field, list) and field:
|
| 294 |
+
f, d = field[0]
|
| 295 |
+
self._sort_field = f
|
| 296 |
+
self._sort_dir = d
|
| 297 |
+
else:
|
| 298 |
+
self._sort_field = field # type: ignore[assignment]
|
| 299 |
+
self._sort_dir = direction
|
| 300 |
+
return self
|
| 301 |
+
|
| 302 |
+
def limit(self, n: int) -> "_Cursor":
|
| 303 |
+
self._limit = int(n)
|
| 304 |
+
return self
|
| 305 |
+
|
| 306 |
+
def skip(self, n: int) -> "_Cursor":
|
| 307 |
+
self._skip = int(n)
|
| 308 |
+
return self
|
| 309 |
+
|
| 310 |
+
async def _materialize(self) -> list[dict]:
|
| 311 |
+
items = await self._fetch()
|
| 312 |
+
if self._sort_field is not None:
|
| 313 |
+
items.sort(key=_sort_key_factory(self._sort_field), reverse=self._sort_dir < 0)
|
| 314 |
+
if self._skip:
|
| 315 |
+
items = items[self._skip:]
|
| 316 |
+
if self._limit is not None:
|
| 317 |
+
items = items[: self._limit]
|
| 318 |
+
return items
|
| 319 |
+
|
| 320 |
+
async def to_list(self, length: int | None = None) -> list[dict]:
|
| 321 |
+
items = await self._materialize()
|
| 322 |
+
return items if length is None else items[:length]
|
| 323 |
+
|
| 324 |
+
def __aiter__(self) -> AsyncIterator[dict]:
|
| 325 |
+
async def _gen():
|
| 326 |
+
for d in await self._materialize():
|
| 327 |
+
yield d
|
| 328 |
+
return _gen()
|
| 329 |
+
|
| 330 |
+
|
| 331 |
+
class _Aggregate:
|
| 332 |
+
"""Tiny aggregation runner — supports the pipeline shapes the routers use:
|
| 333 |
+
``$match`` and ``$group`` with ``_id: None`` plus ``$sum``.
|
| 334 |
+
|
| 335 |
+
More exotic shapes (``$lookup``) raise NotImplementedError; the only
|
| 336 |
+
caller is a maintenance ``cleanup_old_canvas_data`` function that isn't
|
| 337 |
+
on the request path, so this is safe.
|
| 338 |
+
"""
|
| 339 |
+
|
| 340 |
+
def __init__(self, fetch, pipeline: Sequence[Mapping]):
|
| 341 |
+
self._fetch = fetch
|
| 342 |
+
self._pipeline = list(pipeline)
|
| 343 |
+
|
| 344 |
+
async def to_list(self, length: int | None = None) -> list[dict]:
|
| 345 |
+
items = await self._fetch()
|
| 346 |
+
for stage in self._pipeline:
|
| 347 |
+
if "$match" in stage:
|
| 348 |
+
items = [d for d in items if _matches(d, stage["$match"])]
|
| 349 |
+
elif "$group" in stage:
|
| 350 |
+
spec = stage["$group"]
|
| 351 |
+
gid = spec.get("_id")
|
| 352 |
+
if gid is not None:
|
| 353 |
+
raise NotImplementedError("Only $group with _id: None is supported")
|
| 354 |
+
bucket: dict[str, Any] = {"_id": None}
|
| 355 |
+
for k, v in spec.items():
|
| 356 |
+
if k == "_id":
|
| 357 |
+
continue
|
| 358 |
+
if not (isinstance(v, dict) and len(v) == 1):
|
| 359 |
+
raise NotImplementedError(f"Unsupported $group field: {k}={v!r}")
|
| 360 |
+
op, arg = next(iter(v.items()))
|
| 361 |
+
if op == "$sum" and isinstance(arg, str) and arg.startswith("$"):
|
| 362 |
+
field = arg[1:]
|
| 363 |
+
bucket[k] = sum(int(d.get(field) or 0) for d in items)
|
| 364 |
+
else:
|
| 365 |
+
raise NotImplementedError(f"Unsupported $group op: {op}")
|
| 366 |
+
items = [bucket]
|
| 367 |
+
else:
|
| 368 |
+
raise NotImplementedError(f"Unsupported aggregation stage: {stage}")
|
| 369 |
+
return items if length is None else items[:length]
|
| 370 |
+
|
| 371 |
+
|
| 372 |
+
# ---------------------------------------------------------------------------
|
| 373 |
+
# Collection
|
| 374 |
+
# ---------------------------------------------------------------------------
|
| 375 |
+
|
| 376 |
+
class Collection:
|
| 377 |
+
def __init__(self, name: str) -> None:
|
| 378 |
+
self._name = name
|
| 379 |
+
self._has_email = name == "users"
|
| 380 |
+
|
| 381 |
+
# ----- low-level helpers ------------------------------------------------
|
| 382 |
+
|
| 383 |
+
async def _all(self, conn: aiosqlite.Connection) -> list[dict]:
|
| 384 |
+
cur = await conn.execute(f"SELECT doc FROM {self._name}")
|
| 385 |
+
rows = await cur.fetchall()
|
| 386 |
+
return [_loads(r[0]) for r in rows]
|
| 387 |
+
|
| 388 |
+
def _insert_params(self, _id: str, email: str | None, user_id: str | None, doc_json: str):
|
| 389 |
+
if self._has_email:
|
| 390 |
+
return (
|
| 391 |
+
"INSERT INTO users (_id, email, doc) VALUES (?, ?, ?)",
|
| 392 |
+
(_id, email, doc_json),
|
| 393 |
+
)
|
| 394 |
+
return (
|
| 395 |
+
f"INSERT INTO {self._name} (_id, user_id, doc) VALUES (?, ?, ?)",
|
| 396 |
+
(_id, user_id, doc_json),
|
| 397 |
+
)
|
| 398 |
+
|
| 399 |
+
async def _update_row(
|
| 400 |
+
self,
|
| 401 |
+
conn: aiosqlite.Connection,
|
| 402 |
+
_id: str,
|
| 403 |
+
email: str | None,
|
| 404 |
+
user_id: str | None,
|
| 405 |
+
doc_json: str,
|
| 406 |
+
) -> None:
|
| 407 |
+
if self._has_email:
|
| 408 |
+
await conn.execute(
|
| 409 |
+
"UPDATE users SET email=?, doc=? WHERE _id=?",
|
| 410 |
+
(email, doc_json, _id),
|
| 411 |
+
)
|
| 412 |
+
else:
|
| 413 |
+
await conn.execute(
|
| 414 |
+
f"UPDATE {self._name} SET user_id=?, doc=? WHERE _id=?",
|
| 415 |
+
(user_id, doc_json, _id),
|
| 416 |
+
)
|
| 417 |
+
|
| 418 |
+
def _apply_update(self, doc: dict, update: Mapping) -> None:
|
| 419 |
+
for op, fields in update.items():
|
| 420 |
+
if op == "$set":
|
| 421 |
+
for k, v in fields.items():
|
| 422 |
+
doc[k] = v
|
| 423 |
+
elif op == "$unset":
|
| 424 |
+
for k in fields:
|
| 425 |
+
doc.pop(k, None)
|
| 426 |
+
elif op == "$inc":
|
| 427 |
+
for k, v in fields.items():
|
| 428 |
+
doc[k] = (doc.get(k) or 0) + v
|
| 429 |
+
else:
|
| 430 |
+
raise NotImplementedError(f"Unsupported update operator: {op}")
|
| 431 |
+
|
| 432 |
+
# ----- public API -------------------------------------------------------
|
| 433 |
+
|
| 434 |
+
async def find_one(self, query: Mapping | None = None, projection: Mapping | None = None) -> dict | None:
|
| 435 |
+
query = dict(query or {})
|
| 436 |
+
try:
|
| 437 |
+
conn = await _open()
|
| 438 |
+
# Fast paths for primary-key / unique-index lookups.
|
| 439 |
+
if list(query.keys()) == ["_id"] and isinstance(query["_id"], (ObjectId, str)):
|
| 440 |
+
cur = await conn.execute(
|
| 441 |
+
f"SELECT doc FROM {self._name} WHERE _id = ?",
|
| 442 |
+
(_id_str(query["_id"]),),
|
| 443 |
+
)
|
| 444 |
+
row = await cur.fetchone()
|
| 445 |
+
return _loads(row[0]) if row else None
|
| 446 |
+
if self._has_email and list(query.keys()) == ["email"]:
|
| 447 |
+
cur = await conn.execute(
|
| 448 |
+
"SELECT doc FROM users WHERE email = ?",
|
| 449 |
+
(query["email"],),
|
| 450 |
+
)
|
| 451 |
+
row = await cur.fetchone()
|
| 452 |
+
return _loads(row[0]) if row else None
|
| 453 |
+
for d in await self._all(conn):
|
| 454 |
+
if _matches(d, query):
|
| 455 |
+
return d
|
| 456 |
+
return None
|
| 457 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 458 |
+
raise PyMongoError(f"find_one({self._name}) failed: {exc}") from exc
|
| 459 |
+
|
| 460 |
+
def find(self, query: Mapping | None = None, projection: Mapping | None = None) -> _Cursor:
|
| 461 |
+
q = dict(query or {})
|
| 462 |
+
|
| 463 |
+
async def _go() -> list[dict]:
|
| 464 |
+
try:
|
| 465 |
+
conn = await _open()
|
| 466 |
+
docs = await self._all(conn)
|
| 467 |
+
return [d for d in docs if _matches(d, q)]
|
| 468 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 469 |
+
raise PyMongoError(f"find({self._name}) failed: {exc}") from exc
|
| 470 |
+
|
| 471 |
+
return _Cursor(_go)
|
| 472 |
+
|
| 473 |
+
def aggregate(self, pipeline: Sequence[Mapping]) -> _Aggregate:
|
| 474 |
+
async def _go() -> list[dict]:
|
| 475 |
+
try:
|
| 476 |
+
conn = await _open()
|
| 477 |
+
return await self._all(conn)
|
| 478 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 479 |
+
raise PyMongoError(f"aggregate({self._name}) failed: {exc}") from exc
|
| 480 |
+
|
| 481 |
+
return _Aggregate(_go, pipeline)
|
| 482 |
+
|
| 483 |
+
async def count_documents(self, filter: Mapping | None = None) -> int:
|
| 484 |
+
f = dict(filter or {})
|
| 485 |
+
try:
|
| 486 |
+
conn = await _open()
|
| 487 |
+
if not f:
|
| 488 |
+
cur = await conn.execute(f"SELECT COUNT(*) FROM {self._name}")
|
| 489 |
+
row = await cur.fetchone()
|
| 490 |
+
return int(row[0]) if row else 0
|
| 491 |
+
return sum(1 for d in await self._all(conn) if _matches(d, f))
|
| 492 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 493 |
+
raise PyMongoError(f"count_documents({self._name}) failed: {exc}") from exc
|
| 494 |
+
|
| 495 |
+
async def insert_one(self, doc: Mapping) -> _InsertOneResult:
|
| 496 |
+
d = dict(doc)
|
| 497 |
+
if "_id" not in d or d["_id"] is None:
|
| 498 |
+
d["_id"] = ObjectId()
|
| 499 |
+
oid = d["_id"]
|
| 500 |
+
try:
|
| 501 |
+
conn = await _open()
|
| 502 |
+
email = d.get("email") if self._has_email else None
|
| 503 |
+
user_id = _id_str(d["user_id"]) if "user_id" in d else None
|
| 504 |
+
sql, params = self._insert_params(_id_str(oid), email, user_id, _dumps(d))
|
| 505 |
+
await conn.execute(sql, params)
|
| 506 |
+
await conn.commit()
|
| 507 |
+
return _InsertOneResult(oid)
|
| 508 |
+
except aiosqlite.IntegrityError as exc:
|
| 509 |
+
raise DuplicateKeyError(str(exc)) from exc
|
| 510 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 511 |
+
raise PyMongoError(f"insert_one({self._name}) failed: {exc}") from exc
|
| 512 |
+
|
| 513 |
+
async def update_one(
|
| 514 |
+
self,
|
| 515 |
+
filter: Mapping,
|
| 516 |
+
update: Mapping,
|
| 517 |
+
upsert: bool = False,
|
| 518 |
+
) -> _UpdateResult:
|
| 519 |
+
try:
|
| 520 |
+
conn = await _open()
|
| 521 |
+
doc = await self.find_one(filter)
|
| 522 |
+
if doc is None and not upsert:
|
| 523 |
+
return _UpdateResult(0, 0)
|
| 524 |
+
if doc is None and upsert:
|
| 525 |
+
base: dict[str, Any] = {}
|
| 526 |
+
for k, v in filter.items():
|
| 527 |
+
if not (isinstance(v, dict) and any(str(kk).startswith("$") for kk in v)):
|
| 528 |
+
base[k] = v
|
| 529 |
+
if "_id" not in base:
|
| 530 |
+
base["_id"] = ObjectId()
|
| 531 |
+
self._apply_update(base, update)
|
| 532 |
+
res = await self.insert_one(base)
|
| 533 |
+
return _UpdateResult(0, 0, upserted_id=res.inserted_id)
|
| 534 |
+
self._apply_update(doc, update)
|
| 535 |
+
email = doc.get("email") if self._has_email else None
|
| 536 |
+
user_id = _id_str(doc["user_id"]) if "user_id" in doc else None
|
| 537 |
+
await self._update_row(conn, _id_str(doc["_id"]), email, user_id, _dumps(doc))
|
| 538 |
+
await conn.commit()
|
| 539 |
+
return _UpdateResult(1, 1)
|
| 540 |
+
except aiosqlite.IntegrityError as exc:
|
| 541 |
+
raise DuplicateKeyError(str(exc)) from exc
|
| 542 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 543 |
+
raise PyMongoError(f"update_one({self._name}) failed: {exc}") from exc
|
| 544 |
+
|
| 545 |
+
async def update_many(
|
| 546 |
+
self,
|
| 547 |
+
filter: Mapping,
|
| 548 |
+
update: Mapping,
|
| 549 |
+
) -> _UpdateResult:
|
| 550 |
+
try:
|
| 551 |
+
conn = await _open()
|
| 552 |
+
docs = [d for d in await self._all(conn) if _matches(d, filter)]
|
| 553 |
+
if not docs:
|
| 554 |
+
return _UpdateResult(0, 0)
|
| 555 |
+
for doc in docs:
|
| 556 |
+
self._apply_update(doc, update)
|
| 557 |
+
email = doc.get("email") if self._has_email else None
|
| 558 |
+
user_id = _id_str(doc["user_id"]) if "user_id" in doc else None
|
| 559 |
+
await self._update_row(conn, _id_str(doc["_id"]), email, user_id, _dumps(doc))
|
| 560 |
+
await conn.commit()
|
| 561 |
+
return _UpdateResult(len(docs), len(docs))
|
| 562 |
+
except aiosqlite.IntegrityError as exc:
|
| 563 |
+
raise DuplicateKeyError(str(exc)) from exc
|
| 564 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 565 |
+
raise PyMongoError(f"update_many({self._name}) failed: {exc}") from exc
|
| 566 |
+
|
| 567 |
+
async def replace_one(
|
| 568 |
+
self,
|
| 569 |
+
filter: Mapping,
|
| 570 |
+
replacement: Mapping,
|
| 571 |
+
upsert: bool = False,
|
| 572 |
+
) -> _UpdateResult:
|
| 573 |
+
try:
|
| 574 |
+
conn = await _open()
|
| 575 |
+
existing = await self.find_one(filter)
|
| 576 |
+
new_doc = dict(replacement)
|
| 577 |
+
if existing is None and not upsert:
|
| 578 |
+
return _UpdateResult(0, 0)
|
| 579 |
+
if existing is None and upsert:
|
| 580 |
+
if "_id" not in new_doc or new_doc["_id"] is None:
|
| 581 |
+
new_doc["_id"] = ObjectId()
|
| 582 |
+
res = await self.insert_one(new_doc)
|
| 583 |
+
return _UpdateResult(0, 0, upserted_id=res.inserted_id)
|
| 584 |
+
# Preserve the existing _id so the replacement updates the same row.
|
| 585 |
+
new_doc["_id"] = existing["_id"]
|
| 586 |
+
email = new_doc.get("email") if self._has_email else None
|
| 587 |
+
user_id = _id_str(new_doc["user_id"]) if "user_id" in new_doc else None
|
| 588 |
+
await self._update_row(conn, _id_str(existing["_id"]), email, user_id, _dumps(new_doc))
|
| 589 |
+
await conn.commit()
|
| 590 |
+
return _UpdateResult(1, 1)
|
| 591 |
+
except aiosqlite.IntegrityError as exc:
|
| 592 |
+
raise DuplicateKeyError(str(exc)) from exc
|
| 593 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 594 |
+
raise PyMongoError(f"replace_one({self._name}) failed: {exc}") from exc
|
| 595 |
+
|
| 596 |
+
async def delete_one(self, filter: Mapping) -> _DeleteResult:
|
| 597 |
+
try:
|
| 598 |
+
conn = await _open()
|
| 599 |
+
doc = await self.find_one(filter)
|
| 600 |
+
if doc is None:
|
| 601 |
+
return _DeleteResult(0)
|
| 602 |
+
await conn.execute(
|
| 603 |
+
f"DELETE FROM {self._name} WHERE _id=?",
|
| 604 |
+
(_id_str(doc["_id"]),),
|
| 605 |
+
)
|
| 606 |
+
await conn.commit()
|
| 607 |
+
return _DeleteResult(1)
|
| 608 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 609 |
+
raise PyMongoError(f"delete_one({self._name}) failed: {exc}") from exc
|
| 610 |
+
|
| 611 |
+
async def delete_many(self, filter: Mapping) -> _DeleteResult:
|
| 612 |
+
try:
|
| 613 |
+
conn = await _open()
|
| 614 |
+
docs = [d for d in await self._all(conn) if _matches(d, filter)]
|
| 615 |
+
for d in docs:
|
| 616 |
+
await conn.execute(
|
| 617 |
+
f"DELETE FROM {self._name} WHERE _id=?",
|
| 618 |
+
(_id_str(d["_id"]),),
|
| 619 |
+
)
|
| 620 |
+
if docs:
|
| 621 |
+
await conn.commit()
|
| 622 |
+
return _DeleteResult(len(docs))
|
| 623 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 624 |
+
raise PyMongoError(f"delete_many({self._name}) failed: {exc}") from exc
|
| 625 |
+
|
| 626 |
+
async def find_one_and_delete(self, filter: Mapping) -> dict | None:
|
| 627 |
+
try:
|
| 628 |
+
conn = await _open()
|
| 629 |
+
doc = await self.find_one(filter)
|
| 630 |
+
if doc is None:
|
| 631 |
+
return None
|
| 632 |
+
await conn.execute(
|
| 633 |
+
f"DELETE FROM {self._name} WHERE _id=?",
|
| 634 |
+
(_id_str(doc["_id"]),),
|
| 635 |
+
)
|
| 636 |
+
await conn.commit()
|
| 637 |
+
return doc
|
| 638 |
+
except (aiosqlite.OperationalError, aiosqlite.DatabaseError, OSError) as exc:
|
| 639 |
+
raise PyMongoError(f"find_one_and_delete({self._name}) failed: {exc}") from exc
|
| 640 |
+
|
| 641 |
+
async def create_index(self, keys: Any, unique: bool = False, **_: Any) -> str:
|
| 642 |
+
# The schema already declares the indexes the routers rely on
|
| 643 |
+
# (PK on _id, UNIQUE on users.email and on user_profiles.user_id /
|
| 644 |
+
# onboarding_conversations.user_id, plain index on user_id elsewhere).
|
| 645 |
+
# Stays a no-op so legacy main.py / canvas_database.py calls keep
|
| 646 |
+
# working without changes.
|
| 647 |
+
return "ok"
|
| 648 |
+
|
| 649 |
+
|
| 650 |
+
# ---------------------------------------------------------------------------
|
| 651 |
+
# Database
|
| 652 |
+
# ---------------------------------------------------------------------------
|
| 653 |
+
|
| 654 |
+
class Database:
|
| 655 |
+
def __init__(self) -> None:
|
| 656 |
+
self._collections: dict[str, Collection] = {}
|
| 657 |
+
|
| 658 |
+
def _coll(self, name: str) -> Collection:
|
| 659 |
+
c = self._collections.get(name)
|
| 660 |
+
if c is None:
|
| 661 |
+
c = Collection(name)
|
| 662 |
+
self._collections[name] = c
|
| 663 |
+
return c
|
| 664 |
+
|
| 665 |
+
def __getattr__(self, name: str) -> Collection:
|
| 666 |
+
if name.startswith("_"):
|
| 667 |
+
raise AttributeError(name)
|
| 668 |
+
return self._coll(name)
|
| 669 |
+
|
| 670 |
+
def __getitem__(self, name: str) -> Collection:
|
| 671 |
+
return self._coll(name)
|
| 672 |
+
|
| 673 |
+
|
| 674 |
+
_database: Database | None = None
|
| 675 |
+
|
| 676 |
+
|
| 677 |
+
def get_database() -> Database:
|
| 678 |
+
global _database
|
| 679 |
+
if _database is None:
|
| 680 |
+
_database = Database()
|
| 681 |
+
return _database
|
| 682 |
+
|
| 683 |
+
|
| 684 |
+
__all__ = [
|
| 685 |
+
"Collection",
|
| 686 |
+
"Database",
|
| 687 |
+
"get_database",
|
| 688 |
+
]
|
|
@@ -12,6 +12,20 @@ from pathlib import Path
|
|
| 12 |
|
| 13 |
logger = logging.getLogger(__name__)
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
# Download required NLTK data
|
| 16 |
try:
|
| 17 |
nltk.data.find('tokenizers/punkt')
|
|
@@ -177,14 +191,16 @@ class RAGManager:
|
|
| 177 |
Handles document storage, embedding, and retrieval using ChromaDB
|
| 178 |
"""
|
| 179 |
|
| 180 |
-
def __init__(self, embedding_model: str = None, persist_directory: str =
|
| 181 |
from app.config import get_settings
|
| 182 |
settings = get_settings()
|
| 183 |
if embedding_model is None:
|
| 184 |
embedding_model = settings.rag.embedding_model
|
| 185 |
self.embedding_model_name = embedding_model
|
|
|
|
|
|
|
| 186 |
self.persist_directory = Path(persist_directory)
|
| 187 |
-
self.persist_directory.mkdir(exist_ok=True)
|
| 188 |
|
| 189 |
# Initialize embedding model
|
| 190 |
logger.info(f"Loading embedding model: {embedding_model}")
|
|
@@ -459,13 +475,15 @@ class RAGManager:
|
|
| 459 |
|
| 460 |
|
| 461 |
class EnhancedRAGManager:
|
| 462 |
-
def __init__(self, persist_directory: str =
|
| 463 |
"""Initialize enhanced RAG manager with improved document handling"""
|
| 464 |
from app.config import get_settings
|
| 465 |
settings = get_settings()
|
| 466 |
|
|
|
|
|
|
|
| 467 |
self.persist_directory = persist_directory
|
| 468 |
-
Path(persist_directory).mkdir(exist_ok=True)
|
| 469 |
|
| 470 |
# Initialize ChromaDB client
|
| 471 |
self.client = chromadb.PersistentClient(
|
|
|
|
| 12 |
|
| 13 |
logger = logging.getLogger(__name__)
|
| 14 |
|
| 15 |
+
|
| 16 |
+
def _default_chroma_path(subdir: str) -> str:
|
| 17 |
+
"""Resolve the default ChromaDB persistence directory.
|
| 18 |
+
|
| 19 |
+
On HF Spaces (and any deployment that sets ``DATA_DIR``) the embeddings
|
| 20 |
+
must live on the bucket mount so they survive Space rebuilds. In local
|
| 21 |
+
dev DATA_DIR is unset, so we keep the historical working-directory path
|
| 22 |
+
so existing local installs and tests don't move their data.
|
| 23 |
+
"""
|
| 24 |
+
data_dir = os.environ.get("DATA_DIR", "").strip()
|
| 25 |
+
if data_dir:
|
| 26 |
+
return str(Path(data_dir) / "chroma" / subdir)
|
| 27 |
+
return f"./{subdir}"
|
| 28 |
+
|
| 29 |
# Download required NLTK data
|
| 30 |
try:
|
| 31 |
nltk.data.find('tokenizers/punkt')
|
|
|
|
| 191 |
Handles document storage, embedding, and retrieval using ChromaDB
|
| 192 |
"""
|
| 193 |
|
| 194 |
+
def __init__(self, embedding_model: str = None, persist_directory: str = None):
|
| 195 |
from app.config import get_settings
|
| 196 |
settings = get_settings()
|
| 197 |
if embedding_model is None:
|
| 198 |
embedding_model = settings.rag.embedding_model
|
| 199 |
self.embedding_model_name = embedding_model
|
| 200 |
+
if persist_directory is None:
|
| 201 |
+
persist_directory = _default_chroma_path("chroma_db")
|
| 202 |
self.persist_directory = Path(persist_directory)
|
| 203 |
+
self.persist_directory.mkdir(parents=True, exist_ok=True)
|
| 204 |
|
| 205 |
# Initialize embedding model
|
| 206 |
logger.info(f"Loading embedding model: {embedding_model}")
|
|
|
|
| 475 |
|
| 476 |
|
| 477 |
class EnhancedRAGManager:
|
| 478 |
+
def __init__(self, persist_directory: str = None):
|
| 479 |
"""Initialize enhanced RAG manager with improved document handling"""
|
| 480 |
from app.config import get_settings
|
| 481 |
settings = get_settings()
|
| 482 |
|
| 483 |
+
if persist_directory is None:
|
| 484 |
+
persist_directory = _default_chroma_path("chromadb_storage")
|
| 485 |
self.persist_directory = persist_directory
|
| 486 |
+
Path(persist_directory).mkdir(parents=True, exist_ok=True)
|
| 487 |
|
| 488 |
# Initialize ChromaDB client
|
| 489 |
self.client = chromadb.PersistentClient(
|
|
@@ -83,17 +83,34 @@ def get_public_config():
|
|
| 83 |
"""Return the public (non-secret) application configuration."""
|
| 84 |
return settings.get_frontend_config()
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
"""Return the public (non-secret) application configuration."""
|
| 84 |
return settings.get_frontend_config()
|
| 85 |
|
| 86 |
+
|
| 87 |
+
# ---------------------------------------------------------------------------
|
| 88 |
+
# Static SPA mount — Hugging Face Spaces / single-container deployment
|
| 89 |
+
# ---------------------------------------------------------------------------
|
| 90 |
+
# When the Docker build copies the React production bundle into ``./static``
|
| 91 |
+
# (sibling of this app/ directory), expose it at "/" so the API and the
|
| 92 |
+
# SPA share the FastAPI origin. In local development the static dir is
|
| 93 |
+
# absent and a JSON banner is returned at "/" instead.
|
| 94 |
+
_static_dir = Path(__file__).resolve().parent.parent / "static"
|
| 95 |
+
_should_mount_static = _static_dir.is_dir()
|
| 96 |
+
|
| 97 |
+
if _should_mount_static:
|
| 98 |
+
app.mount(
|
| 99 |
+
"/",
|
| 100 |
+
StaticFiles(directory=str(_static_dir), html=True),
|
| 101 |
+
name="spa",
|
| 102 |
+
)
|
| 103 |
+
else:
|
| 104 |
+
@app.get("/")
|
| 105 |
+
def root():
|
| 106 |
+
return {
|
| 107 |
+
"message": f"{settings.app.title} Backend",
|
| 108 |
+
"version": "2.0.0",
|
| 109 |
+
"features": [
|
| 110 |
+
"User Authentication",
|
| 111 |
+
"Persistent Chat Sessions (SQLite via aiosqlite)",
|
| 112 |
+
"Ollama Support",
|
| 113 |
+
"Gemini API Support",
|
| 114 |
+
"Configurable Personas",
|
| 115 |
+
],
|
| 116 |
+
}
|
|
@@ -1,8 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import chromadb
|
| 2 |
from chromadb.config import Settings
|
|
|
|
| 3 |
from app.llm.embedding_client import get_embedding
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
collection = client.get_or_create_collection("persona_knowledge")
|
| 8 |
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from pathlib import Path
|
| 3 |
+
|
| 4 |
import chromadb
|
| 5 |
from chromadb.config import Settings
|
| 6 |
+
|
| 7 |
from app.llm.embedding_client import get_embedding
|
| 8 |
|
| 9 |
+
|
| 10 |
+
def _persona_knowledge_path() -> str:
|
| 11 |
+
"""Resolve the persona-knowledge ChromaDB path.
|
| 12 |
+
|
| 13 |
+
Mirrors ``app.core.rag_manager._default_chroma_path``: when ``DATA_DIR``
|
| 14 |
+
is set (HF Spaces, any bucket-mounted deployment) the embeddings live
|
| 15 |
+
under ``${DATA_DIR}/chroma/persona_knowledge`` so they survive Space
|
| 16 |
+
rebuilds. Local installs keep the historical relative path.
|
| 17 |
+
"""
|
| 18 |
+
data_dir = os.environ.get("DATA_DIR", "").strip()
|
| 19 |
+
if data_dir:
|
| 20 |
+
return str(Path(data_dir) / "chroma" / "persona_knowledge")
|
| 21 |
+
return "./chroma_storage"
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
_path = _persona_knowledge_path()
|
| 25 |
+
Path(_path).mkdir(parents=True, exist_ok=True)
|
| 26 |
+
client = chromadb.PersistentClient(path=_path)
|
| 27 |
|
| 28 |
collection = client.get_or_create_collection("persona_knowledge")
|
| 29 |
|
|
@@ -32,9 +32,11 @@ markdown~=3.0
|
|
| 32 |
# PDF generation and export
|
| 33 |
reportlab~=4.4
|
| 34 |
|
| 35 |
-
#
|
|
|
|
|
|
|
|
|
|
| 36 |
pymongo~=4.16
|
| 37 |
-
motor~=3.7
|
| 38 |
|
| 39 |
# Authentication and security
|
| 40 |
passlib[bcrypt]~=1.7
|
|
|
|
| 32 |
# PDF generation and export
|
| 33 |
reportlab~=4.4
|
| 34 |
|
| 35 |
+
# Persistence: SQLite via aiosqlite on the HF Storage Bucket mount (no MongoDB).
|
| 36 |
+
# pymongo is kept *only* for bson.ObjectId and the DuplicateKeyError /
|
| 37 |
+
# PyMongoError types that the existing routers and the SQLite shim raise.
|
| 38 |
+
aiosqlite~=0.20
|
| 39 |
pymongo~=4.16
|
|
|
|
| 40 |
|
| 41 |
# Authentication and security
|
| 42 |
passlib[bcrypt]~=1.7
|