Spaces:

mmoise00
/

askchosmky

Runtime error

App Files Files Community

mmoise00 commited on Apr 4

Commit

60ffeeb

1 Parent(s): fe55c4e

prepare hugging face deployment and enable gitguardian

Browse files

Files changed (14) hide show

.dockerignore +12 -0
.gitignore +3 -1
.pre-commit-config.yaml +6 -0
Dockerfile +47 -0
README.md +53 -8
backend/__init__.py +3 -0
backend/api.py +37 -1
frontend/components/FlowCanvas.tsx +4 -1
frontend/hooks/useQueryStream.ts +6 -1
frontend/next.config.ts +2 -1
frontend/package.json +3 -3
main.py +96 -22
pyproject.toml +1 -1
start.sh +21 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,12 @@

+.venv/
+__pycache__/
+*.pyc
+node_modules/
+frontend/node_modules/
+frontend/.next/
+frontend/out/
+lightrag_store/
+.git/
+.DS_Store
+.env
+frontend/.env.local

.gitignore CHANGED Viewed

@@ -25,6 +25,7 @@ build/
 # ── RAG / vector stores (can be large and contain indexed private data) ───
 lightrag_store*/
 # ── Generated files ──────────────────────────────────────────────────────
 .files/
@@ -42,4 +43,5 @@ Thumbs.db
 .idea/
 .vscode/
 *.swp
-*.swo

 # ── RAG / vector stores (can be large and contain indexed private data) ───
 lightrag_store*/
+.tools/
 # ── Generated files ──────────────────────────────────────────────────────
 .files/
 .idea/
 .vscode/
 *.swp
+*.swo
+AGENTS.md

.pre-commit-config.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+repos:
+  - repo: https://github.com/GitGuardian/ggshield
+    rev: v1.24.0
+    hooks:
+      - id: ggshield
+        stages: [pre-commit, pre-push]

Dockerfile ADDED Viewed

	@@ -0,0 +1,47 @@

+FROM python:3.11-slim
+ENV NODE_MAJOR=20 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    NEXT_TELEMETRY_DISABLED=1
+WORKDIR /app
+# System dependencies (Node.js for building frontend + git for HF datasets)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    curl \
+    git && \
+    curl -fsSL "https://deb.nodesource.com/setup_${NODE_MAJOR}.x" | bash - && \
+    apt-get install -y --no-install-recommends nodejs && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+# Python dependencies (copy metadata first for caching)
+COPY pyproject.toml README.md ./
+COPY backend/ ./backend/
+RUN pip install --upgrade pip && \
+    pip install -e .
+# Frontend dependencies
+COPY frontend/package.json ./frontend/
+RUN cd frontend && npm install
+# Project sources
+COPY frontend/ ./frontend/
+COPY main.py ask.py start.sh ./
+RUN chmod +x start.sh
+# Build static Next.js export
+RUN cd frontend && npm run build
+# Prepare runtime directories
+RUN mkdir -p /app/lightrag_store
+ENV PYTHONPATH=/app \
+    RAG_WORKING_DIR=/app/lightrag_store \
+    PORT=7860
+EXPOSE 7860
+CMD ["./start.sh"]

README.md CHANGED Viewed

@@ -1,12 +1,23 @@
-# askchomsky
-AskChomsky is a retrieval-augmented chatbot over a Noam Chomsky corpus.
 ## Run Locally
 ### Prerequisites
-- Python 3.14+
 - Node.js 20+
 - npm
@@ -74,8 +85,8 @@ Notes:
 - LightRAG (retrieval-augmented generation)
 - LlamaIndex (RAG orchestration)
-- HuggingFace embeddings: `BAAI/bge-base-en-v1.5`
-- Model for answer generation: `openai/gpt-oss-120b`
 - Langfuse (observability and traces)
 ## Dataset Used
@@ -107,7 +118,41 @@ python ask.py --query "How does Chomsky connect corporate power to public discou
 Notes:
 - LightRAG uses your OpenRouter key from `.env` (`openrouter_key`) for answer generation.
-- This setup uses local embeddings with `BAAI/bge-base-en-v1.5`.
 - Available query modes: `naive`, `local`, `global`, `hybrid`, `mix`.
- - In production you can set `RAG_WORKING_DIR` to control where the LightRAG index is stored
-   (the backend uses `RAG_WORKING_DIR` or defaults to `./lightrag_store`).

+---
+title: AskChomsky
+emoji: 🧠
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+---
+# AskChomsky
+Ask questions about Noam Chomsky's work, grounded in a curated corpus with citations.
+Powered by LightRAG + Next.js.
 ## Run Locally
 ### Prerequisites
+- Python 3.11+
 - Node.js 20+
 - npm
 - LightRAG (retrieval-augmented generation)
 - LlamaIndex (RAG orchestration)
+- OpenAI embeddings: `openai/text-embedding-3-small` (via OpenRouter)
+- Model for answer generation: `openai/gpt-4o-mini`
 - Langfuse (observability and traces)
 ## Dataset Used
 Notes:
 - LightRAG uses your OpenRouter key from `.env` (`openrouter_key`) for answer generation.
 - Available query modes: `naive`, `local`, `global`, `hybrid`, `mix`.
+- In production you can set `RAG_WORKING_DIR` to control where the LightRAG index is stored
+  (the backend uses `RAG_WORKING_DIR` or defaults to `./lightrag_store`).
+- Identical queries are cached by default (24h TTL, configurable via `QUERY_CACHE_TTL`).
+## Deploy to Hugging Face Spaces
+Use the bundled `Dockerfile` when configuring the Space (`sdk: docker` is already declared in this README header).
+- **Repository:** Push this project to the Space or set it as the linked Git repository; the build looks for `Dockerfile` at the root.
+- **Secrets:** In the Space settings add `openrouter_key` (and optional `LANGFUSE_*` keys) under *Variables & secrets*; the container refuses to start without an LLM key.
+- **Resources:** The default `INGEST_DOC_LIMIT` is 200; override it in *Environment variables* if you need a smaller corpus for faster cold starts.
+- **Networking:** The app listens on `$PORT` (default `7860`) and serves both the FastAPI backend and the statically exported Next.js frontend from the same origin.
+- **Persistence:** The LightRAG store lives in `/app/lightrag_store`; Spaces reset storage between restarts, so ingestion runs automatically whenever the cache is empty.
+After each push Hugging Face rebuilds the image, runs `start.sh`, ingests the corpus if needed, and exposes the UI at the Space URL.
+## Secret Scanning (GitGuardian)
+This repository ships with a pre-commit hook configuration that runs GitGuardian's `ggshield` scanner on every commit and push.
+1. Provision the dedicated security tooling venv (one-time):
+   ```bash
+   python3 -m venv .tools/ggshield
+   .tools/ggshield/bin/python -m pip install --upgrade pip
+   .tools/ggshield/bin/python -m pip install pre-commit ggshield
+   .tools/ggshield/bin/ggshield auth login  # already completed
+   ```
+2. Enable the hooks in your local clone:
+   ```bash
+   .tools/ggshield/bin/pre-commit install --install-hooks
+   ```
+3. (Optional) Run a full scan at any time:
+   ```bash
+   .tools/ggshield/bin/ggshield secret scan repo .
+   ```
+Commits that introduce high-risk secrets will be blocked until the secret is removed or revoked.

backend/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """AskChomsky backend package."""
2	+
3	+ __all__ = ["api"]

backend/api.py CHANGED Viewed

@@ -34,9 +34,10 @@ from pydantic import BaseModel
 from main import (
     CITATION_SYSTEM_PROMPT,
     DEFAULT_WORKING_DIR,
     initialize_rag,
     llm_model_func,
-    query_rag,
 )
 # ---------------------------------------------------------------------------
@@ -544,6 +545,14 @@ async def _stream_pipeline(
             detail=f"Original: {question}\n\nRewritten: {rewritten}",
         )
         # ── Stage: RAG Init ──────────────────────────────────────────────
         yield _stage_event("rag_init", "Loading RAG Store", "running")
         # RAG_WORKING_DIR controls where the LightRAG index is stored.
@@ -729,6 +738,9 @@ async def _stream_pipeline(
         yield _sse("done", done_payload)
     except Exception as exc:
         yield _stage_event("answer", "Answer", "error", detail=str(exc))
         yield _sse("error", {"message": str(exc)})
@@ -802,3 +814,27 @@ async def compare(req: CompareRequest) -> dict:
         "mode_b": mode_b,
         "answer_b": answer_b,
     }

 from main import (
     CITATION_SYSTEM_PROMPT,
     DEFAULT_WORKING_DIR,
+    cache_answer,
+    get_cached_answer,
     initialize_rag,
     llm_model_func,
 )
 # ---------------------------------------------------------------------------
             detail=f"Original: {question}\n\nRewritten: {rewritten}",
         )
+        # ── Stage: Cache Check ───────────────────────────────────────────
+        mode = mode_override or os.getenv("CHAINLIT_MODE") or "hybrid"
+        cached = get_cached_answer(question, mode)
+        if cached is not None:
+            yield _stage_event("cache", "Cache", "done", detail="Served from cache")
+            yield _sse("done", {"answer": cached})
+            return
         # ── Stage: RAG Init ──────────────────────────────────────────────
         yield _stage_event("rag_init", "Loading RAG Store", "running")
         # RAG_WORKING_DIR controls where the LightRAG index is stored.
         yield _sse("done", done_payload)
+        # Cache the final answer
+        cache_answer(question, mode, final)
     except Exception as exc:
         yield _stage_event("answer", "Answer", "error", detail=str(exc))
         yield _sse("error", {"message": str(exc)})
         "mode_b": mode_b,
         "answer_b": answer_b,
     }
+# ---------------------------------------------------------------------------
+# Serve Next.js static build (production)
+# ---------------------------------------------------------------------------
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse
+NEXTJS_OUT = os.path.join(PROJECT_ROOT, "frontend", "out")
+if os.path.isdir(NEXTJS_OUT):
+    app.mount(
+        "/_next", StaticFiles(directory=os.path.join(NEXTJS_OUT, "_next")), name="_next"
+    )
+    @app.get("/{full_path:path}")
+    async def serve_frontend(full_path: str):
+        file_path = os.path.join(NEXTJS_OUT, full_path, "index.html")
+        if os.path.isfile(file_path):
+            return FileResponse(file_path)
+        index_path = os.path.join(NEXTJS_OUT, "index.html")
+        if os.path.isfile(index_path):
+            return FileResponse(index_path)
+        return {"error": "Not found"}

frontend/components/FlowCanvas.tsx CHANGED Viewed

@@ -29,6 +29,7 @@ interface PipelineNodeDataShape extends Record<string, unknown> {
 const ICONS: Record<string, string> = {
   intent: "🧭",
   rewrite: "✏️",
   rag_init: "🗄️",
   retrieval_1: "🔍",
   retrieval_2: "🔄",
@@ -43,6 +44,7 @@ const ICONS: Record<string, string> = {
 const POSITIONS: Record<string, { x: number; y: number }> = {
   intent:       { x: 0, y: 0 },
   rewrite:      { x: 0, y: 160 },
   rag_init:     { x: 0, y: 320 },
   retrieval_1:  { x: 0, y: 480 },
   retrieval_2:  { x: 300, y: 480 },
@@ -55,7 +57,8 @@ const POSITIONS: Record<string, { x: number; y: number }> = {
 // ── Edge definitions ────────────────────────────────────────────────────────
 const STATIC_EDGES: Edge[] = [
   { id: "e-intent-rewrite",       source: "intent",      target: "rewrite" },
-  { id: "e-rewrite-rag",          source: "rewrite",     target: "rag_init" },
   { id: "e-rag-r1",               source: "rag_init",    target: "retrieval_1" },
   { id: "e-r1-r2",                source: "retrieval_1", target: "retrieval_2" },
   { id: "e-r2-r3",                source: "retrieval_2", target: "retrieval_3" },

 const ICONS: Record<string, string> = {
   intent: "🧭",
   rewrite: "✏️",
+  cache: "⚡",
   rag_init: "🗄️",
   retrieval_1: "🔍",
   retrieval_2: "🔄",
 const POSITIONS: Record<string, { x: number; y: number }> = {
   intent:       { x: 0, y: 0 },
   rewrite:      { x: 0, y: 160 },
+  cache:        { x: 300, y: 160 },
   rag_init:     { x: 0, y: 320 },
   retrieval_1:  { x: 0, y: 480 },
   retrieval_2:  { x: 300, y: 480 },
 // ── Edge definitions ────────────────────────────────────────────────────────
 const STATIC_EDGES: Edge[] = [
   { id: "e-intent-rewrite",       source: "intent",      target: "rewrite" },
+  { id: "e-rewrite-cache",        source: "rewrite",     target: "cache" },
+  { id: "e-cache-rag",            source: "cache",       target: "rag_init" },
   { id: "e-rag-r1",               source: "rag_init",    target: "retrieval_1" },
   { id: "e-r1-r2",                source: "retrieval_1", target: "retrieval_2" },
   { id: "e-r2-r3",                source: "retrieval_2", target: "retrieval_3" },

frontend/hooks/useQueryStream.ts CHANGED Viewed

@@ -3,12 +3,17 @@
 import { useCallback, useRef, useState } from "react";
 import type { NodeState, StageEvent } from "@/types/pipeline";
-const API_URL = process.env.NEXT_PUBLIC_API_URL ?? "http://localhost:8001";
 // Default idle nodes shown before any query is run
 const DEFAULT_NODES: NodeState[] = [
   { id: "intent", label: "Intent Router", status: "idle", detail: "" },
   { id: "rewrite", label: "Query Rewrite", status: "idle", detail: "" },
   { id: "rag_init", label: "Loading RAG Store", status: "idle", detail: "" },
   { id: "retrieval_1", label: "Retrieval", status: "idle", detail: "" },
   { id: "retrieval_2", label: "Retrieval (retry)", status: "idle", detail: "" },

 import { useCallback, useRef, useState } from "react";
 import type { NodeState, StageEvent } from "@/types/pipeline";
+const API_URL =
+  process.env.NEXT_PUBLIC_API_URL ??
+  (typeof window !== "undefined" && window.location.origin !== "http://localhost:3000"
+    ? window.location.origin
+    : "http://localhost:8001");
 // Default idle nodes shown before any query is run
 const DEFAULT_NODES: NodeState[] = [
   { id: "intent", label: "Intent Router", status: "idle", detail: "" },
   { id: "rewrite", label: "Query Rewrite", status: "idle", detail: "" },
+  { id: "cache", label: "Cache Check", status: "idle", detail: "" },
   { id: "rag_init", label: "Loading RAG Store", status: "idle", detail: "" },
   { id: "retrieval_1", label: "Retrieval", status: "idle", detail: "" },
   { id: "retrieval_2", label: "Retrieval (retry)", status: "idle", detail: "" },

frontend/next.config.ts CHANGED Viewed

@@ -1,7 +1,8 @@
 import type { NextConfig } from "next";
 const nextConfig: NextConfig = {
-  /* config options here */
 };
 export default nextConfig;

 import type { NextConfig } from "next";
 const nextConfig: NextConfig = {
+  output: "export",
+  trailingSlash: true,
 };
 export default nextConfig;

frontend/package.json CHANGED Viewed

@@ -15,11 +15,11 @@
     "react-markdown": "10.1.0"
   },
   "devDependencies": {
-    "@tailwindcss/postcss": "4.0.0",
     "@types/node": "20.0.0",
     "@types/react": "19.0.0",
     "@types/react-dom": "19.0.0",
-    "tailwindcss": "4.0.0",
-    "typescript": "5.0.0"
   }
 }

     "react-markdown": "10.1.0"
   },
   "devDependencies": {
+    "@tailwindcss/postcss": "^4.1.0",
     "@types/node": "20.0.0",
     "@types/react": "19.0.0",
     "@types/react-dom": "19.0.0",
+    "tailwindcss": "^4.1.0",
+    "typescript": "^5.0.0"
   }
 }

main.py CHANGED Viewed

@@ -1,11 +1,12 @@
 import argparse
 import asyncio
 import json
 import logging
 import os
 import re
 import sys
-from functools import lru_cache
 from typing import Any, TYPE_CHECKING
@@ -31,12 +32,6 @@ import numpy as np
 from datasets import load_dataset
 from dotenv import load_dotenv
-if TYPE_CHECKING:
-    # Imported only for type checking; the actual import of
-    # SentenceTransformer happens lazily inside get_embedder to
-    # keep module import (and thus API startup) lightweight.
-    from sentence_transformers import SentenceTransformer
 load_dotenv()
@@ -80,13 +75,16 @@ configure_logging()
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 LLM_MODEL = os.getenv("ASKCHOMSKY_LLM_MODEL", "openai/gpt-4o-mini")
-EMBED_MODEL = "BAAI/bge-base-en-v1.5"
 DEFAULT_WORKING_DIR = "./lightrag_store"
 LLM_TIMEOUT_SECONDS = int(os.getenv("LLM_TIMEOUT", "600"))
 MAX_ASYNC_LLM_CALLS = int(os.getenv("MAX_ASYNC", "2"))
 MAX_PARALLEL_INSERT = int(os.getenv("MAX_PARALLEL_INSERT", "2"))
 REWRITE_QUERY_ENABLED = os.getenv("REWRITE_QUERY", "true").lower() == "true"
 VERIFY_CLAIMS_ENABLED = os.getenv("VERIFY_CLAIMS", "true").lower() == "true"
 CITATION_SYSTEM_PROMPT = """You are a retrieval-grounded assistant.
@@ -150,28 +148,98 @@ def configure_langfuse() -> bool:
     return get_langfuse_client() is not None
-@lru_cache(maxsize=1)
-def get_embedder() -> "SentenceTransformer":
-    # Lazy import avoids loading heavy ML stacks during module import,
-    # which helps services like Render bind the HTTP port quickly.
-    from sentence_transformers import SentenceTransformer
-    return SentenceTransformer(EMBED_MODEL)
 def embed_texts(texts: list[str]) -> np.ndarray:
-    embeddings = get_embedder().encode(
-        texts,
-        normalize_embeddings=True,
-        show_progress_bar=False,
-    )
-    return np.asarray(embeddings, dtype=np.float32)
 async def embedding_func(texts: list[str]) -> np.ndarray:
     return await asyncio.to_thread(embed_texts, texts)
 async def llm_model_func(
     prompt,
     system_prompt=None,
@@ -218,7 +286,7 @@ async def initialize_rag(working_dir: str = DEFAULT_WORKING_DIR) -> "LightRAG":
         llm_model_max_async=MAX_ASYNC_LLM_CALLS,
         max_parallel_insert=MAX_PARALLEL_INSERT,
         embedding_func=EmbeddingFunc(
-            embedding_dim=768,
             max_token_size=8192,
             model_name=EMBED_MODEL,
             func=embedding_func,
@@ -462,6 +530,10 @@ async def query_rag(
         except Exception:
             return ""
     rag = None
     try:
         rag = await initialize_rag(working_dir)
@@ -500,7 +572,9 @@ async def query_rag(
         answer_with_citations = _enforce_citation_answer(answer_text, references)
         verification_summary = await _verify_claims(answer_with_citations, chunks)
-        return f"{answer_with_citations}{verification_summary}".strip()
     finally:
         if rag is not None:
             await rag.finalize_storages()

 import argparse
 import asyncio
+import hashlib
 import json
 import logging
 import os
 import re
 import sys
+import time
 from typing import Any, TYPE_CHECKING
 from datasets import load_dataset
 from dotenv import load_dotenv
 load_dotenv()
 OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
 LLM_MODEL = os.getenv("ASKCHOMSKY_LLM_MODEL", "openai/gpt-4o-mini")
+EMBED_MODEL = os.getenv("ASKCHOMSKY_EMBED_MODEL", "openai/text-embedding-3-small")
+EMBED_DIM = 1536
 DEFAULT_WORKING_DIR = "./lightrag_store"
 LLM_TIMEOUT_SECONDS = int(os.getenv("LLM_TIMEOUT", "600"))
 MAX_ASYNC_LLM_CALLS = int(os.getenv("MAX_ASYNC", "2"))
 MAX_PARALLEL_INSERT = int(os.getenv("MAX_PARALLEL_INSERT", "2"))
 REWRITE_QUERY_ENABLED = os.getenv("REWRITE_QUERY", "true").lower() == "true"
 VERIFY_CLAIMS_ENABLED = os.getenv("VERIFY_CLAIMS", "true").lower() == "true"
+QUERY_CACHE_TTL_SECONDS = int(os.getenv("QUERY_CACHE_TTL", "86400"))
+QUERY_CACHE_PATH = os.path.join(DEFAULT_WORKING_DIR, "query_cache.json")
 CITATION_SYSTEM_PROMPT = """You are a retrieval-grounded assistant.
     return get_langfuse_client() is not None
+# ---------------------------------------------------------------------------
+# API-based embeddings (OpenRouter / OpenAI-compatible)
+# ---------------------------------------------------------------------------
+def _get_api_key() -> str:
+    api_key = os.getenv("openrouter_key") or os.getenv("OPENAI_API_KEY", "")
+    if not api_key:
+        raise ValueError("Missing openrouter_key or OPENAI_API_KEY in .env")
+    return api_key
+def _api_embed_single(text: str) -> list[float]:
+    import httpx
+    api_key = _get_api_key()
+    payload = {"input": text, "model": EMBED_MODEL}
+    headers = {
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json",
+    }
+    with httpx.Client(timeout=30.0) as client:
+        resp = client.post(
+            OPENROUTER_BASE_URL + "/embeddings", json=payload, headers=headers
+        )
+        resp.raise_for_status()
+        data = resp.json()
+    return data["data"][0]["embedding"]
 def embed_texts(texts: list[str]) -> np.ndarray:
+    embeddings = [_api_embed_single(t) for t in texts]
+    arr = np.array(embeddings, dtype=np.float32)
+    norms = np.linalg.norm(arr, axis=1, keepdims=True)
+    norms[norms == 0] = 1.0
+    return arr / norms
 async def embedding_func(texts: list[str]) -> np.ndarray:
     return await asyncio.to_thread(embed_texts, texts)
+# ---------------------------------------------------------------------------
+# Query result cache (disk-based, TTL-evicted)
+# ---------------------------------------------------------------------------
+def _load_query_cache() -> dict[str, dict[str, Any]]:
+    if not os.path.exists(QUERY_CACHE_PATH):
+        return {}
+    try:
+        with open(QUERY_CACHE_PATH, "r") as f:
+            return json.load(f)
+    except (json.JSONDecodeError, OSError):
+        return {}
+def _save_query_cache(cache: dict[str, dict[str, Any]]) -> None:
+    os.makedirs(os.path.dirname(QUERY_CACHE_PATH), exist_ok=True)
+    with open(QUERY_CACHE_PATH, "w") as f:
+        json.dump(cache, f)
+def _cache_key(question: str, mode: str) -> str:
+    raw = f"{question.strip().lower()}|{mode}"
+    return hashlib.sha256(raw.encode()).hexdigest()
+def get_cached_answer(question: str, mode: str) -> str | None:
+    if QUERY_CACHE_TTL_SECONDS <= 0:
+        return None
+    key = _cache_key(question, mode)
+    cache = _load_query_cache()
+    entry = cache.get(key)
+    if not entry:
+        return None
+    if time.time() - entry.get("ts", 0) > QUERY_CACHE_TTL_SECONDS:
+        del cache[key]
+        _save_query_cache(cache)
+        return None
+    return entry.get("answer")
+def cache_answer(question: str, mode: str, answer: str) -> None:
+    if QUERY_CACHE_TTL_SECONDS <= 0:
+        return
+    key = _cache_key(question, mode)
+    cache = _load_query_cache()
+    cache[key] = {"answer": answer, "ts": time.time()}
+    _save_query_cache(cache)
 async def llm_model_func(
     prompt,
     system_prompt=None,
         llm_model_max_async=MAX_ASYNC_LLM_CALLS,
         max_parallel_insert=MAX_PARALLEL_INSERT,
         embedding_func=EmbeddingFunc(
+            embedding_dim=EMBED_DIM,
             max_token_size=8192,
             model_name=EMBED_MODEL,
             func=embedding_func,
         except Exception:
             return ""
+    cached = get_cached_answer(question, mode)
+    if cached is not None:
+        return cached
     rag = None
     try:
         rag = await initialize_rag(working_dir)
         answer_with_citations = _enforce_citation_answer(answer_text, references)
         verification_summary = await _verify_claims(answer_with_citations, chunks)
+        final_answer = f"{answer_with_citations}{verification_summary}".strip()
+        cache_answer(question, mode, final_answer)
+        return final_answer
     finally:
         if rag is not None:
             await rag.finalize_storages()

pyproject.toml CHANGED Viewed

@@ -13,9 +13,9 @@ dependencies = [
     "uvicorn[standard]>=0.30.0",
     "datasets>=4.8.2",
     "langfuse>=4.0.1",
     "lightrag-hku>=1.4.11",
     "llama-index-core>=0.14.18",
-    "llama-index-embeddings-huggingface>=0.7.0",
     "llama-index-llms-openai>=0.7.2",
     "numpy>=2.4.3",
     "poetry>=2.3.2",

     "uvicorn[standard]>=0.30.0",
     "datasets>=4.8.2",
     "langfuse>=4.0.1",
+    "httpx>=0.27.0",
     "lightrag-hku>=1.4.11",
     "llama-index-core>=0.14.18",
     "llama-index-llms-openai>=0.7.2",
     "numpy>=2.4.3",
     "poetry>=2.3.2",

start.sh ADDED Viewed

	@@ -0,0 +1,21 @@

+#!/bin/bash
+set -e
+# Respect Space-provided directory + doc limit overrides
+DOC_LIMIT=${INGEST_DOC_LIMIT:-200}
+PORT=${PORT:-7860}
+RAG_WORKING_DIR=${RAG_WORKING_DIR:-/app/lightrag_store}
+mkdir -p "$RAG_WORKING_DIR"
+# Ingest corpus if RAG store is empty
+if [ ! -d "$RAG_WORKING_DIR/graphml" ]; then
+    echo "Ingesting corpus..."
+    python ask.py --ingest --doc-limit "$DOC_LIMIT" --working-dir "$RAG_WORKING_DIR"
+    echo "Ingestion complete."
+else
+    echo "RAG store found, skipping ingestion."
+fi
+# Start FastAPI on configured port (serves both API + static frontend)
+exec uvicorn backend.api:app --host 0.0.0.0 --port "$PORT"