Spaces:

galileo-ai
/

Demos

Runtime error

App Files Files Community

nikhile-galileo commited on Jul 15, 2025

Commit

e68d535

1 Parent(s): dfb540e

Adding finance protect demo

Browse files

Files changed (40) hide show

.gitignore +30 -0
Dockerfile +34 -0
README.md +95 -10
backend/.env.sample +5 -0
backend/Dockerfile +19 -0
backend/api/__init__.py +1 -0
backend/api/main-test.py +36 -0
backend/api/main.py +139 -0
backend/api/routers/__init__.py +0 -0
backend/api/routers/home_router.py +29 -0
backend/api/templates/index.html +369 -0
backend/classes/chunker/__init__.py +0 -0
backend/classes/chunker/text_chunker.py +49 -0
backend/classes/company_pii_scorer.py +23 -0
backend/classes/data_preparer.py +62 -0
backend/classes/embedding_model.py +17 -0
backend/classes/galileo_platform.py +117 -0
backend/classes/generative_model.py +66 -0
backend/classes/pdf_extractor.py +89 -0
backend/classes/rag_application.py +216 -0
backend/classes/vector_database/__init__.py +0 -0
backend/classes/vector_database/base_vector_database.py +31 -0
backend/classes/vector_database/milvus_vector_database.py +210 -0
backend/conf/config.yaml +122 -0
backend/main/chunk_and_save_to_vector_db.py +107 -0
backend/main/prepare_data.py +41 -0
backend/main/run_rag.py +36 -0
backend/models/__init__.py +1 -0
backend/models/rag_models.py +10 -0
backend/requirements.txt +15 -0
backend/utils/utils.py +49 -0
documentation/backend.puml +43 -0
documentation/workflow.excalidraw +1563 -0
fin-data/processed/test_file +0 -0
fin-data/processed/vector_db/test_file +0 -0
main.py +6 -0
pyproject.toml +27 -0
requirements.txt +20 -0
ui/app.py +52 -0
uv.lock +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,30 @@

+# Mac stuff
+.DS_Store
+# Byte-compiled / optimized / DLL files
+__pycache__/
+# Jupyter Notebook
+.ipynb_checkpoints
+# Environments
+.env
+.venv
+venv/
+# PyCharm
+.idea/
+backend/backend-venv
+# Database files
+*.db
+*.db.lock
+*.pdf
+# Processed data files
+rfp-data/processed/*.jsonl
+fin-data/processed/*.jsonl
+# Notes
+notes

Dockerfile ADDED Viewed

	@@ -0,0 +1,34 @@

+# Use Python 3.12 base image and install uv
+FROM python:3.12-slim
+# Install uv
+RUN pip install uv
+# Set environment variables
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    UV_COMPILE_BYTECODE=1 \
+    UV_LINK_MODE=copy \
+    PYTHONPATH=/app
+# Set working directory
+WORKDIR /app
+# Copy pyproject.toml and uv.lock first for better layer caching
+COPY pyproject.toml uv.lock ./
+# Install dependencies using uv
+RUN uv sync --frozen --no-cache
+# Copy the application code
+COPY backend/ ./backend/
+COPY fin-data/ ./fin-data/
+# Create static directory if it doesn't exist
+RUN mkdir -p backend/api/static
+# Expose the port
+EXPOSE 8000
+# Command to run the application
+CMD ["uv", "run", "uvicorn", "backend.api.main:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,95 @@
----
-title: Demos
-emoji: 🌖
-colorFrom: pink
-colorTo: yellow
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Galileo POC
+A Python-based RAG (Retrieval-Augmented Generation) application for processing and analyzing RFP PDF documents.
+## Project Structure
+```
+api/              # FastAPI endpoints
+backend/          # Core application logic
+├── classes/      # Core application classes
+├── conf/         # Configuration files
+├── main/         # Main application modules
+├── models/       # Data models
+└── backend-venv/ # Python virtual environment
+ui/              # Streamlit UI application
+```
+## Setup Instructions
+1. Create and activate a Python virtual environment:
+```bash
+python3.12 -m venv backend-venv
+source backend-venv/bin/activate
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Set up environment variables:
+- Create a `.env` file in the root directory
+- Configure environment-specific settings
+- Use python-dotenv for loading environment variables
+## API
+### Running the FastAPI Server
+```bash
+uvicorn api.api:app --reload
+```
+Access documentation:
+- Swagger UI: http://localhost:8000/docs
+- ReDoc: http://localhost:8000/redoc
+## UI
+### Running the Streamlit App
+```bash
+cd ui
+streamlit run app.py
+```
+Access the UI at: http://localhost:8501
+## Configuration
+- Environment variables via `.env` file
+- YAML configuration in `conf/config.yaml`
+- Environment-specific settings through `APP_ENV`
+## Development
+The project follows a modular structure:
+- Backend: Core RAG functionality
+- API: REST endpoints for RAG operations
+- UI: Streamlit-based interface for user interaction
+## License
+MIT License
+## Contributing
+Contributions are welcome! Please follow standard GitHub workflow.
+# Questions
+- why did fairfield cdc issue the rfp
+Reliable
+The Fairfield CDC issued the RFP to find a banking institution that shares their commitment to community development, redevelopment, and economic development activities. They aim to enhance the physical, economic, health, safety, welfare, and social aspects of life for all residents. The RFP is for banking services.
+?
+Fairfield CDC issued the RFP to solicit proposals from banking institutions to provide financial services. The RFP aims to find a bank that offers competitive rates and fees while ensuring deposit collateral. The selected bank will partner with Fairfield CDC to advance community development initiatives.
+Hallucination
+- Fairfield CDC issued the RFP to solicit proposals for a new banking partner. The RFP outlines specific requirements, including competitive rates, FDIC coverage, and customer support. The goal is to find a bank that can provide comprehensive services and support the CDC's community development initiatives.

backend/.env.sample ADDED Viewed

	@@ -0,0 +1,5 @@

+APP_ENV= #local_ks or local_ne
+GOOGLE_GEMINI_API_KEY= # Replace with your own key
+GALILEO_API_KEY=
+GALILEO_CONSOLE_URL=
+GALILEO_API_ACCESS_TOKEN=

backend/Dockerfile ADDED Viewed

	@@ -0,0 +1,19 @@

+# Stage 1: Build stage
+FROM python:3.13-slim as builder
+# Set working directory
+WORKDIR /app
+# Copy requirements
+COPY requirements.txt .
+# Install dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY backend .
+# Command to run the application
+CMD ["echo", "this is backend code"]

backend/api/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .main import app

backend/api/main-test.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import uvicorn
+from fastapi import FastAPI, Form
+from fastapi.responses import JSONResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi.templating import Jinja2Templates
+from fastapi.requests import Request
+from fastapi.responses import HTMLResponse
+from dotenv import load_dotenv
+app = FastAPI()
+app.mount("/static", StaticFiles(directory="static"), name="static")
+templates = Jinja2Templates(directory="templates")
+@app.get("/", response_class=HTMLResponse)
+async def read_root(request: Request):
+    return templates.TemplateResponse("index.html", {"request": request})
+@app.post("/search")
+async def search(
+    query: str = Form(...),
+    top_k: int = Form(5),
+    protection: bool = Form(False)
+):
+    # Simulate processing
+    return JSONResponse({
+        "message": "Search response here!",
+        "query": query,
+        "top_k": top_k,
+        "protection": protection
+    })
+if __name__ == "__main__":
+    load_dotenv()
+    uvicorn.run(app, host="0.0.0.0", port=8000)

backend/api/main.py ADDED Viewed

	@@ -0,0 +1,139 @@

+from pathlib import Path
+import uvicorn
+from dotenv import load_dotenv
+from fastapi import FastAPI, Form
+from fastapi.requests import Request
+from fastapi.responses import HTMLResponse
+from fastapi.responses import JSONResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi.templating import Jinja2Templates
+from backend.classes.embedding_model import EmbeddingModelConfig, EmbeddingModel
+from backend.classes.galileo_platform import GalileoPlatformConfig, GalileoPlatform
+from backend.classes.generative_model import GeminiModelConfig, GeminiModel, OpenAIModelConfig, OpenAIModel
+from backend.classes.rag_application import RAGApplicationConfig, RAGApplication
+from backend.classes.vector_database.milvus_vector_database import (
+    MilvusVectorDatabaseConfig,
+    MilvusVectorDatabase,
+)
+from backend.utils.utils import get_embedding_model
+from backend.utils.utils import (
+    initialize_logger,
+    read_config,
+    set_env_variables,
+    create_vector_database,
+    get_generative_model,
+)
+app = FastAPI()
+app.mount("/static", StaticFiles(directory="backend/api/static"), name="static")
+templates = Jinja2Templates(directory="backend/api/templates")
+load_dotenv()
+logger = initialize_logger()
+# get current file path using Path
+config = read_config(str(Path(Path(__file__).parent.parent, "conf/config.yaml")))
+# check if environment variables are set
+env_variables = set_env_variables(config["env_variables"])
+app_config = config[env_variables["APP_ENV"]]
+app_config["env_vars"] = env_variables
+# Create embedding model object
+embedding_model_config = EmbeddingModelConfig(
+    model_name=app_config["embedding_model"]["model_name"],
+    batch_size=app_config["embedding_model"]["batch_size"],
+)
+embedding_model = get_embedding_model(EmbeddingModel, embedding_model_config)
+# Create vector db model object
+vector_db_config = MilvusVectorDatabaseConfig(
+    db_path=app_config["vector_database"]["db_path"],
+    collection_name=app_config["vector_database"]["collection_name"],
+    vector_dimensions=app_config["vector_database"]["dimensions"],
+    drop_if_exists=False,
+)
+vector_db = create_vector_database(MilvusVectorDatabase, vector_db_config)
+# Create generative model object
+gemini_generative_model_config = GeminiModelConfig(
+    model_name=app_config["gemini_generative_model"]["model_name"],
+    api_keys=[env_variables["GOOGLE_GEMINI_API_KEY"], env_variables["GOOGLE_GEMINI_BACKUP_API_KEY"]],
+    temperature=app_config["gemini_generative_model"]["temperature"],
+)
+gemini_generative_model = get_generative_model(GeminiModel, gemini_generative_model_config)
+openai_generative_model_config = OpenAIModelConfig(
+    model_name=app_config["openai_generative_model"]["model_name"],
+    api_key=env_variables["OPENAI_API_KEY"],
+    temperature=app_config["openai_generative_model"]["temperature"],
+)
+openai_generative_model = get_generative_model(OpenAIModel, openai_generative_model_config)
+# Create Galileo platform object
+galileo_platform_config = GalileoPlatformConfig(
+    evaluate_project_name=app_config["galileo_platform"]["evaluate_project_name"],
+    observe_project_name=app_config["galileo_platform"]["observe_project_name"],
+    protect_project_name=app_config["galileo_platform"]["protect_project_name"],
+    protect_stage_name=app_config["galileo_platform"]["protect_stage_name"],
+)
+galileo_platform = GalileoPlatform(galileo_platform_config)
+# Initialize RAG application
+rag_application_config = RAGApplicationConfig(
+    embedding_model=embedding_model,
+    vector_db=vector_db,
+    # gemini_generative_model=gemini_generative_model,
+    generative_model=openai_generative_model,
+    galileo_platform=galileo_platform,
+)
+rag_app = RAGApplication(rag_application_config)
+@app.get("/", response_class=HTMLResponse)
+async def read_root(request: Request):
+    return templates.TemplateResponse("index.html", {"request": request})
+# TODO: Nikhil
+# @app.post("/other-metrics")
+# async def search(
+@app.post("/search")
+async def search(
+    query: str = Form(...),
+    top_k: int = Form(5),
+    protection: bool = Form(False),
+    hallucination_detection: bool = Form(False),
+    induce_hallucination: bool = Form(False),
+):
+    response, redacted_response, original_response, context_adherence_score, pii_flag = rag_app.run(
+        query,
+        protect_enabled=protection,
+        top_k=top_k,
+        hallucination_detection=hallucination_detection,
+        induce_hallucination=induce_hallucination,
+    )
+    # Simulate processing
+    return JSONResponse(
+        {
+            "message": response,
+            "redacted_message": redacted_response,
+            "original_message": original_response,
+            "metrics": {
+                "context_adherence": context_adherence_score,
+                "pii_flag": pii_flag,
+            },
+        }
+    )
+if __name__ == "__main__":
+    uvicorn.run(app, host="0.0.0.0", port=8000)

backend/api/routers/__init__.py ADDED Viewed

File without changes

backend/api/routers/home_router.py ADDED Viewed

	@@ -0,0 +1,29 @@

+# main.py
+from fastapi import FastAPI, Request, Form
+from fastapi.responses import HTMLResponse
+from fastapi.staticfiles import StaticFiles
+from fastapi.templating import Jinja2Templates
+app = FastAPI()
+# Serve static files and templates
+app.mount("/static", StaticFiles(directory="static"), name="static")
+templates = Jinja2Templates(directory="templates")
+@app.get("/", response_class=HTMLResponse)
+async def read_form(request: Request):
+    return templates.TemplateResponse("index.html", {"request": request})
+@app.post("/search")
+async def handle_search(
+    query: str = Form(...),
+    top_k: int = Form(5),
+    protection: bool = Form(False),
+):
+    # Handle search logic here
+    result = {
+        "query": query,
+        "top_k": top_k,
+        "protection": protection,
+    }
+    return result

backend/api/templates/index.html ADDED Viewed

	@@ -0,0 +1,369 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title>Finance Q/A Bot</title>
+    <script src="https://cdn.tailwindcss.com"></script>
+    <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
+</head>
+<body class="bg-gray-100 min-h-screen flex">
+    <!-- Main Content -->
+    <div class="w-3/4 p-10">
+        <h1 class="text-3xl font-bold mb-6">Finance Q/A Bot</h1>
+        <form id="searchForm" class="space-y-4">
+            <input
+                type="text"
+                id="query"
+                name="query"
+                placeholder="Ask a question"
+                class="p-2 border w-3/4 rounded"
+                required
+            />
+            <br />
+            <div class="flex items-center space-x-2">
+                <button
+                    type="submit"
+                    class="bg-blue-500 text-white px-4 py-2 rounded hover:bg-blue-600"
+                >
+                    Submit
+                </button>
+                <!-- Loading spinner -->
+                <div id="loadingSpinner" class="hidden">
+                    <svg class="animate-spin h-5 w-5 text-blue-500" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
+                        <circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
+                        <path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
+                    </svg>
+                </div>
+            </div>
+        </form>
+        <!-- Context Adherence Message (above the result box) -->
+        <div id="adherenceMessage" class="mt-6 p-3 rounded hidden"></div>
+        <!-- Result area -->
+        <div id="result" class="mt-4 p-4 bg-white shadow rounded hidden"></div>
+    </div>
+    <!-- Sidebar on the right -->
+    <div class="w-1/4 bg-white shadow p-6">
+        <h2 class="text-xl font-bold mb-4">Options</h2>
+        <div class="flex flex-col space-y-4">
+            <label class="block">
+                <span class="text-gray-700">Top K:</span>
+                <input
+                    type="number"
+                    id="top_k"
+                    name="top_k"
+                    value="5"
+                    class="mt-1 p-2 w-full border rounded"
+                />
+            </label>
+            <label class="flex items-center space-x-2">
+                <input
+                    type="checkbox"
+                    id="protection"
+                    name="protection"
+                    class="form-checkbox text-green-600 focus:ring-green-500"
+                />
+                <span>Enable Galileo Protection</span>
+            </label>
+            <label class="flex items-center space-x-2">
+                <input
+                    type="checkbox"
+                    id="hallucination_detection"
+                    name="hallucination_detection"
+                    class="form-checkbox text-blue-600 focus:ring-blue-500"
+                />
+                <span>Enable Hallucination Detection</span>
+            </label>
+            <label class="flex items-center space-x-2">
+                <input
+                    type="checkbox"
+                    id="induce_hallucination"
+                    name="induce_hallucination"
+                    class="form-checkbox text-red-600 focus:ring-red-500"
+                />
+                <span>Induce Hallucination</span>
+            </label>
+        </div>
+    </div>
+    <script>
+        $(document).ready(function () {
+            // Check for URL parameters and pre-fill form
+            const urlParams = new URLSearchParams(window.location.search);
+            if (urlParams.has('query')) {
+                $('#query').val(urlParams.get('query'));
+            }
+            if (urlParams.has('top_k')) {
+                $('#top_k').val(urlParams.get('top_k'));
+            }
+            if (urlParams.has('protection')) {
+                $('#protection').prop('checked', urlParams.get('protection') === 'true');
+            }
+            if (urlParams.has('hallucination_detection')) {
+                $('#hallucination_detection').prop('checked', urlParams.get('hallucination_detection') === 'true');
+            }
+            if (urlParams.has('induce_hallucination')) {
+                $('#induce_hallucination').prop('checked', urlParams.get('induce_hallucination') === 'true');
+            }
+            $('#searchForm').on('submit', function (e) {
+                e.preventDefault();
+                const query = $('#query').val();
+                const top_k = $('#top_k').val();
+                const protection = $('#protection').is(':checked');
+                const hallucination_detection = $('#hallucination_detection').is(':checked');
+                const induce_hallucination = $('#induce_hallucination').is(':checked');
+                // Show loading spinner
+                $('#loadingSpinner').removeClass('hidden');
+                // Hide previous results
+                $('#adherenceMessage').addClass('hidden');
+                $('#result').addClass('hidden');
+                $.ajax({
+                    type: 'POST',
+                    url: '/search',
+                    data: {
+                        query: query,
+                        top_k: top_k,
+                        protection: protection,
+                        hallucination_detection: hallucination_detection,
+                        induce_hallucination: induce_hallucination
+                    },
+                    success: function (response) {
+                        // Hide loading spinner
+                        $('#loadingSpinner').addClass('hidden');
+                        // Check for PII flag first
+                        const piiFlag = response.metrics && response.metrics.pii_flag;
+                        // Check if any PII types are detected
+                        const piiDetected = piiFlag && Object.values(piiFlag).some(value => value === true);
+                        // If PII is detected, display specific PII warning
+                        if (piiDetected) {
+                            // Build specific PII warning message
+                            const detectedTypes = [];
+                            if (piiFlag.phone_number) detectedTypes.push('phone number');
+                            if (piiFlag.email) detectedTypes.push('<span style="color:yellow; font-weight: bold">email address</span>');
+                            if (piiFlag.name) detectedTypes.push('<span style="color:yellow; font-weight: bold">personal name</span>');
+                            if (piiFlag.company) detectedTypes.push('<span style="color:yellow; font-weight: bold">company name</span>');
+                            let piiMessage = 'Sensitive personally identifiable information detected! The following types of PII were found: ';
+                            if (detectedTypes.length === 1) {
+                                piiMessage += detectedTypes[0];
+                            } else if (detectedTypes.length === 2) {
+                                piiMessage += detectedTypes.join(' and ');
+                            } else {
+                                piiMessage += detectedTypes.slice(0, -1).join(', ') + ', and ' + detectedTypes.slice(-1);
+                            }
+                            // Display the PII warning and response
+                            $('#result')
+                                .removeClass('hidden')
+                                .html(`
+                                    <div class="space-y-4">
+                                        <!-- PII Warning Message with red background -->
+                                        <div class="bg-red-500 text-white p-3 rounded-lg">
+                                            <div class="flex items-start">
+                                                <div class="flex-shrink-0">
+                                                    <svg class="h-5 w-5 text-white" viewBox="0 0 20 20" fill="currentColor">
+                                                        <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.707 7.293a1 1 0 00-1.414 1.414L8.586 10l-1.293 1.293a1 1 0 101.414 1.414L10 11.414l1.293 1.293a1 1 0 001.414-1.414L11.414 10l1.293-1.293a1 1 0 00-1.414-1.414L10 8.586 8.707 7.293z" clip-rule="evenodd" />
+                                                    </svg>
+                                                </div>
+                                                <div class="ml-3">
+                                                    <p class="font-medium">${piiMessage}</p>
+                                                </div>
+                                            </div>
+                                        </div>
+                                        <!-- Original Message in red text -->
+                                        <div class="bg-white border border-gray-200 rounded-lg p-4">
+                                            <p class="font-medium text-gray-700 mb-2">Original Message:</p>
+                                            <p class="text-black-600">
+                                                <style>
+                                                    tag {
+                                                        font-weight: bold;
+                                                        background-color: yellow;
+                                                    }
+                                                </style>
+                                                ${response.message}</p>
+                                        </div>
+                                        <!-- Redacted Message in green text -->
+                                        <div class="bg-white border border-gray-200 rounded-lg p-4">
+                                            <p class="font-medium text-gray-700 mb-2">Redacted Message:</p>
+                                            <p class="text-black-600">
+                                                <style>
+                                                    pii {
+                                                        font-weight: bold;
+                                                        text-decoration: line-through;
+                                                        background-color: yellow;
+                                                    }
+                                                </style>
+                                                ${response.redacted_message || 'No redacted version available'}
+                                            </p>
+                                        </div>
+                                    </div>
+                                `);
+                        } else if ((induce_hallucination && response.original_message && response.message) ||
+                                   (response.metrics && response.metrics.context_adherence < 0.8)) {
+                            // Display hallucination warning and both responses
+                            const adherenceScore = response.metrics ? response.metrics.context_adherence : 1;
+                            const isInducedHallucination = induce_hallucination && response.original_message && response.message;
+                            let warningMessage = '';
+                            if (isInducedHallucination) {
+                                warningMessage = 'Hallucination induced for demonstration purposes! Comparing original vs safe response.';
+                            } else {
+                                warningMessage = 'Potential hallucination detected! Comparing original vs safe response.';
+                            }
+                            $('#result')
+                                .removeClass('hidden')
+                                .html(`
+                                    <div class="space-y-4">
+                                        <!-- Hallucination Warning Message with orange background -->
+                                        <div class="bg-orange-500 text-white p-3 rounded-lg">
+                                            <div class="flex items-start">
+                                                <div class="flex-shrink-0">
+                                                    <svg class="h-5 w-5 text-white" viewBox="0 0 20 20" fill="currentColor">
+                                                        <path fill-rule="evenodd" d="M8.257 3.099c.765-1.36 2.722-1.36 3.486 0l5.58 9.92c.75 1.334-.213 2.98-1.742 2.98H4.42c-1.53 0-2.493-1.646-1.743-2.98l5.58-9.92zM11 13a1 1 0 11-2 0 1 1 0 012 0zm-1-8a1 1 0 00-1 1v3a1 1 0 002 0V6a1 1 0 00-1-1z" clip-rule="evenodd" />
+                                                    </svg>
+                                                </div>
+                                                <div class="ml-3">
+                                                    <p class="font-medium">${warningMessage}</p>
+                                                </div>
+                                            </div>
+                                        </div>
+                                        <!-- Potentially Hallucinatory Response -->
+                                        <div class="bg-white border border-red-200 rounded-lg p-4">
+                                            <p class="font-medium text-red-700 mb-2">Original Hallucinatory Response:</p>
+                                            <p class="text-black-600">${response.message}</p>
+                                        </div>
+                                        <!-- Fallback Response -->
+                                        <div class="bg-white border border-green-200 rounded-lg p-4">
+                                            <p class="font-medium text-green-700 mb-2">Safe Response:</p>
+                                            <p class="text-black-600">I cannot provide a reliable answer to this question based on the available information! Please try again.</p>
+                                        </div>
+                                        <!-- Try Again Option -->
+                                        <div class="bg-blue-50 border border-blue-200 rounded-lg p-4">
+                                            <p class="font-medium text-blue-700 mb-3">Retry with different search parameters:</p>
+                                            <div class="flex items-center space-x-3">
+                                                <label class="text-sm text-blue-600">
+                                                    Top K:
+                                                    <input
+                                                        type="number"
+                                                        id="retry_top_k"
+                                                        value="5"
+                                                        min="1"
+                                                        max="100"
+                                                        class="ml-2 p-1 w-16 border border-blue-300 rounded text-sm"
+                                                    />
+                                                </label>
+                                                <button id="retry_button" class="bg-blue-500 text-white px-3 py-1 rounded text-sm hover:bg-blue-600">
+                                                    Try Again
+                                                </button>
+                                            </div>
+                                        </div>
+                                    </div>
+                                `);
+                        } else {
+                            // Display the main result in normal black color
+                            $('#result')
+                                .removeClass('hidden')
+                                .html(`
+                                    <p class="text-black font-bold">${response.message}</p>
+                                `);
+                        }
+                        // Always display context adherence message below the response (regardless of PII detection)
+                        if (response.metrics && response.metrics.context_adherence !== undefined) {
+                            const adherenceScore = response.metrics.context_adherence;
+                            // Only show adherence message if score is not exactly 1 (default value)
+                            if (adherenceScore !== 1.0 || hallucination_detection) {
+                                let adherenceMessage = '';
+                                let adherenceClass = '';
+                                if (adherenceScore >= 0.8) {
+                                    adherenceMessage = 'No hallucination detected - The answer is reliable';
+                                    adherenceClass = 'bg-green-100 border border-green-300 text-green-800';
+                                } else if (adherenceScore >= 0.3) {
+                                    adherenceMessage = 'Potential hallucination detected- The answer is unreliable';
+                                    adherenceClass = 'bg-orange-100 border border-orange-300 text-orange-800';
+                                } else {
+                                    adherenceMessage = 'High hallucination detected - The answer is unusable';
+                                    adherenceClass = 'bg-red-100 border border-red-300 text-red-800';
+                                }
+                                $('#adherenceMessage')
+                                    .removeClass('hidden bg-green-100 bg-yellow-100 bg-orange-100 bg-red-100 border-green-300 border-yellow-300 border-orange-300 border-red-300 text-green-800 text-yellow-800 text-orange-800 text-red-800')
+                                    .addClass(adherenceClass)
+                                    .html(`
+                                        <div class="flex items-center justify-between">
+                                            <span class="font-medium">${adherenceMessage}</span>
+                                            ${adherenceScore <= 0.8 ? `<span class="text-sm opacity-75">${((1-adherenceScore) * 100).toFixed(1)}% Hallucination detected</span>` : ''}
+                                        </div>
+                                    `);
+                            } else {
+                                $('#adherenceMessage').addClass('hidden');
+                            }
+                        } else {
+                            $('#adherenceMessage').addClass('hidden');
+                        }
+                    },
+                    error: function () {
+                        // Hide loading spinner
+                        $('#loadingSpinner').addClass('hidden');
+                        $('#adherenceMessage').addClass('hidden');
+                        $('#result')
+                            .removeClass('hidden')
+                            .html('<p class="text-red-600 font-bold">An error occurred while searching.</p>');
+                    }
+                });
+            });
+            // Handle retry button click
+            $(document).on('click', '#retry_button', function() {
+                const query = $('#query').val();
+                const retry_top_k = $('#retry_top_k').val();
+                const protection = $('#protection').is(':checked');
+                const hallucination_detection = $('#hallucination_detection').is(':checked');
+                const induce_hallucination = $('#induce_hallucination').is(':checked');
+                // Create URL parameters to reload with form pre-filled
+                const params = new URLSearchParams();
+                params.set('query', query);
+                params.set('top_k', retry_top_k);
+                params.set('retry', 'true'); // Flag to indicate this is a retry attempt
+                if (protection) params.set('protection', 'true');
+                if (hallucination_detection) params.set('hallucination_detection', 'true');
+                if (induce_hallucination) params.set('induce_hallucination', 'true');
+                // Reload the page with parameters
+                window.location.href = window.location.pathname + '?' + params.toString();
+            });
+            // Auto-submit if this is a retry attempt (only once) - do this after all handlers are attached
+            if (urlParams.has('retry') && urlParams.get('retry') === 'true') {
+                $('#searchForm').submit();
+            }
+        });
+    </script>
+</body>
+</html>

backend/classes/chunker/__init__.py ADDED Viewed

File without changes

backend/classes/chunker/text_chunker.py ADDED Viewed

	@@ -0,0 +1,49 @@

+from typing import Optional, List
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from pydantic import BaseModel
+class RecursiveCharacterTextChunkerConfig(BaseModel):
+    chunk_size: int = 500
+    chunk_overlap: int = 100
+class RecursiveCharacterTextChunker:
+    def __init__(self, config: RecursiveCharacterTextChunkerConfig):
+        self.config = config
+    def chunk_text(self, text: str, separators: Optional[List[str]] = None) -> List[str]:
+        """
+        Chunks a single text string using Langchain's RecursiveCharacterTextSplitter.
+        This function is designed to be easily used with pandas DataFrame.apply().
+        Args:
+            text (str): The input text string to be chunked.
+            chunk_size (int): The maximum number of characters per chunk.
+            chunk_overlap (int): The number of characters to overlap between chunks.
+            separators (Optional[List[str]]): A list of characters/strings to use as split points.
+                                               Defaults to common markdown-friendly separators.
+        Returns:
+            List[str]: A list of chunked text strings.
+                       If the input text is empty or None, returns an empty list.
+        """
+        if not text:
+            return []
+        # Initialize the splitter inside the function.
+        # This ensures each text receives a fresh splitter instance if needed,
+        # though it's more efficient to initialize it once outside if possible
+        # and pass it, but for df.apply() direct column operation, this is common.
+        text_splitter = RecursiveCharacterTextSplitter(
+            chunk_size=self.config.chunk_size,
+            chunk_overlap=self.config.chunk_overlap,
+            separators=separators or ["\n\n", "\n", " ", ""], # Default separators
+            length_function=len, # Use character length
+            is_separator_regex=False
+        )
+        # Use split_text which returns a list of strings
+        chunked_texts = text_splitter.split_text(text)
+        return chunked_texts

backend/classes/company_pii_scorer.py ADDED Viewed

	@@ -0,0 +1,23 @@

+from typing import Any, Union, Type, List
+import re
+# Define regex patterns for each keyword, allowing for flexible spacing and separators
+keyword_patterns = [
+    r"Fairfield\sCDC",
+    r"Fairfield",
+]
+def scorer_fn(*, index: Union[int, str], node_input: str, node_output: str, **kwargs: Any) -> Union[float, int, bool, str, None]:
+    # Check if any of the regex patterns match in the input
+    for pattern in keyword_patterns:
+        if re.search(pattern, node_output, re.IGNORECASE):
+            # If pattern found, check if output is valid JSON
+            return 1
+    # No patterns found
+    return 0
+def score_type() -> Type:
+    return int
+def scoreable_node_types_fn() -> List[str]:
+    return ["llm", "chat"]

backend/classes/data_preparer.py ADDED Viewed

	@@ -0,0 +1,62 @@

+from typing import List, Any
+from pydantic import BaseModel
+from pathlib import Path
+from backend.classes.pdf_extractor import PyMuPDFExtractor, PyMuPDFExtractorConfig
+from backend.utils.utils import create_pdf_extractor
+from backend.classes.pdf_extractor import BasePDFExtractorConfig
+class DataPreparerConfig(BaseModel):
+    input_data_path: str
+    output_data_path: str
+    output_file: str
+    pdf_extractor: BasePDFExtractorConfig
+class DataPreparer:
+    def __init__(self, config: DataPreparerConfig):
+        self.config = config
+        self.input_data_path = self.config.input_data_path
+        self.output_data_path = self.config.output_data_path
+        self.output_file = self.config.output_file
+        self.pdf_extractor_config = PyMuPDFExtractorConfig()
+        self.pdf_extractor = create_pdf_extractor(PyMuPDFExtractor, self.pdf_extractor_config)
+    def get_pdf_files(self) -> list:
+        # Get all pdf files from folder in a recursive manner using pathlib.Path
+        pdf_files = []
+        for path in Path(self.input_data_path).rglob("*.pdf"):
+            pdf_files.append(path)
+        return pdf_files
+    def save_data_to_jsonl(self, data: List[Any], file_path: str):
+        try:
+            # Save text to a file
+            with open(file_path, "w", encoding="utf-8") as f:
+                for entry in data:
+                    f.write(entry.model_dump_json() + "\n")
+        except Exception as e:
+            print(f"Error saving data to file: {e}")
+    def prepare_data(self):
+        # Read pdf files from folder
+        pdf_files = self.get_pdf_files()
+        # Extract text from pdf files
+        for pdf_file in pdf_files:
+            # Extract pdf data in markdown
+            pdf_data = self.pdf_extractor.extract(pdf_file)
+            # Get file name and construct output file name
+            file_name = pdf_file.stem.replace(" ", "_")
+            output_file = self.output_file.format(file_name=file_name)
+            # Save pdf data to json
+            self.save_data_to_jsonl(pdf_data, str(Path(self.output_data_path) / output_file))

backend/classes/embedding_model.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from typing import List
+from pydantic import BaseModel
+from sentence_transformers import SentenceTransformer
+class EmbeddingModelConfig(BaseModel):
+    model_name: str
+    batch_size: int
+class EmbeddingModel:
+    def __init__(self, config: EmbeddingModelConfig):
+        self.config = config
+        self._model = SentenceTransformer(self.config.model_name)
+    def encode(self, texts: List[str], convert_to_tensor: bool = False):
+        return self._model.encode(texts, convert_to_tensor=convert_to_tensor, batch_size=self.config.batch_size)

backend/classes/galileo_platform.py ADDED Viewed

	@@ -0,0 +1,117 @@

+from galileo_observe import ObserveWorkflows
+import galileo_protect as gp
+from pydantic import BaseModel
+import promptquality as pq
+from promptquality import CustomizedScorerName, Models
+from dotenv import load_dotenv
+import os
+from datetime import datetime
+from typing import Optional
+load_dotenv()
+class GalileoPlatformConfig(BaseModel):
+    """Base configuration for Galileo platform."""
+    evaluate_project_name: str
+    observe_project_name: str
+    protect_project_name: str
+    protect_stage_name: str
+class GalileoPlatform:
+    """Implementation of Galileo Features"""
+    def __init__(self, config: GalileoPlatformConfig):
+        self.config = config
+        pq.login(api_key=os.getenv("GALILEO_API_KEY"))
+        self.evaluate_run = self.create_evaluate_run()
+        self.observe_logger = ObserveWorkflows(project_name=config.observe_project_name)
+        self.protect_stage_id = self.get_protect_stage()
+    def create_evaluate_run(self):
+        """Create a Galileo Evaluate run."""
+        scorers = [
+            pq.Scorers.context_adherence_luna,
+            pq.Scorers.chunk_attribution_utilization_luna,
+            pq.Scorers.completeness_luna
+        ]
+        evaluate_run = pq.EvaluateRun(
+            project_name=self.config.evaluate_project_name,
+            scorers=scorers,
+        )
+        return evaluate_run
+    def get_protect_stage(self):
+        """Get or create a Galileo Protect stage."""
+        try:
+            protect_project = gp.get_project(
+                project_name=self.config.protect_project_name
+            )
+        except Exception as _:
+            protect_project = gp.create_project(name=self.config.protect_project_name)
+        protect_project_id = protect_project.id
+        try:
+            protect_stage = gp.get_stage(
+                project_id=protect_project_id, stage_name=self.config.protect_stage_name
+            )
+        except Exception as _:
+            protect_stage = gp.create_stage(
+                project_id=protect_project_id,
+                name=self.config.protect_stage_name,
+            )
+        return protect_stage.id
+    def run_protect(self, prompt: str, output: str, workflow: Optional[ObserveWorkflows] = None) -> dict:
+        """Run Galileo Protect on input and output."""
+        response = gp.invoke(
+            payload=gp.Payload(input=prompt, output=output),
+            prioritized_rulesets=[
+                gp.Ruleset(
+                    rules=[
+                        gp.Rule(
+                            metric=gp.RuleMetrics.context_adherence_luna,
+                            operator=gp.RuleOperator.lte,
+                            target_value=0.01,
+                        ),
+                    ],
+                    action=gp.OverrideAction(
+                        choices=["Sorry, the input is hallucinatory."]
+                    ),
+                ),
+                gp.Ruleset(
+                    rules=[
+                        gp.Rule(
+                            metric=gp.RuleMetrics.pii,
+                            operator=gp.RuleOperator.any,
+                            target_value=["email", "phone_number", "name"],
+                        )
+                    ],
+                    action=gp.OverrideAction(
+                        choices=["Sorry, the output contains PII."]
+                    ),
+                ),
+                # gp.Ruleset(
+                #     rules=[
+                #         gp.Rule(
+                #             metric="deutsche_bank_company_pii_0",
+                #             operator=gp.RuleOperator.gte,
+                #             target_value=0.1,
+                #         )
+                #     ],
+                #     action=gp.OverrideAction(
+                #         choices=["Sorry, the output contains PII."]
+                #     ),
+                # )
+            ],
+            stage_id=self.protect_stage_id,
+        )
+        if workflow:
+            workflow.add_protect(
+                payload=gp.Payload(input=prompt, output=output),
+                response=response,
+            )
+        return dict(response)

backend/classes/generative_model.py ADDED Viewed

	@@ -0,0 +1,66 @@

+from google import genai
+from pydantic import BaseModel
+from typing import Union, List, Any
+import itertools
+from google.genai.types import GenerateContentConfig
+from openai import OpenAI
+class GenerativeModelConfig(BaseModel):
+    """Base configuration for vector databases."""
+    model_name: str
+class GenerativeModel:
+    """Abstract base class for vector databases."""
+    def __init__(self, config: Any):
+        self.config = config
+class GeminiModelConfig(BaseModel):
+    # Example field for model settings
+    model_name: str
+    api_keys: List[str]
+    temperature: float = 0.0
+class OpenAIModelConfig(BaseModel):
+    model_name: str
+    api_key: str
+    temperature: float = 0.0
+class GeminiModel(GenerativeModel):
+    def __init__(self, config: GeminiModelConfig):
+        super().__init__(config)
+        self.config.api_keys = list(set(config.api_keys))
+        self.clients = [genai.Client(api_key=api_key) for api_key in self.config.api_keys]
+        self._client_cycle = itertools.cycle(self.clients)
+    def generate_response(
+        self,
+        prompt: str,
+    ) -> str:
+        """Generate a response by calling the model selected."""
+        client = next(self._client_cycle)
+        response = client.models.generate_content(
+            model=self.config.model_name,
+            contents=prompt,
+            config=GenerateContentConfig(temperature=self.config.temperature),
+        )
+        return response.text
+class OpenAIModel(GenerativeModel):
+    def __init__(self, config: OpenAIModelConfig):
+        super().__init__(config)
+        self.client = OpenAI(api_key=config.api_key)
+    def generate_response(self, prompt: str, temperature: float = None) -> str:
+        if temperature is None:
+            temperature = self.config.temperature
+        response = self.client.chat.completions.create(
+            model=self.config.model_name,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=temperature,
+        )
+        return response.choices[0].message.content

backend/classes/pdf_extractor.py ADDED Viewed

	@@ -0,0 +1,89 @@

+import fitz
+import pymupdf4llm
+from pydantic import BaseModel
+from pathlib import Path
+from typing import List, Optional
+import logging
+logger = logging.getLogger(__name__)
+class PDFMetadata(BaseModel):
+    """Metadata for extracted PDF content."""
+    source: str
+    page_number: int
+    num_words: int
+    document_title: Optional[str] = None
+class PDFEntry(BaseModel):
+    """Represents a single page of extracted PDF content."""
+    id: str
+    markdown_text: str
+    metadata: PDFMetadata
+class BasePDFExtractorConfig(BaseModel):
+    """Base configuration for PDF extractors."""
+    extension: str = "pdf"
+class PyMuPDFExtractorConfig(BasePDFExtractorConfig):
+    """Configuration for PyMuPDF-based extractor."""
+    name: str = "pymupdf"
+class BasePDFExtractor:
+    """Base class for PDF extractors."""
+    def __init__(self, config: BasePDFExtractorConfig):
+        """Initialize the PDF extractor with configuration."""
+        self.config = config
+    def extract(self, pdf_path: Path) -> List[PDFEntry]:
+        """Extract text from a PDF file."""
+        raise NotImplementedError("This method should be implemented by subclasses")
+class PyMuPDFExtractor(BasePDFExtractor):
+    """PDF extractor using PyMuPDF library."""
+    def __init__(self, config: PyMuPDFExtractorConfig):
+        super().__init__(config)
+    def extract(self, pdf_path: Path) -> List[PDFEntry]:
+        """Extract text from PDF using PyMuPDF."""
+        pdf_file_path = str(pdf_path)
+        try:
+            doc = fitz.open(pdf_file_path)
+            pdf_name = pdf_path.name
+            entries = []
+            logger.info(f"Extracting content from {pdf_file_path}")
+            total_pages = len(doc)
+            processed_count = 0
+            for page_num in range(len(doc)):
+                # page = doc[page_num]
+                logger.info(f"Processing page: {page_num + 1}/{total_pages}")
+                markdown_text = pymupdf4llm.to_markdown(doc, pages=[page_num])
+                metadata = PDFMetadata(
+                    source=pdf_file_path,
+                    page_number=page_num + 1,
+                    num_words=len(markdown_text.split()),
+                    document_title=pdf_name
+                )
+                entry = PDFEntry(
+                    id=f"{pdf_name}_page_{page_num + 1}",
+                    markdown_text=markdown_text,
+                    metadata=metadata
+                )
+                entries.append(entry)
+                processed_count += 1
+            return entries
+        except fitz.FileNotFoundError:
+            print(f"Error: PDF file not found at '{pdf_file_path}'")
+        except Exception as e:
+            print(f"An error occurred: {e}")

backend/classes/rag_application.py ADDED Viewed

	@@ -0,0 +1,216 @@

+from pydantic import BaseModel
+import json
+import time
+from promptquality import Models
+from backend.classes.embedding_model import EmbeddingModel
+from backend.classes.vector_database.milvus_vector_database import MilvusVectorDatabase
+from backend.classes.galileo_platform import GalileoPlatform
+from backend.classes.generative_model import GeminiModel, OpenAIModel
+from typing import Union
+def strike(text):
+    return ''.join([char + '\u0336' for char in text])
+class RAGApplicationConfig(BaseModel):
+    embedding_model: EmbeddingModel
+    vector_db: MilvusVectorDatabase
+    generative_model: Union[GeminiModel, OpenAIModel]
+    galileo_platform: GalileoPlatform
+    top_k: int = 5
+    prompt_template: str = """
+You are an expert-level Question-Answering system. Your sole function is to provide a direct answer to the 'QUESTION'.
+You MUST synthesize information from the provided 'CONTEXT' and your internal knowledge base to formulate the answer. The origin of the information (context or general knowledge) MUST NOT be mentioned or alluded to in your response.
+**Crucial Rules:**
+1. NEVER apologize, hedge, or express uncertainty (e.g., "based on the context," "it seems," "I cannot find").
+2. NEVER mention the context using phrases like "provided context", "context provided",
+3. Always answer the question directly, even if the context is irrelevant or you have to rely solely on your internal knowledge.
+4. If the context is relevant, use it to answer the question.
+5. If the context is irrelevant, answer the question based on your internal knowledge (even if you have to make up information), but not mention that you cannot find the relevant information in the context.
+6. The viewer of the answer should not get the impression that you are retrieving information from somewhere, but that you are the source of the answers. Never make any reference to the provided context in your response.
+7. The response MUST be a maximum of three sentences.
+"""
+    hallucinatory_prompt_template: str = """You are an assistant for question-answering tasks. Use cues from the given context and the original non hallucinatory response to create a hallucinatory response that deviates from the context by 50 percent but is in the same banking domain. Use three sentences and keep the answer concise.
+Question: {question}
+Original Response: {original_response}
+Context: {context}
+Answer:"""
+    redacted_prompt_template: str = """You are an assistant for PII identification. Based on the categories mentioned below that are true, output a response with the PII wrapped in <pii></pii> tags. Other than the tag addition for PII, keep the response the same as the original response.
+The following are the categories that need to be redacted:
+- Phone numbers
+- Email addresses
+- Names
+- Company names
+For every PII that needs to be redacted, wrap it in <pii></pii> tags.
+Categories: {pii_flag}
+Response: {response}
+Modified Response: """
+    hallucinatory_chunks: list[str] = [
+        "Fairfield CDC is issuing this RFP to select a banking partner for its ambitious new program to fund the city's first dragon-powered public transportation system.",
+        "Merchant services must include psychic energy transfer gateways for multi-reality donation collection.",
+        "Technological capabilities must include temporal online banking for pre-cognitive transaction approvals.",
+        "Nonprofit expertise should cover managing pixie dust endowments and alchemical transmutations for the Fairfield CDC.",
+        "Deposit collateral for amounts over $250,000 can include moon rock deeds or dragon scale parchments.",
+        "Community advancement efforts may include an Elven Kingdom portal to enhance residents' magical aspects of life.",
+    ]
+    class Config:
+        arbitrary_types_allowed = True
+class RAGApplication:
+    def __init__(self, config: RAGApplicationConfig):
+        self.config = config
+    def run(
+        self,
+        query: str,
+        prompt_template: str = None,
+        protect_enabled: bool = False,
+        top_k: int = 5,
+        hallucination_detection: bool = False,
+        induce_hallucination: bool = False,
+    ) -> str:
+        # Create a workflow to track this query
+        observe_workflow = self.config.galileo_platform.observe_logger.add_workflow(
+            name="RAG Workflow", input={"query": query}
+        )
+        evaluate_workflow = self.config.galileo_platform.evaluate_run.add_workflow(
+            name="RAG Workflow", input={"query": query}
+        )
+        context_adherence_score = 1
+        pii_flag = {
+            "phone_number": False,
+            "email": False,
+            "name": False,
+            "company": False,
+        }
+        redacted_result = ""
+        original_result = ""
+        try:
+            start_time = time.time()
+            # Get query embedding
+            query_embedding = self.config.embedding_model.encode([query])
+            # Get top-k similar texts
+            retrieved_documents = [
+                str(text["text"])
+                for text in self.config.vector_db.search_similar_texts(
+                    query_embedding, limit=top_k
+                )
+            ]
+            # Log retriever step to Galileo Observe
+            observe_workflow.add_retriever(
+                name="Milvus Retrieval",
+                input=query,
+                documents=retrieved_documents,
+                duration_ns=int((time.time() - start_time) * 1e9),
+            )
+            evaluate_workflow.add_retriever(
+                name="Milvus Retrieval",
+                input=query,
+                documents=retrieved_documents,
+                # documents=[
+                # Document(content=doc, metadata={"length": len(doc)}) for doc in retrieved_documents],
+                duration_ns=int((time.time() - start_time) * 1e9),
+            )
+            start_time = time.time()
+            if not retrieved_documents:
+                return "There is nothing to return", redacted_result, context_adherence_score, pii_flag
+            # Create context by combining the retrieved documents
+            context = "\n\n".join(retrieved_documents)
+            # Set prompt template
+            prompt = (
+                self.config.prompt_template
+                if not prompt_template
+                else prompt_template
+            )
+            # Construct prompt
+            formatted_prompt = f"{prompt}\n\nQUESTION: {query}\n\nCONTEXT: {context}"
+            # Generate response
+            result = self.config.generative_model.generate_response(
+                formatted_prompt
+            )
+            if induce_hallucination:
+                original_result = result
+                hallucinatory_prompt = self.config.hallucinatory_prompt_template.format(question=query, context=context, original_response=result)
+                result = self.config.generative_model.generate_response(
+                    hallucinatory_prompt,
+                    temperature=1.0,
+                )
+            # Log LLM call to Galileo Observe
+            observe_workflow.add_llm(
+                name="Answer Generation",
+                input=retrieved_documents,
+                output=result,
+                model=self.config.generative_model.config.model_name,
+                duration_ns=int((time.time() - start_time) * 1e9),
+            )
+            evaluate_workflow.add_llm(
+                # input=Message(content=prompt, role=MessageRole.user),
+                # output=Message(content=result, role=MessageRole.assistant),
+                name="Answer Generation",
+                input=prompt,
+                output=result,
+                model=Models.gpt_4o,
+                duration_ns=int((time.time() - start_time) * 1e9),
+            )
+            start_time = time.time()
+            protect_response = self.config.galileo_platform.run_protect(
+                context, result, observe_workflow
+            )
+            if protect_enabled and protect_response["text"] != result:
+                pii_flag["phone_number"] = "phone_number" in protect_response["metric_results"]["pii"]["value"]
+                pii_flag["email"] = "email" in protect_response["metric_results"]["pii"]["value"]
+                pii_flag["name"] = "name" in protect_response["metric_results"]["pii"]["value"]
+                # pii_flag["company"] = protect_response["metric_results"]["deutsche_bank_company_pii_0"]["value"]>0.1
+                redacted_result = self.get_redacted_result(result, pii_flag)
+                result = redacted_result.replace("<pii>", "<tag>").replace("</pii>", "</tag>")
+            if hallucination_detection:
+                context_adherence_score = protect_response["metric_results"]["context_adherence_luna"]["value"]
+                # print(context_adherence_score)
+            # Conclude the workflow with the final result and set output
+            observe_workflow.conclude(output=result)
+            evaluate_workflow.output = result
+            self.config.galileo_platform.observe_logger.upload_workflows()
+            # Start evaluation in separate thread
+            self.config.galileo_platform.evaluate_run.finish(wait=True, silent=True)
+            # print(self.config.galileo_platform.evaluate_run)
+            return result, redacted_result, original_result, context_adherence_score, pii_flag
+        except Exception as e:
+            # Log errors to Galileo Observe
+            observe_workflow.conclude(output={"error": str(e)})
+            self.config.galileo_platform.observe_logger.upload_workflows()
+            raise e
+    def get_redacted_result(self, result, pii_flag):
+        prompt = self.config.redacted_prompt_template.format(pii_flag=pii_flag, response=result)
+        redacted_result = self.config.generative_model.generate_response(prompt)
+        return redacted_result

backend/classes/vector_database/__init__.py ADDED Viewed

File without changes

backend/classes/vector_database/base_vector_database.py ADDED Viewed

	@@ -0,0 +1,31 @@

+from pydantic import BaseModel
+import pandas as pd
+class VectorDatabaseConfig(BaseModel):
+    """Base configuration for vector databases."""
+    collection_name: str
+    class Config:
+        arbitrary_types_allowed = True
+class VectorDatabase:
+    """Abstract base class for vector databases."""
+    def __init__(self, config: VectorDatabaseConfig):
+        self.config = config
+    def add_texts(self, df: pd.DataFrame, embeddings: list):
+        """Add texts and their embeddings to the collection."""
+        raise NotImplementedError
+    def search_similar_texts(self, query_embedding: list, limit: int = 5):
+        """Search for similar texts based on embeddings."""
+        raise NotImplementedError
+    def drop_collection(self):
+        """Drop the collection."""
+        raise NotImplementedError
+    def count_entities(self) -> int:
+        """Get the number of entities in the collection."""
+        raise NotImplementedError

backend/classes/vector_database/milvus_vector_database.py ADDED Viewed

	@@ -0,0 +1,210 @@

+import os
+import shutil
+from typing import List
+import pandas as pd
+from pymilvus import MilvusClient, connections, FieldSchema, CollectionSchema, DataType, Collection
+import logging
+from backend.classes.vector_database.base_vector_database import VectorDatabaseConfig, VectorDatabase
+logger = logging.getLogger(__name__)
+class MilvusVectorDatabaseConfig(VectorDatabaseConfig):
+    """Configuration for Milvus vector database."""
+    db_path: str
+    collection_name: str
+    vector_dimensions: int
+    drop_if_exists: bool = True
+    class Config:
+        arbitrary_types_allowed = True
+class MilvusVectorDatabase(VectorDatabase):
+    """Implementation of vector database using Milvus."""
+    def __init__(self, config: MilvusVectorDatabaseConfig):
+        super().__init__(config)
+        # Create database
+        self.client = self.connect()
+        self.create_collection(config.drop_if_exists)
+        # # Create or get collection
+        # schema = CollectionSchema(fields, description="Text embeddings collection")
+        # self.collection:Collection = Collection(name=self.config.collection_name, schema=schema)
+    def connect(self):
+        logger.info(f"\nConnecting to Milvus at {self.config.db_path}...")
+        client = MilvusClient(self.config.db_path)
+        logger.info("Connected to Milvus.")
+        return client
+    def _define_schema(self) -> List[FieldSchema]:
+        """
+        Defines the Milvus collection schema for hybrid search.
+        - `id`: Primary key for unique chunk identification.
+        - `text_content`: Stores the chunked text, suitable for keyword filtering using `LIKE` or equality.
+        - `embedding`: Stores the dense vector embedding for similarity search.
+        - `doc_metadata`: A JSON field to store additional, flexible metadata for filtering.
+        """
+        fields = [
+            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
+            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1024),
+            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=self.config.vector_dimensions),
+            FieldSchema(name="metadata", dtype=DataType.JSON, description="Flexible JSON metadata for the document")
+        ]
+        return fields
+    def create_collection(self, drop_if_exists: bool = True):
+        """
+                Creates the Milvus collection with the defined schema and necessary indexes.
+                Args:
+                    drop_if_exists (bool): If True, drops the collection if it already exists
+                                           before creating a new one. Defaults to True.
+                """
+        if drop_if_exists: # and self.client.has_collection(collection_name=self.config.collection_name):
+            logger.info(f"Dropping existing collection '{self.config.collection_name}'...")
+            self.client.drop_collection(collection_name=self.config.collection_name)
+            # Create scalar index on 'text_content' for efficient filtering (e.g., using LIKE)
+            logger.info(f"Creating scalar index on 'text_content' for filtering...")
+            index_params = self.client.prepare_index_params()
+            index_params.add_index(
+                field_name="embedding",
+                metric_type="COSINE",  # Metric type is ignored for scalar indexes but required by API
+                index_type="IVF_FLAT",  # HNSW is a good general-purpose vector index
+                params={"nlist": 128}
+            )
+            fields = self._define_schema()
+            milvus_schema = CollectionSchema(
+                fields=fields,
+                description="Hybrid search collection for Finance documents"  # You can customize this description
+            )
+            logger.info(f"Creating collection '{self.config.collection_name}'...")
+            self.client.create_collection(
+                collection_name=self.config.collection_name,
+                schema=milvus_schema,
+                index_params=index_params,
+                dimension=self.config.vector_dimensions
+            )
+        # # Create scalar index on 'text_content' for efficient filtering (e.g., using LIKE)
+        # print(f"Creating scalar index on 'text' for filtering...")
+        # self.client.create_index(
+        #     collection_name=self.config.collection_name,
+        #     field_name="text",
+        #     index_type="STL",  # Segment Tree Index, suitable for VARCHAR filtering (equality, range, LIKE)
+        #     metric_type="COSINE",  # Metric type is ignored for scalar indexes but required by API
+        #     index_params=index_params
+        # )
+    def add_texts(self, df: pd.DataFrame, embeddings: list):
+        """
+        Add texts and their embeddings to the collection.
+        Args:
+            df: DataFrame containing text data with columns
+            embeddings: List of embeddings corresponding to each text
+        """
+        # Prepare data
+        data = []
+        for index, row in df.iterrows():
+            row["embedding"] = embeddings[index]
+            data.append(row.to_dict())
+        # data = [
+        #     df.text.tolist(),
+        #     embeddings,
+        #     df.metadata.tolist()
+        # ]
+        #
+        # Insert data
+        self.client.insert(collection_name=self.config.collection_name,data=data)
+    def hybrid_search(self, query_embedding: list, query_text: str, limit: int = 5,
+                     text_weight: float = 0.4, embedding_weight: float = 0.6) -> list:
+        """
+        Perform hybrid search combining text-based and vector similarity search.
+        Args:
+            query_embedding: Embedding vector for similarity search
+            query_text: Text query for text-based search
+            limit: Number of results to return
+            text_weight: Weight for text-based search score
+            embedding_weight: Weight for embedding similarity score
+        Returns:
+            List of search results with combined scores
+        """
+        output_fields = ["text", "metadata"]
+        # Vector similarity search
+        search_results = self.client.search(
+            collection_name=self.config.collection_name,
+            data=[query_embedding],
+            anns_field="embedding",
+            param={"metric_type": "L2", "params": {"nprobe": 10}},
+            limit=limit * 2,  # Get more candidates to combine with text search
+            output_fields=output_fields
+        )
+        # Process embedding results
+        formatted_results = []
+        if search_results and search_results[0]:
+            for hit in search_results[0]:
+                result = {
+                    "id": hit['id'],
+                    "distance": hit['distance'],
+                    "text": hit.get('text', 'N/A'),
+                    "metadata": hit.get('metadata', {})
+                }
+                # Add any other requested output fields
+                for field in output_fields:
+                    if field not in result: # Avoid overwriting 'text' or 'metadata' if already handled
+                        result[field] = hit.get(field)
+                formatted_results.append(result)
+        return formatted_results
+    def search_similar_texts(self, query_embedding: list, limit: int = 5):
+        """
+        Search for similar texts based on embeddings.
+        Args:
+            query_embedding: Embedding vector to search for
+            limit: Number of results to return
+        Returns:
+            List of similar texts and their distances
+        """
+        output_fields = ["text"]
+        search_results = self.client.search(
+            collection_name=self.config.collection_name,
+            data=query_embedding,
+            anns_field="embedding",
+            # param={"metric_type": "L2", "params": {"nprobe": 10}},
+            limit=limit,  # Get more candidates to combine with text search
+            output_fields=output_fields
+        )
+        return [{
+            "text": result.get("text"),
+            "distance": result["distance"]
+        } for result in search_results[0]]
+    def drop_collection(self):
+        """Drop the collection."""
+        if os.path.exists(self.config.db_path):
+            logger.info(f"Removing local Milvus Lite data directory: {self.config.db_path}...")
+            shutil.rmtree(self.config.db_path)
+            logger.info("Local data removed.")
+        else:
+            logger.info(f"Local data directory '{self.config.db_path}' not found, nothing to clean.")

backend/conf/config.yaml ADDED Viewed

	@@ -0,0 +1,122 @@

+env_variables:
+    - APP_ENV
+    - GOOGLE_GEMINI_API_KEY
+    - GOOGLE_GEMINI_BACKUP_API_KEY
+    - GALILEO_API_KEY
+    - GALILEO_CONSOLE_URL
+    - GALILEO_API_ACCESS_TOKEN
+    - OPENAI_API_KEY
+local_ks:
+    data:
+      input_data_path: "/Users/kannan/Documents/2025/galileo-poc/rfp-data/input"
+      output_data_path: "/Users/kannan/Documents/2025/galileo-poc/rfp-data/processed"
+      output_file: "{file_name}.jsonl"
+    chunker:
+        chunk_size: 500
+        chunk_overlap: 100
+    vector_database:
+        db_path: "/Users/kannan/Documents/2025/galileo-poc/rfp-data/processed/vector_db/rfp_data_milvus.db"
+        collection_name: "rfp_data"
+        dimensions: 768
+    embedding_model:
+        model_name: "philschmid/bge-base-financial-matryoshka"
+        batch_size: 32
+    pdf_extractor:
+        extension: ".pdf"
+    generative_model:
+        model_name: "gemini-2.0-flash-lite"
+        api_key: "{GOOGLE_GEMINI_API_KEY}"
+        backup_api_key: "{GOOGLE_GEMINI_BACKUP_API_KEY}"
+    galileo_platform:
+        evaluate_project_name: "deutsche-bank-evaluate"
+        observe_project_name: "deutsche-bank-test"
+        observe_project_id: "185841b9-fe41-4fe3-ad75-c5217f7d554d"
+        protect_project_name: "protect-test-project"
+        protect_stage_name: "protect-test-stage"
+local_ne:
+    data:
+      input_data_path: "/Users/nikhile/Work/repos/galileo-poc/fin-data/input"
+      output_data_path: "/Users/nikhile/Work/repos/galileo-poc/fin-data/processed"
+      output_file: "{file_name}.jsonl"
+    chunker:
+        chunk_size: 500
+        chunk_overlap: 100
+    vector_database:
+        db_path: "/Users/nikhile/Work/repos/galileo-poc/fin-data/processed/vector_db/fin_data_milvus.db"
+        collection_name: "fin_data"
+        dimensions: 768
+    embedding_model:
+        model_name: "philschmid/bge-base-financial-matryoshka"
+        batch_size: 32
+    pdf_extractor:
+        extension: ".pdf"
+    gemini_generative_model:
+        model_name: "gemini-2.0-flash"
+        api_key: "{GOOGLE_GEMINI_API_KEY}"
+        backup_api_key: "{GOOGLE_GEMINI_BACKUP_API_KEY}"
+        temperature: 1.0
+    openai_generative_model:
+        model_name: "gpt-4.1-nano-2025-04-14"
+        api_key: "{OPENAI_API_KEY}"
+        temperature: 0.0
+    galileo_platform:
+        evaluate_project_name: "lseg-qa-evaluate"
+        observe_project_name: "lseq-qa-observe"
+        observe_project_id: "185841b9-fe41-4fe3-ad75-c5217f7d554d"
+        protect_project_name: "protect-test-project"
+        protect_stage_name: "protect-test-stage"
+local_docker:
+    data:
+      input_data_path: "/app/fin-data/input"
+      output_data_path: "/app/fin-data/processed"
+      output_file: "{file_name}.jsonl"
+    chunker:
+        chunk_size: 500
+        chunk_overlap: 100
+    vector_database:
+        db_path: "/app/fin-data/processed/vector_db/fin_data_milvus.db"
+        collection_name: "fin_data"
+        dimensions: 768
+    embedding_model:
+        model_name: "philschmid/bge-base-financial-matryoshka"
+        batch_size: 32
+    pdf_extractor:
+        extension: ".pdf"
+    gemini_generative_model:
+        model_name: "gemini-2.0-flash"
+        api_key: "{GOOGLE_GEMINI_API_KEY}"
+        backup_api_key: "{GOOGLE_GEMINI_BACKUP_API_KEY}"
+        temperature: 1.0
+    openai_generative_model:
+        model_name: "gpt-4.1-nano-2025-04-14"
+        api_key: "{OPENAI_API_KEY}"
+        temperature: 0.0
+    galileo_platform:
+        evaluate_project_name: "lseg-qa-evaluate"
+        observe_project_name: "lseq-qa-observe"
+        observe_project_id: "185841b9-fe41-4fe3-ad75-c5217f7d554d"
+        protect_project_name: "protect-test-project"
+        protect_stage_name: "protect-test-stage"

backend/main/chunk_and_save_to_vector_db.py ADDED Viewed

	@@ -0,0 +1,107 @@

+from pathlib import Path
+import pandas as pd
+from backend.classes.chunker.text_chunker import RecursiveCharacterTextChunkerConfig, RecursiveCharacterTextChunker
+from backend.classes.embedding_model import EmbeddingModelConfig, EmbeddingModel
+from pydantic import BaseModel
+import json
+import dotenv
+from backend.classes.vector_database.milvus_vector_database import MilvusVectorDatabaseConfig, MilvusVectorDatabase
+from backend.utils.utils import get_embedding_model, read_config, set_env_variables, create_vector_database, \
+    create_text_chunker, initialize_logger
+dotenv.load_dotenv()
+def get_files(folder_path: str, extension: str = "jsonl") -> list:
+    # Get all pdf files from folder in a recursive manner using pathlib.Path
+    files = []
+    for path in Path(folder_path).rglob(f"*.{extension}"):
+        files.append(path)
+    return files
+class ChunkerVectorDbConfig(BaseModel):
+    folder_path: str
+    chunker: RecursiveCharacterTextChunker
+    vector_database: MilvusVectorDatabase
+    embedding_model: EmbeddingModel
+    class Config:
+        arbitrary_types_allowed = True
+def get_file_data(file_path: str) -> pd.DataFrame:
+    try:
+        return pd.read_json(file_path, lines=True)
+    except Exception as e:
+        logger.exception(e)
+        raise e
+def chunk_and_save_to_vector_db(config: ChunkerVectorDbConfig):
+    # Read files from folder
+    file_paths = get_files(config.folder_path)
+    logger.info(f"There are {len(file_paths)} to process")
+    # Extract text from pdf files
+    for file_path in file_paths:
+        # Extract pdf data in markdown
+        logger.info(f"Processing {file_path}")
+        data_df = get_file_data(str(file_path))
+        # There are a few rows that are empty due to images not being extracted
+        # Remove them
+        data_df = data_df[data_df["markdown_text"] != ""]
+        data_df["text_chunks"] = data_df["markdown_text"].apply(config.chunker.chunk_text)
+        data_df = data_df.explode("text_chunks").rename(columns={"text_chunks": "text"})
+        data_df["chunk_id"] = data_df.groupby("id").cumcount() + 1
+        data_df["row_chunk_id"] = data_df["id"] + data_df["chunk_id"].astype(str)
+        data_df["metadata_json"] = data_df["metadata"].apply(lambda d: json.dumps(d))
+        data_df = data_df.drop(columns=["metadata", "id", "row_chunk_id", "markdown_text", "chunk_id"]).rename(columns={"metadata_json": "metadata"})
+        embeddings = config.embedding_model.encode(data_df.text.tolist())
+        config.vector_database.add_texts(data_df, embeddings)
+def run(config: dict):
+    # Create embedding model object
+    embedding_model_config = EmbeddingModelConfig(model_name=config["embedding_model"]["model_name"],
+                                                  batch_size=config["embedding_model"]["batch_size"])
+    embedding_model = get_embedding_model(EmbeddingModel, embedding_model_config)
+    # Create vector db model object
+    vector_db_config = MilvusVectorDatabaseConfig(db_path=config["vector_database"]["db_path"],
+                                                  collection_name=config["vector_database"]["collection_name"],
+                                                  vector_dimensions=config["vector_database"]["dimensions"])
+    vector_db = create_vector_database(MilvusVectorDatabase, vector_db_config)
+    text_chunker_config = RecursiveCharacterTextChunkerConfig(chunk_size=config["chunker"]["chunk_size"],
+                                                              chunk_overlap=config["chunker"]["chunk_overlap"])
+    text_chunker = create_text_chunker(RecursiveCharacterTextChunker, text_chunker_config)
+    chunker_vector_db_config = ChunkerVectorDbConfig(folder_path=config["data"]["output_data_path"],
+                                                     chunker=text_chunker,
+                                                     vector_database=vector_db,
+                                                     embedding_model=embedding_model)
+    chunk_and_save_to_vector_db(chunker_vector_db_config)
+if __name__ == "__main__":
+    logger = initialize_logger()
+    # get current file path using Path
+    config = read_config(str(Path(Path(__file__).parent, "../conf/config.yaml")))
+    # check if environment variables are set
+    env_variables = set_env_variables(config["env_variables"])
+    app_config = config[env_variables["APP_ENV"]]
+    app_config["env_vars"] = env_variables
+    run(app_config)

backend/main/prepare_data.py ADDED Viewed

	@@ -0,0 +1,41 @@

+from pathlib import Path
+from backend.classes.data_preparer import DataPreparerConfig, DataPreparer
+from backend.utils.utils import initialize_logger, read_config, set_env_variables
+from dotenv import load_dotenv
+load_dotenv()
+def run(config: dict):
+    """
+    Run the RAG application.
+    :param config: Configuration dictionary
+    """
+    logger.info("Prepare data")
+    data_preparer_config = DataPreparerConfig(input_data_path=config["data"]["input_data_path"],
+                                              output_data_path=config["data"]["output_data_path"],
+                                              output_file=config["data"]["output_file"],
+                                              pdf_extractor=config["pdf_extractor"],
+                                              vector_database=config["vector_database"],
+                                              embedding_model=config["embedding_model"])
+    data_preparer = DataPreparer(data_preparer_config)
+    data_preparer.prepare_data()
+    logger.info("Data prepared")
+if __name__ == '__main__':
+    logger = initialize_logger()
+    # get current file path using Path
+    config = read_config(str(Path(Path(__file__).parent, "../conf/config.yaml")))
+    # check if environment variables are set
+    env_variables = set_env_variables(config["env_variables"])
+    app_config = config[env_variables["APP_ENV"]]
+    app_config["env_vars"] = env_variables
+    run(app_config)

backend/main/run_rag.py ADDED Viewed

	@@ -0,0 +1,36 @@

+# from classes.rag_application import RAGApplication
+# from classes.pdf_extractor import PDFExtractor
+# from classes.generative_model import GenerativeModel, GenerativeModelConfig
+# from utils import initialize_logger, read_config
+# from pathlib import Path
+# import logging
+#
+# logging.basicConfig(level=logging.INFO)
+# logger = logging.getLogger(__name__)
+#
+# logger = initialize_logger()
+#
+# # Initialize required components from config
+# config = read_config("conf/config.yaml")
+# pdf_extractor = PDFExtractor(config["pdf_extractor"])
+# vector_db = VectorDatabase(config["vector_database"])
+#
+# # Get PDF path from config
+# pdf_path = config["data"]["input_data_path"]
+#
+# def run(config: dict):
+#     """
+#     Run the RAG application.
+#     :param config: Configuration dictionary
+#     """
+#     logger.info("Create generative model")
+#     generative_model_config = GenerativeModelConfig(model_name=config['generative_model']['model_name'])
+#     generative_model = GenerativeModel(generative_model_config)
+#
+#     logger.info("Create RAG application")
+#     rag_app = RAGApplication(vector_db, pdf_extractor, generative_model)
+#     rag_app.run(pdf_path, 'Give me a summary of the document')
+#     logger.info("RAG application completed")
+#
+# if __name__ == '__main__':
+#     run(config)

backend/models/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

backend/models/rag_models.py ADDED Viewed

	@@ -0,0 +1,10 @@

+from pydantic import BaseModel
+class RAGRequest(BaseModel):
+    query: str
+    protect_enabled: bool = False
+class RAGResponse(BaseModel):
+    response: str

backend/requirements.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+# Requirements for RAG Application
+python-dotenv
+PyMuPDF
+chromadb
+google-genai
+pandas
+fastapi
+uvicorn
+requests
+streamlit
+pymilvus
+docling
+sentence-transformers
+torch
+langchain-text-splitters

backend/utils/utils.py ADDED Viewed

	@@ -0,0 +1,49 @@

+import os
+import yaml
+import logging
+from typing import List, Any
+from dotenv import load_dotenv
+load_dotenv()
+def set_env_variables(env_vars: List):
+    env_vars_dict = {}
+    for env_var in env_vars:
+        if env_var not in os.environ or not os.environ[env_var]:
+            raise ValueError(f"ERROR: Please set {env_var}.")
+        env_vars_dict[env_var] = os.environ[env_var]
+    return env_vars_dict
+def initialize_logger():
+    logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+    logger = logging.getLogger(__name__)
+    return logger
+def read_config(config_path: str) -> dict:
+    with open(config_path, 'r') as config_file:
+        config = yaml.safe_load(config_file)
+    return config
+def create_vector_database(db: Any, config: Any) -> Any:
+    vector_db = db(config)
+    return vector_db
+def create_text_chunker(text_chunker: Any, text_chunker_config: Any) -> Any:
+    return text_chunker(text_chunker_config)
+def create_pdf_extractor(pdf_extractor: Any, pdf_extractor_config: Any) -> Any:
+    return pdf_extractor(pdf_extractor_config)
+def get_embedding_model(model: Any, config: Any) -> Any:
+    embedding_model = model(config)
+    return embedding_model
+def get_generative_model(model: Any, config: Any) -> Any:
+    generative_model = model(config)
+    return generative_model

documentation/backend.puml ADDED Viewed

	@@ -0,0 +1,43 @@

+@startuml backend
+class VectorDatabase {
+    - chroma_client
+    - collection
+    + __init__(config: class VectorDatabaseConfig {})
+    + store_text_as_vector(df: pd.DataFrame)
+}
+class PDFExtractor {
+    + __init__(config: class PDFExtractorConfig {})
+    + extract_text_from_pdf(pdf_path: str)
+}
+class GenerativeModel {
+    - client
+    + __init__(config: class GenerativeModelConfig {})
+    + generate_response(query: str)
+}
+class RAGApplicationConfig {
+    + vector_db: VectorDatabase
+    + pdf_extractor: PDFExtractor
+    + generative_model: GenerativeModel
+}
+class DataPreparerConfig {
+    + input_data_path: str
+    + output_data_path: str
+    + output_file: str
+    + pdf_extractor: PDFExtractorConfig
+    + vector_database: VectorDatabaseConfig
+    + embedding_model: EmbeddingModelConfig
+}
+class RAGApplication {
+    - vector_db
+    - pdf_extractor
+    - generative_model
+    + __init__(config: class RAGApplicationConfig)
+    + run(pdf_path: str, query: str)
+}
+@enduml

documentation/workflow.excalidraw ADDED Viewed

	@@ -0,0 +1,1563 @@

+{
+  "type": "excalidraw",
+  "version": 2,
+  "source": "https://marketplace.visualstudio.com/items?itemName=pomdtr.excalidraw-editor",
+  "elements": [
+    {
+      "id": "ahbQwKAsijgSdg9jT0hg3",
+      "type": "rectangle",
+      "x": 113.92242891508351,
+      "y": 546.2754497232347,
+      "width": 145.73046875,
+      "height": 60,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a0",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 732491180,
+      "version": 134,
+      "versionNonce": 2128788504,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "l2Bhq9WJokYdEsOu-kqdv",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "UFlJSimF8SzkCx_fogXYo"
+        },
+        {
+          "id": "FPhrCFjAj_hPudKjQjMHb",
+          "type": "arrow"
+        }
+      ],
+      "updated": 1747664916081,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "UFlJSimF8SzkCx_fogXYo",
+      "type": "text",
+      "x": 122.77770693022023,
+      "y": 563.7754497232347,
+      "width": 128.01991271972656,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a1",
+      "roundness": null,
+      "seed": 1377196076,
+      "version": 150,
+      "versionNonce": 274954520,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916081,
+      "link": null,
+      "locked": false,
+      "text": "Save as jsonl",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "ahbQwKAsijgSdg9jT0hg3",
+      "originalText": "Save as jsonl",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "l2Bhq9WJokYdEsOu-kqdv",
+      "type": "arrow",
+      "x": 181.26889178080395,
+      "y": 481.2445801020841,
+      "width": 0.47992077927702326,
+      "height": 64.64388544319729,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a2",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 1282568876,
+      "version": 214,
+      "versionNonce": 297608040,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747666564671,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          0.47992077927702326,
+          64.64388544319729
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": {
+        "elementId": "S2mRdzEdWZomP_4VMODv9",
+        "focus": 0.03491911256120813,
+        "gap": 1.7043524039286808
+      },
+      "endBinding": {
+        "elementId": "ahbQwKAsijgSdg9jT0hg3",
+        "focus": -0.06586435538969795,
+        "gap": 1
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "8qw2U9U9RrOSm8H7KwMOK",
+      "type": "rectangle",
+      "x": 86.36768348636292,
+      "y": 665.4955189693495,
+      "width": 209.10886312467028,
+      "height": 126.70472854491413,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "#ffec99",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a4",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 1709894932,
+      "version": 286,
+      "versionNonce": 143242008,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "FPhrCFjAj_hPudKjQjMHb",
+          "type": "arrow"
+        },
+        {
+          "id": "VfDy5P7I3DdOSK-tio8_Z",
+          "type": "text"
+        }
+      ],
+      "updated": 1747665906821,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "VfDy5P7I3DdOSK-tio8_Z",
+      "type": "text",
+      "x": 98.4121892064129,
+      "y": 691.3478832418066,
+      "width": 185.0198516845703,
+      "height": 75,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a5",
+      "roundness": null,
+      "seed": 2108959380,
+      "version": 454,
+      "versionNonce": 1477677080,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747665906821,
+      "link": null,
+      "locked": false,
+      "text": "Create embeddings,\nuse Matryoshka\nembeddings",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "8qw2U9U9RrOSm8H7KwMOK",
+      "originalText": "Create embeddings, use Matryoshka embeddings",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "FPhrCFjAj_hPudKjQjMHb",
+      "type": "arrow",
+      "x": 183.93007917700277,
+      "y": 606.662433901188,
+      "width": 0.3427040579839229,
+      "height": 58.31253716582853,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a6",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 343521300,
+      "version": 398,
+      "versionNonce": 878300440,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747665906821,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.3427040579839229,
+          58.31253716582853
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": {
+        "elementId": "ahbQwKAsijgSdg9jT0hg3",
+        "focus": 0.03801391972015758,
+        "gap": 1
+      },
+      "endBinding": {
+        "elementId": "8qw2U9U9RrOSm8H7KwMOK",
+        "focus": -0.07396043014036746,
+        "gap": 1
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "S2mRdzEdWZomP_4VMODv9",
+      "type": "rectangle",
+      "x": 110.72815104552075,
+      "y": 420.5777198758129,
+      "width": 145.73046875,
+      "height": 64,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a7",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 1766607916,
+      "version": 176,
+      "versionNonce": 712793960,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "4CsqH5PyXh3iccuPZ0I0L",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "gzAFKkWZ8OVdvSP3NptvV"
+        },
+        {
+          "id": "l2Bhq9WJokYdEsOu-kqdv",
+          "type": "arrow"
+        }
+      ],
+      "updated": 1747666605404,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "gzAFKkWZ8OVdvSP3NptvV",
+      "type": "text",
+      "x": 128.96345683165356,
+      "y": 425.5777198758129,
+      "width": 109.25985717773438,
+      "height": 54,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a8",
+      "roundness": null,
+      "seed": 1892279980,
+      "version": 268,
+      "versionNonce": 1504654872,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747666617437,
+      "link": null,
+      "locked": false,
+      "text": "Extract data\nfrom pdfs",
+      "fontSize": 20,
+      "fontFamily": 6,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "S2mRdzEdWZomP_4VMODv9",
+      "originalText": "Extract data from pdfs",
+      "autoResize": true,
+      "lineHeight": 1.35
+    },
+    {
+      "id": "4CsqH5PyXh3iccuPZ0I0L",
+      "type": "arrow",
+      "x": 178.40108773648336,
+      "y": 358.8316261258129,
+      "width": 0.7712439836965075,
+      "height": 61.294579681627624,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "a9",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 447754540,
+      "version": 290,
+      "versionNonce": 2117562472,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747666564671,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7712439836965075,
+          61.294579681627624
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "S2mRdzEdWZomP_4VMODv9",
+        "focus": -0.08665299440759953,
+        "gap": 1.163726888744634
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "sQ3a2n9urbtaCts05NIgi",
+      "type": "text",
+      "x": 152.56951496517308,
+      "y": 329.55127711710816,
+      "width": 53.63995361328125,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aA",
+      "roundness": null,
+      "seed": 720358572,
+      "version": 45,
+      "versionNonce": 975122968,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "PDFs",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "PDFs",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "bHSkFr3jNSgNI1uBSw_YH",
+      "type": "rectangle",
+      "x": 87.6826415406075,
+      "y": 861.2677483936271,
+      "width": 209.10886312467028,
+      "height": 126.70472854491413,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aB",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 988513964,
+      "version": 266,
+      "versionNonce": 1486958360,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "zXok0v3zHlKbCmoSypY8o",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "sg3SXgncifiwZc-zwXYjd"
+        }
+      ],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "sg3SXgncifiwZc-zwXYjd",
+      "type": "text",
+      "x": 96.8871356639778,
+      "y": 887.1201126660842,
+      "width": 190.6998748779297,
+      "height": 75,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aC",
+      "roundness": null,
+      "seed": 1872613676,
+      "version": 352,
+      "versionNonce": 802917400,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "Create and save to\ncollection in Chroma\nvector db",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "bHSkFr3jNSgNI1uBSw_YH",
+      "originalText": "Create and save to collection in Chroma vector db",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "zXok0v3zHlKbCmoSypY8o",
+      "type": "arrow",
+      "x": 185.16187324544518,
+      "y": 799.2927796444268,
+      "width": 0.4120303976070261,
+      "height": 61.368755427712586,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aD",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 480431020,
+      "version": 426,
+      "versionNonce": 1082347112,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916199,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.4120303976070261,
+          61.368755427712586
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "bHSkFr3jNSgNI1uBSw_YH",
+        "focus": -0.07541117678427305,
+        "gap": 1
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "d9zvLKXktuu4KKoNwDHks",
+      "type": "text",
+      "x": -207.9106402991797,
+      "y": 670.8478145278248,
+      "width": 257.6997985839844,
+      "height": 125,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aF",
+      "roundness": null,
+      "seed": 297698072,
+      "version": 205,
+      "versionNonce": 1692721944,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "Matryoshka embeddings\ncan be used with different\ndimensions and used with\nGalileo evals to show \nimprovement ",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "Matryoshka embeddings\ncan be used with different\ndimensions and used with\nGalileo evals to show \nimprovement ",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "LUAd0l_VFSMzCn477Y5tU",
+      "type": "text",
+      "x": -32.44826338674477,
+      "y": 246.50406452782477,
+      "width": 419.84990437825513,
+      "height": 46.875,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aG",
+      "roundness": null,
+      "seed": 220549992,
+      "version": 96,
+      "versionNonce": 1636352024,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "BACKEND DATA PREP",
+      "fontSize": 37.49999999999999,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "BACKEND DATA PREP",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "LCcahyR5mZ-VrhSSJ0RrX",
+      "type": "text",
+      "x": 844.4690950116927,
+      "y": 238.00406452782477,
+      "width": 263.5499572753906,
+      "height": 46.87499999999999,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aH",
+      "roundness": null,
+      "seed": 1455065880,
+      "version": 235,
+      "versionNonce": 933821720,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "UI WITH RAG",
+      "fontSize": 37.49999999999999,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "UI WITH RAG",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "EdUn3nnIEvlF6aBa1kj7M",
+      "type": "text",
+      "x": 921.6245159508203,
+      "y": 313.83609577782477,
+      "width": 134.23989868164062,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aK",
+      "roundness": null,
+      "seed": 1834382440,
+      "version": 103,
+      "versionNonce": 951687704,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "User question",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "User question",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "ZN_mKooYb6wL_YlpOin8R",
+      "type": "rectangle",
+      "x": 918.8139690758203,
+      "y": 423.5783258824798,
+      "width": 145.73046875,
+      "height": 60,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aL",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 585607704,
+      "version": 285,
+      "versionNonce": 1300833896,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "XV1DRpL8zYaqsEKP0Z_nl",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "-NyTkinb1h4vbr60eEtwZ"
+        }
+      ],
+      "updated": 1747665565753,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "-NyTkinb1h4vbr60eEtwZ",
+      "type": "text",
+      "x": 934.9592556358789,
+      "y": 428.5783258824798,
+      "width": 113.43989562988281,
+      "height": 50,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aM",
+      "roundness": null,
+      "seed": 26912536,
+      "version": 364,
+      "versionNonce": 963123560,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747665565753,
+      "link": null,
+      "locked": false,
+      "text": "Streamlit\nUI/backend",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "ZN_mKooYb6wL_YlpOin8R",
+      "originalText": "Streamlit UI/backend",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "XV1DRpL8zYaqsEKP0Z_nl",
+      "type": "arrow",
+      "x": 986.4869057667829,
+      "y": 361.8478571324798,
+      "width": 0.7711825485696409,
+      "height": 61.27895468162757,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aN",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 1793459224,
+      "version": 509,
+      "versionNonce": 1391504488,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747665565754,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7711825485696409,
+          61.27895468162757
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "ZN_mKooYb6wL_YlpOin8R",
+        "focus": -0.08665299440759792,
+        "gap": 1.163726888744634
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "VaeeYk5sVGivRgHn8KaCo",
+      "type": "rectangle",
+      "x": 922.3256878258203,
+      "y": 547.8040436970491,
+      "width": 145.73046875,
+      "height": 85,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aO",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 1355576856,
+      "version": 314,
+      "versionNonce": 1845381912,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "kOt0lJ7fT7uuJmHCxsxGM",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "1sXCYRnQofixcmBJbRUYk"
+        }
+      ],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "1sXCYRnQofixcmBJbRUYk",
+      "type": "text",
+      "x": 938.7609524132226,
+      "y": 552.8040436970491,
+      "width": 112.85993957519531,
+      "height": 75,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aP",
+      "roundness": null,
+      "seed": 822473496,
+      "version": 423,
+      "versionNonce": 1395992600,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "Convert\nquestion to\nembedding",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "VaeeYk5sVGivRgHn8KaCo",
+      "originalText": "Convert question to embedding",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "kOt0lJ7fT7uuJmHCxsxGM",
+      "type": "arrow",
+      "x": 989.9986245167829,
+      "y": 486.3079499470491,
+      "width": 0.7612717529503357,
+      "height": 60.332366861255366,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aQ",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 654823448,
+      "version": 564,
+      "versionNonce": 461427816,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916199,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7612717529503357,
+          60.332366861255366
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "VaeeYk5sVGivRgHn8KaCo",
+        "focus": -0.08861558743629552,
+        "gap": 1.163726888744577
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "Xfbn92OkPEM6ZzK149DxS",
+      "type": "rectangle",
+      "x": 924.7045940758203,
+      "y": 697.1442769037981,
+      "width": 145.73046875,
+      "height": 85,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aR",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 242743832,
+      "version": 359,
+      "versionNonce": 81002264,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "TFd3QnaQ0gbgIILWoCbfK",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "AANZteIq6UfeoE2UXHRri"
+        }
+      ],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "AANZteIq6UfeoE2UXHRri",
+      "type": "text",
+      "x": 937.3798641563867,
+      "y": 702.1442769037981,
+      "width": 120.37992858886719,
+      "height": 75,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aS",
+      "roundness": null,
+      "seed": 916518680,
+      "version": 512,
+      "versionNonce": 1832147992,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "Get top-k\nfrom Chroma\nvector db",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "Xfbn92OkPEM6ZzK149DxS",
+      "originalText": "Get top-k from Chroma vector db",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "TFd3QnaQ0gbgIILWoCbfK",
+      "type": "arrow",
+      "x": 992.3775307667829,
+      "y": 635.6481831537981,
+      "width": 0.7612717529503357,
+      "height": 60.332366861255366,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aT",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 1252152344,
+      "version": 652,
+      "versionNonce": 138020712,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916200,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7612717529503357,
+          60.332366861255366
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "Xfbn92OkPEM6ZzK149DxS",
+        "focus": -0.08861558743629806,
+        "gap": 1.163726888744577
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "2iNtOdIxMDmUgNRqT2SHv",
+      "type": "rectangle",
+      "x": 929.2280315758203,
+      "y": 846.8381009518404,
+      "width": 145.73046875,
+      "height": 85,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "#ffec99",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aU",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 418077208,
+      "version": 411,
+      "versionNonce": 959427352,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "id": "mt3dUdBKJks2RtTSQqE1X",
+          "type": "arrow"
+        },
+        {
+          "type": "text",
+          "id": "PO5LIHoHfyLfjDsOD0DAw"
+        }
+      ],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "PO5LIHoHfyLfjDsOD0DAw",
+      "type": "text",
+      "x": 947.3633007408594,
+      "y": 864.3381009518404,
+      "width": 109.45993041992188,
+      "height": 50,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aV",
+      "roundness": null,
+      "seed": 1537759000,
+      "version": 610,
+      "versionNonce": 66687000,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "text": "Add Galileo\nevals",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "2iNtOdIxMDmUgNRqT2SHv",
+      "originalText": "Add Galileo evals",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "mt3dUdBKJks2RtTSQqE1X",
+      "type": "arrow",
+      "x": 996.6509682667829,
+      "y": 785.3420072018404,
+      "width": 0.7612717529503357,
+      "height": 60.332366861255366,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aW",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 398308376,
+      "version": 754,
+      "versionNonce": 223664744,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916200,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7612717529503357,
+          60.332366861255366
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": {
+        "elementId": "2iNtOdIxMDmUgNRqT2SHv",
+        "focus": -0.09202151247941287,
+        "gap": 1.1637268887446908
+      },
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "_l8h4crDy_ePJQQpee6SU",
+      "type": "arrow",
+      "x": 998.8268626084431,
+      "y": 931.5477478779942,
+      "width": 0.7612717529503357,
+      "height": 60.332366861255366,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aX",
+      "roundness": {
+        "type": 2
+      },
+      "seed": 1835770648,
+      "version": 731,
+      "versionNonce": 403793688,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747664916082,
+      "link": null,
+      "locked": false,
+      "points": [
+        [
+          0,
+          0
+        ],
+        [
+          -0.7612717529503357,
+          60.332366861255366
+        ]
+      ],
+      "lastCommittedPoint": null,
+      "startBinding": null,
+      "endBinding": null,
+      "startArrowhead": null,
+      "endArrowhead": "arrow",
+      "elbowed": false
+    },
+    {
+      "id": "uOvtezevF7hRnF5IhUdTk",
+      "type": "text",
+      "x": 877.2417034508203,
+      "y": 1150.0626582778248,
+      "width": 230.87986755371094,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aY",
+      "roundness": null,
+      "seed": 1347031320,
+      "version": 146,
+      "versionNonce": 2103029272,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747665658987,
+      "link": null,
+      "locked": false,
+      "text": "Show top-k + eval in UI",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "Show top-k + eval in UI",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "95VkcTxIBcZMz2gr1CcCo",
+      "type": "text",
+      "x": 398.624890941522,
+      "y": 479.95474812157477,
+      "width": 389.2597351074219,
+      "height": 150,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "ac",
+      "roundness": null,
+      "seed": 1184416536,
+      "version": 117,
+      "versionNonce": 251684456,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747664921990,
+      "link": null,
+      "locked": false,
+      "text": "- Phase 1: Simple RAG with Galileo evals\n\n- Phase 2: Agentic RAG with ability to \nself-correct and/with Galileo evals\n\n- Phase 3: TBD",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "- Phase 1: Simple RAG with Galileo evals\n\n- Phase 2: Agentic RAG with ability to \nself-correct and/with Galileo evals\n\n- Phase 3: TBD",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "X0ww4FIuAjwX6E_HhWLpX",
+      "type": "text",
+      "x": 524.1010784508203,
+      "y": 398.76187702782477,
+      "width": 77.95994567871094,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "ae",
+      "roundness": null,
+      "seed": 142996072,
+      "version": 8,
+      "versionNonce": 1670002536,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747664926713,
+      "link": null,
+      "locked": false,
+      "text": "PHASES",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "PHASES",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "umtwNm9fcxd1bsz6uIFH3",
+      "type": "text",
+      "x": -196.4262652991797,
+      "y": 894.7618770278248,
+      "width": 216.8598175048828,
+      "height": 25,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "af",
+      "roundness": null,
+      "seed": 1217313560,
+      "version": 41,
+      "versionNonce": 728632680,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747665493854,
+      "link": null,
+      "locked": false,
+      "text": "Fuzzy search in Milvus",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "Fuzzy search in Milvus",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "7wPfS9EWewDNqNV2lGfYX",
+      "type": "rectangle",
+      "x": 927.0092815758203,
+      "y": 998.382970777825,
+      "width": 145.73046875,
+      "height": 110,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "#ffec99",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "ag",
+      "roundness": {
+        "type": 3
+      },
+      "seed": 974709272,
+      "version": 484,
+      "versionNonce": 334137624,
+      "isDeleted": false,
+      "boundElements": [
+        {
+          "type": "text",
+          "id": "lYwockrM6JTcy0vqjVEyE"
+        }
+      ],
+      "updated": 1747665654880,
+      "link": null,
+      "locked": false
+    },
+    {
+      "id": "lYwockrM6JTcy0vqjVEyE",
+      "type": "text",
+      "x": 941.7745632530664,
+      "y": 1003.382970777825,
+      "width": 116.19990539550781,
+      "height": 100,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "ah",
+      "roundness": null,
+      "seed": 233103128,
+      "version": 743,
+      "versionNonce": 545124632,
+      "isDeleted": false,
+      "boundElements": [],
+      "updated": 1747665655863,
+      "link": null,
+      "locked": false,
+      "text": "Check\ncontextual\nrelevance,\nplan actions",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "center",
+      "verticalAlign": "middle",
+      "containerId": "7wPfS9EWewDNqNV2lGfYX",
+      "originalText": "Check contextual relevance, plan actions",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "q8QUVhGqw9gk_AYyPZ-7P",
+      "type": "text",
+      "x": -193.6059527991797,
+      "y": 415.57828327782477,
+      "width": 204.9798583984375,
+      "height": 75,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "aj",
+      "roundness": null,
+      "seed": 797414424,
+      "version": 105,
+      "versionNonce": 626327832,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747666036988,
+      "link": null,
+      "locked": false,
+      "text": "- Try docling\nbetter than pymupdf\n- export in markdown",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "- Try docling\nbetter than pymupdf\n- export in markdown",
+      "autoResize": true,
+      "lineHeight": 1.25
+    },
+    {
+      "id": "vrvvrH4qCb-xGQbERHh1C",
+      "type": "text",
+      "x": -173.8207965491797,
+      "y": 809.8126582778248,
+      "width": 153.69985961914062,
+      "height": 50,
+      "angle": 0,
+      "strokeColor": "#1e1e1e",
+      "backgroundColor": "transparent",
+      "fillStyle": "solid",
+      "strokeWidth": 2,
+      "strokeStyle": "solid",
+      "roughness": 1,
+      "opacity": 100,
+      "groupIds": [],
+      "frameId": null,
+      "index": "ak",
+      "roundness": null,
+      "seed": 1771655704,
+      "version": 40,
+      "versionNonce": 2071710824,
+      "isDeleted": false,
+      "boundElements": null,
+      "updated": 1747666171125,
+      "link": null,
+      "locked": false,
+      "text": "Jina embeddings\n1024",
+      "fontSize": 20,
+      "fontFamily": 5,
+      "textAlign": "left",
+      "verticalAlign": "top",
+      "containerId": null,
+      "originalText": "Jina embeddings\n1024",
+      "autoResize": true,
+      "lineHeight": 1.25
+    }
+  ],
+  "appState": {
+    "gridSize": 20,
+    "gridStep": 5,
+    "gridModeEnabled": false,
+    "viewBackgroundColor": "#ffffff"
+  },
+  "files": {}
+}

fin-data/processed/test_file ADDED Viewed

File without changes

fin-data/processed/vector_db/test_file ADDED Viewed

File without changes

main.py ADDED Viewed

	@@ -0,0 +1,6 @@

+def main():
+    print("Hello from galileo-poc!")
+if __name__ == "__main__":
+    main()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,27 @@

+[project]
+name = "galileo-poc"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.12.10"
+dependencies = [
+    "fastapi>=0.115.9",
+    "galileo-observe>=1.23.0",
+    "galileo-protect>=0.17.1",
+    "google-genai>=1.19.0",
+    "langchain-text-splitters>=0.3.8",
+    "openai>=1.95.1",
+    "pandas>=2.3.0",
+    "promptquality>=1.11.3",
+    "pymilvus>=2.5.11",
+    "pymupdf>=1.26.0",
+    "pymupdf4llm>=0.0.24",
+    "python-dotenv>=1.1.0",
+    "python-multipart>=0.0.20",
+    "requests>=2.32.4",
+    "sentence-transformers>=4.1.0",
+    "streamlit>=1.45.1",
+    "torch>=2.7.1",
+    "uvicorn>=0.34.3",
+    "watchdog>=6.0.0",
+]

requirements.txt ADDED Viewed

	@@ -0,0 +1,20 @@

+fastapi
+uvicorn
+requests
+streamlit
+python-dotenv
+PyMuPDF
+chromadb
+google-genai
+pandas
+pymilvus
+docling
+sentence-transformers
+torch
+langchain-text-splitters
+watchdog
+galileo-observe>=1.23.0
+galileo-protect>=0.17.1
+pymupdf4llm
+python-multipart
+promptquality

ui/app.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import streamlit as st
+import requests
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+def handle_enter():
+    st.session_state['submitted'] = True
+if __name__ == "__main__":
+    # Streamlit UI
+    st.title("RFP Q/A")
+    # Sidebar for settings
+    with st.sidebar:
+        st.title("Settings")
+        st.divider()
+        protect_enabled = st.toggle("Galileo Protect", value=False,
+                                    help="Enable content protection with Galileo")
+    # Callback function when Enter is pressed
+    def on_enter():
+        st.session_state["run_search"] = True
+    # User input
+    st.text_input("Ask your question:", key="query", on_change=on_enter)
+    # Search button
+    if st.button("Search"):
+        st.session_state["run_search"] = True
+    # Run the search if triggered by button or Enter
+    if st.session_state.get("run_search"):
+        try:
+            query = st.session_state.get("query", "").strip()
+            # Make request to FastAPI endpoint
+            response = requests.post(
+                "http://localhost:8000/rag/search",
+                json={"query": query, "protect_enabled": protect_enabled}
+            )
+            if response.status_code == 200:
+                result = response.json()
+                st.success("Search Results:")
+                st.write(result["response"])
+            else:
+                st.error(f"Error: {response.text}")
+        except Exception as e:
+            st.error(f"Error: {str(e)}")

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff