Spaces:

ImagineCanada
/

hr-intervals-chatbot

Sleeping

App Files Files Community

hr-intervals-chatbot / README.md

pikamomo

add readme

1336c0c 2 months ago

preview code

raw

history blame contribute delete

28.6 kB

	---
	title: HR Intervals Chatbot
	emoji: 💼
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.49.0
	app_file: app.py
	pinned: false
	---

	# HR Intervals AI Assistant

	A RAG-powered chatbot that provides HR knowledge and policy guidance for non-profit organizations. Built with LangChain, OpenAI, Qdrant, Firecrawl, and Gradio — with optional LangSmith observability.

	---

	## Table of Contents

	- [Features](#features)
	- [Architecture Overview](#architecture-overview)
	- [Project Structure](#project-structure)
	- [Prerequisites](#prerequisites)
	- [Installation](#installation)
	- [Configuration](#configuration)
	- [OpenAI](#openai)
	- [Qdrant](#qdrant)
	- [Firecrawl](#firecrawl)
	- [LangSmith](#langsmith)
	- [Deployment Model](#deployment-model)
	- [Running the Application](#running-the-application)
	- [Admin Interface](#admin-interface)
	- [How It Works](#how-it-works)
	- [Embedding the Chatbot Widget](#embedding-the-chatbot-widget)
	- [Testing](#testing)
	- [Troubleshooting](#troubleshooting)

	---

	## Features

	- AI-powered Q&A — Answers HR questions using a retrieval-augmented generation (RAG) pipeline
	- Source citations — Every answer includes references to the documents it was derived from
	- Web scraping — Ingest web pages into the knowledge base via Firecrawl
	- Document ingestion — Upload PDF and DOCX files through the admin interface
	- PII detection — Warns users when personally identifiable information is detected in queries
	- Conversation memory — Maintains context across multi-turn conversations with session-based history
	- Multi-query retrieval — Expands user queries for better document matching
	- Admin dashboard — Manage documents, scrape URLs, and monitor the knowledge base
	- Observability (optional) — Trace and debug the full RAG pipeline with LangSmith

	---

	## Architecture Overview

	```
	┌────────────┐ ┌────────────────┐ ┌──────────────┐
	│ Gradio │──────▶│ RAG Chain │──────▶│ OpenAI │
	│ (User) │ │ (LangChain) │ │ GPT-4o │
	└────────────┘ └───────┬────────┘ └──────────────┘
	│
	┌───────▼────────┐
	│ Qdrant │
	│ Vector Store │
	└───────▲────────┘
	│
	┌──────────────┴──────────────┐
	│ │
	┌────────▼───────┐ ┌──────────▼─────────┐
	│ PDF / DOCX │ │ Firecrawl │
	│ Ingestion │ │ Web Scraping │
	└────────────────┘ └────────────────────-┘
	```

	Data flow:

	1. Documents (PDF, DOCX) and web pages are ingested, chunked, and embedded using OpenAI embeddings.
	2. Embeddings are stored in a Qdrant vector database.
	3. When a user asks a question, the query is expanded via MultiQueryRetriever, matched against stored embeddings, and the top-k results are used as context.
	4. OpenAI GPT-4o generates a grounded answer using the retrieved context.
	5. (Optional) The entire pipeline is traced and logged in LangSmith.

	---

	## Project Structure

	```
	hr-intervals-chatbot/
	├── app.py # User-facing chat interface (Gradio, port 7860)
	├── admin.py # Admin dashboard (Gradio, port 7861)
	├── chatbot-widget.html # Embeddable HTML widget for the chatbot
	├── requirements.txt # Python dependencies
	├── .env # Environment variables (not committed)
	├── src/
	│ ├── __init__.py
	│ ├── chatbot.py # RAG chain construction and query handling
	│ ├── ingestion.py # PDF/DOCX document processing
	│ ├── scraper.py # Firecrawl web scraping
	│ └── vector_store.py # Qdrant vector store utilities
	└── tests/
	└── test_connections.py # API connection verification tests
	```

	---

	## Prerequisites

	- Python 3.10+
	- An OpenAI account with API access
	- A Qdrant Cloud account (or self-hosted Qdrant instance)
	- A Firecrawl account for web scraping
	- (Optional) A LangSmith account for observability

	---

	## Installation

	1. Clone the repository:

	```bash
	https://huggingface.co/spaces/pikamomo/hr-intervals-chatbot/tree/main
	```

	2. Create and activate a virtual environment:

	```bash
	python -m venv venv

	# Windows
	venv\Scripts\activate

	# macOS / Linux
	source venv/bin/activate
	```

	3. Install dependencies:

	```bash
	pip install -r requirements.txt
	```

	4. Create your `.env` file (see [Configuration](#configuration) below):

	```bash
	OPENAI_API_KEY=""
	OPEN_AI_EMBEDDING_MODEL=""
	OPEN_AI_CHAT_MODEL=""

	QDRANT_URL=""
	QDRANT_API_KEY=""
	QDRANT_COLLECTION=""

	LANGSMITH_TRACING=""
	LANGSMITH_ENDPOINT=""
	LANGSMITH_API_KEY=""
	LANGSMITH_PROJECT=""

	FIRECRAWL_API_KEY=""
	```

	---

	## Configuration

	Create a `.env` file in the project root with the following variables. All services are configured exclusively through environment variables.

	### OpenAI

	OpenAI powers both the embedding model (for converting text into vectors) and the chat model (for generating answers).

	\| Variable \| Required \| Default \| Description \|
	\| ------------------------- \| -------- \| ------------------------ \| ----------------------------------------------------- \|
	\| `OPENAI_API_KEY` \| Yes \| — \| Your OpenAI API key \|
	\| `OPEN_AI_EMBEDDING_MODEL` \| No \| `text-embedding-3-small` \| Embedding model for vectorizing documents and queries \|
	\| `OPEN_AI_CHAT_MODEL` \| No \| `gpt-4o` \| Chat model for generating RAG answers \|

	How it's used:

	- Embeddings (`text-embedding-3-small`): Every document chunk and user query is embedded using this model before being stored in or searched against Qdrant. The `OpenAIEmbeddings` class from `langchain-openai` handles this.
	- Chat completions (`gpt-4o`): After relevant document chunks are retrieved, they are passed as context to the chat model along with a system prompt that instructs it to act as an HR assistant. The model generates answers at `temperature=0.3` for factual consistency.

	```dotenv
	OPENAI_API_KEY="sk-proj-your-key-here"
	OPEN_AI_EMBEDDING_MODEL=text-embedding-3-small
	OPEN_AI_CHAT_MODEL=gpt-4o
	```

	> Tip: You can switch to `text-embedding-3-large` for higher quality embeddings or `gpt-4o-mini` for lower-cost chat completions. If you change the embedding model, you must re-ingest all documents since vector dimensions will differ.

	---

	### Qdrant

	Qdrant is the vector database that stores document embeddings and enables similarity search.

	\| Variable \| Required \| Default \| Description \|
	\| ------------------- \| -------- \| -------------- \| -------------------------------------------------- \|
	\| `QDRANT_URL` \| Yes \| — \| URL of your Qdrant instance (cloud or self-hosted) \|
	\| `QDRANT_API_KEY` \| Yes \| — \| API key for authenticating with Qdrant \|
	\| `QDRANT_COLLECTION` \| No \| `hr-intervals` \| Name of the vector collection \|

	How it's used:

	- Storage: Document chunks are embedded and stored as points in a Qdrant collection. Each point contains the vector embedding plus metadata (`source`, `type`, `upload_date`, etc.).
	- Retrieval: When a user asks a question, the query is embedded and a cosine similarity search retrieves the top 8 most relevant chunks.
	- Management: The admin interface directly interacts with Qdrant to list, filter, and delete documents by source.

	Setting up Qdrant Cloud:

	1. Sign up at [cloud.qdrant.io](https://cloud.qdrant.io)
	2. Create a new cluster (the free tier works for development)
	3. Create an API key from the cluster dashboard
	4. Copy the cluster URL and API key into your `.env`

	```dotenv
	QDRANT_URL="https://your-cluster-id.region.cloud.qdrant.io:6333"
	QDRANT_API_KEY="your-qdrant-api-key"
	QDRANT_COLLECTION="hr-intervals"
	```

	> Note: The collection is created automatically when you first ingest documents. You do not need to create it manually.

	---

	### Firecrawl

	Firecrawl is a web scraping service that converts web pages into clean markdown — ideal for ingesting online HR resources, policies, and articles into the knowledge base.

	\| Variable \| Required \| Default \| Description \|
	\| ------------------- \| -------- \| ------- \| ---------------------- \|
	\| `FIRECRAWL_API_KEY` \| Yes\* \| — \| Your Firecrawl API key \|

	_\*Required only if you plan to use the web scraping feature._

	How it's used:

	- Single URL scraping: From the admin dashboard, enter a URL and Firecrawl fetches the page content as markdown.
	- Batch scraping: Paste multiple URLs (one per line) to scrape several pages at once.
	- Duplicate detection: Before scraping, the system checks if a URL has already been ingested to prevent duplicates.
	- Pipeline: Scraped markdown is split into chunks (1000 characters, 200 overlap), embedded via OpenAI, and stored in Qdrant with `type: "webpage"` metadata.

	Setting up Firecrawl:

	1. Sign up at [firecrawl.dev](https://www.firecrawl.dev)
	2. Get your API key from the dashboard
	3. Add it to your `.env`

	```dotenv
	FIRECRAWL_API_KEY="fc-your-firecrawl-api-key"
	```

	---

	### LangSmith

	LangSmith provides observability and tracing for the entire LangChain pipeline. When enabled, every chain invocation — from query expansion to retrieval to generation — is logged and can be inspected in the LangSmith dashboard.

	\| Variable \| Required \| Default \| Description \|
	\| -------------------- \| -------- \| --------------------------------- \| ----------------------------------- \|
	\| `LANGSMITH_TRACING` \| No \| `false` \| Set to `true` to enable tracing \|
	\| `LANGSMITH_ENDPOINT` \| No \| `https://api.smith.langchain.com` \| LangSmith API endpoint \|
	\| `LANGSMITH_API_KEY` \| No \| — \| Your LangSmith API key \|
	\| `LANGSMITH_PROJECT` \| No \| `hr-intervals-chatbot` \| Project name in LangSmith dashboard \|

	How it's used:

	- When `LANGSMITH_TRACING=true`, LangChain automatically sends trace data to LangSmith for every chain execution. No code changes are needed — LangChain detects these environment variables at runtime.
	- Traces include: input queries, retrieved documents, prompt templates, LLM responses, latency, token usage, and errors.
	- Use the LangSmith dashboard to debug retrieval quality, monitor token costs, and identify slow chain steps.

	Setting up LangSmith:

	1. Sign up at [smith.langchain.com](https://smith.langchain.com)
	2. Create a new project (e.g., `hr-intervals-chatbot`)
	3. Generate an API key
	4. Add the variables to your `.env`

	```dotenv
	LANGSMITH_TRACING=true
	LANGSMITH_ENDPOINT=https://api.smith.langchain.com
	LANGSMITH_API_KEY="lsv2_pt_your-langsmith-api-key"
	LANGSMITH_PROJECT=hr-intervals-chatbot
	```

	> Tip: Keep `LANGSMITH_TRACING=false` in production to avoid overhead, or enable it selectively for debugging.

	---

	### Complete `.env` Example

	```dotenv
	# ── OpenAI ───────────────────────────────────────────────
	OPENAI_API_KEY="sk-proj-your-openai-api-key"
	OPEN_AI_EMBEDDING_MODEL=text-embedding-3-small
	OPEN_AI_CHAT_MODEL=gpt-4o

	# ── Qdrant ───────────────────────────────────────────────
	QDRANT_URL="https://your-cluster-id.region.cloud.qdrant.io:6333"
	QDRANT_API_KEY="your-qdrant-api-key"
	QDRANT_COLLECTION="hr-intervals"

	# ── LangSmith (optional) ────────────────────────────────
	LANGSMITH_TRACING=false
	LANGSMITH_ENDPOINT=https://api.smith.langchain.com
	LANGSMITH_API_KEY="lsv2_pt_your-langsmith-api-key"
	LANGSMITH_PROJECT=hr-intervals-chatbot

	# ── Firecrawl ───────────────────────────────────────────
	FIRECRAWL_API_KEY="fc-your-firecrawl-api-key"
	```

	---

	## Deployment Model

	This project uses a split deployment architecture: the user-facing chatbot runs in the cloud, while the admin dashboard runs on your local machine.

	```
	┌─────────────────────────────────┐ ┌─────────────────────────────┐
	│ Hugging Face Spaces (Cloud) │ │ Your Local Machine │
	│ │ │ │
	│ app.py (port 7860) │ │ admin.py (port 7861) │
	│ - User chat interface │ │ - Upload documents │
	│ - RAG Q&A │ │ - Scrape web pages │
	│ - Public access │ │ - Delete documents │
	│ │ │ - View knowledge base │
	└────────────┬────────────────────┘ └──────────┬──────────────────┘
	│ │
	└──────────┬───────────────────────────┘
	│ Both connect to the same
	▼ cloud services
	┌─────────────────┐ ┌──────────────┐
	│ Qdrant Cloud │ │ OpenAI API │
	└─────────────────┘ └──────────────┘
	```

	\| Component \| Where it runs \| Purpose \| Access \|
	\| ---------- \| ----------------------- \| ------------------------- \| ---------------------------- \|
	\| `app.py` \| Hugging Face Spaces \| User-facing chatbot \| Public (anyone with the URL) \|
	\| `admin.py` \| Your local machine \| Knowledge base management \| Private (admin only) \|

	Why this split?

	- The chatbot (`app.py`) is deployed to Hugging Face Spaces so end users can access it 24/7 via a public URL without needing any local setup.
	- The admin dashboard (`admin.py`) runs locally because it performs sensitive operations (uploading documents, deleting data, scraping URLs). Keeping it local ensures only authorized administrators can modify the knowledge base.
	- Both components share the same `.env` configuration and connect to the same Qdrant and OpenAI instances, so changes made via the admin dashboard are immediately reflected in the chatbot.

	---

	## Running the Application

	### User Chat Interface (Cloud)

	The chatbot is deployed on Hugging Face Spaces and accessible at:

	```
	https://pikamomo-hr-intervals-chatbot.hf.space
	```

	To deploy your own instance, see [Deploying to Hugging Face Spaces](#deploying-to-hugging-face-spaces) below.

	For local development and testing, you can also run it locally:

	```bash
	python app.py
	```

	This opens the chatbot at http://localhost:7860.

	### Admin Dashboard (Local)

	The admin panel runs on your local machine. Start it with:

	```bash
	python admin.py
	```

	This opens the admin dashboard at http://localhost:7861. See [Admin Interface](#admin-interface) for details on each tab.

	> Important: The admin dashboard is intentionally not deployed to the cloud. Always run it locally to maintain control over who can modify the knowledge base.

	---

	## Admin Interface

	The admin dashboard (`admin.py`) provides tools to manage the knowledge base:

	\| Tab \| Description \|
	\| -------------------- \| ----------------------------------------------------------------------------------- \|
	\| View Documents \| Lists all ingested documents with metadata (source, type, upload date, chunk count) \|
	\| Upload Documents \| Upload PDF or DOCX files. Choose a document type (policy, guide, article, etc.) \|
	\| Scrape Web Pages \| Scrape a single URL or batch-scrape multiple URLs via Firecrawl \|
	\| Delete Documents \| Remove documents from the vector store by source name \|
	\| Help \| Usage instructions and tips \|

	### Ingesting Documents

	Via file upload (PDF / DOCX):

	1. Open the admin dashboard
	2. Go to the Upload Documents tab
	3. Select your file and choose a document type
	4. Click Upload — the file is parsed, chunked, embedded, and stored in Qdrant

	Via web scraping:

	1. Go to the Scrape Web Pages tab
	2. Enter a URL (or multiple URLs, one per line)
	3. Click Scrape — Firecrawl fetches the page as markdown, which is then chunked and stored

	### Document Metadata

	Each document chunk stored in Qdrant carries the following metadata:

	\| Field \| Type \| Description \|
	\| ------------- \| ------ \| ------------------------------------------------------ \|
	\| `source` \| string \| Filename or URL \|
	\| `type` \| string \| `document`, `webpage`, `policy`, `guide`, or `article` \|
	\| `upload_date` \| string \| Ingestion date (`YYYY-MM-DD`) \|
	\| `page` \| int \| Page number (PDFs only) \|
	\| `valid_until` \| string \| Expiry date for time-sensitive policies (optional) \|
	\| `version` \| string \| Document version (optional) \|

	---

	## How It Works

	### RAG Pipeline

	1. Query expansion — The user's question is passed through a `MultiQueryRetriever` that generates multiple rephrasings to improve recall.
	2. Embedding — Each query variant is embedded using `text-embedding-3-small`.
	3. Retrieval — Cosine similarity search against Qdrant returns the top 8 most relevant document chunks.
	4. Generation — Retrieved chunks are injected into a prompt template alongside the conversation history, and GPT-4o generates a grounded answer.
	5. Citation — The top 3 source documents are appended to the response.

	### Chunking Strategy

	Documents are split using `RecursiveCharacterTextSplitter`:

	- Chunk size: 1,000 characters
	- Overlap: 200 characters
	- Separators: `["\n\n", "\n", ". ", " ", ""]`

	### Session Management

	- Each user gets a unique session ID (UUID)
	- Conversation history is stored in memory per session
	- Sessions expire after 1 hour of inactivity
	- History enables follow-up questions and contextual conversations

	### PII Detection

	A regex-based check warns users when their query appears to contain names (e.g., `John Smith`). This is a first-line safeguard; integration with Microsoft Presidio is planned for more robust PII detection.

	---

	## Embedding the Chatbot Widget

	The file `chatbot-widget.html` provides a ready-to-use floating chat widget that you can embed on any website. It renders a circular button in the bottom-right corner that opens the chatbot in a popup window — no page navigation required.

	### Quick Start

	The simplest way to add the chatbot to an existing web page is to copy three pieces from `chatbot-widget.html` into your site:

	1. CSS (add to your `<head>` or stylesheet)
	2. HTML (add before `</body>`)
	3. JavaScript (add before `</body>`)

	### Step 1 — Add the CSS

	Copy the widget styles into your page's `<head>` (or into your existing CSS file). These styles are marked between `CHATBOT WIDGET STYLES - COPY FROM HERE` and `END OF CHATBOT WIDGET STYLES` in the source file.

	### Step 2 — Add the HTML

	Add the following HTML just before your closing `</body>` tag. Update the `src` URL to point to your own Hugging Face Space:

	```html
	<!-- Chat Toggle Button -->
	<button class="chat-widget-button" id="chatWidgetButton" onclick="toggleChat()">
	<span class="chat-widget-badge">1</span>
	<svg viewBox="0 0 24 24" id="chatIcon">
	<path
	d="M20 2H4c-1.1 0-2 .9-2 2v18l4-4h14c1.1 0 2-.9 2-2V4c0-1.1-.9-2-2-2zm0 14H6l-2 2V4h16v12z"
	/>
	</svg>
	<svg viewBox="0 0 24 24" id="closeIcon" style="display: none;">
	<path
	d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"
	/>
	</svg>
	</button>

	<!-- Chat Window -->
	<div class="chat-widget-window" id="chatWidgetWindow">
	<div class="chat-widget-header">
	<div class="chat-widget-header-info">
	<div class="chat-widget-avatar">🤖</div>
	<div class="chat-widget-title">
	<h3>HR Assistant</h3>
	<span><span class="status-dot"></span> Online</span>
	</div>
	</div>
	<button class="chat-widget-close" onclick="toggleChat()">
	<svg width="16" height="16" viewBox="0 0 24 24" fill="currentColor">
	<path
	d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"
	/>
	</svg>
	</button>
	</div>
	<div class="chat-widget-body">
	<!-- 👇 Change this URL to your Hugging Face Space URL -->
	<iframe
	src="https://pikamomo-hr-intervals-chatbot.hf.space"
	title="HR Chatbot"
	loading="lazy"
	>
	</iframe>
	</div>
	</div>
	```

	### Step 3 — Add the JavaScript

	Add this script after the HTML above:

	```html
	<script>
	let isOpen = false;
	const button = document.getElementById("chatWidgetButton");
	const window_el = document.getElementById("chatWidgetWindow");
	const chatIcon = document.getElementById("chatIcon");
	const closeIcon = document.getElementById("closeIcon");
	const badge = document.querySelector(".chat-widget-badge");

	function toggleChat() {
	isOpen = !isOpen;
	if (isOpen) {
	window_el.classList.add("open");
	button.classList.add("active");
	chatIcon.style.display = "none";
	closeIcon.style.display = "block";
	badge.style.display = "none";
	} else {
	window_el.classList.remove("open");
	button.classList.remove("active");
	chatIcon.style.display = "block";
	closeIcon.style.display = "none";
	}
	}

	// Close on Escape key
	document.addEventListener("keydown", function (e) {
	if (e.key === "Escape" && isOpen) {
	toggleChat();
	}
	});
	</script>
	```

	### Customization

	\| What to change \| Where \| Details \|
	\| --------------- \| -------------------------------------- \| ---------------------------------------------------------------------- \|
	\| Chatbot URL \| `<iframe src="...">` \| Replace with your Hugging Face Space URL \|
	\| Widget colors \| CSS `background: linear-gradient(...)` \| Change the gradient on `.chat-widget-button` and `.chat-widget-header` \|
	\| Widget size \| CSS `.chat-widget-window` \| Adjust `width` (400px) and `height` (600px) \|
	\| Button position \| CSS `.chat-widget-button` \| Change `bottom` and `right` values \|
	\| Header title \| HTML `<h3>HR Assistant</h3>` \| Replace with your preferred name \|

	### Full Working Example

	Open `chatbot-widget.html` directly in a browser to see the widget in action on a demo page. The file is self-contained and requires no build step — just open it and click the chat button in the bottom-right corner.

	---

	## Testing

	Run the connection tests to verify all services are properly configured:

	```bash
	python tests/test_connections.py
	```

	This tests connectivity to:

	- OpenAI (API key validity, embedding generation)
	- Qdrant (cluster reachability, collection access)
	- Firecrawl (API key validity)
	- LangChain (chain construction)
	- Gradio (import verification)

	---

	## Troubleshooting

	\| Problem \| Solution \|
	\| --------------------------------- \| ----------------------------------------------------------------------------------------- \|
	\| `AuthenticationError` from OpenAI \| Verify `OPENAI_API_KEY` is correct and has sufficient credits \|
	\| `Connection refused` from Qdrant \| Check `QDRANT_URL` — ensure the port (`:6333`) is included \|
	\| Firecrawl returns empty content \| Some pages block scrapers; try a different URL or check your API quota \|
	\| LangSmith traces not appearing \| Ensure `LANGSMITH_TRACING=true` and the API key is valid \|
	\| Embedding dimension mismatch \| You changed the embedding model — re-ingest all documents or delete the Qdrant collection \|
	\| Session history not working \| Sessions are in-memory; they reset when the server restarts \|
	\| `ModuleNotFoundError` \| Run `pip install -r requirements.txt` to install all dependencies \|

	---

	## Deploying to Hugging Face Spaces

	The user-facing chatbot (`app.py`) is designed to run on Hugging Face Spaces. The admin dashboard (`admin.py`) stays on your local machine.

	### Setting Up the Space

	1. Create a new Space at [huggingface.co/spaces](https://huggingface.co/spaces)
	2. Select Gradio as the SDK
	3. Push your code to the Space repository (or link it to your Git repo)
	4. Add all required environment variables as Secrets in the Space settings:

	\| Secret \| Value \|
	\| ------------------- \| ----------------------------------------------------- \|
	\| `OPENAI_API_KEY` \| Your OpenAI API key \|
	\| `QDRANT_URL` \| Your Qdrant cluster URL \|
	\| `QDRANT_API_KEY` \| Your Qdrant API key \|
	\| `QDRANT_COLLECTION` \| `hr-intervals` (or your collection name) \|
	\| `LANGSMITH_TRACING` \| `false` (or `true` if you want tracing in production) \|
	\| `LANGSMITH_API_KEY` \| Your LangSmith API key (if tracing is enabled) \|
	\| `LANGSMITH_PROJECT` \| `hr-intervals-chatbot` \|

	5. The Space will automatically install dependencies from `requirements.txt` and start `app.py`
	6. Once deployed, your chatbot will be available at: `https://<your-username>-<space-name>.hf.space`

	### Managing the Knowledge Base

	After the Space is live, manage the knowledge base from your local machine:

	```bash
	# 1. Make sure your local .env has the SAME Qdrant credentials as the Space
	# 2. Start the admin dashboard
	python admin.py
	```

	Any documents you upload, scrape, or delete through the local admin dashboard will immediately be available to the cloud-hosted chatbot — because both connect to the same Qdrant instance.

	### Embedding on Your Website

	Once your Space is running, you can embed the chatbot on any website using the widget (see [Embedding the Chatbot Widget](#embedding-the-chatbot-widget)). Simply set the iframe `src` to your Space URL:

	```html
	<iframe src="https://pikamomo-hr-intervals-chatbot.hf.space" ...></iframe>
	```

	---

	## License

	This project is developed for non-profit HR use. See your organization's licensing terms for details.