pikamomo's picture
add readme
1336c0c
---
title: HR Intervals Chatbot
emoji: πŸ’Ό
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.0
app_file: app.py
pinned: false
---
# HR Intervals AI Assistant
A **RAG-powered chatbot** that provides HR knowledge and policy guidance for non-profit organizations. Built with LangChain, OpenAI, Qdrant, Firecrawl, and Gradio β€” with optional LangSmith observability.
---
## Table of Contents
- [Features](#features)
- [Architecture Overview](#architecture-overview)
- [Project Structure](#project-structure)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Configuration](#configuration)
- [OpenAI](#openai)
- [Qdrant](#qdrant)
- [Firecrawl](#firecrawl)
- [LangSmith](#langsmith)
- [Deployment Model](#deployment-model)
- [Running the Application](#running-the-application)
- [Admin Interface](#admin-interface)
- [How It Works](#how-it-works)
- [Embedding the Chatbot Widget](#embedding-the-chatbot-widget)
- [Testing](#testing)
- [Troubleshooting](#troubleshooting)
---
## Features
- **AI-powered Q&A** β€” Answers HR questions using a retrieval-augmented generation (RAG) pipeline
- **Source citations** β€” Every answer includes references to the documents it was derived from
- **Web scraping** β€” Ingest web pages into the knowledge base via Firecrawl
- **Document ingestion** β€” Upload PDF and DOCX files through the admin interface
- **PII detection** β€” Warns users when personally identifiable information is detected in queries
- **Conversation memory** β€” Maintains context across multi-turn conversations with session-based history
- **Multi-query retrieval** β€” Expands user queries for better document matching
- **Admin dashboard** β€” Manage documents, scrape URLs, and monitor the knowledge base
- **Observability (optional)** β€” Trace and debug the full RAG pipeline with LangSmith
---
## Architecture Overview
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gradio │──────▢│ RAG Chain │──────▢│ OpenAI β”‚
β”‚ (User) β”‚ β”‚ (LangChain) β”‚ β”‚ GPT-4o β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Qdrant β”‚
β”‚ Vector Store β”‚
β””β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PDF / DOCX β”‚ β”‚ Firecrawl β”‚
β”‚ Ingestion β”‚ β”‚ Web Scraping β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ └────────────────────-β”˜
```
**Data flow:**
1. Documents (PDF, DOCX) and web pages are ingested, chunked, and embedded using OpenAI embeddings.
2. Embeddings are stored in a Qdrant vector database.
3. When a user asks a question, the query is expanded via MultiQueryRetriever, matched against stored embeddings, and the top-k results are used as context.
4. OpenAI GPT-4o generates a grounded answer using the retrieved context.
5. (Optional) The entire pipeline is traced and logged in LangSmith.
---
## Project Structure
```
hr-intervals-chatbot/
β”œβ”€β”€ app.py # User-facing chat interface (Gradio, port 7860)
β”œβ”€β”€ admin.py # Admin dashboard (Gradio, port 7861)
β”œβ”€β”€ chatbot-widget.html # Embeddable HTML widget for the chatbot
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # Environment variables (not committed)
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ chatbot.py # RAG chain construction and query handling
β”‚ β”œβ”€β”€ ingestion.py # PDF/DOCX document processing
β”‚ β”œβ”€β”€ scraper.py # Firecrawl web scraping
β”‚ └── vector_store.py # Qdrant vector store utilities
└── tests/
└── test_connections.py # API connection verification tests
```
---
## Prerequisites
- **Python 3.10+**
- An **OpenAI** account with API access
- A **Qdrant Cloud** account (or self-hosted Qdrant instance)
- A **Firecrawl** account for web scraping
- (Optional) A **LangSmith** account for observability
---
## Installation
1. **Clone the repository:**
```bash
https://huggingface.co/spaces/pikamomo/hr-intervals-chatbot/tree/main
```
2. **Create and activate a virtual environment:**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
source venv/bin/activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
4. **Create your `.env` file** (see [Configuration](#configuration) below):
```bash
OPENAI_API_KEY=""
OPEN_AI_EMBEDDING_MODEL=""
OPEN_AI_CHAT_MODEL=""
QDRANT_URL=""
QDRANT_API_KEY=""
QDRANT_COLLECTION=""
LANGSMITH_TRACING=""
LANGSMITH_ENDPOINT=""
LANGSMITH_API_KEY=""
LANGSMITH_PROJECT=""
FIRECRAWL_API_KEY=""
```
---
## Configuration
Create a `.env` file in the project root with the following variables. All services are configured exclusively through environment variables.
### OpenAI
OpenAI powers both the **embedding model** (for converting text into vectors) and the **chat model** (for generating answers).
| Variable | Required | Default | Description |
| ------------------------- | -------- | ------------------------ | ----------------------------------------------------- |
| `OPENAI_API_KEY` | Yes | β€” | Your OpenAI API key |
| `OPEN_AI_EMBEDDING_MODEL` | No | `text-embedding-3-small` | Embedding model for vectorizing documents and queries |
| `OPEN_AI_CHAT_MODEL` | No | `gpt-4o` | Chat model for generating RAG answers |
**How it's used:**
- **Embeddings** (`text-embedding-3-small`): Every document chunk and user query is embedded using this model before being stored in or searched against Qdrant. The `OpenAIEmbeddings` class from `langchain-openai` handles this.
- **Chat completions** (`gpt-4o`): After relevant document chunks are retrieved, they are passed as context to the chat model along with a system prompt that instructs it to act as an HR assistant. The model generates answers at `temperature=0.3` for factual consistency.
```dotenv
OPENAI_API_KEY="sk-proj-your-key-here"
OPEN_AI_EMBEDDING_MODEL=text-embedding-3-small
OPEN_AI_CHAT_MODEL=gpt-4o
```
> **Tip:** You can switch to `text-embedding-3-large` for higher quality embeddings or `gpt-4o-mini` for lower-cost chat completions. If you change the embedding model, you must re-ingest all documents since vector dimensions will differ.
---
### Qdrant
Qdrant is the **vector database** that stores document embeddings and enables similarity search.
| Variable | Required | Default | Description |
| ------------------- | -------- | -------------- | -------------------------------------------------- |
| `QDRANT_URL` | Yes | β€” | URL of your Qdrant instance (cloud or self-hosted) |
| `QDRANT_API_KEY` | Yes | β€” | API key for authenticating with Qdrant |
| `QDRANT_COLLECTION` | No | `hr-intervals` | Name of the vector collection |
**How it's used:**
- **Storage**: Document chunks are embedded and stored as points in a Qdrant collection. Each point contains the vector embedding plus metadata (`source`, `type`, `upload_date`, etc.).
- **Retrieval**: When a user asks a question, the query is embedded and a cosine similarity search retrieves the top 8 most relevant chunks.
- **Management**: The admin interface directly interacts with Qdrant to list, filter, and delete documents by source.
**Setting up Qdrant Cloud:**
1. Sign up at [cloud.qdrant.io](https://cloud.qdrant.io)
2. Create a new cluster (the free tier works for development)
3. Create an API key from the cluster dashboard
4. Copy the cluster URL and API key into your `.env`
```dotenv
QDRANT_URL="https://your-cluster-id.region.cloud.qdrant.io:6333"
QDRANT_API_KEY="your-qdrant-api-key"
QDRANT_COLLECTION="hr-intervals"
```
> **Note:** The collection is created automatically when you first ingest documents. You do not need to create it manually.
---
### Firecrawl
Firecrawl is a **web scraping service** that converts web pages into clean markdown β€” ideal for ingesting online HR resources, policies, and articles into the knowledge base.
| Variable | Required | Default | Description |
| ------------------- | -------- | ------- | ---------------------- |
| `FIRECRAWL_API_KEY` | Yes\* | β€” | Your Firecrawl API key |
_\*Required only if you plan to use the web scraping feature._
**How it's used:**
- **Single URL scraping**: From the admin dashboard, enter a URL and Firecrawl fetches the page content as markdown.
- **Batch scraping**: Paste multiple URLs (one per line) to scrape several pages at once.
- **Duplicate detection**: Before scraping, the system checks if a URL has already been ingested to prevent duplicates.
- **Pipeline**: Scraped markdown is split into chunks (1000 characters, 200 overlap), embedded via OpenAI, and stored in Qdrant with `type: "webpage"` metadata.
**Setting up Firecrawl:**
1. Sign up at [firecrawl.dev](https://www.firecrawl.dev)
2. Get your API key from the dashboard
3. Add it to your `.env`
```dotenv
FIRECRAWL_API_KEY="fc-your-firecrawl-api-key"
```
---
### LangSmith
LangSmith provides **observability and tracing** for the entire LangChain pipeline. When enabled, every chain invocation β€” from query expansion to retrieval to generation β€” is logged and can be inspected in the LangSmith dashboard.
| Variable | Required | Default | Description |
| -------------------- | -------- | --------------------------------- | ----------------------------------- |
| `LANGSMITH_TRACING` | No | `false` | Set to `true` to enable tracing |
| `LANGSMITH_ENDPOINT` | No | `https://api.smith.langchain.com` | LangSmith API endpoint |
| `LANGSMITH_API_KEY` | No | β€” | Your LangSmith API key |
| `LANGSMITH_PROJECT` | No | `hr-intervals-chatbot` | Project name in LangSmith dashboard |
**How it's used:**
- When `LANGSMITH_TRACING=true`, LangChain **automatically** sends trace data to LangSmith for every chain execution. No code changes are needed β€” LangChain detects these environment variables at runtime.
- Traces include: input queries, retrieved documents, prompt templates, LLM responses, latency, token usage, and errors.
- Use the LangSmith dashboard to debug retrieval quality, monitor token costs, and identify slow chain steps.
**Setting up LangSmith:**
1. Sign up at [smith.langchain.com](https://smith.langchain.com)
2. Create a new project (e.g., `hr-intervals-chatbot`)
3. Generate an API key
4. Add the variables to your `.env`
```dotenv
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY="lsv2_pt_your-langsmith-api-key"
LANGSMITH_PROJECT=hr-intervals-chatbot
```
> **Tip:** Keep `LANGSMITH_TRACING=false` in production to avoid overhead, or enable it selectively for debugging.
---
### Complete `.env` Example
```dotenv
# ── OpenAI ───────────────────────────────────────────────
OPENAI_API_KEY="sk-proj-your-openai-api-key"
OPEN_AI_EMBEDDING_MODEL=text-embedding-3-small
OPEN_AI_CHAT_MODEL=gpt-4o
# ── Qdrant ───────────────────────────────────────────────
QDRANT_URL="https://your-cluster-id.region.cloud.qdrant.io:6333"
QDRANT_API_KEY="your-qdrant-api-key"
QDRANT_COLLECTION="hr-intervals"
# ── LangSmith (optional) ────────────────────────────────
LANGSMITH_TRACING=false
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY="lsv2_pt_your-langsmith-api-key"
LANGSMITH_PROJECT=hr-intervals-chatbot
# ── Firecrawl ───────────────────────────────────────────
FIRECRAWL_API_KEY="fc-your-firecrawl-api-key"
```
---
## Deployment Model
This project uses a **split deployment** architecture: the user-facing chatbot runs in the cloud, while the admin dashboard runs on your local machine.
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Hugging Face Spaces (Cloud) β”‚ β”‚ Your Local Machine β”‚
β”‚ β”‚ β”‚ β”‚
β”‚ app.py (port 7860) β”‚ β”‚ admin.py (port 7861) β”‚
β”‚ - User chat interface β”‚ β”‚ - Upload documents β”‚
β”‚ - RAG Q&A β”‚ β”‚ - Scrape web pages β”‚
β”‚ - Public access β”‚ β”‚ - Delete documents β”‚
β”‚ β”‚ β”‚ - View knowledge base β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ Both connect to the same
β–Ό cloud services
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Qdrant Cloud β”‚ β”‚ OpenAI API β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
| Component | Where it runs | Purpose | Access |
| ---------- | ----------------------- | ------------------------- | ---------------------------- |
| `app.py` | **Hugging Face Spaces** | User-facing chatbot | Public (anyone with the URL) |
| `admin.py` | **Your local machine** | Knowledge base management | Private (admin only) |
**Why this split?**
- The **chatbot** (`app.py`) is deployed to Hugging Face Spaces so end users can access it 24/7 via a public URL without needing any local setup.
- The **admin dashboard** (`admin.py`) runs locally because it performs sensitive operations (uploading documents, deleting data, scraping URLs). Keeping it local ensures only authorized administrators can modify the knowledge base.
- Both components share the same `.env` configuration and connect to the same Qdrant and OpenAI instances, so changes made via the admin dashboard are immediately reflected in the chatbot.
---
## Running the Application
### User Chat Interface (Cloud)
The chatbot is deployed on Hugging Face Spaces and accessible at:
```
https://pikamomo-hr-intervals-chatbot.hf.space
```
To deploy your own instance, see [Deploying to Hugging Face Spaces](#deploying-to-hugging-face-spaces) below.
For local development and testing, you can also run it locally:
```bash
python app.py
```
This opens the chatbot at **http://localhost:7860**.
### Admin Dashboard (Local)
The admin panel runs on your local machine. Start it with:
```bash
python admin.py
```
This opens the admin dashboard at **http://localhost:7861**. See [Admin Interface](#admin-interface) for details on each tab.
> **Important:** The admin dashboard is intentionally **not deployed** to the cloud. Always run it locally to maintain control over who can modify the knowledge base.
---
## Admin Interface
The admin dashboard (`admin.py`) provides tools to manage the knowledge base:
| Tab | Description |
| -------------------- | ----------------------------------------------------------------------------------- |
| **View Documents** | Lists all ingested documents with metadata (source, type, upload date, chunk count) |
| **Upload Documents** | Upload PDF or DOCX files. Choose a document type (policy, guide, article, etc.) |
| **Scrape Web Pages** | Scrape a single URL or batch-scrape multiple URLs via Firecrawl |
| **Delete Documents** | Remove documents from the vector store by source name |
| **Help** | Usage instructions and tips |
### Ingesting Documents
**Via file upload (PDF / DOCX):**
1. Open the admin dashboard
2. Go to the **Upload Documents** tab
3. Select your file and choose a document type
4. Click Upload β€” the file is parsed, chunked, embedded, and stored in Qdrant
**Via web scraping:**
1. Go to the **Scrape Web Pages** tab
2. Enter a URL (or multiple URLs, one per line)
3. Click Scrape β€” Firecrawl fetches the page as markdown, which is then chunked and stored
### Document Metadata
Each document chunk stored in Qdrant carries the following metadata:
| Field | Type | Description |
| ------------- | ------ | ------------------------------------------------------ |
| `source` | string | Filename or URL |
| `type` | string | `document`, `webpage`, `policy`, `guide`, or `article` |
| `upload_date` | string | Ingestion date (`YYYY-MM-DD`) |
| `page` | int | Page number (PDFs only) |
| `valid_until` | string | Expiry date for time-sensitive policies (optional) |
| `version` | string | Document version (optional) |
---
## How It Works
### RAG Pipeline
1. **Query expansion** β€” The user's question is passed through a `MultiQueryRetriever` that generates multiple rephrasings to improve recall.
2. **Embedding** β€” Each query variant is embedded using `text-embedding-3-small`.
3. **Retrieval** β€” Cosine similarity search against Qdrant returns the top 8 most relevant document chunks.
4. **Generation** β€” Retrieved chunks are injected into a prompt template alongside the conversation history, and GPT-4o generates a grounded answer.
5. **Citation** β€” The top 3 source documents are appended to the response.
### Chunking Strategy
Documents are split using `RecursiveCharacterTextSplitter`:
- **Chunk size:** 1,000 characters
- **Overlap:** 200 characters
- **Separators:** `["\n\n", "\n", ". ", " ", ""]`
### Session Management
- Each user gets a unique session ID (UUID)
- Conversation history is stored in memory per session
- Sessions expire after **1 hour** of inactivity
- History enables follow-up questions and contextual conversations
### PII Detection
A regex-based check warns users when their query appears to contain names (e.g., `John Smith`). This is a first-line safeguard; integration with Microsoft Presidio is planned for more robust PII detection.
---
## Embedding the Chatbot Widget
The file `chatbot-widget.html` provides a ready-to-use **floating chat widget** that you can embed on any website. It renders a circular button in the bottom-right corner that opens the chatbot in a popup window β€” no page navigation required.
### Quick Start
The simplest way to add the chatbot to an existing web page is to copy three pieces from `chatbot-widget.html` into your site:
1. **CSS** (add to your `<head>` or stylesheet)
2. **HTML** (add before `</body>`)
3. **JavaScript** (add before `</body>`)
### Step 1 β€” Add the CSS
Copy the widget styles into your page's `<head>` (or into your existing CSS file). These styles are marked between `CHATBOT WIDGET STYLES - COPY FROM HERE` and `END OF CHATBOT WIDGET STYLES` in the source file.
### Step 2 β€” Add the HTML
Add the following HTML just before your closing `</body>` tag. Update the `src` URL to point to your own Hugging Face Space:
```html
<!-- Chat Toggle Button -->
<button class="chat-widget-button" id="chatWidgetButton" onclick="toggleChat()">
<span class="chat-widget-badge">1</span>
<svg viewBox="0 0 24 24" id="chatIcon">
<path
d="M20 2H4c-1.1 0-2 .9-2 2v18l4-4h14c1.1 0 2-.9 2-2V4c0-1.1-.9-2-2-2zm0 14H6l-2 2V4h16v12z"
/>
</svg>
<svg viewBox="0 0 24 24" id="closeIcon" style="display: none;">
<path
d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"
/>
</svg>
</button>
<!-- Chat Window -->
<div class="chat-widget-window" id="chatWidgetWindow">
<div class="chat-widget-header">
<div class="chat-widget-header-info">
<div class="chat-widget-avatar">πŸ€–</div>
<div class="chat-widget-title">
<h3>HR Assistant</h3>
<span><span class="status-dot"></span> Online</span>
</div>
</div>
<button class="chat-widget-close" onclick="toggleChat()">
<svg width="16" height="16" viewBox="0 0 24 24" fill="currentColor">
<path
d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"
/>
</svg>
</button>
</div>
<div class="chat-widget-body">
<!-- πŸ‘‡ Change this URL to your Hugging Face Space URL -->
<iframe
src="https://pikamomo-hr-intervals-chatbot.hf.space"
title="HR Chatbot"
loading="lazy"
>
</iframe>
</div>
</div>
```
### Step 3 β€” Add the JavaScript
Add this script after the HTML above:
```html
<script>
let isOpen = false;
const button = document.getElementById("chatWidgetButton");
const window_el = document.getElementById("chatWidgetWindow");
const chatIcon = document.getElementById("chatIcon");
const closeIcon = document.getElementById("closeIcon");
const badge = document.querySelector(".chat-widget-badge");
function toggleChat() {
isOpen = !isOpen;
if (isOpen) {
window_el.classList.add("open");
button.classList.add("active");
chatIcon.style.display = "none";
closeIcon.style.display = "block";
badge.style.display = "none";
} else {
window_el.classList.remove("open");
button.classList.remove("active");
chatIcon.style.display = "block";
closeIcon.style.display = "none";
}
}
// Close on Escape key
document.addEventListener("keydown", function (e) {
if (e.key === "Escape" && isOpen) {
toggleChat();
}
});
</script>
```
### Customization
| What to change | Where | Details |
| --------------- | -------------------------------------- | ---------------------------------------------------------------------- |
| Chatbot URL | `<iframe src="...">` | Replace with your Hugging Face Space URL |
| Widget colors | CSS `background: linear-gradient(...)` | Change the gradient on `.chat-widget-button` and `.chat-widget-header` |
| Widget size | CSS `.chat-widget-window` | Adjust `width` (400px) and `height` (600px) |
| Button position | CSS `.chat-widget-button` | Change `bottom` and `right` values |
| Header title | HTML `<h3>HR Assistant</h3>` | Replace with your preferred name |
### Full Working Example
Open `chatbot-widget.html` directly in a browser to see the widget in action on a demo page. The file is self-contained and requires no build step β€” just open it and click the chat button in the bottom-right corner.
---
## Testing
Run the connection tests to verify all services are properly configured:
```bash
python tests/test_connections.py
```
This tests connectivity to:
- OpenAI (API key validity, embedding generation)
- Qdrant (cluster reachability, collection access)
- Firecrawl (API key validity)
- LangChain (chain construction)
- Gradio (import verification)
---
## Troubleshooting
| Problem | Solution |
| --------------------------------- | ----------------------------------------------------------------------------------------- |
| `AuthenticationError` from OpenAI | Verify `OPENAI_API_KEY` is correct and has sufficient credits |
| `Connection refused` from Qdrant | Check `QDRANT_URL` β€” ensure the port (`:6333`) is included |
| Firecrawl returns empty content | Some pages block scrapers; try a different URL or check your API quota |
| LangSmith traces not appearing | Ensure `LANGSMITH_TRACING=true` and the API key is valid |
| Embedding dimension mismatch | You changed the embedding model β€” re-ingest all documents or delete the Qdrant collection |
| Session history not working | Sessions are in-memory; they reset when the server restarts |
| `ModuleNotFoundError` | Run `pip install -r requirements.txt` to install all dependencies |
---
## Deploying to Hugging Face Spaces
The user-facing chatbot (`app.py`) is designed to run on **Hugging Face Spaces**. The admin dashboard (`admin.py`) stays on your local machine.
### Setting Up the Space
1. Create a new Space at [huggingface.co/spaces](https://huggingface.co/spaces)
2. Select **Gradio** as the SDK
3. Push your code to the Space repository (or link it to your Git repo)
4. Add all required environment variables as **Secrets** in the Space settings:
| Secret | Value |
| ------------------- | ----------------------------------------------------- |
| `OPENAI_API_KEY` | Your OpenAI API key |
| `QDRANT_URL` | Your Qdrant cluster URL |
| `QDRANT_API_KEY` | Your Qdrant API key |
| `QDRANT_COLLECTION` | `hr-intervals` (or your collection name) |
| `LANGSMITH_TRACING` | `false` (or `true` if you want tracing in production) |
| `LANGSMITH_API_KEY` | Your LangSmith API key (if tracing is enabled) |
| `LANGSMITH_PROJECT` | `hr-intervals-chatbot` |
5. The Space will automatically install dependencies from `requirements.txt` and start `app.py`
6. Once deployed, your chatbot will be available at: `https://<your-username>-<space-name>.hf.space`
### Managing the Knowledge Base
After the Space is live, manage the knowledge base from your local machine:
```bash
# 1. Make sure your local .env has the SAME Qdrant credentials as the Space
# 2. Start the admin dashboard
python admin.py
```
Any documents you upload, scrape, or delete through the local admin dashboard will immediately be available to the cloud-hosted chatbot β€” because both connect to the same Qdrant instance.
### Embedding on Your Website
Once your Space is running, you can embed the chatbot on any website using the widget (see [Embedding the Chatbot Widget](#embedding-the-chatbot-widget)). Simply set the iframe `src` to your Space URL:
```html
<iframe src="https://pikamomo-hr-intervals-chatbot.hf.space" ...></iframe>
```
---
## License
This project is developed for non-profit HR use. See your organization's licensing terms for details.