digitChatBot / README.md
paradox44's picture
Update README.md
fe21b57 verified
---
title: Non-QM Glossary Bot
emoji: 🏠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
---
# Non-QM Glossary Chatbot
A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
## Features
- 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
- πŸ’¬ **Conversation Memory**: Smart follow-up question handling
- πŸ”’ **Compliance First**: Built-in disclaimers and PII protection
- ⚑ **Streaming Responses**: Real-time text generation
- 🎨 **Professional UI**: Modern Gradio interface with custom styling
- πŸ’° **Cost Efficient**: Optimized for <$10/month operation
## Prerequisites
- Python 3.8 or higher
- OpenAI API key (for embeddings)
- OpenRouter API key (for Gemini LLM access)
## Installation
1. **Clone the repository:**
```bash
git clone <repository-url>
cd ChatBot
```
2. **Create and activate a virtual environment:**
```bash
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
## API Key Setup
### 1. OpenAI API Key
1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
2. Create a new API key
3. Copy the key (starts with `sk-proj-...`)
### 2. OpenRouter API Key
1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
2. Create a new API key
3. Copy the key (starts with `sk-or-...`)
### 3. Environment Configuration
Create a `.env` file in the project root:
```bash
# Create .env file
touch .env
```
Add your API keys to the `.env` file:
```env
OPENAI_API_KEY=sk-proj-your-openai-key-here
OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
```
⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
## Running the Application
### 1. Generate Vector Index (First Time Only)
Before running the chatbot for the first time, generate the search index:
```bash
python build_index.py
```
This creates:
- `glossary.index` - FAISS vector search index
- `chunks.json` - Text chunks metadata
### 2. Start the Chatbot
```bash
python app.py
```
The application will start and display:
```
Running on local URL: http://127.0.0.1:7860
```
### 3. Access the Interface
Open your browser and go to: `http://127.0.0.1:7860`
## Usage
### Basic Questions
Ask about Non-QM mortgage terms:
- "What is a Non-QM loan?"
- "Define debt-to-income ratio"
- "What does DSCR mean?"
- "Explain asset-based lending"
### Follow-up Questions
The chatbot remembers conversation context:
- After asking about a term, say "tell me more"
- "Can you elaborate on that?"
- "Give me more details"
### What NOT to Ask
- Personal financial information
- Rate quotes or loan applications
- Questions outside the glossary scope
## Project Structure
```
ChatBot/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ build_index.py # Vector index generation
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ glossary.txt # Source glossary content
β”œβ”€β”€ glossary.index # Generated FAISS index (after build)
β”œβ”€β”€ chunks.json # Generated text chunks (after build)
β”œβ”€β”€ .env # API keys (create this file)
β”œβ”€β”€ .gitignore # Files to exclude from git
└── memory-bank/ # Project documentation
```
## Configuration
Key settings in `app.py`:
```python
EMBED_MODEL = "text-embedding-3-small" # OpenAI embeddings
GPT_MODEL = "google/gemini-2.5-flash-preview-05-20" # OpenRouter LLM
SIM_THRESHOLD = 0.30 # Similarity threshold
TOP_K = 3 # Number of chunks to retrieve
```
## Deployment
### Hugging Face Spaces
1. **Create a new Space:**
- Go to [Hugging Face Spaces](https://huggingface.co/spaces)
- Choose Gradio SDK
- Set hardware to CPU Basic (free)
2. **Upload required files:**
```
app.py
requirements.txt
glossary.txt
glossary.index
chunks.json
build_index.py
```
3. **Configure secrets in HF Spaces:**
- Go to Settings β†’ Variables and Secrets
- Add `OPENAI_API_KEY`
- Add `OPENROUTER_API_KEY`
4. **Deploy:**
- Push files to the Space repository
- The app will automatically build and deploy
## Maintenance
### Updating the Glossary
1. Edit `glossary.txt` with new terms
2. Regenerate the index:
```bash
python build_index.py
```
3. Restart the application
### Cost Monitoring
- **OpenAI**: ~$0.0001 per query (embeddings)
- **OpenRouter**: ~$0.005 per response (Gemini)
- **Target**: <$10/month total operation
### Troubleshooting
**Common Issues:**
1. **"Module not found" error:**
```bash
pip install -r requirements.txt
```
2. **"No such file" for index files:**
```bash
python build_index.py
```
3. **API key errors:**
- Check `.env` file exists and has correct keys
- Verify API keys are valid and have sufficient credits
4. **Import errors:**
```bash
pip install faiss-cpu numpy openai requests gradio python-dotenv
```
## Compliance Features
- **Automatic Disclaimers**: Every response includes required compliance text
- **PII Detection**: Blocks emails, SSNs, and credit score references
- **Scope Limiting**: Only answers questions about glossary terms
- **Session Memory**: Context resets when chat is cleared (no persistent data)
## Security
- API keys stored in environment variables
- No user data persistence
- Input sanitization and validation
- PII detection and rejection
## Support
For technical issues:
1. Check the troubleshooting section above
2. Verify all dependencies are installed
3. Ensure API keys are correctly configured
4. Check that vector index files exist
## License
This project is designed for internal compliance-focused use with strict business requirements.