Spaces:
Sleeping
Sleeping
| title: Non-QM Glossary Bot | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.31.0 | |
| app_file: app.py | |
| pinned: false | |
| # Non-QM Glossary Chatbot | |
| A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory. | |
| ## Features | |
| - π **Non-QM Expertise**: Specialized glossary of mortgage terminology | |
| - π¬ **Conversation Memory**: Smart follow-up question handling | |
| - π **Compliance First**: Built-in disclaimers and PII protection | |
| - β‘ **Streaming Responses**: Real-time text generation | |
| - π¨ **Professional UI**: Modern Gradio interface with custom styling | |
| - π° **Cost Efficient**: Optimized for <$10/month operation | |
| ## Prerequisites | |
| - Python 3.8 or higher | |
| - OpenAI API key (for embeddings) | |
| - OpenRouter API key (for Gemini LLM access) | |
| ## Installation | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone <repository-url> | |
| cd ChatBot | |
| ``` | |
| 2. **Create and activate a virtual environment:** | |
| ```bash | |
| python -m venv venv | |
| # On Windows: | |
| venv\Scripts\activate | |
| # On macOS/Linux: | |
| source venv/bin/activate | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## API Key Setup | |
| ### 1. OpenAI API Key | |
| 1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys) | |
| 2. Create a new API key | |
| 3. Copy the key (starts with `sk-proj-...`) | |
| ### 2. OpenRouter API Key | |
| 1. Go to [OpenRouter Keys](https://openrouter.ai/keys) | |
| 2. Create a new API key | |
| 3. Copy the key (starts with `sk-or-...`) | |
| ### 3. Environment Configuration | |
| Create a `.env` file in the project root: | |
| ```bash | |
| # Create .env file | |
| touch .env | |
| ``` | |
| Add your API keys to the `.env` file: | |
| ```env | |
| OPENAI_API_KEY=sk-proj-your-openai-key-here | |
| OPENROUTER_API_KEY=sk-or-your-openrouter-key-here | |
| ``` | |
| β οΈ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`. | |
| ## Running the Application | |
| ### 1. Generate Vector Index (First Time Only) | |
| Before running the chatbot for the first time, generate the search index: | |
| ```bash | |
| python build_index.py | |
| ``` | |
| This creates: | |
| - `glossary.index` - FAISS vector search index | |
| - `chunks.json` - Text chunks metadata | |
| ### 2. Start the Chatbot | |
| ```bash | |
| python app.py | |
| ``` | |
| The application will start and display: | |
| ``` | |
| Running on local URL: http://127.0.0.1:7860 | |
| ``` | |
| ### 3. Access the Interface | |
| Open your browser and go to: `http://127.0.0.1:7860` | |
| ## Usage | |
| ### Basic Questions | |
| Ask about Non-QM mortgage terms: | |
| - "What is a Non-QM loan?" | |
| - "Define debt-to-income ratio" | |
| - "What does DSCR mean?" | |
| - "Explain asset-based lending" | |
| ### Follow-up Questions | |
| The chatbot remembers conversation context: | |
| - After asking about a term, say "tell me more" | |
| - "Can you elaborate on that?" | |
| - "Give me more details" | |
| ### What NOT to Ask | |
| - Personal financial information | |
| - Rate quotes or loan applications | |
| - Questions outside the glossary scope | |
| ## Project Structure | |
| ``` | |
| ChatBot/ | |
| βββ app.py # Main Gradio application | |
| βββ build_index.py # Vector index generation | |
| βββ requirements.txt # Python dependencies | |
| βββ glossary.txt # Source glossary content | |
| βββ glossary.index # Generated FAISS index (after build) | |
| βββ chunks.json # Generated text chunks (after build) | |
| βββ .env # API keys (create this file) | |
| βββ .gitignore # Files to exclude from git | |
| βββ memory-bank/ # Project documentation | |
| ``` | |
| ## Configuration | |
| Key settings in `app.py`: | |
| ```python | |
| EMBED_MODEL = "text-embedding-3-small" # OpenAI embeddings | |
| GPT_MODEL = "google/gemini-2.5-flash-preview-05-20" # OpenRouter LLM | |
| SIM_THRESHOLD = 0.30 # Similarity threshold | |
| TOP_K = 3 # Number of chunks to retrieve | |
| ``` | |
| ## Deployment | |
| ### Hugging Face Spaces | |
| 1. **Create a new Space:** | |
| - Go to [Hugging Face Spaces](https://huggingface.co/spaces) | |
| - Choose Gradio SDK | |
| - Set hardware to CPU Basic (free) | |
| 2. **Upload required files:** | |
| ``` | |
| app.py | |
| requirements.txt | |
| glossary.txt | |
| glossary.index | |
| chunks.json | |
| build_index.py | |
| ``` | |
| 3. **Configure secrets in HF Spaces:** | |
| - Go to Settings β Variables and Secrets | |
| - Add `OPENAI_API_KEY` | |
| - Add `OPENROUTER_API_KEY` | |
| 4. **Deploy:** | |
| - Push files to the Space repository | |
| - The app will automatically build and deploy | |
| ## Maintenance | |
| ### Updating the Glossary | |
| 1. Edit `glossary.txt` with new terms | |
| 2. Regenerate the index: | |
| ```bash | |
| python build_index.py | |
| ``` | |
| 3. Restart the application | |
| ### Cost Monitoring | |
| - **OpenAI**: ~$0.0001 per query (embeddings) | |
| - **OpenRouter**: ~$0.005 per response (Gemini) | |
| - **Target**: <$10/month total operation | |
| ### Troubleshooting | |
| **Common Issues:** | |
| 1. **"Module not found" error:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 2. **"No such file" for index files:** | |
| ```bash | |
| python build_index.py | |
| ``` | |
| 3. **API key errors:** | |
| - Check `.env` file exists and has correct keys | |
| - Verify API keys are valid and have sufficient credits | |
| 4. **Import errors:** | |
| ```bash | |
| pip install faiss-cpu numpy openai requests gradio python-dotenv | |
| ``` | |
| ## Compliance Features | |
| - **Automatic Disclaimers**: Every response includes required compliance text | |
| - **PII Detection**: Blocks emails, SSNs, and credit score references | |
| - **Scope Limiting**: Only answers questions about glossary terms | |
| - **Session Memory**: Context resets when chat is cleared (no persistent data) | |
| ## Security | |
| - API keys stored in environment variables | |
| - No user data persistence | |
| - Input sanitization and validation | |
| - PII detection and rejection | |
| ## Support | |
| For technical issues: | |
| 1. Check the troubleshooting section above | |
| 2. Verify all dependencies are installed | |
| 3. Ensure API keys are correctly configured | |
| 4. Check that vector index files exist | |
| ## License | |
| This project is designed for internal compliance-focused use with strict business requirements. |