Spaces:

paradox44
/

digitChatBot

Sleeping

App Files Files Community

paradox44 commited on May 27, 2025

Commit

4c6c5df

verified ·

1 Parent(s): bd7261b

Update README.md

Browse files

Files changed (1) hide show

README.md +252 -241

README.md CHANGED Viewed

@@ -1,241 +1,252 @@
-# Non-QM Glossary Chatbot
-A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
-## Features
-- 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
-- 💬 **Conversation Memory**: Smart follow-up question handling
-- 🔒 **Compliance First**: Built-in disclaimers and PII protection
-- ⚡ **Streaming Responses**: Real-time text generation
-- 🎨 **Professional UI**: Modern Gradio interface with custom styling
-- 💰 **Cost Efficient**: Optimized for <$10/month operation
-## Prerequisites
-- Python 3.8 or higher
-- OpenAI API key (for embeddings)
-- OpenRouter API key (for Gemini LLM access)
-## Installation
-1. **Clone the repository:**
-   ```bash
-   git clone <repository-url>
-   cd ChatBot
-   ```
-2. **Create and activate a virtual environment:**
-   ```bash
-   python -m venv venv
-   # On Windows:
-   venv\Scripts\activate
-   # On macOS/Linux:
-   source venv/bin/activate
-   ```
-3. **Install dependencies:**
-   ```bash
-   pip install -r requirements.txt
-   ```
-## API Key Setup
-### 1. OpenAI API Key
-1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
-2. Create a new API key
-3. Copy the key (starts with `sk-proj-...`)
-### 2. OpenRouter API Key
-1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
-2. Create a new API key
-3. Copy the key (starts with `sk-or-...`)
-### 3. Environment Configuration
-Create a `.env` file in the project root:
-```bash
-# Create .env file
-touch .env
-```
-Add your API keys to the `.env` file:
-```env
-OPENAI_API_KEY=sk-proj-your-openai-key-here
-OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
-```
-⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
-## Running the Application
-### 1. Generate Vector Index (First Time Only)
-Before running the chatbot for the first time, generate the search index:
-```bash
-python build_index.py
-```
-This creates:
-- `glossary.index` - FAISS vector search index
-- `chunks.json` - Text chunks metadata
-### 2. Start the Chatbot
-```bash
-python app.py
-```
-The application will start and display:
-```
-Running on local URL: http://127.0.0.1:7860
-```
-### 3. Access the Interface
-Open your browser and go to: `http://127.0.0.1:7860`
-## Usage
-### Basic Questions
-Ask about Non-QM mortgage terms:
-- "What is a Non-QM loan?"
-- "Define debt-to-income ratio"
-- "What does DSCR mean?"
-- "Explain asset-based lending"
-### Follow-up Questions
-The chatbot remembers conversation context:
-- After asking about a term, say "tell me more"
-- "Can you elaborate on that?"
-- "Give me more details"
-### What NOT to Ask
-- Personal financial information
-- Rate quotes or loan applications
-- Questions outside the glossary scope
-## Project Structure
-```
-ChatBot/
-├── app.py                 # Main Gradio application
-├── build_index.py         # Vector index generation
-├── requirements.txt       # Python dependencies
-├── glossary.txt          # Source glossary content
-├── glossary.index        # Generated FAISS index (after build)
-├── chunks.json           # Generated text chunks (after build)
-├── .env                  # API keys (create this file)
-├── .gitignore           # Files to exclude from git
-└── memory-bank/         # Project documentation
-```
-## Configuration
-Key settings in `app.py`:
-```python
-EMBED_MODEL = "text-embedding-3-small"            # OpenAI embeddings
-GPT_MODEL = "google/gemini-2.5-flash-preview-05-20"  # OpenRouter LLM
-SIM_THRESHOLD = 0.30                              # Similarity threshold
-TOP_K = 3                                         # Number of chunks to retrieve
-```
-## Deployment
-### Hugging Face Spaces
-1. **Create a new Space:**
-   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
-   - Choose Gradio SDK
-   - Set hardware to CPU Basic (free)
-2. **Upload required files:**
-   ```
-   app.py
-   requirements.txt
-   glossary.txt
-   glossary.index
-   chunks.json
-   build_index.py
-   ```
-3. **Configure secrets in HF Spaces:**
-   - Go to Settings → Variables and Secrets
-   - Add `OPENAI_API_KEY`
-   - Add `OPENROUTER_API_KEY`
-4. **Deploy:**
-   - Push files to the Space repository
-   - The app will automatically build and deploy
-## Maintenance
-### Updating the Glossary
-1. Edit `glossary.txt` with new terms
-2. Regenerate the index:
-   ```bash
-   python build_index.py
-   ```
-3. Restart the application
-### Cost Monitoring
-- **OpenAI**: ~$0.0001 per query (embeddings)
-- **OpenRouter**: ~$0.005 per response (Gemini)
-- **Target**: <$10/month total operation
-### Troubleshooting
-**Common Issues:**
-1. **"Module not found" error:**
-   ```bash
-   pip install -r requirements.txt
-   ```
-2. **"No such file" for index files:**
-   ```bash
-   python build_index.py
-   ```
-3. **API key errors:**
-   - Check `.env` file exists and has correct keys
-   - Verify API keys are valid and have sufficient credits
-4. **Import errors:**
-   ```bash
-   pip install faiss-cpu numpy openai requests gradio python-dotenv
-   ```
-## Compliance Features
-- **Automatic Disclaimers**: Every response includes required compliance text
-- **PII Detection**: Blocks emails, SSNs, and credit score references
-- **Scope Limiting**: Only answers questions about glossary terms
-- **Session Memory**: Context resets when chat is cleared (no persistent data)
-## Security
-- API keys stored in environment variables
-- No user data persistence
-- Input sanitization and validation
-- PII detection and rejection
-## Support
-For technical issues:
-1. Check the troubleshooting section above
-2. Verify all dependencies are installed
-3. Ensure API keys are correctly configured
-4. Check that vector index files exist
-## License
-This project is designed for internal compliance-focused use with strict business requirements.

+---
+title: Non-QM Glossary Bot
+emoji: 🏠
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: "4.26.0"   # any current Gradio version is fine
+app_file: app.py
+pinned: false
+---
+# Non-QM Glossary Chatbot
+A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
+## Features
+- 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
+- 💬 **Conversation Memory**: Smart follow-up question handling
+- 🔒 **Compliance First**: Built-in disclaimers and PII protection
+- ⚡ **Streaming Responses**: Real-time text generation
+- 🎨 **Professional UI**: Modern Gradio interface with custom styling
+- 💰 **Cost Efficient**: Optimized for <$10/month operation
+## Prerequisites
+- Python 3.8 or higher
+- OpenAI API key (for embeddings)
+- OpenRouter API key (for Gemini LLM access)
+## Installation
+1. **Clone the repository:**
+   ```bash
+   git clone <repository-url>
+   cd ChatBot
+   ```
+2. **Create and activate a virtual environment:**
+   ```bash
+   python -m venv venv
+   # On Windows:
+   venv\Scripts\activate
+   # On macOS/Linux:
+   source venv/bin/activate
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+## API Key Setup
+### 1. OpenAI API Key
+1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
+2. Create a new API key
+3. Copy the key (starts with `sk-proj-...`)
+### 2. OpenRouter API Key
+1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
+2. Create a new API key
+3. Copy the key (starts with `sk-or-...`)
+### 3. Environment Configuration
+Create a `.env` file in the project root:
+```bash
+# Create .env file
+touch .env
+```
+Add your API keys to the `.env` file:
+```env
+OPENAI_API_KEY=sk-proj-your-openai-key-here
+OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
+```
+⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
+## Running the Application
+### 1. Generate Vector Index (First Time Only)
+Before running the chatbot for the first time, generate the search index:
+```bash
+python build_index.py
+```
+This creates:
+- `glossary.index` - FAISS vector search index
+- `chunks.json` - Text chunks metadata
+### 2. Start the Chatbot
+```bash
+python app.py
+```
+The application will start and display:
+```
+Running on local URL: http://127.0.0.1:7860
+```
+### 3. Access the Interface
+Open your browser and go to: `http://127.0.0.1:7860`
+## Usage
+### Basic Questions
+Ask about Non-QM mortgage terms:
+- "What is a Non-QM loan?"
+- "Define debt-to-income ratio"
+- "What does DSCR mean?"
+- "Explain asset-based lending"
+### Follow-up Questions
+The chatbot remembers conversation context:
+- After asking about a term, say "tell me more"
+- "Can you elaborate on that?"
+- "Give me more details"
+### What NOT to Ask
+- Personal financial information
+- Rate quotes or loan applications
+- Questions outside the glossary scope
+## Project Structure
+```
+ChatBot/
+├── app.py                 # Main Gradio application
+├── build_index.py         # Vector index generation
+├── requirements.txt       # Python dependencies
+├── glossary.txt          # Source glossary content
+├── glossary.index        # Generated FAISS index (after build)
+├── chunks.json           # Generated text chunks (after build)
+├── .env                  # API keys (create this file)
+├── .gitignore           # Files to exclude from git
+└── memory-bank/         # Project documentation
+```
+## Configuration
+Key settings in `app.py`:
+```python
+EMBED_MODEL = "text-embedding-3-small"            # OpenAI embeddings
+GPT_MODEL = "google/gemini-2.5-flash-preview-05-20"  # OpenRouter LLM
+SIM_THRESHOLD = 0.30                              # Similarity threshold
+TOP_K = 3                                         # Number of chunks to retrieve
+```
+## Deployment
+### Hugging Face Spaces
+1. **Create a new Space:**
+   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+   - Choose Gradio SDK
+   - Set hardware to CPU Basic (free)
+2. **Upload required files:**
+   ```
+   app.py
+   requirements.txt
+   glossary.txt
+   glossary.index
+   chunks.json
+   build_index.py
+   ```
+3. **Configure secrets in HF Spaces:**
+   - Go to Settings → Variables and Secrets
+   - Add `OPENAI_API_KEY`
+   - Add `OPENROUTER_API_KEY`
+4. **Deploy:**
+   - Push files to the Space repository
+   - The app will automatically build and deploy
+## Maintenance
+### Updating the Glossary
+1. Edit `glossary.txt` with new terms
+2. Regenerate the index:
+   ```bash
+   python build_index.py
+   ```
+3. Restart the application
+### Cost Monitoring
+- **OpenAI**: ~$0.0001 per query (embeddings)
+- **OpenRouter**: ~$0.005 per response (Gemini)
+- **Target**: <$10/month total operation
+### Troubleshooting
+**Common Issues:**
+1. **"Module not found" error:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+2. **"No such file" for index files:**
+   ```bash
+   python build_index.py
+   ```
+3. **API key errors:**
+   - Check `.env` file exists and has correct keys
+   - Verify API keys are valid and have sufficient credits
+4. **Import errors:**
+   ```bash
+   pip install faiss-cpu numpy openai requests gradio python-dotenv
+   ```
+## Compliance Features
+- **Automatic Disclaimers**: Every response includes required compliance text
+- **PII Detection**: Blocks emails, SSNs, and credit score references
+- **Scope Limiting**: Only answers questions about glossary terms
+- **Session Memory**: Context resets when chat is cleared (no persistent data)
+## Security
+- API keys stored in environment variables
+- No user data persistence
+- Input sanitization and validation
+- PII detection and rejection
+## Support
+For technical issues:
+1. Check the troubleshooting section above
+2. Verify all dependencies are installed
+3. Ensure API keys are correctly configured
+4. Check that vector index files exist
+## License
+This project is designed for internal compliance-focused use with strict business requirements.