Spaces:

paradox44
/

digitChatBot

Sleeping

App Files Files Community

digitChatBot / README.md

paradox44

Update README.md

fe21b57 verified 8 months ago

preview code

raw

history blame contribute delete

6.04 kB

	---
	title: Non-QM Glossary Bot
	emoji: 🏠
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: 5.31.0
	app_file: app.py
	pinned: false
	---

	# Non-QM Glossary Chatbot

	A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.

	## Features

	- 🏠 Non-QM Expertise: Specialized glossary of mortgage terminology
	- 💬 Conversation Memory: Smart follow-up question handling
	- 🔒 Compliance First: Built-in disclaimers and PII protection
	- ⚡ Streaming Responses: Real-time text generation
	- 🎨 Professional UI: Modern Gradio interface with custom styling
	- 💰 Cost Efficient: Optimized for <$10/month operation

	## Prerequisites

	- Python 3.8 or higher
	- OpenAI API key (for embeddings)
	- OpenRouter API key (for Gemini LLM access)

	## Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd ChatBot
	```

	2. Create and activate a virtual environment:
	```bash
	python -m venv venv

	# On Windows:
	venv\Scripts\activate

	# On macOS/Linux:
	source venv/bin/activate
	```

	3. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## API Key Setup

	### 1. OpenAI API Key
	1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
	2. Create a new API key
	3. Copy the key (starts with `sk-proj-...`)

	### 2. OpenRouter API Key
	1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
	2. Create a new API key
	3. Copy the key (starts with `sk-or-...`)

	### 3. Environment Configuration

	Create a `.env` file in the project root:

	```bash
	# Create .env file
	touch .env
	```

	Add your API keys to the `.env` file:

	```env
	OPENAI_API_KEY=sk-proj-your-openai-key-here
	OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
	```

	⚠️ Important: Never commit your `.env` file to version control. It's already included in `.gitignore`.

	## Running the Application

	### 1. Generate Vector Index (First Time Only)

	Before running the chatbot for the first time, generate the search index:

	```bash
	python build_index.py
	```

	This creates:
	- `glossary.index` - FAISS vector search index
	- `chunks.json` - Text chunks metadata

	### 2. Start the Chatbot

	```bash
	python app.py
	```

	The application will start and display:
	```
	Running on local URL: http://127.0.0.1:7860
	```

	### 3. Access the Interface

	Open your browser and go to: `http://127.0.0.1:7860`

	## Usage

	### Basic Questions
	Ask about Non-QM mortgage terms:
	- "What is a Non-QM loan?"
	- "Define debt-to-income ratio"
	- "What does DSCR mean?"
	- "Explain asset-based lending"

	### Follow-up Questions
	The chatbot remembers conversation context:
	- After asking about a term, say "tell me more"
	- "Can you elaborate on that?"
	- "Give me more details"

	### What NOT to Ask
	- Personal financial information
	- Rate quotes or loan applications
	- Questions outside the glossary scope

	## Project Structure

	```
	ChatBot/
	├── app.py # Main Gradio application
	├── build_index.py # Vector index generation
	├── requirements.txt # Python dependencies
	├── glossary.txt # Source glossary content
	├── glossary.index # Generated FAISS index (after build)
	├── chunks.json # Generated text chunks (after build)
	├── .env # API keys (create this file)
	├── .gitignore # Files to exclude from git
	└── memory-bank/ # Project documentation
	```

	## Configuration

	Key settings in `app.py`:

	```python
	EMBED_MODEL = "text-embedding-3-small" # OpenAI embeddings
	GPT_MODEL = "google/gemini-2.5-flash-preview-05-20" # OpenRouter LLM
	SIM_THRESHOLD = 0.30 # Similarity threshold
	TOP_K = 3 # Number of chunks to retrieve
	```

	## Deployment

	### Hugging Face Spaces

	1. Create a new Space:
	- Go to [Hugging Face Spaces](https://huggingface.co/spaces)
	- Choose Gradio SDK
	- Set hardware to CPU Basic (free)

	2. Upload required files:
	```
	app.py
	requirements.txt
	glossary.txt
	glossary.index
	chunks.json
	build_index.py
	```

	3. Configure secrets in HF Spaces:
	- Go to Settings → Variables and Secrets
	- Add `OPENAI_API_KEY`
	- Add `OPENROUTER_API_KEY`

	4. Deploy:
	- Push files to the Space repository
	- The app will automatically build and deploy

	## Maintenance

	### Updating the Glossary

	1. Edit `glossary.txt` with new terms
	2. Regenerate the index:
	```bash
	python build_index.py
	```
	3. Restart the application

	### Cost Monitoring

	- OpenAI: ~$0.0001 per query (embeddings)
	- OpenRouter: ~$0.005 per response (Gemini)
	- Target: <$10/month total operation

	### Troubleshooting

	Common Issues:

	1. "Module not found" error:
	```bash
	pip install -r requirements.txt
	```

	2. "No such file" for index files:
	```bash
	python build_index.py
	```

	3. API key errors:
	- Check `.env` file exists and has correct keys
	- Verify API keys are valid and have sufficient credits

	4. Import errors:
	```bash
	pip install faiss-cpu numpy openai requests gradio python-dotenv
	```

	## Compliance Features

	- Automatic Disclaimers: Every response includes required compliance text
	- PII Detection: Blocks emails, SSNs, and credit score references
	- Scope Limiting: Only answers questions about glossary terms
	- Session Memory: Context resets when chat is cleared (no persistent data)

	## Security

	- API keys stored in environment variables
	- No user data persistence
	- Input sanitization and validation
	- PII detection and rejection

	## Support

	For technical issues:
	1. Check the troubleshooting section above
	2. Verify all dependencies are installed
	3. Ensure API keys are correctly configured
	4. Check that vector index files exist

	## License

	This project is designed for internal compliance-focused use with strict business requirements.