digitChatBot / README.md
paradox44's picture
Update README.md
fe21b57 verified

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: Non-QM Glossary Bot
emoji: 🏠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false

Non-QM Glossary Chatbot

A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.

Features

  • 🏠 Non-QM Expertise: Specialized glossary of mortgage terminology
  • πŸ’¬ Conversation Memory: Smart follow-up question handling
  • πŸ”’ Compliance First: Built-in disclaimers and PII protection
  • ⚑ Streaming Responses: Real-time text generation
  • 🎨 Professional UI: Modern Gradio interface with custom styling
  • πŸ’° Cost Efficient: Optimized for <$10/month operation

Prerequisites

  • Python 3.8 or higher
  • OpenAI API key (for embeddings)
  • OpenRouter API key (for Gemini LLM access)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd ChatBot
    
  2. Create and activate a virtual environment:

    python -m venv venv
    
    # On Windows:
    venv\Scripts\activate
    
    # On macOS/Linux:
    source venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    

API Key Setup

1. OpenAI API Key

  1. Go to OpenAI API Keys
  2. Create a new API key
  3. Copy the key (starts with sk-proj-...)

2. OpenRouter API Key

  1. Go to OpenRouter Keys
  2. Create a new API key
  3. Copy the key (starts with sk-or-...)

3. Environment Configuration

Create a .env file in the project root:

# Create .env file
touch .env

Add your API keys to the .env file:

OPENAI_API_KEY=sk-proj-your-openai-key-here
OPENROUTER_API_KEY=sk-or-your-openrouter-key-here

⚠️ Important: Never commit your .env file to version control. It's already included in .gitignore.

Running the Application

1. Generate Vector Index (First Time Only)

Before running the chatbot for the first time, generate the search index:

python build_index.py

This creates:

  • glossary.index - FAISS vector search index
  • chunks.json - Text chunks metadata

2. Start the Chatbot

python app.py

The application will start and display:

Running on local URL: http://127.0.0.1:7860

3. Access the Interface

Open your browser and go to: http://127.0.0.1:7860

Usage

Basic Questions

Ask about Non-QM mortgage terms:

  • "What is a Non-QM loan?"
  • "Define debt-to-income ratio"
  • "What does DSCR mean?"
  • "Explain asset-based lending"

Follow-up Questions

The chatbot remembers conversation context:

  • After asking about a term, say "tell me more"
  • "Can you elaborate on that?"
  • "Give me more details"

What NOT to Ask

  • Personal financial information
  • Rate quotes or loan applications
  • Questions outside the glossary scope

Project Structure

ChatBot/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ build_index.py         # Vector index generation
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ glossary.txt          # Source glossary content
β”œβ”€β”€ glossary.index        # Generated FAISS index (after build)
β”œβ”€β”€ chunks.json           # Generated text chunks (after build)
β”œβ”€β”€ .env                  # API keys (create this file)
β”œβ”€β”€ .gitignore           # Files to exclude from git
└── memory-bank/         # Project documentation

Configuration

Key settings in app.py:

EMBED_MODEL = "text-embedding-3-small"            # OpenAI embeddings
GPT_MODEL = "google/gemini-2.5-flash-preview-05-20"  # OpenRouter LLM
SIM_THRESHOLD = 0.30                              # Similarity threshold
TOP_K = 3                                         # Number of chunks to retrieve

Deployment

Hugging Face Spaces

  1. Create a new Space:

  2. Upload required files:

    app.py
    requirements.txt
    glossary.txt
    glossary.index
    chunks.json
    build_index.py
    
  3. Configure secrets in HF Spaces:

    • Go to Settings β†’ Variables and Secrets
    • Add OPENAI_API_KEY
    • Add OPENROUTER_API_KEY
  4. Deploy:

    • Push files to the Space repository
    • The app will automatically build and deploy

Maintenance

Updating the Glossary

  1. Edit glossary.txt with new terms
  2. Regenerate the index:
    python build_index.py
    
  3. Restart the application

Cost Monitoring

  • OpenAI: ~$0.0001 per query (embeddings)
  • OpenRouter: ~$0.005 per response (Gemini)
  • Target: <$10/month total operation

Troubleshooting

Common Issues:

  1. "Module not found" error:

    pip install -r requirements.txt
    
  2. "No such file" for index files:

    python build_index.py
    
  3. API key errors:

    • Check .env file exists and has correct keys
    • Verify API keys are valid and have sufficient credits
  4. Import errors:

    pip install faiss-cpu numpy openai requests gradio python-dotenv
    

Compliance Features

  • Automatic Disclaimers: Every response includes required compliance text
  • PII Detection: Blocks emails, SSNs, and credit score references
  • Scope Limiting: Only answers questions about glossary terms
  • Session Memory: Context resets when chat is cleared (no persistent data)

Security

  • API keys stored in environment variables
  • No user data persistence
  • Input sanitization and validation
  • PII detection and rejection

Support

For technical issues:

  1. Check the troubleshooting section above
  2. Verify all dependencies are installed
  3. Ensure API keys are correctly configured
  4. Check that vector index files exist

License

This project is designed for internal compliance-focused use with strict business requirements.