Spaces:

paradox44
/

digitChatBot

Sleeping

App Files Files Community

paradox44 commited on May 27, 2025

Commit

bd7261b

verified ·

1 Parent(s): 9d1d5e4

Upload 7 files

Browse files

Added Basic functionalities

Files changed (8) hide show

.gitattributes +1 -0
.gitignore +251 -0
README.md +241 -12
app.py +384 -0
build_index.py +39 -0
chunks.json +1 -0
glossary.index +3 -0
requirements.txt +7 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+glossary.index filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,251 @@

+# Hugging Face Spaces .gitignore
+# Optimized for Non-QM Glossary Chatbot deployment
+# ==========================================
+# Environment and Configuration Files
+# ==========================================
+.env
+.env.local
+.env.production
+.env.staging
+config.json
+secrets.json
+# ==========================================
+# Cline AI Assistant Files
+# ==========================================
+.clinerules
+.cline/
+memory-bank/
+.claude/
+.cursor/
+# ==========================================
+# Development Documentation
+# ==========================================
+designDoc.md
+Basic_Design_Doc.docx
+Glossary.pdf
+README_dev.md
+DEVELOPMENT.md
+TODO.md
+NOTES.md
+# ==========================================
+# Python
+# ==========================================
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# ==========================================
+# Virtual Environments
+# ==========================================
+venv/
+env/
+ENV/
+env.bak/
+venv.bak/
+.venv/
+.env/
+.conda/
+conda-meta/
+# ==========================================
+# IDEs and Editors
+# ==========================================
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+Thumbs.db
+# VS Code
+.vscode/settings.json
+.vscode/tasks.json
+.vscode/launch.json
+.vscode/extensions.json
+.vscode/cline_docs.md
+# PyCharm
+.idea/
+*.iml
+*.iws
+# Sublime Text
+*.sublime-project
+*.sublime-workspace
+# Vim
+.vim/
+*.swp
+*.swo
+# ==========================================
+# OS Generated Files
+# ==========================================
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+desktop.ini
+# ==========================================
+# Logs and Databases
+# ==========================================
+*.log
+logs/
+log/
+*.sqlite
+*.sqlite3
+*.db
+# ==========================================
+# Testing and Coverage
+# ==========================================
+.coverage
+.pytest_cache/
+.tox/
+.nox/
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.cache
+nosetests.xml
+# ==========================================
+# Jupyter Notebook
+# ==========================================
+.ipynb_checkpoints
+*.ipynb
+# ==========================================
+# Model and Data Files (Exclude Large Files)
+# ==========================================
+# Note: We DO want to include these for our chatbot:
+# - glossary.txt (source data)
+# - glossary.index (FAISS index)
+# - chunks.json (preprocessed data)
+# But exclude any backup or temporary versions
+*.bak
+*.backup
+*.tmp
+*_backup.*
+*_temp.*
+# ==========================================
+# Package Managers
+# ==========================================
+node_modules/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+package-lock.json
+yarn.lock
+# ==========================================
+# Security and Sensitive Files
+# ==========================================
+*.pem
+*.key
+*.cert
+*.crt
+id_rsa
+id_dsa
+.ssh/
+.gnupg/
+# ==========================================
+# Temporary Files
+# ==========================================
+*.tmp
+*.temp
+temp/
+tmp/
+.cache/
+.temp/
+# ==========================================
+# Backup Files
+# ==========================================
+*.orig
+*.bak
+*.backup
+*~
+#*#
+# ==========================================
+# Hugging Face Specific
+# ==========================================
+.gradio/
+gradio_cached_examples/
+.hf_token
+hf_token.txt
+# ==========================================
+# Git
+# ==========================================
+.git/
+.gitignore_local
+.gitconfig_local
+# ==========================================
+# Local Development Scripts
+# ==========================================
+run_local.py
+test_local.py
+debug.py
+local_test.sh
+dev_setup.sh
+# ==========================================
+# Documentation Build Files
+# ==========================================
+docs/_build/
+docs/build/
+site/
+.readthedocs.yml
+# ==========================================
+# Performance and Profiling
+# ==========================================
+.prof
+*.prof
+.benchmark
+profile_output
+# END: Files above this line will be excluded from Hugging Face Spaces
+#
+# INCLUDED FILES (should be committed):
+# - app.py (main application)
+# - requirements.txt (dependencies)
+# - glossary.txt (source data)
+# - glossary.index (FAISS vector index)
+# - chunks.json (preprocessed data)
+# - build_index.py (for maintenance)

README.md CHANGED Viewed

@@ -1,12 +1,241 @@
----
-title: DigitChatBot
-emoji: 💻
-colorFrom: gray
-colorTo: gray
-sdk: gradio
-sdk_version: 5.31.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Non-QM Glossary Chatbot
+A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
+## Features
+- 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
+- 💬 **Conversation Memory**: Smart follow-up question handling
+- 🔒 **Compliance First**: Built-in disclaimers and PII protection
+- ⚡ **Streaming Responses**: Real-time text generation
+- 🎨 **Professional UI**: Modern Gradio interface with custom styling
+- 💰 **Cost Efficient**: Optimized for <$10/month operation
+## Prerequisites
+- Python 3.8 or higher
+- OpenAI API key (for embeddings)
+- OpenRouter API key (for Gemini LLM access)
+## Installation
+1. **Clone the repository:**
+   ```bash
+   git clone <repository-url>
+   cd ChatBot
+   ```
+2. **Create and activate a virtual environment:**
+   ```bash
+   python -m venv venv
+   # On Windows:
+   venv\Scripts\activate
+   # On macOS/Linux:
+   source venv/bin/activate
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+## API Key Setup
+### 1. OpenAI API Key
+1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
+2. Create a new API key
+3. Copy the key (starts with `sk-proj-...`)
+### 2. OpenRouter API Key
+1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
+2. Create a new API key
+3. Copy the key (starts with `sk-or-...`)
+### 3. Environment Configuration
+Create a `.env` file in the project root:
+```bash
+# Create .env file
+touch .env
+```
+Add your API keys to the `.env` file:
+```env
+OPENAI_API_KEY=sk-proj-your-openai-key-here
+OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
+```
+⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
+## Running the Application
+### 1. Generate Vector Index (First Time Only)
+Before running the chatbot for the first time, generate the search index:
+```bash
+python build_index.py
+```
+This creates:
+- `glossary.index` - FAISS vector search index
+- `chunks.json` - Text chunks metadata
+### 2. Start the Chatbot
+```bash
+python app.py
+```
+The application will start and display:
+```
+Running on local URL: http://127.0.0.1:7860
+```
+### 3. Access the Interface
+Open your browser and go to: `http://127.0.0.1:7860`
+## Usage
+### Basic Questions
+Ask about Non-QM mortgage terms:
+- "What is a Non-QM loan?"
+- "Define debt-to-income ratio"
+- "What does DSCR mean?"
+- "Explain asset-based lending"
+### Follow-up Questions
+The chatbot remembers conversation context:
+- After asking about a term, say "tell me more"
+- "Can you elaborate on that?"
+- "Give me more details"
+### What NOT to Ask
+- Personal financial information
+- Rate quotes or loan applications
+- Questions outside the glossary scope
+## Project Structure
+```
+ChatBot/
+├── app.py                 # Main Gradio application
+├── build_index.py         # Vector index generation
+├── requirements.txt       # Python dependencies
+├── glossary.txt          # Source glossary content
+├── glossary.index        # Generated FAISS index (after build)
+├── chunks.json           # Generated text chunks (after build)
+├── .env                  # API keys (create this file)
+├── .gitignore           # Files to exclude from git
+└── memory-bank/         # Project documentation
+```
+## Configuration
+Key settings in `app.py`:
+```python
+EMBED_MODEL = "text-embedding-3-small"            # OpenAI embeddings
+GPT_MODEL = "google/gemini-2.5-flash-preview-05-20"  # OpenRouter LLM
+SIM_THRESHOLD = 0.30                              # Similarity threshold
+TOP_K = 3                                         # Number of chunks to retrieve
+```
+## Deployment
+### Hugging Face Spaces
+1. **Create a new Space:**
+   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+   - Choose Gradio SDK
+   - Set hardware to CPU Basic (free)
+2. **Upload required files:**
+   ```
+   app.py
+   requirements.txt
+   glossary.txt
+   glossary.index
+   chunks.json
+   build_index.py
+   ```
+3. **Configure secrets in HF Spaces:**
+   - Go to Settings → Variables and Secrets
+   - Add `OPENAI_API_KEY`
+   - Add `OPENROUTER_API_KEY`
+4. **Deploy:**
+   - Push files to the Space repository
+   - The app will automatically build and deploy
+## Maintenance
+### Updating the Glossary
+1. Edit `glossary.txt` with new terms
+2. Regenerate the index:
+   ```bash
+   python build_index.py
+   ```
+3. Restart the application
+### Cost Monitoring
+- **OpenAI**: ~$0.0001 per query (embeddings)
+- **OpenRouter**: ~$0.005 per response (Gemini)
+- **Target**: <$10/month total operation
+### Troubleshooting
+**Common Issues:**
+1. **"Module not found" error:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+2. **"No such file" for index files:**
+   ```bash
+   python build_index.py
+   ```
+3. **API key errors:**
+   - Check `.env` file exists and has correct keys
+   - Verify API keys are valid and have sufficient credits
+4. **Import errors:**
+   ```bash
+   pip install faiss-cpu numpy openai requests gradio python-dotenv
+   ```
+## Compliance Features
+- **Automatic Disclaimers**: Every response includes required compliance text
+- **PII Detection**: Blocks emails, SSNs, and credit score references
+- **Scope Limiting**: Only answers questions about glossary terms
+- **Session Memory**: Context resets when chat is cleared (no persistent data)
+## Security
+- API keys stored in environment variables
+- No user data persistence
+- Input sanitization and validation
+- PII detection and rejection
+## Support
+For technical issues:
+1. Check the troubleshooting section above
+2. Verify all dependencies are installed
+3. Ensure API keys are correctly configured
+4. Check that vector index files exist
+## License
+This project is designed for internal compliance-focused use with strict business requirements.

app.py ADDED Viewed

	@@ -0,0 +1,384 @@

+import os
+import json
+import faiss
+import numpy as np
+import requests
+import gradio as gr
+from dotenv import load_dotenv
+import openai
+import re
+import time
+# ---------- config ----------
+EMBED_MODEL = "text-embedding-3-small"            # OpenAI
+GPT_MODEL   = "google/gemini-2.5-flash-preview-05-20"  # OpenRouter
+SIM_THRESHOLD = 0.30                              # tweak if recall is poor
+TOP_K = 3
+DISCLAIMER = "General info only, not a commitment to lend."
+# ----------------------------
+load_dotenv()
+openai.api_key       = os.getenv("OPENAI_API_KEY")
+OPENROUTER_API_KEY   = os.getenv("OPENROUTER_API_KEY")
+# ----- load glossary vectors -----
+with open("chunks.json", encoding="utf8") as f:
+    CHUNKS = json.load(f)
+INDEX = faiss.read_index("glossary.index")
+# ----- PII detection (compliance requirement) -----
+def contains_pii(text: str) -> bool:
+    """Basic PII detection for emails, SSNs, credit scores."""
+    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
+    ssn_pattern = r'\b\d{3}-?\d{2}-?\d{4}\b'
+    # Tightened credit score pattern to avoid false positives like "Form 4506-C"
+    credit_pattern = r'\b(?:[4-8]\d{2})(?:\s*credit\s*score)?\b'
+    return bool(re.search(email_pattern, text) or
+                re.search(ssn_pattern, text) or
+                re.search(credit_pattern, text))
+# ----- conversation memory helpers -----
+def detect_followup_question(question: str) -> bool:
+    """Detect if a question is asking for elaboration or follow-up."""
+    followup_patterns = [
+        r'\b(elaborate|expand|explain more|tell me more|more details|further|additionally)\b',
+        r'\b(can you|could you|would you).*(more|further|elaborate|expand)\b',
+        r'\b(what about|how about|what else)\b',
+        r'\b(that|this|it)\b.*\?',  # References to previous topic
+        r'^\s*(more|further|additionally|also)\b',
+        r'\b(give me more|tell me more|say more)\b'
+    ]
+    question_lower = question.lower()
+    return any(re.search(pattern, question_lower) for pattern in followup_patterns)
+def extract_last_topic(history):
+    """Extract the main topic from the most recent bot response."""
+    if not history or len(history) == 0:
+        return None
+    # Get the last bot response
+    last_exchange = history[-1]
+    if isinstance(last_exchange, dict) and 'content' in last_exchange:
+        last_response = last_exchange['content']
+    elif isinstance(last_exchange, list) and len(last_exchange) >= 2:
+        last_response = last_exchange[1]  # Bot response
+    else:
+        return None
+    # Extract key terms from the response (before disclaimer)
+    if DISCLAIMER in last_response:
+        content = last_response.split(DISCLAIMER)[0].strip()
+    else:
+        content = last_response
+    # Look for capitalized terms and common Non-QM keywords
+    terms = re.findall(r'\b[A-Z][A-Za-z-]+(?:\s+[A-Z][A-Za-z-]+)*\b', content)
+    nqm_keywords = ['Non-QM', 'DSCR', 'DTI', 'income', 'ratio', 'loan', 'mortgage', 'lending']
+    # Return the first meaningful term found
+    for term in terms:
+        if len(term) > 3 and any(keyword.lower() in term.lower() for keyword in nqm_keywords):
+            return term
+    return None
+# ----- helpers -----
+def embed(text: str) -> np.ndarray:
+    """Call OpenAI embedding endpoint and return a normalized float32 numpy vector."""
+    res = openai.embeddings.create(
+        model=EMBED_MODEL,
+        input=[text]
+    )
+    vec = np.array(res.data[0].embedding, dtype="float32")
+    # Normalize the vector for consistent similarity computation
+    faiss.normalize_L2(vec.reshape(1, -1))
+    return vec
+def retrieve(question: str, conversation_context: str = None):
+    """Return chunks whose cosine sim >= threshold, with optional conversation context."""
+    # Use conversation context for better retrieval if available
+    search_query = question
+    if conversation_context and detect_followup_question(question):
+        search_query = f"{conversation_context} {question}"
+    vec = embed(search_query).reshape(1, -1)
+    scores, ids = INDEX.search(vec, TOP_K)
+    relevant_chunks = [
+        CHUNKS[i]
+        for i, s in zip(ids[0], scores[0])
+        if s >= SIM_THRESHOLD
+    ]
+    # If no results with conversation context, try just the question
+    if not relevant_chunks and conversation_context:
+        vec = embed(question).reshape(1, -1)
+        scores, ids = INDEX.search(vec, TOP_K)
+        relevant_chunks = [
+            CHUNKS[i]
+            for i, s in zip(ids[0], scores[0])
+            if s >= SIM_THRESHOLD
+        ]
+    return relevant_chunks
+def call_llm_streaming(question: str, context: str, is_followup: bool = False):
+    """Stream LLM response while ensuring compliance."""
+    # Adjust prompt for follow-up questions
+    if is_followup:
+        prompt = (
+            "You are a Non-QM glossary assistant.\n"
+            "The user is asking for more details about a previous topic.\n"
+            "Answer with additional information from the context.\n"
+            "Keep it to 3 sentences max. Finish with this exact line:\n"
+            f"{DISCLAIMER}\n\n"
+            f"User: {question}\n"
+            f"Context:\n{context}"
+        )
+        max_tokens = 150  # Allow slightly more for elaboration
+    else:
+        prompt = (
+            "You are a Non-QM glossary assistant.\n"
+            "Answer the user only with information in the context.\n"
+            "Two sentences max. Finish with this exact line:\n"
+            f"{DISCLAIMER}\n\n"
+            f"User: {question}\n"
+            f"Context:\n{context}"
+        )
+        max_tokens = 120
+    headers = {
+        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
+        "X-Title": "nonqm-glossary-bot"
+    }
+    try:
+        resp = requests.post(
+            "https://openrouter.ai/api/v1/chat/completions",
+            headers=headers,
+            json={
+                "model": GPT_MODEL,
+                "messages": [{"role": "user", "content": prompt}],
+                "max_tokens": max_tokens,
+                "temperature": 0.3,
+                "stream": True
+            },
+            timeout=60,  # Increased timeout for OpenRouter stability
+            stream=True
+        )
+        resp.raise_for_status()
+        accumulated_text = ""
+        for line in resp.iter_lines():
+            if line:
+                line = line.decode('utf-8')
+                if line.startswith('data: '):
+                    line = line[6:]
+                    if line.strip() == '[DONE]':
+                        break
+                    try:
+                        data = json.loads(line)
+                        if 'choices' in data and len(data['choices']) > 0:
+                            delta = data['choices'][0].get('delta', {})
+                            if 'content' in delta:
+                                content = delta['content']
+                                accumulated_text += content
+                                yield accumulated_text
+                                time.sleep(0.02)  # Small delay for smooth streaming
+                    except json.JSONDecodeError:
+                        continue
+    except Exception as e:
+        # Fallback to non-streaming if streaming fails
+        yield call_llm_fallback(question, context, is_followup)
+def call_llm_fallback(question: str, context: str, is_followup: bool = False) -> str:
+    """Fallback non-streaming LLM call."""
+    if is_followup:
+        prompt = (
+            "You are a Non-QM glossary assistant.\n"
+            "The user is asking for more details about a previous topic.\n"
+            "Answer with additional information from the context.\n"
+            "Keep it to 3 sentences max. Finish with this exact line:\n"
+            f"{DISCLAIMER}\n\n"
+            f"User: {question}\n"
+            f"Context:\n{context}"
+        )
+        max_tokens = 150
+    else:
+        prompt = (
+            "You are a Non-QM glossary assistant.\n"
+            "Answer the user only with information in the context.\n"
+            "Two sentences max. Finish with this exact line:\n"
+            f"{DISCLAIMER}\n\n"
+            f"User: {question}\n"
+            f"Context:\n{context}"
+        )
+        max_tokens = 120
+    headers = {
+        "Authorization": f"Bearer {OPENROUTER_API_KEY}",
+        "X-Title": "nonqm-glossary-bot"
+    }
+    resp = requests.post(
+        "https://openrouter.ai/api/v1/chat/completions",
+        headers=headers,
+        json={
+            "model": GPT_MODEL,
+            "messages": [{"role": "user", "content": prompt}],
+            "max_tokens": max_tokens,
+            "temperature": 0.3
+        },
+        timeout=60  # Increased timeout for OpenRouter stability
+    )
+    resp.raise_for_status()
+    return resp.json()["choices"][0]["message"]["content"].strip()
+# ----- Enhanced Gradio callback with conversation memory -----
+def chat_fn(message, history):
+    # PII detection (compliance requirement)
+    if contains_pii(message):
+        yield "I cannot process messages containing personal information. Please ask about glossary terms only."
+        return
+    # Detect if this is a follow-up question
+    is_followup = detect_followup_question(message)
+    conversation_context = None
+    if is_followup and history:
+        # Get conversation context for better retrieval
+        last_topic = extract_last_topic(history)
+        if last_topic:
+            conversation_context = last_topic
+            # Try enhanced search with conversation context
+            hits = retrieve(message, conversation_context)
+        else:
+            hits = retrieve(message)
+    else:
+        # Regular retrieval for new questions
+        hits = retrieve(message)
+    # Handle no results
+    if not hits:
+        if is_followup:
+            yield "I don't have additional information on that topic in our glossary. Please ask a specific question about a Non-QM term, or contact a loan officer for more detailed assistance."
+        else:
+            yield "I'm not sure about that term. Please contact a loan officer for assistance with questions outside our glossary."
+        return
+    # Stream the response
+    context = "\n---\n".join(hits)
+    for partial_response in call_llm_streaming(message, context, is_followup):
+        yield partial_response
+# ----- Custom CSS for enhanced aesthetics -----
+custom_theme = gr.themes.Soft(
+    primary_hue="blue",
+    secondary_hue="gray",
+    neutral_hue="slate",
+).set(
+    body_background_fill="linear-gradient(135deg, #667eea 0%, #764ba2 100%)",
+    block_background_fill="*neutral_50",
+    button_primary_background_fill="linear-gradient(90deg, #667eea 0%, #764ba2 100%)",
+    button_primary_background_fill_hover="linear-gradient(90deg, #5a6fd8 0%, #6a4190 100%)",
+)
+custom_css = """
+.gradio-container {
+    max-width: 900px !important;
+    margin: auto !important;
+    border-radius: 15px !important;
+    box-shadow: 0 20px 40px rgba(0,0,0,0.1) !important;
+}
+.chat-message {
+    border-radius: 12px !important;
+    margin: 8px 0 !important;
+    padding: 12px !important;
+}
+.message-wrap {
+    max-width: 85% !important;
+}
+.user .message-wrap {
+    background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
+    color: white !important;
+}
+.bot .message-wrap {
+    background: #f8f9fa !important;
+    border: 1px solid #e9ecef !important;
+}
+.disclaimer {
+    font-style: italic !important;
+    color: #6c757d !important;
+    border-top: 1px solid #dee2e6 !important;
+    margin-top: 8px !important;
+    padding-top: 8px !important;
+}
+/* Typing animation for streaming */
+@keyframes typing {
+    0% { opacity: 0.4; }
+    50% { opacity: 1; }
+    100% { opacity: 0.4; }
+}
+.streaming-text {
+    animation: typing 1.5s infinite;
+}
+"""
+# ----- Enhanced UI -----
+with gr.Blocks(theme=custom_theme, css=custom_css, title="Non-QM Glossary Assistant") as demo:
+    gr.HTML("""
+    <div style="text-align: center; padding: 20px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; border-radius: 12px; margin-bottom: 20px;">
+        <h1 style="margin: 0; font-size: 2.5em; font-weight: 700;">🏠 Non-QM Glossary Assistant</h1>
+        <p style="margin: 10px 0 0 0; font-size: 1.2em; opacity: 0.95;">
+            Get instant, accurate definitions of Non-Qualified Mortgage terms
+        </p>
+    </div>
+    """)
+    gr.Markdown("""
+    ### 💬 How to Use This Assistant
+    - **Ask about Non-QM mortgage terms** and receive clear, accurate definitions
+    - **Ask follow-up questions** like "tell me more" or "can you elaborate" for additional details
+    - Questions outside our glossary scope will be directed to a loan officer
+    - All responses include required compliance disclaimers
+    - **No personal information** should be shared in your questions
+    **Example questions:**
+    - "What is a Non-QM loan?"
+    - "Define debt-to-income ratio"
+    - "What does DSCR mean?"
+    - "Explain asset-based lending"
+    - "Tell me more about that" (after asking about a term)
+    """)
+    chatbot = gr.ChatInterface(
+        fn=chat_fn,
+        title="Non-QM Glossary Assistant",
+        description="Ask about Non-QM mortgage terms and get instant definitions. Follow-up questions welcome!",
+        type="messages"
+    )
+    gr.HTML("""
+    <div style="text-align: center; margin-top: 20px; padding: 20px; background: #dc3545; border: 2px solid #b02a37; border-radius: 12px; box-shadow: 0 4px 12px rgba(220, 53, 69, 0.3);">
+        <p style="margin: 0; color: white; font-size: 1.1em; font-weight: 600; line-height: 1.4;">
+            <strong>⚠️ IMPORTANT COMPLIANCE NOTICE:</strong><br><br>
+            This assistant provides general information only and is NOT a commitment to lend.<br>
+            For personalized advice, loan applications, or specific financial guidance,<br>
+            please contact a qualified loan officer.
+        </p>
+    </div>
+    """)
+if __name__ == "__main__":
+    demo.launch()

build_index.py ADDED Viewed

	@@ -0,0 +1,39 @@

+import json
+from pathlib import Path
+import numpy as np
+import faiss
+import openai
+from dotenv import load_dotenv
+# ---------- setup ----------
+load_dotenv()                       # pulls OPENAI_API_KEY from .env
+client = openai.OpenAI()
+TXT_FILE = "glossary.txt"
+OUT_INDEX = "glossary.index"
+OUT_CHUNKS = "chunks.json"
+EMBED_MODEL = "text-embedding-3-small"
+# ----------------------------
+# ---------- load + chunk ----------
+txt = Path(TXT_FILE).read_text(encoding="utf8")
+chunks = [c.strip() for c in txt.split("\n\n") if c.strip()]
+# ---------- embed ----------
+def embed(texts):
+    res = client.embeddings.create(model=EMBED_MODEL, input=texts)
+    return [d.embedding for d in res.data]
+vecs = np.array(embed(chunks), dtype="float32")
+faiss.normalize_L2(vecs)            # cosine similarity wants unit vectors
+# ---------- build index ----------
+dim = vecs.shape[1]
+index = faiss.IndexFlatIP(dim)      # inner product == cosine when vectors norm-1
+index.add(vecs)
+# ---------- save ----------
+faiss.write_index(index, OUT_INDEX)
+Path(OUT_CHUNKS).write_text(json.dumps(chunks, ensure_ascii=False), encoding="utf8")
+print(f"Built {index.ntotal} vectors → {OUT_INDEX}")

chunks.json ADDED Viewed

	@@ -0,0 +1 @@

+ ["# NonQM Glossary \\& Scenarios", "Last edited by Dhruv Ratra (Polaris24) 5 days ago A", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Ability-to-Repay\n(ATR) | Federal rule (TILA § 1026.43) requiring lenders to make a reasonable determination that the\nborrower can repay the loan. Non-QM loans must still satisfy ATR, even though they don't\nmeet the \"Qualified Mortgage\" safe-harbor tests. | ATR Rule |\n| Asset-Depletion\nLoan | Underwriting method that treats a borrower's liquid assets (bank, brokerage, retirement\naccounts) as imputed income to meet ATR. | Asset Utilization,\nAsset-Qualifier |\n| Automatic Value\nModel (AVM) | Computer-generated estimate of property value, often used for DSCR or portfolio reviews\nwhen a full appraisal isn't required. | |", "B", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Bank-Statement Loan | Loan qualified primarily on 12-24 months of personal or business bank statements\ninstead of tax returns. Common for self-employed borrowers. | BS Loan, Alt-Doc Bank\nStatement |\n| Blanket Loan | Single mortgage that covers multiple properties or units. Useful for portfolio\ninvestors. | Portfolio Loan |\n| Borrower Paid\nCompensation (BPC) | When the borrower, not the lender, pays the mortgage broker's commission. | |", "C", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Cash-Flow Coverage Ratio | See Debt-Service-Coverage Ratio (DSCR). | |\n| CLTV (Combined Loan-to-Value) | Total liens on the property + appraised value. Includes first, second, and HELOC\nbalances. | |\n| Credit Event Seasoning | Elapsed time since negative credit events (BK, foreclosure, short sale).\nMeasured in months. | Seasoning Period |\n| Credit Score\n(FICO®/VantageScore®) | Numeric measure of credit risk. Many Non-QM programs go as low as 600-620. | FICO, Beacon\nScore |", "D", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Debt-Service-Coverage Ratio\n(DSCR) | Net operating income + total property debt service. For rental/investor loans a\nDSCR $\\times 1.0$ indicates the rent covers the payment. | Cash-Flow\nCoverage Ratio |\n| Documentation Level | Spectrum of required borrower docs (Full-Doc, Alt-Doc, Lite-Doc, No-Doc). | |\n| DTI (Debt-to-Income) Ratio | Total monthly debt payments + gross monthly income. Non-QM allows higher\nDTIs (e.g., 55 \\%). | |", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Exit Strategy | How a short-term bridge or fix-and-flip loan will be repaid or refinanced. | |\n| Extension Fee | Charge to prolong a short-term (bridge) loan past its original maturity. | |", "# F", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Foreign National\nLoan | Mortgage to a non-U.S. citizen who resides abroad; qualifies on foreign\ncredit/income or asset-depletion. | ITIN Loan (if borrower has Individual\nTaxpayer ID) |\n| Full-Doc | Standard underwriting with tax returns, W-2s/1099s, pay stubs. Opposite of\nAlt-Doc. | |", "G", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Gift of Equity | Seller (often family) gives part of home equity toward buyer's down payment; allowed under\nsome Non-QM guidelines. | |\n| Guideline\nMatrix | Table showing max LTV/FICO/DTI tiers for a given product. | Rate Sheet, Eligibility\nGrid |", "H", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| HCLTV (High-Credit\nLoan-to-Value) | LTV calculation that factors in the credit limit of a HELOC, not just the current\ndraw. | |\n| Hard Money Loan | Asset-based short-term loan (12-24 mo) often used for rehab or bridge\nfinancing. | Private Money |", "I", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Interest-Only\n(IO) | Payment structure where borrower pays only interest for a set period (e.g., 10 yrs), after which\namortization begins or balloon payment is due. | |\n| ITIN Borrower | Individual with an IRS Individual Taxpayer Identification Number (not SSN). Often qualifies under\nForeign National or Alt-Doc programs. | |", "J", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Jumbo (Non-Agency)\nLoan | Loan amount above conforming limits, not sold to Fannie/Freddie. May be QM or\nNon-QM. | |", "K", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Key Rate Adjustment | Rate bump applied when certain credit factors fall outside matrix tiers (e.g., recent BK). | |", "L", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Lender Paid Compensation\n(LPC) | Broker comp paid by the lender via higher rate/YSP. | |\n| Loan Program | Defined set of eligibility \\& pricing rules (e.g., \"12-Month Bank Statement, 90 \\% LTV,\nNo MI\"). | Product, Shelf |\n| Loan-to-Cost (LTC) | For rehab/ground-up builds: loan amount + total project cost. | |\n| LTV (Loan-to-Value) | First lien amount + appraised value. | |", "M", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Margin (ARM) | Fixed spread added to index rate on adjustable loans. | |\n| Minimum DSCR | Lowest acceptable DSCR for an investor loan (often 0.75-1.00). | |", "N", "| Term | Definition | Synonyms / Aliases |\n| --- | --- | --- |\n| Non-QM (Non-Qualified\nMortgage) | Any mortgage that fails at least one of the CFPB's QM safe-harbor tests (e.g., 43 \\% DTI,\npoints \\& fees, APOR threshold, doc type). Still must meet ATR. | |\n| No-Ratio Loan | Underwriting disregards borrower's DTI; focuses on collateral or assets. | NINA (No Income,\nNo Asset) |", "0", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Originator Compensation\nRule | CFPB rule limiting how brokers are paid (no steering based on comp, no dual comp).\nApplies equally to Non-QM. | |\n| Occupancy Types | Primary Residence, Second Home, Non-Owner-Occupied (Investor). Eligibility \\& pricing\nvary widely in Non-QM. | |", "P", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| | | |", "| Points \\& Fees Cap | QM loans limited to $3 \\%$ of loan amount; Non-QM has no cap but high points affect pricing \\& demand. | |\n| :--: | :--: | :--: |\n| Prepayment Penalty | Fee for paying off loan early. Common in investor DSCR loans (e.g., 3-2-1 step-down). | PPP |\n| Profit-and-Loss (P\\&L) Statement Loan | Alternate doc type where CPA-prepared P\\&L (with or without statements) substantiates income. | |", "Q", "| Term | Definition | Synonyms / Aliases |\n| :-- | :-- | :-- |\n| QM (Qualified Mortgage) | Loan meeting CFPB safe-harbor criteria (points/fees, APOR, doc type, DTI or price-based test). Opposite of Non-QM. | |", "R", "| Term | Definition | Synonyms / Aliases |\n| :-- | :-- | :-- |\n| Rate Buy-Down | Paying points at closing to secure lower note rate; thresholds differ in Non-QM pricing engines. | |\n| Reserves | Liquid assets required post-closing, expressed in months of PITIA. Non-QM often requires 6-12 mo. | |", "S", "| Term | Definition | Synonyms / Aliases |\n| :-- | :-- | :-- |\n| Scratch-and-Dent Loan | Previously funded loan with documentation/servicing defects that render it unsaleable to agencies-often securitized in Non-QM pools. | $S-\\&-D$ |\n| Seasoning (Title) | Time elapsed since acquisition or cash-out refinance. Affects max LTV for flips. | |\n| Self-Employed Borrower | $\\geq 25 \\%$ ownership in business; usually underwrites via Bank-Statement or Full-Doc 2-yr returns. | |", "T", "| Term | Definition | Synonyms / Aliases |\n| :-- | :-- | :-- |\n| Twelve-Month Bank Statement Program | Counts average monthly deposits over 12 months as qualifying income. | 12-Mo BS |\n| TILA-RESPA Integrated Disclosure (TRID) | CFPB rule dictating Loan Estimate (LE) \\& Closing Disclosure (CD) timing. Applies equally to Non-QM. | |", "U", "| Term | Definition | Synonyms / Aliases |\n| :-- | :-- | :-- |\n| Underwriting Flexibility | Degree to which a lender will grant exceptions to stated guidelines (e.g., manual ATR calculation, compensating factors). (c) Copyright 2020 MyScaler - NetTantra Technologies. All rights reserved. | |", "UWM (Ultimate Weighted Margin)", "Proprietary pricing metric some aggregators use for Non-QM bulk bids.", "V", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Verification of Employment\n(VOE)-Only Loan | Uses independent VOE to document income instead of pay stubs/tax\nreturns. | VOE Program |\n| VOI / VOA | Verification of Income / Verification of Assets via automated services (e.g.,\nPlaid, Finicity). | |", "# W", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Written Explanation Letter\n(LOE) | Borrower letter clarifying derogatory credit or cash-flow anomalies; often requested in\nNon-QM. | Letter of\nExplanation |\n| Wholesale Lender | Lender funding loans through third-party mortgage brokers. Dominant distribution\nchannel for Non-QM. | |", "X, Y, Z", "| Term | Definition | Synonyms /\nAliases |\n| --- | --- | --- |\n| Yield-Spread Premium\n(YSP) | Extra rate margin paid to broker when lender covers their comp (see LPC). | |\n| Zero-Prepay | Non-QM loan structure with no prepayment penalty—rare for DSCR but used in\nowner-occupied products. | |", "## Comments", "1 Dhruv Ratra (Polaris24) @dhruv.ooIaris24 $\\cdot 2$ weeks ago Abbreviation Quick Reference ATR $\\cdot$ AVM $\\cdot$ BS $\\cdot$ CLTV $\\cdot$ DSCR $\\cdot$ DTI $\\cdot$ FICO $\\cdot$ HCLTV $\\cdot$ IO $\\cdot$ ITIN $\\cdot$ LTV $\\cdot$ LTC $\\cdot$ LPC $\\cdot$ MSA $\\cdot$ NINA $\\cdot$ QM $\\cdot$ PPP $\\cdot$ TRID $\\cdot$ VOE $\\cdot$ VOI/VOA $\\cdot$ YSP", "2 Dhruv Ratra (Polaris24) @dhruv.ooIaris24 $\\cdot 2$ weeks ago", "## Example Scenarios", "1. Bank-Statement Loan — Income Calculation", "| Detail | Input |\n| --- | --- |\n| Borrower | Self-employed graphic-design studio owner |\n| Program | 12-Month Personal Bank-Statement |\n| Deposits | $\\$ 18,000$ average monthly credits |\n| Expense Factor | $50 \\%$ (per lender matrix) |", "Qualifying Income:", "Traditional tax returns showed only $\\$ 45 \\mathrm{k}$ AGI-insufficient. Under the Non-QM program, the borrower now meets the ATR test for a $\\$ 600 \\mathrm{k}$ purchase at $90 \\%$ LTV. 2. DSCR Rental Loan", "| Detail | Input |\n| --- | --- |\n| Monthly Gross Rent | $\\$ 2,500$ |\n| Property Taxes \\& Insurance | $\\$ 300$ |\n| Mortgage P\\&I (interest-only, year 1) | $\\$ 1,800$ |", "DSCR: $\\$ 2,500 /(\\$ 1,800+\\$ 300)(\\$ 2,500 /(\\$ 1,800+(\\$ 300)=1.19$ Lender's minimum DSCR is 1.00 , so the investor qualifies even without W-2 income. 3. Asset-Qualification / Asset-Depletion / Asset-Utilization", "| Detail | Input |\n| --- | --- |\n| Liquid Assets | $\\$ 1200000$ |\n| Amortization Term Assumed | 60 months |\n| Asset Utilization Factor | $100 \\%$ (no haircut) |", "Imputed Monthly Income: $\\$ 1,200,000+60=\\$ 20,000 / \\mathrm{mo}$ Retired borrower with minimal pension now evidences enough \"income\" to pass ATR for a $\\$ 900 \\mathrm{k}$ cash-out refi. 4. Credit-Event Seasoning", "| Detail | Input |\n| --- | --- |\n| Chapter 7 Bankruptcy Discharge | 18 months ago |\n| Lender Matrix | $\\geq 12 \\mathrm{mo}=\\mathrm{OK}$ to $80 \\% \\mathrm{LTV} ; \\geq 24 \\mathrm{mo}=\\mathrm{OK}$ to $90 \\% \\mathrm{LTV}$ |", "# Outcome:", "At 18 months seasoning, the borrower can obtain up to $80 \\%$ LTV but not $90 \\%$. Waiting six more months would open higher-LTV tiers. 5. Prepayment Penalty - 3-2-1 Step-Down", "| Year Paid Off | Penalty Calculation |\n| --- | --- |\n| Year 1 | $3 \\%$ of outstanding principal |\n| Year 2 | $2 \\%$ of outstanding principal |\n| Year 3 | $1 \\%$ of outstanding principal |\n| Year 4+ | $0 \\%$ |", "If the investor prepays $\\$ 400 \\mathrm{k}$ principal in Year 2, penalty $=\\$ 400 \\mathrm{k} \\times 2 \\%=\\$ 8,000$. 6. Interest-Only (IO) Structure", "| Detail | Input |\n| --- | --- |\n| Loan Amount | $\\$ 700 \\mathrm{k}$ |\n| IO Term | 10 years (then 20-yr amortization) |\n| Note Rate | $7.25 \\%$ |", "Year 1 Payment: Interest-only $=\\$ 700,000 \\times 7.25 \\% / 12=\\$ 4,229$ Year 11 Payment (amortized): Factor for 20-yr @ $7.25 \\% \\approx \\$ 7.90$ per $\\$ 1 \\mathrm{k} \\rightarrow \\$ 7.90 \\times 700=\\$ 5,530$ Borrower's payment jumps about $\\$ 1,300$ when amortization kicks in-disclosed via TRID. 7. Foreign National Purchase", "| Detail | Input |\n| --- | --- |\n| Borrower | Canadian citizen with no U.S. credit |\n| Program | Non-Owner-Occupled DSCR (min DSCR = 1.0) |\n| Down Payment | $25 \\%$ |\n| Documentation | Passport, Canadian bureau report, CPA letter verifying income, 12 mo reserves |", "As long as property cash flow covers debt service (DSCR $\\geq 1.0$ ), the borrower qualifies despite zero U.S. FICO. 8. High-DTI Compensating-Factor Exception", "| Detail | Input |\n| --- | --- |\n| Calculated DTI | $54 \\%$ (lender matrix max $=49 \\%$ ) |\n| Compensating Factors | DSCR 1.25, 12 mo reserves, 780 FICO, $50 \\%$ LTV |", "Underwriting manager grants manual exception based on strong compensating factors-classic Non-QM flexibility.", "These examples demonstrate how Non-QM programs bend traditional rules-yet remain measurable and risk-managed-allowing borrowers and investors otherwise locked out of agency lending to secure financing."]

glossary.index ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7fe193b3717c0d5f8c23f4bb223542c873c9d573fe532c6bcc02b69a67775855
+size 460845

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+gradio
+faiss-cpu
+openai
+python-dotenv
+requests
+rapidfuzz      # optional spelling helper
+numpy