paradox44 commited on
Commit
4c6c5df
Β·
verified Β·
1 Parent(s): bd7261b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +252 -241
README.md CHANGED
@@ -1,241 +1,252 @@
1
- # Non-QM Glossary Chatbot
2
-
3
- A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
4
-
5
- ## Features
6
-
7
- - 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
8
- - πŸ’¬ **Conversation Memory**: Smart follow-up question handling
9
- - πŸ”’ **Compliance First**: Built-in disclaimers and PII protection
10
- - ⚑ **Streaming Responses**: Real-time text generation
11
- - 🎨 **Professional UI**: Modern Gradio interface with custom styling
12
- - πŸ’° **Cost Efficient**: Optimized for <$10/month operation
13
-
14
- ## Prerequisites
15
-
16
- - Python 3.8 or higher
17
- - OpenAI API key (for embeddings)
18
- - OpenRouter API key (for Gemini LLM access)
19
-
20
- ## Installation
21
-
22
- 1. **Clone the repository:**
23
- ```bash
24
- git clone <repository-url>
25
- cd ChatBot
26
- ```
27
-
28
- 2. **Create and activate a virtual environment:**
29
- ```bash
30
- python -m venv venv
31
-
32
- # On Windows:
33
- venv\Scripts\activate
34
-
35
- # On macOS/Linux:
36
- source venv/bin/activate
37
- ```
38
-
39
- 3. **Install dependencies:**
40
- ```bash
41
- pip install -r requirements.txt
42
- ```
43
-
44
- ## API Key Setup
45
-
46
- ### 1. OpenAI API Key
47
- 1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
48
- 2. Create a new API key
49
- 3. Copy the key (starts with `sk-proj-...`)
50
-
51
- ### 2. OpenRouter API Key
52
- 1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
53
- 2. Create a new API key
54
- 3. Copy the key (starts with `sk-or-...`)
55
-
56
- ### 3. Environment Configuration
57
-
58
- Create a `.env` file in the project root:
59
-
60
- ```bash
61
- # Create .env file
62
- touch .env
63
- ```
64
-
65
- Add your API keys to the `.env` file:
66
-
67
- ```env
68
- OPENAI_API_KEY=sk-proj-your-openai-key-here
69
- OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
70
- ```
71
-
72
- ⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
73
-
74
- ## Running the Application
75
-
76
- ### 1. Generate Vector Index (First Time Only)
77
-
78
- Before running the chatbot for the first time, generate the search index:
79
-
80
- ```bash
81
- python build_index.py
82
- ```
83
-
84
- This creates:
85
- - `glossary.index` - FAISS vector search index
86
- - `chunks.json` - Text chunks metadata
87
-
88
- ### 2. Start the Chatbot
89
-
90
- ```bash
91
- python app.py
92
- ```
93
-
94
- The application will start and display:
95
- ```
96
- Running on local URL: http://127.0.0.1:7860
97
- ```
98
-
99
- ### 3. Access the Interface
100
-
101
- Open your browser and go to: `http://127.0.0.1:7860`
102
-
103
- ## Usage
104
-
105
- ### Basic Questions
106
- Ask about Non-QM mortgage terms:
107
- - "What is a Non-QM loan?"
108
- - "Define debt-to-income ratio"
109
- - "What does DSCR mean?"
110
- - "Explain asset-based lending"
111
-
112
- ### Follow-up Questions
113
- The chatbot remembers conversation context:
114
- - After asking about a term, say "tell me more"
115
- - "Can you elaborate on that?"
116
- - "Give me more details"
117
-
118
- ### What NOT to Ask
119
- - Personal financial information
120
- - Rate quotes or loan applications
121
- - Questions outside the glossary scope
122
-
123
- ## Project Structure
124
-
125
- ```
126
- ChatBot/
127
- β”œβ”€β”€ app.py # Main Gradio application
128
- β”œβ”€β”€ build_index.py # Vector index generation
129
- β”œβ”€β”€ requirements.txt # Python dependencies
130
- β”œβ”€β”€ glossary.txt # Source glossary content
131
- β”œβ”€β”€ glossary.index # Generated FAISS index (after build)
132
- β”œβ”€β”€ chunks.json # Generated text chunks (after build)
133
- β”œβ”€β”€ .env # API keys (create this file)
134
- β”œβ”€β”€ .gitignore # Files to exclude from git
135
- └── memory-bank/ # Project documentation
136
- ```
137
-
138
- ## Configuration
139
-
140
- Key settings in `app.py`:
141
-
142
- ```python
143
- EMBED_MODEL = "text-embedding-3-small" # OpenAI embeddings
144
- GPT_MODEL = "google/gemini-2.5-flash-preview-05-20" # OpenRouter LLM
145
- SIM_THRESHOLD = 0.30 # Similarity threshold
146
- TOP_K = 3 # Number of chunks to retrieve
147
- ```
148
-
149
- ## Deployment
150
-
151
- ### Hugging Face Spaces
152
-
153
- 1. **Create a new Space:**
154
- - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
155
- - Choose Gradio SDK
156
- - Set hardware to CPU Basic (free)
157
-
158
- 2. **Upload required files:**
159
- ```
160
- app.py
161
- requirements.txt
162
- glossary.txt
163
- glossary.index
164
- chunks.json
165
- build_index.py
166
- ```
167
-
168
- 3. **Configure secrets in HF Spaces:**
169
- - Go to Settings β†’ Variables and Secrets
170
- - Add `OPENAI_API_KEY`
171
- - Add `OPENROUTER_API_KEY`
172
-
173
- 4. **Deploy:**
174
- - Push files to the Space repository
175
- - The app will automatically build and deploy
176
-
177
- ## Maintenance
178
-
179
- ### Updating the Glossary
180
-
181
- 1. Edit `glossary.txt` with new terms
182
- 2. Regenerate the index:
183
- ```bash
184
- python build_index.py
185
- ```
186
- 3. Restart the application
187
-
188
- ### Cost Monitoring
189
-
190
- - **OpenAI**: ~$0.0001 per query (embeddings)
191
- - **OpenRouter**: ~$0.005 per response (Gemini)
192
- - **Target**: <$10/month total operation
193
-
194
- ### Troubleshooting
195
-
196
- **Common Issues:**
197
-
198
- 1. **"Module not found" error:**
199
- ```bash
200
- pip install -r requirements.txt
201
- ```
202
-
203
- 2. **"No such file" for index files:**
204
- ```bash
205
- python build_index.py
206
- ```
207
-
208
- 3. **API key errors:**
209
- - Check `.env` file exists and has correct keys
210
- - Verify API keys are valid and have sufficient credits
211
-
212
- 4. **Import errors:**
213
- ```bash
214
- pip install faiss-cpu numpy openai requests gradio python-dotenv
215
- ```
216
-
217
- ## Compliance Features
218
-
219
- - **Automatic Disclaimers**: Every response includes required compliance text
220
- - **PII Detection**: Blocks emails, SSNs, and credit score references
221
- - **Scope Limiting**: Only answers questions about glossary terms
222
- - **Session Memory**: Context resets when chat is cleared (no persistent data)
223
-
224
- ## Security
225
-
226
- - API keys stored in environment variables
227
- - No user data persistence
228
- - Input sanitization and validation
229
- - PII detection and rejection
230
-
231
- ## Support
232
-
233
- For technical issues:
234
- 1. Check the troubleshooting section above
235
- 2. Verify all dependencies are installed
236
- 3. Ensure API keys are correctly configured
237
- 4. Check that vector index files exist
238
-
239
- ## License
240
-
241
- This project is designed for internal compliance-focused use with strict business requirements.
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Non-QM Glossary Bot
3
+ emoji: 🏠
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: "4.26.0" # any current Gradio version is fine
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Non-QM Glossary Chatbot
13
+
14
+ A professional RAG-powered chatbot that provides instant, accurate definitions of Non-Qualified Mortgage terms with strict compliance controls and conversation memory.
15
+
16
+ ## Features
17
+
18
+ - 🏠 **Non-QM Expertise**: Specialized glossary of mortgage terminology
19
+ - πŸ’¬ **Conversation Memory**: Smart follow-up question handling
20
+ - πŸ”’ **Compliance First**: Built-in disclaimers and PII protection
21
+ - ⚑ **Streaming Responses**: Real-time text generation
22
+ - 🎨 **Professional UI**: Modern Gradio interface with custom styling
23
+ - πŸ’° **Cost Efficient**: Optimized for <$10/month operation
24
+
25
+ ## Prerequisites
26
+
27
+ - Python 3.8 or higher
28
+ - OpenAI API key (for embeddings)
29
+ - OpenRouter API key (for Gemini LLM access)
30
+
31
+ ## Installation
32
+
33
+ 1. **Clone the repository:**
34
+ ```bash
35
+ git clone <repository-url>
36
+ cd ChatBot
37
+ ```
38
+
39
+ 2. **Create and activate a virtual environment:**
40
+ ```bash
41
+ python -m venv venv
42
+
43
+ # On Windows:
44
+ venv\Scripts\activate
45
+
46
+ # On macOS/Linux:
47
+ source venv/bin/activate
48
+ ```
49
+
50
+ 3. **Install dependencies:**
51
+ ```bash
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ ## API Key Setup
56
+
57
+ ### 1. OpenAI API Key
58
+ 1. Go to [OpenAI API Keys](https://platform.openai.com/api-keys)
59
+ 2. Create a new API key
60
+ 3. Copy the key (starts with `sk-proj-...`)
61
+
62
+ ### 2. OpenRouter API Key
63
+ 1. Go to [OpenRouter Keys](https://openrouter.ai/keys)
64
+ 2. Create a new API key
65
+ 3. Copy the key (starts with `sk-or-...`)
66
+
67
+ ### 3. Environment Configuration
68
+
69
+ Create a `.env` file in the project root:
70
+
71
+ ```bash
72
+ # Create .env file
73
+ touch .env
74
+ ```
75
+
76
+ Add your API keys to the `.env` file:
77
+
78
+ ```env
79
+ OPENAI_API_KEY=sk-proj-your-openai-key-here
80
+ OPENROUTER_API_KEY=sk-or-your-openrouter-key-here
81
+ ```
82
+
83
+ ⚠️ **Important:** Never commit your `.env` file to version control. It's already included in `.gitignore`.
84
+
85
+ ## Running the Application
86
+
87
+ ### 1. Generate Vector Index (First Time Only)
88
+
89
+ Before running the chatbot for the first time, generate the search index:
90
+
91
+ ```bash
92
+ python build_index.py
93
+ ```
94
+
95
+ This creates:
96
+ - `glossary.index` - FAISS vector search index
97
+ - `chunks.json` - Text chunks metadata
98
+
99
+ ### 2. Start the Chatbot
100
+
101
+ ```bash
102
+ python app.py
103
+ ```
104
+
105
+ The application will start and display:
106
+ ```
107
+ Running on local URL: http://127.0.0.1:7860
108
+ ```
109
+
110
+ ### 3. Access the Interface
111
+
112
+ Open your browser and go to: `http://127.0.0.1:7860`
113
+
114
+ ## Usage
115
+
116
+ ### Basic Questions
117
+ Ask about Non-QM mortgage terms:
118
+ - "What is a Non-QM loan?"
119
+ - "Define debt-to-income ratio"
120
+ - "What does DSCR mean?"
121
+ - "Explain asset-based lending"
122
+
123
+ ### Follow-up Questions
124
+ The chatbot remembers conversation context:
125
+ - After asking about a term, say "tell me more"
126
+ - "Can you elaborate on that?"
127
+ - "Give me more details"
128
+
129
+ ### What NOT to Ask
130
+ - Personal financial information
131
+ - Rate quotes or loan applications
132
+ - Questions outside the glossary scope
133
+
134
+ ## Project Structure
135
+
136
+ ```
137
+ ChatBot/
138
+ β”œβ”€β”€ app.py # Main Gradio application
139
+ β”œβ”€β”€ build_index.py # Vector index generation
140
+ β”œβ”€β”€ requirements.txt # Python dependencies
141
+ β”œβ”€β”€ glossary.txt # Source glossary content
142
+ β”œβ”€β”€ glossary.index # Generated FAISS index (after build)
143
+ β”œβ”€β”€ chunks.json # Generated text chunks (after build)
144
+ β”œβ”€β”€ .env # API keys (create this file)
145
+ β”œβ”€β”€ .gitignore # Files to exclude from git
146
+ └── memory-bank/ # Project documentation
147
+ ```
148
+
149
+ ## Configuration
150
+
151
+ Key settings in `app.py`:
152
+
153
+ ```python
154
+ EMBED_MODEL = "text-embedding-3-small" # OpenAI embeddings
155
+ GPT_MODEL = "google/gemini-2.5-flash-preview-05-20" # OpenRouter LLM
156
+ SIM_THRESHOLD = 0.30 # Similarity threshold
157
+ TOP_K = 3 # Number of chunks to retrieve
158
+ ```
159
+
160
+ ## Deployment
161
+
162
+ ### Hugging Face Spaces
163
+
164
+ 1. **Create a new Space:**
165
+ - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
166
+ - Choose Gradio SDK
167
+ - Set hardware to CPU Basic (free)
168
+
169
+ 2. **Upload required files:**
170
+ ```
171
+ app.py
172
+ requirements.txt
173
+ glossary.txt
174
+ glossary.index
175
+ chunks.json
176
+ build_index.py
177
+ ```
178
+
179
+ 3. **Configure secrets in HF Spaces:**
180
+ - Go to Settings β†’ Variables and Secrets
181
+ - Add `OPENAI_API_KEY`
182
+ - Add `OPENROUTER_API_KEY`
183
+
184
+ 4. **Deploy:**
185
+ - Push files to the Space repository
186
+ - The app will automatically build and deploy
187
+
188
+ ## Maintenance
189
+
190
+ ### Updating the Glossary
191
+
192
+ 1. Edit `glossary.txt` with new terms
193
+ 2. Regenerate the index:
194
+ ```bash
195
+ python build_index.py
196
+ ```
197
+ 3. Restart the application
198
+
199
+ ### Cost Monitoring
200
+
201
+ - **OpenAI**: ~$0.0001 per query (embeddings)
202
+ - **OpenRouter**: ~$0.005 per response (Gemini)
203
+ - **Target**: <$10/month total operation
204
+
205
+ ### Troubleshooting
206
+
207
+ **Common Issues:**
208
+
209
+ 1. **"Module not found" error:**
210
+ ```bash
211
+ pip install -r requirements.txt
212
+ ```
213
+
214
+ 2. **"No such file" for index files:**
215
+ ```bash
216
+ python build_index.py
217
+ ```
218
+
219
+ 3. **API key errors:**
220
+ - Check `.env` file exists and has correct keys
221
+ - Verify API keys are valid and have sufficient credits
222
+
223
+ 4. **Import errors:**
224
+ ```bash
225
+ pip install faiss-cpu numpy openai requests gradio python-dotenv
226
+ ```
227
+
228
+ ## Compliance Features
229
+
230
+ - **Automatic Disclaimers**: Every response includes required compliance text
231
+ - **PII Detection**: Blocks emails, SSNs, and credit score references
232
+ - **Scope Limiting**: Only answers questions about glossary terms
233
+ - **Session Memory**: Context resets when chat is cleared (no persistent data)
234
+
235
+ ## Security
236
+
237
+ - API keys stored in environment variables
238
+ - No user data persistence
239
+ - Input sanitization and validation
240
+ - PII detection and rejection
241
+
242
+ ## Support
243
+
244
+ For technical issues:
245
+ 1. Check the troubleshooting section above
246
+ 2. Verify all dependencies are installed
247
+ 3. Ensure API keys are correctly configured
248
+ 4. Check that vector index files exist
249
+
250
+ ## License
251
+
252
+ This project is designed for internal compliance-focused use with strict business requirements.