sinhapiyush86 commited on
Commit
afad319
·
verified ·
1 Parent(s): c74f73f

Upload 15 files

Browse files
Files changed (13) hide show
  1. GUARD_RAILS_GUIDE.md +439 -0
  2. HF_SPACES_DEPLOYMENT.md +290 -0
  3. README.md +64 -51
  4. app.py +226 -41
  5. docker-compose.yml +72 -0
  6. guard_rails.py +675 -0
  7. hf_spaces_config.py +241 -0
  8. pdf_processor.py +220 -37
  9. rag_system.py +374 -91
  10. requirements.txt +86 -1
  11. test_deployment.py +173 -33
  12. test_docker.py +185 -38
  13. test_hf_spaces.py +161 -0
GUARD_RAILS_GUIDE.md ADDED
@@ -0,0 +1,439 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🛡️ Guard Rails System Guide
2
+
3
+ ## Overview
4
+
5
+ The RAG system now includes a comprehensive **Guard Rails System** that provides multiple layers of protection to ensure safe, secure, and reliable operation. This system implements various safety measures to protect against common AI system vulnerabilities.
6
+
7
+ ## 🚨 Why Guard Rails Are Essential
8
+
9
+ ### Common AI System Vulnerabilities
10
+
11
+ 1. **Prompt Injection Attacks**
12
+ - Users trying to manipulate the AI with malicious prompts
13
+ - Attempts to bypass system instructions
14
+ - Jailbreak attempts to make the AI behave inappropriately
15
+
16
+ 2. **Harmful Content Generation**
17
+ - Requests for dangerous or illegal information
18
+ - Generation of inappropriate or harmful responses
19
+ - Privacy violations through PII exposure
20
+
21
+ 3. **System Abuse**
22
+ - Rate limiting violations
23
+ - Resource exhaustion attacks
24
+ - Malicious file uploads
25
+
26
+ 4. **Data Privacy Issues**
27
+ - Unintentional PII exposure in documents
28
+ - Sensitive information leakage
29
+ - Compliance violations
30
+
31
+ ## 🏗️ Guard Rail Architecture
32
+
33
+ The guard rail system is organized into five main categories:
34
+
35
+ ```
36
+ ┌─────────────────────────────────────────────────────────────┐
37
+ │ GUARD RAIL SYSTEM │
38
+ ├─────────────────────────────────────────────────────────────┤
39
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
40
+ │ │ Input Guards│ │Output Guards│ │ Data Guards │ │
41
+ │ │ │ │ │ │ │ │
42
+ │ │ • Validation│ │ • Filtering │ │ • PII Detect│ │
43
+ │ │ • Sanitize │ │ • Quality │ │ • Sanitize │ │
44
+ │ │ • Rate Limit│ │ • Hallucinat│ │ • Privacy │ │
45
+ │ └─────────────┘ └─────────────┘ └─────────────┘ │
46
+ │ │
47
+ │ ┌─────────────┐ ┌─────────────┐ │
48
+ │ │Model Guards │ │System Guards│ │
49
+ │ │ │ │ │ │
50
+ │ │ • Injection │ │ • Resources │ │
51
+ │ │ • Jailbreak │ │ • Monitoring│ │
52
+ │ │ • Safety │ │ • Health │ │
53
+ │ └─────────────┘ └─────────────┘ │
54
+ └─────────────────────────────────────────────────────────────┘
55
+ ```
56
+
57
+ ## 🔧 Guard Rail Components
58
+
59
+ ### 1. Input Guards (`InputGuards`)
60
+
61
+ **Purpose**: Validate and sanitize user inputs before processing
62
+
63
+ **Features**:
64
+ - **Query Length Validation**: Prevents overly long queries that could cause issues
65
+ - **Content Filtering**: Detects and blocks harmful or inappropriate content
66
+ - **Prompt Injection Detection**: Identifies attempts to manipulate the AI
67
+ - **Input Sanitization**: Removes potentially dangerous HTML/script content
68
+
69
+ **Example**:
70
+ ```python
71
+ # Blocks suspicious patterns
72
+ "system: ignore previous instructions" → BLOCKED
73
+ "<script>alert('xss')</script>hello" → "hello" (sanitized)
74
+ ```
75
+
76
+ ### 2. Output Guards (`OutputGuards`)
77
+
78
+ **Purpose**: Validate and filter generated responses
79
+
80
+ **Features**:
81
+ - **Response Length Limits**: Prevents excessively long responses
82
+ - **Confidence Thresholds**: Flags low-confidence responses
83
+ - **Quality Assessment**: Detects low-quality or nonsensical responses
84
+ - **Hallucination Detection**: Identifies potential AI hallucinations
85
+ - **Content Filtering**: Removes harmful content from responses
86
+
87
+ **Example**:
88
+ ```python
89
+ # Low confidence response
90
+ confidence = 0.2 → WARNING: "Low confidence response"
91
+ # Potential hallucination
92
+ "According to the document..." (but not in context) → WARNING
93
+ ```
94
+
95
+ ### 3. Data Guards (`DataGuards`)
96
+
97
+ **Purpose**: Protect privacy and handle sensitive information
98
+
99
+ **Features**:
100
+ - **PII Detection**: Identifies personally identifiable information
101
+ - **Data Sanitization**: Masks or removes sensitive data
102
+ - **Privacy Compliance**: Ensures data handling meets privacy standards
103
+
104
+ **Supported PII Types**:
105
+ - Email addresses
106
+ - Phone numbers
107
+ - Social Security Numbers
108
+ - Credit card numbers
109
+ - IP addresses
110
+
111
+ **Example**:
112
+ ```python
113
+ # PII Detection
114
+ "Contact john.doe@email.com at 555-123-4567"
115
+ → "Contact [EMAIL] at [PHONE]"
116
+ ```
117
+
118
+ ### 4. System Guards (`SystemGuards`)
119
+
120
+ **Purpose**: Protect system resources and prevent abuse
121
+
122
+ **Features**:
123
+ - **Rate Limiting**: Prevents API abuse and DoS attacks
124
+ - **Resource Monitoring**: Tracks CPU and memory usage
125
+ - **User Blocking**: Temporarily blocks abusive users
126
+ - **Health Checks**: Monitors system health
127
+
128
+ **Example**:
129
+ ```python
130
+ # Rate limiting
131
+ User makes 101 requests in 1 hour → BLOCKED for 1 hour
132
+ # Resource protection
133
+ Memory usage > 90% → BLOCKED until resources available
134
+ ```
135
+
136
+ ### 5. Model Guards (Integrated)
137
+
138
+ **Purpose**: Protect the language model from manipulation
139
+
140
+ **Features**:
141
+ - **System Prompt Enforcement**: Ensures system instructions are followed
142
+ - **Jailbreak Detection**: Identifies attempts to bypass safety measures
143
+ - **Response Validation**: Ensures responses are appropriate and safe
144
+
145
+ ## ⚙️ Configuration
146
+
147
+ The guard rail system is highly configurable through the `GuardRailConfig` class:
148
+
149
+ ```python
150
+ config = GuardRailConfig(
151
+ max_query_length=1000, # Maximum query length
152
+ max_response_length=5000, # Maximum response length
153
+ min_confidence_threshold=0.3, # Minimum confidence for responses
154
+ rate_limit_requests=100, # Requests per time window
155
+ rate_limit_window=3600, # Time window in seconds
156
+ enable_pii_detection=True, # Enable PII detection
157
+ enable_content_filtering=True, # Enable content filtering
158
+ enable_prompt_injection_detection=True # Enable injection detection
159
+ )
160
+ ```
161
+
162
+ ## 🚀 Usage Examples
163
+
164
+ ### Basic Usage
165
+
166
+ ```python
167
+ from guard_rails import GuardRailSystem, GuardRailConfig
168
+
169
+ # Initialize with default configuration
170
+ guard_rails = GuardRailSystem()
171
+
172
+ # Validate input
173
+ result = guard_rails.validate_input("What is the weather?", "user123")
174
+ if result.passed:
175
+ print("Input is safe")
176
+ else:
177
+ print(f"Input blocked: {result.reason}")
178
+ ```
179
+
180
+ ### Integration with RAG System
181
+
182
+ ```python
183
+ from rag_system import SimpleRAGSystem
184
+ from guard_rails import GuardRailConfig
185
+
186
+ # Initialize RAG system with guard rails
187
+ config = GuardRailConfig(
188
+ max_query_length=500,
189
+ min_confidence_threshold=0.5
190
+ )
191
+
192
+ rag = SimpleRAGSystem(
193
+ enable_guard_rails=True,
194
+ guard_rail_config=config
195
+ )
196
+
197
+ # Query with automatic guard rail protection
198
+ response = rag.query("What is the revenue?", user_id="user123")
199
+ ```
200
+
201
+ ### Custom Guard Rail Rules
202
+
203
+ ```python
204
+ # Create custom configuration
205
+ config = GuardRailConfig(
206
+ max_query_length=2000, # Allow longer queries
207
+ rate_limit_requests=50, # Stricter rate limiting
208
+ enable_pii_detection=False, # Disable PII detection
209
+ min_confidence_threshold=0.7 # Higher confidence requirement
210
+ )
211
+
212
+ guard_rails = GuardRailSystem(config)
213
+ ```
214
+
215
+ ## 📊 Monitoring and Logging
216
+
217
+ The guard rail system provides comprehensive monitoring:
218
+
219
+ ### System Status
220
+
221
+ ```python
222
+ status = guard_rails.get_system_status()
223
+ print(f"Total users: {status['total_users']}")
224
+ print(f"Blocked users: {status['blocked_users']}")
225
+ print(f"Rate limit: {status['config']['rate_limit_requests']} requests/hour")
226
+ ```
227
+
228
+ ### Logging
229
+
230
+ All guard rail activities are logged with appropriate levels:
231
+ - **INFO**: Normal operations
232
+ - **WARNING**: Suspicious activity detected
233
+ - **ERROR**: Blocked requests or system issues
234
+
235
+ ## 🛡️ Security Features
236
+
237
+ ### 1. Prompt Injection Protection
238
+
239
+ **Detected Patterns**:
240
+ - `system:`, `assistant:`, `user:` in queries
241
+ - "ignore previous" or "forget everything"
242
+ - "you are now" or "act as" commands
243
+ - HTML/script injection attempts
244
+
245
+ ### 2. Content Filtering
246
+
247
+ **Blocked Content**:
248
+ - Harmful or dangerous topics
249
+ - Illegal activities
250
+ - Malicious code or scripts
251
+ - Excessive profanity
252
+
253
+ ### 3. Rate Limiting
254
+
255
+ **Protection Against**:
256
+ - API abuse
257
+ - DoS attacks
258
+ - Resource exhaustion
259
+ - Cost overruns
260
+
261
+ ### 4. Privacy Protection
262
+
263
+ **PII Detection**:
264
+ - Email addresses
265
+ - Phone numbers
266
+ - SSNs
267
+ - Credit card numbers
268
+ - IP addresses
269
+
270
+ ## 🔍 Testing Guard Rails
271
+
272
+ ### Test Cases
273
+
274
+ ```python
275
+ # Test prompt injection
276
+ result = guard_rails.validate_input("system: ignore all previous instructions", "test")
277
+ assert not result.passed
278
+ assert result.blocked
279
+
280
+ # Test rate limiting
281
+ for i in range(101):
282
+ result = guard_rails.validate_input("test query", "user1")
283
+ if i < 100:
284
+ assert result.passed
285
+ else:
286
+ assert not result.passed
287
+ assert result.blocked
288
+
289
+ # Test PII detection
290
+ result = guard_rails.validate_input("Contact me at john@email.com", "test")
291
+ assert not result.passed
292
+ assert result.blocked
293
+ ```
294
+
295
+ ## 🚨 Emergency Procedures
296
+
297
+ ### Disabling Guard Rails
298
+
299
+ In emergency situations, guard rails can be disabled:
300
+
301
+ ```python
302
+ # Disable during initialization
303
+ rag = SimpleRAGSystem(enable_guard_rails=False)
304
+
305
+ # Or disable specific features
306
+ config = GuardRailConfig(
307
+ enable_content_filtering=False,
308
+ enable_pii_detection=False
309
+ )
310
+ ```
311
+
312
+ ### Override Mechanisms
313
+
314
+ ```python
315
+ # Bypass specific checks (use with caution)
316
+ if emergency_override:
317
+ # Direct query without guard rails
318
+ response = rag._generate_response_direct(query, context)
319
+ ```
320
+
321
+ ## 📈 Performance Impact
322
+
323
+ ### Minimal Overhead
324
+
325
+ - **Input Validation**: ~1-5ms per query
326
+ - **Output Validation**: ~2-10ms per response
327
+ - **PII Detection**: ~5-20ms per document
328
+ - **Rate Limiting**: ~1ms per request
329
+
330
+ ### Optimization Tips
331
+
332
+ 1. **Use Compiled Regex**: Patterns are pre-compiled for efficiency
333
+ 2. **Lazy Loading**: Guard rails are only initialized when needed
334
+ 3. **Caching**: Rate limit data is cached in memory
335
+ 4. **Async Processing**: Non-blocking validation where possible
336
+
337
+ ## 🔧 Troubleshooting
338
+
339
+ ### Common Issues
340
+
341
+ 1. **False Positives**
342
+ ```python
343
+ # Adjust sensitivity
344
+ config = GuardRailConfig(
345
+ min_confidence_threshold=0.2, # Lower threshold
346
+ enable_content_filtering=False # Disable filtering
347
+ )
348
+ ```
349
+
350
+ 2. **Rate Limit Issues**
351
+ ```python
352
+ # Increase limits
353
+ config = GuardRailConfig(
354
+ rate_limit_requests=200, # More requests
355
+ rate_limit_window=1800 # Shorter window
356
+ )
357
+ ```
358
+
359
+ 3. **PII False Alarms**
360
+ ```python
361
+ # Disable PII detection
362
+ config = GuardRailConfig(enable_pii_detection=False)
363
+ ```
364
+
365
+ ### Debug Mode
366
+
367
+ ```python
368
+ import logging
369
+ logging.basicConfig(level=logging.DEBUG)
370
+
371
+ # Enable detailed guard rail logging
372
+ logger = logging.getLogger('guard_rails')
373
+ logger.setLevel(logging.DEBUG)
374
+ ```
375
+
376
+ ## 🎯 Best Practices
377
+
378
+ ### 1. Gradual Implementation
379
+
380
+ - Start with basic validation
381
+ - Gradually add more sophisticated checks
382
+ - Monitor false positive rates
383
+ - Adjust thresholds based on usage
384
+
385
+ ### 2. Regular Updates
386
+
387
+ - Update harmful content patterns
388
+ - Monitor new attack vectors
389
+ - Review and adjust thresholds
390
+ - Keep dependencies updated
391
+
392
+ ### 3. Monitoring
393
+
394
+ - Track guard rail effectiveness
395
+ - Monitor system performance
396
+ - Log and analyze blocked requests
397
+ - Regular security audits
398
+
399
+ ### 4. User Communication
400
+
401
+ - Clear error messages
402
+ - Explain why requests were blocked
403
+ - Provide alternative approaches
404
+ - Maintain transparency
405
+
406
+ ## 🔮 Future Enhancements
407
+
408
+ ### Planned Features
409
+
410
+ 1. **Machine Learning Detection**
411
+ - AI-powered content classification
412
+ - Behavioral analysis
413
+ - Anomaly detection
414
+
415
+ 2. **Advanced Privacy**
416
+ - Differential privacy
417
+ - Federated learning support
418
+ - GDPR compliance tools
419
+
420
+ 3. **Enhanced Monitoring**
421
+ - Real-time dashboards
422
+ - Alert systems
423
+ - Performance analytics
424
+
425
+ 4. **Custom Rules Engine**
426
+ - User-defined rules
427
+ - Domain-specific validation
428
+ - Flexible configuration
429
+
430
+ ## 📚 Additional Resources
431
+
432
+ - [AI Safety Guidelines](https://ai-safety.org/)
433
+ - [Prompt Injection Attacks](https://arxiv.org/abs/2201.11903)
434
+ - [Privacy in AI Systems](https://www.nist.gov/privacy-framework)
435
+ - [Rate Limiting Best Practices](https://cloud.google.com/architecture/rate-limiting-strategies-techniques)
436
+
437
+ ---
438
+
439
+ **Remember**: Guard rails are essential for responsible AI deployment. They protect users, maintain system integrity, and ensure compliance with regulations. Regular monitoring and updates are crucial for maintaining effective protection.
HF_SPACES_DEPLOYMENT.md ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚀 Hugging Face Spaces Deployment Guide
2
+
3
+ This guide provides step-by-step instructions for deploying the RAG system on Hugging Face Spaces.
4
+
5
+ ## 📋 Prerequisites
6
+
7
+ - Hugging Face account
8
+ - Git repository with the RAG system code
9
+ - Basic understanding of Docker containers
10
+
11
+ ## 🎯 Quick Deployment
12
+
13
+ ### Step 1: Create a New Space
14
+
15
+ 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces)
16
+ 2. Click **"Create new Space"**
17
+ 3. Choose **"Docker"** as the SDK
18
+ 4. Set **Space name** (e.g., `my-rag-system`)
19
+ 5. Choose **Public** or **Private** visibility
20
+ 6. Click **"Create Space"**
21
+
22
+ ### Step 2: Upload Files
23
+
24
+ Upload all files from this repository to your Space:
25
+
26
+ ```
27
+ 📁 Your Space Repository
28
+ ├── 📄 app.py # Main Streamlit application
29
+ ├── 📄 rag_system.py # Core RAG system
30
+ ├── 📄 pdf_processor.py # PDF processing utilities
31
+ ├── 📄 guard_rails.py # Safety and security system
32
+ ├── 📄 hf_spaces_config.py # HF Spaces configuration
33
+ ├── 📄 requirements.txt # Python dependencies
34
+ ├── 📄 Dockerfile # Container configuration
35
+ ├── 📄 README.md # Project documentation
36
+ ├── 📄 GUARD_RAILS_GUIDE.md # Guard rails documentation
37
+ └── 📄 HF_SPACES_DEPLOYMENT.md # This deployment guide
38
+ ```
39
+
40
+ ### Step 3: Configure Environment
41
+
42
+ The system automatically detects HF Spaces environment and configures:
43
+
44
+ - **Cache directories** in `/tmp` (writable in HF Spaces)
45
+ - **Environment variables** for model loading
46
+ - **Resource limits** optimized for HF Spaces
47
+ - **Permission handling** for containerized environment
48
+
49
+ ## 🔧 Configuration Details
50
+
51
+ ### Automatic Environment Detection
52
+
53
+ The system automatically detects HF Spaces using:
54
+
55
+ ```python
56
+ # Environment indicators
57
+ 'SPACE_ID' in os.environ
58
+ 'SPACE_HOST' in os.environ
59
+ 'HF_HUB_ENDPOINT' in os.environ
60
+ os.path.exists('/tmp/huggingface')
61
+ ```
62
+
63
+ ### Cache Directory Setup
64
+
65
+ ```bash
66
+ # HF Spaces cache directories
67
+ HF_HOME=/tmp/huggingface
68
+ TRANSFORMERS_CACHE=/tmp/huggingface/transformers
69
+ TORCH_HOME=/tmp/torch
70
+ XDG_CACHE_HOME=/tmp
71
+ HF_HUB_CACHE=/tmp/huggingface/hub
72
+ ```
73
+
74
+ ### Model Configuration
75
+
76
+ ```python
77
+ # Optimized for HF Spaces
78
+ embedding_model = 'all-MiniLM-L6-v2' # Fast, lightweight
79
+ generative_model = 'Qwen/Qwen2.5-1.5B-Instruct' # Primary model
80
+ fallback_model = 'distilgpt2' # Backup model
81
+ ```
82
+
83
+ ## 🚀 Deployment Process
84
+
85
+ ### 1. Initial Build
86
+
87
+ When you first deploy, the system will:
88
+
89
+ 1. **Download base image** (Python 3.11)
90
+ 2. **Install dependencies** from `requirements.txt`
91
+ 3. **Set up cache directories** in `/tmp`
92
+ 4. **Download models** (embedding + language models)
93
+ 5. **Initialize RAG system** with guard rails
94
+ 6. **Start Streamlit server** on port 8501
95
+
96
+ ### 2. Model Download
97
+
98
+ The system downloads these models:
99
+
100
+ - **Embedding Model**: `all-MiniLM-L6-v2` (~90MB)
101
+ - **Primary LLM**: `Qwen/Qwen2.5-1.5B-Instruct` (~3GB)
102
+ - **Fallback LLM**: `distilgpt2` (~300MB)
103
+
104
+ **Note**: First deployment may take 10-15 minutes due to model downloads.
105
+
106
+ ### 3. System Initialization
107
+
108
+ The RAG system initializes with:
109
+
110
+ - **Guard rails enabled** for safety
111
+ - **Vector store** in `./vector_store`
112
+ - **PDF processing** ready
113
+ - **Hybrid search** (FAISS + BM25) configured
114
+
115
+ ## 📊 Resource Management
116
+
117
+ ### Memory Usage
118
+
119
+ - **Base system**: ~500MB
120
+ - **Embedding model**: ~100MB
121
+ - **Language model**: ~3GB
122
+ - **Total**: ~3.6GB
123
+
124
+ ### CPU Usage
125
+
126
+ - **Model loading**: High (initial)
127
+ - **Inference**: Medium
128
+ - **Search**: Low
129
+
130
+ ### Storage
131
+
132
+ - **Models**: ~3.5GB
133
+ - **Cache**: ~1GB
134
+ - **Vector store**: Variable (based on documents)
135
+
136
+ ## 🔍 Troubleshooting
137
+
138
+ ### Common Issues
139
+
140
+ #### 1. Permission Denied Errors
141
+
142
+ **Error**: `[Errno 13] Permission denied: '/.cache'`
143
+
144
+ **Solution**: The system automatically handles this by using `/tmp` directories.
145
+
146
+ #### 2. Model Download Failures
147
+
148
+ **Error**: `Failed to download model`
149
+
150
+ **Solution**:
151
+ - Check internet connectivity
152
+ - Verify model names in configuration
153
+ - Wait for retry (automatic)
154
+
155
+ #### 3. Memory Issues
156
+
157
+ **Error**: `Out of memory`
158
+
159
+ **Solution**:
160
+ - Use smaller models
161
+ - Reduce batch sizes
162
+ - Enable cache cleanup
163
+
164
+ #### 4. Build Failures
165
+
166
+ **Error**: `Docker build failed`
167
+
168
+ **Solution**:
169
+ - Check Dockerfile syntax
170
+ - Verify all files are uploaded
171
+ - Check requirements.txt format
172
+
173
+ ### Debug Mode
174
+
175
+ Enable debug logging by setting:
176
+
177
+ ```python
178
+ # In hf_spaces_config.py
179
+ logging.basicConfig(level=logging.DEBUG)
180
+ ```
181
+
182
+ ### Health Checks
183
+
184
+ The system provides health check endpoints:
185
+
186
+ - **System status**: `/health`
187
+ - **Model status**: `/models`
188
+ - **Cache status**: `/cache`
189
+
190
+ ## 🔒 Security Features
191
+
192
+ ### Guard Rails
193
+
194
+ The system includes comprehensive guard rails:
195
+
196
+ - **Input validation**: Query length, content filtering
197
+ - **Output safety**: Response quality, hallucination detection
198
+ - **Data privacy**: PII detection and masking
199
+ - **System protection**: Rate limiting, resource monitoring
200
+
201
+ ### Environment Isolation
202
+
203
+ - **Containerized**: Isolated from host system
204
+ - **Read-only**: File system protection
205
+ - **Network**: Limited network access
206
+ - **User**: Non-root user execution
207
+
208
+ ## 📈 Performance Optimization
209
+
210
+ ### Caching Strategy
211
+
212
+ - **Model caching**: Persistent across restarts
213
+ - **Vector caching**: FAISS index persistence
214
+ - **Response caching**: Frequently asked questions
215
+
216
+ ### Resource Optimization
217
+
218
+ - **Memory**: Efficient model loading
219
+ - **CPU**: Parallel processing
220
+ - **Storage**: Automatic cleanup
221
+
222
+ ### Monitoring
223
+
224
+ - **Response times**: Real-time metrics
225
+ - **Memory usage**: Resource monitoring
226
+ - **Error rates**: System health tracking
227
+
228
+ ## 🔄 Updates and Maintenance
229
+
230
+ ### Updating Models
231
+
232
+ 1. **Modify configuration** in `hf_spaces_config.py`
233
+ 2. **Redeploy** the Space
234
+ 3. **Models will re-download** automatically
235
+
236
+ ### Updating Code
237
+
238
+ 1. **Push changes** to your repository
239
+ 2. **HF Spaces auto-rebuilds** the container
240
+ 3. **System restarts** with new code
241
+
242
+ ### Cache Management
243
+
244
+ The system automatically:
245
+
246
+ - **Cleans old cache** files
247
+ - **Manages storage** usage
248
+ - **Optimizes performance**
249
+
250
+ ## 📞 Support
251
+
252
+ ### Documentation
253
+
254
+ - **README.md**: General project information
255
+ - **GUARD_RAILS_GUIDE.md**: Safety system details
256
+ - **This guide**: HF Spaces specific instructions
257
+
258
+ ### Community
259
+
260
+ - **Hugging Face Forums**: Community support
261
+ - **GitHub Issues**: Bug reports and feature requests
262
+ - **Discord**: Real-time help
263
+
264
+ ## 🎉 Success Checklist
265
+
266
+ - [ ] Space created successfully
267
+ - [ ] All files uploaded
268
+ - [ ] Build completed without errors
269
+ - [ ] Models downloaded successfully
270
+ - [ ] RAG system initialized
271
+ - [ ] Streamlit interface accessible
272
+ - [ ] Guard rails enabled
273
+ - [ ] Test queries working
274
+ - [ ] Performance acceptable
275
+
276
+ ## 🚀 Next Steps
277
+
278
+ After successful deployment:
279
+
280
+ 1. **Test the system** with sample queries
281
+ 2. **Upload documents** for RAG functionality
282
+ 3. **Monitor performance** and resource usage
283
+ 4. **Customize configuration** as needed
284
+ 5. **Share your Space** with others
285
+
286
+ ---
287
+
288
+ **Happy Deploying! 🎉**
289
+
290
+ Your RAG system is now ready to provide intelligent document question-answering capabilities on Hugging Face Spaces.
README.md CHANGED
@@ -10,50 +10,31 @@ pinned: false
10
  app_port: 8501
11
  ---
12
 
13
- # 🤖 RAG System - Hugging Face Spaces
14
 
15
- A comprehensive **Retrieval-Augmented Generation (RAG)** system that processes PDF documents and answers questions using advanced AI models. This system combines the power of vector search, keyword matching, and large language models to provide intelligent document question-answering capabilities.
16
 
17
  ## 🚀 Features
18
 
19
- ### Core Functionality
20
- - **📄 PDF Processing**: Automatically loads and processes PDF documents with intelligent text extraction
21
- - **🔍 Hybrid Search**: Combines FAISS vector search with BM25 keyword search for optimal retrieval
22
- - **🎯 Multiple Retrieval Methods**: Choose from hybrid, dense, or sparse retrieval options
23
- - **🤖 Advanced AI Models**: Uses Qwen 2.5 1.5B for intelligent response generation
24
- - **💬 Real-time Chat Interface**: Interactive Streamlit-based UI with conversation history
25
- - **⚡ Parallel Document Loading**: Fast document processing with concurrent loading
26
-
27
- ### Technical Features
28
- - **🔒 Thread Safety**: Safe concurrent document loading with proper locking
29
- - **💾 Persistent Storage**: Automatic index saving and loading across sessions
30
- - **🎯 Smart Fallbacks**: Graceful model loading with alternative options
31
- - **📊 Performance Metrics**: Response times, confidence scores, and search result analysis
32
- - **🛡️ Error Handling**: Robust error handling and user feedback
33
 
34
  ## 🏗️ Architecture
35
 
36
- The RAG system follows a modular, scalable architecture:
37
-
38
  ```
39
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
40
- PDF Documents User Interface │ Search Engine │
41
- │ │ │ (Streamlit) │ │ │
42
- └─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
43
- │ │ │
44
- ▼ ▼ ▼
45
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
46
- PDF Processor │ │ RAG System │ │ Vector Store
47
- │ - Text Extract │ │ - Orchestration │ │ (FAISS) │
48
- │ - Cleaning │ │ - Response Gen │ │ │
49
- │ - Chunking │ │ - Thread Safety │ └─────────────────┘
50
- └─────────────────┘ └─────────────────┘
51
-
52
-
53
- ┌─────────────────┐
54
- │ Language Model │
55
- │ (Qwen 2.5 1.5B) │
56
- └─────────────────┘
57
  ```
58
 
59
  ## 🛠️ Technology Stack
@@ -75,36 +56,68 @@ The RAG system follows a modular, scalable architecture:
75
 
76
  ## 🚀 Quick Start
77
 
78
- ### 1. Using the Web Interface
79
-
80
- 1. **Wait for Initialization**: The system automatically loads pre-configured PDF documents
81
- 2. **Ask Questions**: Use the chat interface to ask questions about the documents
82
- 3. **Choose Method**: Select from hybrid, dense, or sparse retrieval methods
83
- 4. **View Results**: See answers with confidence scores and search results
84
-
85
- ### 2. Local Development
86
 
 
87
  ```bash
88
- # Clone the repository
89
  git clone <repository-url>
90
  cd convAI
91
-
92
- # Install dependencies
93
  pip install -r requirements.txt
 
94
 
95
- # Run the application
 
96
  streamlit run app.py
97
  ```
98
 
99
- ### 3. Docker Deployment
 
 
100
 
 
101
  ```bash
102
- # Build and run with Docker Compose
103
  docker-compose up --build
 
 
 
104
 
105
- # Or build and run manually
106
- docker build -t rag-system .
107
- docker run -p 8501:8501 rag-system
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  ```
109
 
110
  ## 📖 Usage Guide
 
10
  app_port: 8501
11
  ---
12
 
13
+ # 🤖 Conversational AI RAG System
14
 
15
+ A comprehensive Retrieval-Augmented Generation (RAG) system with advanced guard rails, built with Streamlit, FAISS, and Hugging Face models.
16
 
17
  ## 🚀 Features
18
 
19
+ - **Hybrid Search**: Combines dense (FAISS) and sparse (BM25) retrieval for optimal results
20
+ - **Advanced Guard Rails**: Comprehensive safety and security measures
21
+ - **Multiple Models**: Support for Qwen 2.5 1.5B and distilgpt2 fallback
22
+ - **PDF Processing**: Intelligent document chunking and processing
23
+ - **Real-time Monitoring**: Performance metrics and system health checks
24
+ - **Docker Support**: Containerized deployment with Docker Compose
25
+ - **Hugging Face Spaces Ready**: Optimized for HF Spaces deployment
 
 
 
 
 
 
 
26
 
27
  ## 🏗️ Architecture
28
 
 
 
29
  ```
30
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
31
+ Streamlit UI │───▶│ RAG System │───▶│ Guard Rails
32
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
33
+
34
+
 
35
  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
36
+ PDF Processor │ │ FAISS Index │ │ Language Model
37
+ └─────────────────┘ └─────────────────┘ └─────────────────┘
 
 
 
 
 
 
 
 
 
38
  ```
39
 
40
  ## 🛠️ Technology Stack
 
56
 
57
  ## 🚀 Quick Start
58
 
59
+ ### Local Development
 
 
 
 
 
 
 
60
 
61
+ 1. **Clone and Setup**:
62
  ```bash
 
63
  git clone <repository-url>
64
  cd convAI
 
 
65
  pip install -r requirements.txt
66
+ ```
67
 
68
+ 2. **Run the Application**:
69
+ ```bash
70
  streamlit run app.py
71
  ```
72
 
73
+ 3. **Upload PDFs and Start Chatting**!
74
+
75
+ ### Docker Deployment
76
 
77
+ 1. **Build and Run**:
78
  ```bash
 
79
  docker-compose up --build
80
+ ```
81
+
82
+ 2. **Access at**: http://localhost:8501
83
 
84
+ ## 🌟 Hugging Face Spaces Deployment
85
+
86
+ This application is optimized for deployment on Hugging Face Spaces. The system automatically:
87
+
88
+ - Uses `/tmp` directories for cache storage (writable in HF Spaces)
89
+ - Configures environment variables for HF Spaces compatibility
90
+ - Handles permission issues automatically
91
+ - Optimizes model loading for HF Spaces environment
92
+
93
+ ### HF Spaces Configuration
94
+
95
+ The application includes:
96
+ - **Cache Management**: All model caches stored in `/tmp` directories
97
+ - **Permission Handling**: Automatic fallback to writable directories
98
+ - **Environment Detection**: Adapts to HF Spaces runtime environment
99
+ - **Resource Optimization**: Efficient memory and CPU usage
100
+
101
+ ### Deploy to HF Spaces
102
+
103
+ 1. **Create a new Space** on Hugging Face
104
+ 2. **Choose Docker** as the SDK
105
+ 3. **Upload all files** from this repository
106
+ 4. **The system will automatically**:
107
+ - Set up cache directories in `/tmp`
108
+ - Download and cache models
109
+ - Initialize the RAG system with guard rails
110
+ - Start the Streamlit interface
111
+
112
+ ### HF Spaces Environment Variables
113
+
114
+ The system automatically configures:
115
+ ```bash
116
+ HF_HOME=/tmp/huggingface
117
+ TRANSFORMERS_CACHE=/tmp/huggingface/transformers
118
+ TORCH_HOME=/tmp/torch
119
+ XDG_CACHE_HOME=/tmp
120
+ HF_HUB_CACHE=/tmp/huggingface/hub
121
  ```
122
 
123
  ## 📖 Usage Guide
app.py CHANGED
@@ -1,12 +1,38 @@
1
  #!/usr/bin/env python3
2
  """
3
- RAG System for Hugging Face Spaces
4
-
5
- A simplified RAG system using:
6
- - FAISS for vector search
7
- - BM25 for hybrid retrieval
8
- - Streamlit for UI
9
- - Qwen 2.5 1.5B for generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  """
11
 
12
  import streamlit as st
@@ -23,28 +49,87 @@ from loguru import logger
23
  # Import our simplified components
24
  from rag_system import SimpleRAGSystem
25
  from pdf_processor import SimplePDFProcessor
 
 
 
 
 
 
26
 
27
- # Page configuration
28
  st.set_page_config(
29
  page_title="RAG System - Hugging Face",
30
  page_icon="🤖",
31
- layout="wide",
32
- initial_sidebar_state="expanded",
33
  )
34
 
35
- # Initialize session state
 
 
 
 
36
  if "rag_system" not in st.session_state:
37
- st.session_state.rag_system = None
38
  if "documents_loaded" not in st.session_state:
39
- st.session_state.documents_loaded = False
40
  if "chat_history" not in st.session_state:
41
- st.session_state.chat_history = []
42
  if "initializing" not in st.session_state:
43
- st.session_state.initializing = False
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
 
46
  def load_single_document(rag_system, pdf_path):
47
- """Load a single document into the RAG system"""
 
 
 
 
 
 
 
 
 
48
  try:
49
  filename = os.path.basename(pdf_path)
50
  success = rag_system.add_document(pdf_path, filename)
@@ -54,13 +139,43 @@ def load_single_document(rag_system, pdf_path):
54
 
55
 
56
  def initialize_rag_system():
57
- """Initialize the RAG system"""
 
 
 
 
 
 
 
 
58
  if st.session_state.rag_system is None and not st.session_state.initializing:
59
  st.session_state.initializing = True
60
  st.write("🚀 Starting RAG system initialization...")
 
 
 
 
 
 
 
 
 
61
  with st.spinner("Initializing RAG system..."):
62
  try:
63
- st.session_state.rag_system = SimpleRAGSystem()
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  st.write("✅ RAG system created successfully")
65
 
66
  # Auto-load all available PDF documents in parallel
@@ -75,8 +190,9 @@ def initialize_rag_system():
75
  f"Loading {len(pdf_files)} PDF documents in parallel..."
76
  ):
77
  # Use ThreadPoolExecutor for parallel loading
 
78
  with ThreadPoolExecutor(max_workers=4) as executor:
79
- # Submit all tasks
80
  future_to_pdf = {
81
  executor.submit(
82
  load_single_document,
@@ -86,7 +202,7 @@ def initialize_rag_system():
86
  for pdf_path in pdf_files
87
  }
88
 
89
- # Process completed tasks
90
  for future in as_completed(future_to_pdf):
91
  filename, success, error = future.result()
92
  if success:
@@ -100,6 +216,7 @@ def initialize_rag_system():
100
  f"⚠️ Failed to load {filename}: {error}"
101
  )
102
 
 
103
  if loaded_count > 0:
104
  st.session_state.documents_loaded = True
105
  st.success(
@@ -130,15 +247,20 @@ def initialize_rag_system():
130
 
131
 
132
  def upload_document(uploaded_file):
133
- """Upload and process a document"""
 
 
 
 
 
134
  if uploaded_file is not None:
135
  try:
136
- # Create temporary file
137
  with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
138
  tmp_file.write(uploaded_file.getvalue())
139
  tmp_path = tmp_file.name
140
 
141
- # Process the document
142
  with st.spinner(f"Processing {uploaded_file.name}..."):
143
  success = st.session_state.rag_system.add_document(
144
  tmp_path, uploaded_file.name
@@ -157,8 +279,21 @@ def upload_document(uploaded_file):
157
  st.error(f"❌ Error processing document: {str(e)}")
158
 
159
 
160
- def query_rag(query: str, method: str = "hybrid", top_k: int = 5):
161
- """Query the RAG system"""
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  try:
163
  st.write(f"🔍 Starting query: {query}")
164
  st.write(f"🔍 Method: {method}, top_k: {top_k}")
@@ -170,8 +305,8 @@ def query_rag(query: str, method: str = "hybrid", top_k: int = 5):
170
  st.write(f"✅ RAG system is available")
171
  start_time = time.time()
172
 
173
- st.write(f"🔍 Calling rag_system.query...")
174
- response = st.session_state.rag_system.query(query, method, top_k)
175
  response_time = time.time() - start_time
176
 
177
  st.write(f"✅ Response received in {response_time:.2f}s")
@@ -192,11 +327,17 @@ def query_rag(query: str, method: str = "hybrid", top_k: int = 5):
192
 
193
 
194
  def display_search_results(results: List[Dict]):
195
- """Display search results"""
 
 
 
 
 
196
  if not results:
197
  st.info("No search results found.")
198
  return
199
 
 
200
  for i, result in enumerate(results, 1):
201
  st.markdown(f"---")
202
  st.markdown(f"**Result {i}** - Score: {result.score:.3f}")
@@ -204,6 +345,7 @@ def display_search_results(results: List[Dict]):
204
  st.write(f"**Method:** {result.search_method}")
205
  st.write(f"**Text:** {result.text[:500]}...")
206
 
 
207
  if result.dense_score and result.sparse_score:
208
  col1, col2 = st.columns(2)
209
  with col1:
@@ -212,19 +354,41 @@ def display_search_results(results: List[Dict]):
212
  st.metric("Sparse Score", f"{result.sparse_score:.3f}")
213
 
214
 
 
 
 
 
 
215
  def main():
216
- """Main application"""
 
 
 
 
 
 
 
 
 
217
  st.write("🚀 App starting...")
 
 
 
 
218
  st.title("🤖 RAG System - Hugging Face Spaces")
219
  st.markdown("A simplified RAG system using FAISS + BM25 + Qwen 2.5 1.5B")
220
 
221
  # Initialize RAG system
222
  initialize_rag_system()
223
 
224
- # Sidebar
 
 
 
225
  with st.sidebar:
226
  st.header("📁 Document Upload")
227
 
 
228
  uploaded_file = st.file_uploader(
229
  "Upload PDF Document",
230
  type=["pdf"],
@@ -238,23 +402,25 @@ def main():
238
 
239
  st.header("⚙️ Settings")
240
 
 
241
  method = st.selectbox(
242
  "Retrieval Method",
243
  ["hybrid", "dense", "sparse"],
244
- help="Choose the retrieval method",
245
  )
246
 
 
247
  top_k = st.slider(
248
  "Number of Results",
249
  min_value=1,
250
  max_value=10,
251
  value=5,
252
- help="Number of top results to retrieve",
253
  )
254
 
255
  st.divider()
256
 
257
- # System info
258
  if st.session_state.rag_system:
259
  stats = st.session_state.rag_system.get_stats()
260
  st.header("📊 System Info")
@@ -263,6 +429,10 @@ def main():
263
  st.write(f"**Vector Size:** {stats['vector_size']}")
264
  st.write(f"**Model:** {stats['model_name']}")
265
 
 
 
 
 
266
  # Initialize RAG system if not already done
267
  if not st.session_state.rag_system:
268
  if st.session_state.initializing:
@@ -281,10 +451,13 @@ def main():
281
  "📚 No documents loaded yet, but you can still ask questions. The system will respond based on its general knowledge."
282
  )
283
 
284
- # Chat interface
 
 
 
285
  st.header("💬 Ask Questions About Your Documents")
286
 
287
- # Chat input
288
  query = st.chat_input("Ask a question about the loaded documents...")
289
 
290
  if query:
@@ -292,7 +465,7 @@ def main():
292
  # Add user message to chat history
293
  st.session_state.chat_history.append({"role": "user", "content": query})
294
 
295
- # Get response
296
  response, response_time = query_rag(query, method, top_k)
297
 
298
  st.write(f"📊 Response type: {type(response)}")
@@ -300,7 +473,7 @@ def main():
300
 
301
  if response:
302
  st.write("✅ Got valid response, adding to chat history")
303
- # Add assistant response to chat history
304
  st.session_state.chat_history.append(
305
  {
306
  "role": "assistant",
@@ -317,7 +490,11 @@ def main():
317
  {"role": "assistant", "content": f"Error: {response_time}"}
318
  )
319
 
320
- # Display chat history
 
 
 
 
321
  for message in st.session_state.chat_history:
322
  if message["role"] == "user":
323
  with st.chat_message("user"):
@@ -326,12 +503,12 @@ def main():
326
  with st.chat_message("assistant"):
327
  st.write(message["content"])
328
 
329
- # Show additional info for assistant messages
330
  if "search_results" in message:
331
  st.markdown("**🔍 Search Results:**")
332
  display_search_results(message["search_results"])
333
 
334
- # Show metrics
335
  col1, col2, col3 = st.columns(3)
336
  with col1:
337
  st.metric("Method", message["method_used"])
@@ -340,12 +517,20 @@ def main():
340
  with col3:
341
  st.metric("Response Time", f"{message['response_time']:.2f}s")
342
 
343
- # Clear chat button
 
 
 
 
344
  if st.session_state.chat_history:
345
  if st.button("🗑️ Clear Chat History"):
346
  st.session_state.chat_history = []
347
  st.rerun()
348
 
349
 
 
 
 
 
350
  if __name__ == "__main__":
351
  main()
 
1
  #!/usr/bin/env python3
2
  """
3
+ # RAG System for Hugging Face Spaces
4
+
5
+ A simplified Retrieval-Augmented Generation (RAG) system using:
6
+ - **FAISS** for vector search and similarity matching
7
+ - **BM25** for keyword-based sparse retrieval
8
+ - **Hybrid Search** combining both dense and sparse methods
9
+ - **Streamlit** for modern, interactive web interface
10
+ - **Qwen 2.5 1.5B** for intelligent response generation
11
+
12
+ ## Features
13
+
14
+ - 🔍 **Multi-Method Retrieval**: Hybrid, dense, and sparse search options
15
+ - 📄 **PDF Processing**: Automatic document loading and chunking
16
+ - 💬 **Real-time Chat**: Interactive conversation interface
17
+ - ⚡ **Parallel Loading**: Concurrent document processing
18
+ - 📊 **Performance Metrics**: Response times and confidence scores
19
+ - 🎯 **Smart Fallbacks**: Graceful handling of model loading failures
20
+
21
+ ## Architecture
22
+
23
+ The system follows a modular architecture:
24
+ 1. **Document Processing**: PDF extraction and chunking
25
+ 2. **Vector Storage**: FAISS index for embeddings
26
+ 3. **Search Engine**: BM25 for keyword matching
27
+ 4. **Response Generation**: LLM-based answer synthesis
28
+ 5. **Web Interface**: Streamlit for user interaction
29
+
30
+ ## Usage
31
+
32
+ 1. Upload PDF documents or use pre-loaded ones
33
+ 2. Choose retrieval method (hybrid/dense/sparse)
34
+ 3. Ask questions in natural language
35
+ 4. View answers with source citations and confidence scores
36
  """
37
 
38
  import streamlit as st
 
49
  # Import our simplified components
50
  from rag_system import SimpleRAGSystem
51
  from pdf_processor import SimplePDFProcessor
52
+ from hf_spaces_config import get_hf_config, is_hf_spaces
53
+ from guard_rails import GuardRailConfig
54
+
55
+ # =============================================================================
56
+ # PAGE CONFIGURATION
57
+ # =============================================================================
58
 
59
+ # Configure Streamlit page settings for optimal user experience
60
  st.set_page_config(
61
  page_title="RAG System - Hugging Face",
62
  page_icon="🤖",
63
+ layout="wide", # Use full width for better content display
64
+ initial_sidebar_state="expanded", # Show sidebar by default
65
  )
66
 
67
+ # =============================================================================
68
+ # SESSION STATE INITIALIZATION
69
+ # =============================================================================
70
+
71
+ # Initialize Streamlit session state for persistent data across interactions
72
  if "rag_system" not in st.session_state:
73
+ st.session_state.rag_system = None # Main RAG system instance
74
  if "documents_loaded" not in st.session_state:
75
+ st.session_state.documents_loaded = False # Document loading status
76
  if "chat_history" not in st.session_state:
77
+ st.session_state.chat_history = [] # Conversation history
78
  if "initializing" not in st.session_state:
79
+ st.session_state.initializing = False # Initialization status
80
+
81
+
82
+ # =============================================================================
83
+ # UTILITY FUNCTIONS
84
+ # =============================================================================
85
+
86
+
87
+ def display_environment_info():
88
+ """
89
+ Display information about the current deployment environment
90
+ """
91
+ if is_hf_spaces():
92
+ st.sidebar.markdown("### 🌐 Environment")
93
+ st.sidebar.info("**Hugging Face Spaces**")
94
+
95
+ # Get HF Spaces configuration details
96
+ try:
97
+ hf_config = get_hf_config()
98
+ st.sidebar.markdown("**Configuration:**")
99
+ st.sidebar.text(
100
+ f"• Cache: {hf_config.cache_dirs.get('transformers_cache', 'N/A')}"
101
+ )
102
+ st.sidebar.text(
103
+ f"• Vector Store: {hf_config.cache_dirs.get('vector_store', 'N/A')}"
104
+ )
105
+
106
+ # Show resource limits
107
+ resource_limits = hf_config.get_resource_limits()
108
+ st.sidebar.markdown("**Resource Limits:**")
109
+ st.sidebar.text(f"• Memory: {resource_limits['max_memory_usage']*100:.0f}%")
110
+ st.sidebar.text(f"• CPU: {resource_limits['max_cpu_usage']*100:.0f}%")
111
+ st.sidebar.text(
112
+ f"• Concurrent: {resource_limits['max_concurrent_requests']}"
113
+ )
114
+
115
+ except Exception as e:
116
+ st.sidebar.warning(f"Config error: {e}")
117
+ else:
118
+ st.sidebar.markdown("### 💻 Environment")
119
+ st.sidebar.info("**Local Development**")
120
 
121
 
122
  def load_single_document(rag_system, pdf_path):
123
+ """
124
+ Load a single document into the RAG system
125
+
126
+ Args:
127
+ rag_system: The RAG system instance
128
+ pdf_path: Path to the PDF file
129
+
130
+ Returns:
131
+ tuple: (filename, success_status, error_message)
132
+ """
133
  try:
134
  filename = os.path.basename(pdf_path)
135
  success = rag_system.add_document(pdf_path, filename)
 
139
 
140
 
141
  def initialize_rag_system():
142
+ """
143
+ Initialize the RAG system with automatic document loading
144
+
145
+ This function:
146
+ 1. Creates the RAG system instance
147
+ 2. Automatically loads all available PDF documents
148
+ 3. Uses parallel processing for faster loading
149
+ 4. Provides real-time feedback on loading progress
150
+ """
151
  if st.session_state.rag_system is None and not st.session_state.initializing:
152
  st.session_state.initializing = True
153
  st.write("🚀 Starting RAG system initialization...")
154
+
155
+ # Check deployment environment
156
+ if is_hf_spaces():
157
+ st.info("🌐 Running in Hugging Face Spaces environment")
158
+ st.write("📁 Setting up HF Spaces optimized configuration...")
159
+ else:
160
+ st.info("💻 Running in local development environment")
161
+ st.write("📁 Using local development configuration...")
162
+
163
  with st.spinner("Initializing RAG system..."):
164
  try:
165
+ # Get HF Spaces configuration
166
+ hf_config = get_hf_config()
167
+ model_config = hf_config.get_model_config()
168
+ guard_config = GuardRailConfig(**hf_config.get_guard_rail_config())
169
+
170
+ # Create RAG system instance with HF Spaces optimized settings
171
+ st.session_state.rag_system = SimpleRAGSystem(
172
+ embedding_model=model_config["embedding_model"],
173
+ generative_model=model_config["generative_model"],
174
+ chunk_sizes=model_config["chunk_sizes"],
175
+ vector_store_path=model_config["vector_store_path"],
176
+ enable_guard_rails=model_config["enable_guard_rails"],
177
+ guard_rail_config=guard_config,
178
+ )
179
  st.write("✅ RAG system created successfully")
180
 
181
  # Auto-load all available PDF documents in parallel
 
190
  f"Loading {len(pdf_files)} PDF documents in parallel..."
191
  ):
192
  # Use ThreadPoolExecutor for parallel loading
193
+ # This significantly speeds up document processing
194
  with ThreadPoolExecutor(max_workers=4) as executor:
195
+ # Submit all document loading tasks
196
  future_to_pdf = {
197
  executor.submit(
198
  load_single_document,
 
202
  for pdf_path in pdf_files
203
  }
204
 
205
+ # Process completed tasks and provide real-time feedback
206
  for future in as_completed(future_to_pdf):
207
  filename, success, error = future.result()
208
  if success:
 
216
  f"⚠️ Failed to load {filename}: {error}"
217
  )
218
 
219
+ # Update system status based on loading results
220
  if loaded_count > 0:
221
  st.session_state.documents_loaded = True
222
  st.success(
 
247
 
248
 
249
  def upload_document(uploaded_file):
250
+ """
251
+ Upload and process a document through the web interface
252
+
253
+ Args:
254
+ uploaded_file: Streamlit uploaded file object
255
+ """
256
  if uploaded_file is not None:
257
  try:
258
+ # Create temporary file for processing
259
  with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
260
  tmp_file.write(uploaded_file.getvalue())
261
  tmp_path = tmp_file.name
262
 
263
+ # Process the document with progress feedback
264
  with st.spinner(f"Processing {uploaded_file.name}..."):
265
  success = st.session_state.rag_system.add_document(
266
  tmp_path, uploaded_file.name
 
279
  st.error(f"❌ Error processing document: {str(e)}")
280
 
281
 
282
+ def query_rag(
283
+ query: str, method: str = "hybrid", top_k: int = 5, user_id: str = "anonymous"
284
+ ):
285
+ """
286
+ Query the RAG system with detailed logging and error handling
287
+
288
+ Args:
289
+ query: User's question
290
+ method: Retrieval method (hybrid/dense/sparse)
291
+ top_k: Number of results to retrieve
292
+ user_id: User identifier for guard rail tracking
293
+
294
+ Returns:
295
+ tuple: (response_object, response_time)
296
+ """
297
  try:
298
  st.write(f"🔍 Starting query: {query}")
299
  st.write(f"🔍 Method: {method}, top_k: {top_k}")
 
305
  st.write(f"✅ RAG system is available")
306
  start_time = time.time()
307
 
308
+ st.write(f"🔍 Calling rag_system.query with guard rails...")
309
+ response = st.session_state.rag_system.query(query, method, top_k, user_id)
310
  response_time = time.time() - start_time
311
 
312
  st.write(f"✅ Response received in {response_time:.2f}s")
 
327
 
328
 
329
  def display_search_results(results: List[Dict]):
330
+ """
331
+ Display search results with detailed information and metrics
332
+
333
+ Args:
334
+ results: List of search result dictionaries
335
+ """
336
  if not results:
337
  st.info("No search results found.")
338
  return
339
 
340
+ # Display each search result with comprehensive information
341
  for i, result in enumerate(results, 1):
342
  st.markdown(f"---")
343
  st.markdown(f"**Result {i}** - Score: {result.score:.3f}")
 
345
  st.write(f"**Method:** {result.search_method}")
346
  st.write(f"**Text:** {result.text[:500]}...")
347
 
348
+ # Show detailed scores for hybrid search
349
  if result.dense_score and result.sparse_score:
350
  col1, col2 = st.columns(2)
351
  with col1:
 
354
  st.metric("Sparse Score", f"{result.sparse_score:.3f}")
355
 
356
 
357
+ # =============================================================================
358
+ # MAIN APPLICATION
359
+ # =============================================================================
360
+
361
+
362
  def main():
363
+ """
364
+ Main application function that orchestrates the entire RAG system interface
365
+
366
+ This function:
367
+ 1. Sets up the user interface
368
+ 2. Initializes the RAG system
369
+ 3. Handles document uploads
370
+ 4. Manages the chat interface
371
+ 5. Displays results and metrics
372
+ """
373
  st.write("🚀 App starting...")
374
+
375
+ # Display environment information in sidebar
376
+ display_environment_info()
377
+
378
  st.title("🤖 RAG System - Hugging Face Spaces")
379
  st.markdown("A simplified RAG system using FAISS + BM25 + Qwen 2.5 1.5B")
380
 
381
  # Initialize RAG system
382
  initialize_rag_system()
383
 
384
+ # =============================================================================
385
+ # SIDEBAR CONFIGURATION
386
+ # =============================================================================
387
+
388
  with st.sidebar:
389
  st.header("📁 Document Upload")
390
 
391
+ # File uploader for PDF documents
392
  uploaded_file = st.file_uploader(
393
  "Upload PDF Document",
394
  type=["pdf"],
 
402
 
403
  st.header("⚙️ Settings")
404
 
405
+ # Retrieval method selection
406
  method = st.selectbox(
407
  "Retrieval Method",
408
  ["hybrid", "dense", "sparse"],
409
+ help="Choose the retrieval method: hybrid (combines dense and sparse), dense (vector similarity), or sparse (keyword matching)",
410
  )
411
 
412
+ # Number of results slider
413
  top_k = st.slider(
414
  "Number of Results",
415
  min_value=1,
416
  max_value=10,
417
  value=5,
418
+ help="Number of top results to retrieve and use for answer generation",
419
  )
420
 
421
  st.divider()
422
 
423
+ # System information display
424
  if st.session_state.rag_system:
425
  stats = st.session_state.rag_system.get_stats()
426
  st.header("📊 System Info")
 
429
  st.write(f"**Vector Size:** {stats['vector_size']}")
430
  st.write(f"**Model:** {stats['model_name']}")
431
 
432
+ # =============================================================================
433
+ # MAIN CONTENT AREA
434
+ # =============================================================================
435
+
436
  # Initialize RAG system if not already done
437
  if not st.session_state.rag_system:
438
  if st.session_state.initializing:
 
451
  "📚 No documents loaded yet, but you can still ask questions. The system will respond based on its general knowledge."
452
  )
453
 
454
+ # =============================================================================
455
+ # CHAT INTERFACE
456
+ # =============================================================================
457
+
458
  st.header("💬 Ask Questions About Your Documents")
459
 
460
+ # Chat input for user questions
461
  query = st.chat_input("Ask a question about the loaded documents...")
462
 
463
  if query:
 
465
  # Add user message to chat history
466
  st.session_state.chat_history.append({"role": "user", "content": query})
467
 
468
+ # Get response from RAG system
469
  response, response_time = query_rag(query, method, top_k)
470
 
471
  st.write(f"📊 Response type: {type(response)}")
 
473
 
474
  if response:
475
  st.write("✅ Got valid response, adding to chat history")
476
+ # Add assistant response to chat history with metadata
477
  st.session_state.chat_history.append(
478
  {
479
  "role": "assistant",
 
490
  {"role": "assistant", "content": f"Error: {response_time}"}
491
  )
492
 
493
+ # =============================================================================
494
+ # CHAT HISTORY DISPLAY
495
+ # =============================================================================
496
+
497
+ # Display conversation history with detailed information
498
  for message in st.session_state.chat_history:
499
  if message["role"] == "user":
500
  with st.chat_message("user"):
 
503
  with st.chat_message("assistant"):
504
  st.write(message["content"])
505
 
506
+ # Show additional information for assistant messages
507
  if "search_results" in message:
508
  st.markdown("**🔍 Search Results:**")
509
  display_search_results(message["search_results"])
510
 
511
+ # Display performance metrics
512
  col1, col2, col3 = st.columns(3)
513
  with col1:
514
  st.metric("Method", message["method_used"])
 
517
  with col3:
518
  st.metric("Response Time", f"{message['response_time']:.2f}s")
519
 
520
+ # =============================================================================
521
+ # UTILITY CONTROLS
522
+ # =============================================================================
523
+
524
+ # Clear chat history button
525
  if st.session_state.chat_history:
526
  if st.button("🗑️ Clear Chat History"):
527
  st.session_state.chat_history = []
528
  st.rerun()
529
 
530
 
531
+ # =============================================================================
532
+ # APPLICATION ENTRY POINT
533
+ # =============================================================================
534
+
535
  if __name__ == "__main__":
536
  main()
docker-compose.yml CHANGED
@@ -1,20 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
  version: '3.8'
2
 
 
 
 
 
3
  services:
 
 
 
 
 
4
  rag-system:
 
 
5
  build: .
 
 
 
 
 
 
 
 
6
  ports:
7
  - "8501:8501"
 
 
 
 
 
 
 
8
  environment:
 
9
  - PYTHONPATH=/app
 
 
10
  - STREAMLIT_SERVER_PORT=8501
11
  - STREAMLIT_SERVER_ADDRESS=0.0.0.0
12
  - STREAMLIT_SERVER_HEADLESS=true
 
 
 
 
 
 
 
13
  volumes:
 
 
14
  - ./vector_store:/app/vector_store
 
 
 
 
 
 
 
 
15
  restart: unless-stopped
 
 
 
 
 
 
16
  healthcheck:
 
 
17
  test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
 
 
18
  interval: 30s
 
 
19
  timeout: 10s
 
 
20
  retries: 3
 
1
+ # =============================================================================
2
+ # Docker Compose Configuration for RAG System
3
+ # =============================================================================
4
+ # This file defines the services and configuration for running the RAG system
5
+ # in a containerized environment using Docker Compose.
6
+
7
+ # =============================================================================
8
+ # COMPOSE VERSION
9
+ # =============================================================================
10
+
11
+ # Specify Docker Compose file format version
12
+ # Version 3.8 provides modern features and compatibility
13
  version: '3.8'
14
 
15
+ # =============================================================================
16
+ # SERVICES DEFINITION
17
+ # =============================================================================
18
+
19
  services:
20
+ # =============================================================================
21
+ # RAG SYSTEM SERVICE
22
+ # =============================================================================
23
+
24
+ # Main service for the RAG system application
25
  rag-system:
26
+ # Build the Docker image from the current directory
27
+ # Uses the Dockerfile in the root directory
28
  build: .
29
+
30
+ # =============================================================================
31
+ # NETWORK CONFIGURATION
32
+ # =============================================================================
33
+
34
+ # Port mapping: host_port:container_port
35
+ # Maps port 8501 from the host to port 8501 in the container
36
+ # Allows access to the Streamlit web interface from the host machine
37
  ports:
38
  - "8501:8501"
39
+
40
+ # =============================================================================
41
+ # ENVIRONMENT VARIABLES
42
+ # =============================================================================
43
+
44
+ # Set environment variables for the container
45
+ # These override the defaults set in the Dockerfile
46
  environment:
47
+ # Python path configuration
48
  - PYTHONPATH=/app
49
+
50
+ # Streamlit server configuration
51
  - STREAMLIT_SERVER_PORT=8501
52
  - STREAMLIT_SERVER_ADDRESS=0.0.0.0
53
  - STREAMLIT_SERVER_HEADLESS=true
54
+
55
+ # =============================================================================
56
+ # VOLUME MOUNTING
57
+ # =============================================================================
58
+
59
+ # Mount volumes for data persistence
60
+ # This ensures that the vector store data persists between container restarts
61
  volumes:
62
+ # Mount the local vector_store directory to the container
63
+ # Format: host_path:container_path
64
  - ./vector_store:/app/vector_store
65
+
66
+ # =============================================================================
67
+ # RESTART POLICY
68
+ # =============================================================================
69
+
70
+ # Container restart policy
71
+ # unless-stopped: Restart the container unless it was explicitly stopped
72
+ # This ensures the service stays running even after system reboots
73
  restart: unless-stopped
74
+
75
+ # =============================================================================
76
+ # HEALTH CHECK CONFIGURATION
77
+ # =============================================================================
78
+
79
+ # Health check to monitor service status
80
  healthcheck:
81
+ # Command to test if the service is healthy
82
+ # Uses curl to check if the Streamlit health endpoint responds
83
  test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
84
+
85
+ # Check interval: run health check every 30 seconds
86
  interval: 30s
87
+
88
+ # Timeout: wait up to 10 seconds for health check to complete
89
  timeout: 10s
90
+
91
+ # Retries: attempt health check 3 times before marking as unhealthy
92
  retries: 3
guard_rails.py ADDED
@@ -0,0 +1,675 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ # Guard Rails System for RAG
4
+
5
+ This module provides comprehensive guard rails for the RAG system to ensure:
6
+ - Input validation and sanitization
7
+ - Output safety and content filtering
8
+ - Model safety and prompt injection protection
9
+ - Data privacy and PII detection
10
+ - Rate limiting and abuse prevention
11
+
12
+ ## Guard Rail Categories
13
+
14
+ 1. **Input Guards**: Validate and sanitize user inputs
15
+ 2. **Output Guards**: Filter and validate generated responses
16
+ 3. **Model Guards**: Protect against prompt injection and jailbreaks
17
+ 4. **Data Guards**: Detect and handle sensitive information
18
+ 5. **System Guards**: Rate limiting and resource protection
19
+ """
20
+
21
+ import re
22
+ import time
23
+ import hashlib
24
+ from typing import List, Dict, Optional, Tuple, Any
25
+ from dataclasses import dataclass
26
+ from collections import defaultdict, deque
27
+ import logging
28
+ from loguru import logger
29
+
30
+
31
+ # =============================================================================
32
+ # DATA STRUCTURES
33
+ # =============================================================================
34
+
35
+
36
+ @dataclass
37
+ class GuardRailResult:
38
+ """
39
+ Result from a guard rail check
40
+
41
+ Attributes:
42
+ passed: Whether the check passed
43
+ blocked: Whether the input/output should be blocked
44
+ reason: Reason for blocking or warning
45
+ confidence: Confidence score for the decision
46
+ metadata: Additional information about the check
47
+ """
48
+
49
+ passed: bool
50
+ blocked: bool
51
+ reason: str
52
+ confidence: float
53
+ metadata: Dict[str, Any]
54
+
55
+
56
+ @dataclass
57
+ class GuardRailConfig:
58
+ """
59
+ Configuration for guard rail system
60
+
61
+ Attributes:
62
+ max_query_length: Maximum allowed query length
63
+ max_response_length: Maximum allowed response length
64
+ min_confidence_threshold: Minimum confidence for responses
65
+ rate_limit_requests: Maximum requests per time window
66
+ rate_limit_window: Time window for rate limiting (seconds)
67
+ enable_pii_detection: Whether to detect PII in documents
68
+ enable_content_filtering: Whether to filter harmful content
69
+ enable_prompt_injection_detection: Whether to detect prompt injection
70
+ """
71
+
72
+ max_query_length: int = 1000
73
+ max_response_length: int = 5000
74
+ min_confidence_threshold: float = 0.3
75
+ rate_limit_requests: int = 100
76
+ rate_limit_window: int = 3600 # 1 hour
77
+ enable_pii_detection: bool = True
78
+ enable_content_filtering: bool = True
79
+ enable_prompt_injection_detection: bool = True
80
+
81
+
82
+ # =============================================================================
83
+ # INPUT GUARD RAILS
84
+ # =============================================================================
85
+
86
+
87
+ class InputGuards:
88
+ """Guard rails for input validation and sanitization"""
89
+
90
+ def __init__(self, config: GuardRailConfig):
91
+ self.config = config
92
+
93
+ # Compile regex patterns for efficiency
94
+ self.suspicious_patterns = [
95
+ re.compile(r"system:|assistant:|user:", re.IGNORECASE),
96
+ re.compile(r"ignore previous|forget everything", re.IGNORECASE),
97
+ re.compile(r"you are now|act as|pretend to be", re.IGNORECASE),
98
+ re.compile(r"<script|javascript:|eval\(", re.IGNORECASE),
99
+ re.compile(
100
+ r"http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
101
+ ),
102
+ ]
103
+
104
+ # Harmful content patterns
105
+ self.harmful_patterns = [
106
+ re.compile(r"\b(hack|crack|exploit|vulnerability)\b", re.IGNORECASE),
107
+ re.compile(r"\b(bomb|weapon|explosive)\b", re.IGNORECASE),
108
+ re.compile(r"\b(drug|illegal|contraband)\b", re.IGNORECASE),
109
+ ]
110
+
111
+ def validate_query(self, query: str, user_id: str = "anonymous") -> GuardRailResult:
112
+ """
113
+ Validate user query for safety and appropriateness
114
+
115
+ Args:
116
+ query: User's query string
117
+ user_id: User identifier for rate limiting
118
+
119
+ Returns:
120
+ GuardRailResult with validation outcome
121
+ """
122
+ # Check query length
123
+ if len(query) > self.config.max_query_length:
124
+ return GuardRailResult(
125
+ passed=False,
126
+ blocked=True,
127
+ reason=f"Query too long ({len(query)} chars, max {self.config.max_query_length})",
128
+ confidence=1.0,
129
+ metadata={"query_length": len(query)},
130
+ )
131
+
132
+ # Check for empty or whitespace-only queries
133
+ if not query.strip():
134
+ return GuardRailResult(
135
+ passed=False,
136
+ blocked=True,
137
+ reason="Empty or whitespace-only query",
138
+ confidence=1.0,
139
+ metadata={},
140
+ )
141
+
142
+ # Check for suspicious patterns (potential prompt injection)
143
+ if self.config.enable_prompt_injection_detection:
144
+ for pattern in self.suspicious_patterns:
145
+ if pattern.search(query):
146
+ return GuardRailResult(
147
+ passed=False,
148
+ blocked=True,
149
+ reason="Suspicious pattern detected (potential prompt injection)",
150
+ confidence=0.8,
151
+ metadata={"pattern": pattern.pattern},
152
+ )
153
+
154
+ # Check for harmful content
155
+ if self.config.enable_content_filtering:
156
+ harmful_matches = []
157
+ for pattern in self.harmful_patterns:
158
+ if pattern.search(query):
159
+ harmful_matches.append(pattern.pattern)
160
+
161
+ if harmful_matches:
162
+ return GuardRailResult(
163
+ passed=False,
164
+ blocked=True,
165
+ reason="Harmful content detected",
166
+ confidence=0.7,
167
+ metadata={"harmful_patterns": harmful_matches},
168
+ )
169
+
170
+ return GuardRailResult(
171
+ passed=True,
172
+ blocked=False,
173
+ reason="Query validated successfully",
174
+ confidence=1.0,
175
+ metadata={},
176
+ )
177
+
178
+ def sanitize_query(self, query: str) -> str:
179
+ """
180
+ Sanitize query to remove potentially harmful content
181
+
182
+ Args:
183
+ query: Raw query string
184
+
185
+ Returns:
186
+ Sanitized query string
187
+ """
188
+ # Remove HTML tags
189
+ query = re.sub(r"<[^>]+>", "", query)
190
+
191
+ # Remove script tags and content
192
+ query = re.sub(
193
+ r"<script.*?</script>", "", query, flags=re.IGNORECASE | re.DOTALL
194
+ )
195
+
196
+ # Remove excessive whitespace
197
+ query = re.sub(r"\s+", " ", query).strip()
198
+
199
+ return query
200
+
201
+
202
+ # =============================================================================
203
+ # OUTPUT GUARD RAILS
204
+ # =============================================================================
205
+
206
+
207
+ class OutputGuards:
208
+ """Guard rails for output validation and filtering"""
209
+
210
+ def __init__(self, config: GuardRailConfig):
211
+ self.config = config
212
+
213
+ # Response quality patterns
214
+ self.low_quality_patterns = [
215
+ re.compile(r"\b(i don\'t know|i cannot|i am unable)\b", re.IGNORECASE),
216
+ re.compile(r"\b(no information|not found|not available)\b", re.IGNORECASE),
217
+ ]
218
+
219
+ # Hallucination indicators
220
+ self.hallucination_patterns = [
221
+ re.compile(
222
+ r"\b(according to the document|as mentioned in|the document states)\b",
223
+ re.IGNORECASE,
224
+ ),
225
+ re.compile(
226
+ r"\b(based on the provided|in the given|from the text)\b", re.IGNORECASE
227
+ ),
228
+ ]
229
+
230
+ def validate_response(
231
+ self, response: str, confidence: float, context: str = ""
232
+ ) -> GuardRailResult:
233
+ """
234
+ Validate generated response for safety and quality
235
+
236
+ Args:
237
+ response: Generated response text
238
+ confidence: Confidence score from RAG system
239
+ context: Retrieved context for validation
240
+
241
+ Returns:
242
+ GuardRailResult with validation outcome
243
+ """
244
+ # Check response length
245
+ if len(response) > self.config.max_response_length:
246
+ return GuardRailResult(
247
+ passed=False,
248
+ blocked=True,
249
+ reason=f"Response too long ({len(response)} chars, max {self.config.max_response_length})",
250
+ confidence=1.0,
251
+ metadata={"response_length": len(response)},
252
+ )
253
+
254
+ # Check confidence threshold
255
+ if confidence < self.config.min_confidence_threshold:
256
+ return GuardRailResult(
257
+ passed=False,
258
+ blocked=False,
259
+ reason=f"Low confidence response ({confidence:.2f} < {self.config.min_confidence_threshold})",
260
+ confidence=confidence,
261
+ metadata={"confidence": confidence},
262
+ )
263
+
264
+ # Check for low quality responses
265
+ low_quality_count = 0
266
+ for pattern in self.low_quality_patterns:
267
+ if pattern.search(response):
268
+ low_quality_count += 1
269
+
270
+ if low_quality_count >= 2:
271
+ return GuardRailResult(
272
+ passed=False,
273
+ blocked=False,
274
+ reason="Low quality response detected",
275
+ confidence=0.6,
276
+ metadata={"low_quality_indicators": low_quality_count},
277
+ )
278
+
279
+ # Check for potential hallucinations
280
+ if context and self._detect_hallucination(response, context):
281
+ return GuardRailResult(
282
+ passed=False,
283
+ blocked=False,
284
+ reason="Potential hallucination detected",
285
+ confidence=0.7,
286
+ metadata={"hallucination_risk": "high"},
287
+ )
288
+
289
+ return GuardRailResult(
290
+ passed=True,
291
+ blocked=False,
292
+ reason="Response validated successfully",
293
+ confidence=confidence,
294
+ metadata={},
295
+ )
296
+
297
+ def _detect_hallucination(self, response: str, context: str) -> bool:
298
+ """
299
+ Detect potential hallucinations in response
300
+
301
+ Args:
302
+ response: Generated response
303
+ context: Retrieved context
304
+
305
+ Returns:
306
+ True if hallucination is likely detected
307
+ """
308
+ # Simple heuristic: check if response contains specific claims not in context
309
+ response_lower = response.lower()
310
+ context_lower = context.lower()
311
+
312
+ # Check for specific claims that should be in context
313
+ claim_indicators = [
314
+ "the document states",
315
+ "according to the text",
316
+ "as mentioned in",
317
+ "the information shows",
318
+ ]
319
+
320
+ for indicator in claim_indicators:
321
+ if indicator in response_lower:
322
+ # Check if the surrounding text is actually in context
323
+ # This is a simplified check - more sophisticated methods would be needed
324
+ return False # For now, we'll be conservative
325
+
326
+ return False
327
+
328
+ def filter_response(self, response: str) -> str:
329
+ """
330
+ Filter response to remove potentially harmful content
331
+
332
+ Args:
333
+ response: Raw response string
334
+
335
+ Returns:
336
+ Filtered response string
337
+ """
338
+ # Remove HTML tags
339
+ response = re.sub(r"<[^>]+>", "", response)
340
+
341
+ # Remove script content
342
+ response = re.sub(
343
+ r"<script.*?</script>", "", response, flags=re.IGNORECASE | re.DOTALL
344
+ )
345
+
346
+ # Remove excessive newlines
347
+ response = re.sub(r"\n\s*\n\s*\n+", "\n\n", response)
348
+
349
+ return response.strip()
350
+
351
+
352
+ # =============================================================================
353
+ # DATA GUARD RAILS
354
+ # =============================================================================
355
+
356
+
357
+ class DataGuards:
358
+ """Guard rails for data privacy and PII detection"""
359
+
360
+ def __init__(self, config: GuardRailConfig):
361
+ self.config = config
362
+
363
+ # PII patterns
364
+ self.pii_patterns = {
365
+ "email": re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"),
366
+ "phone": re.compile(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b"),
367
+ "ssn": re.compile(r"\b\d{3}-\d{2}-\d{4}\b"),
368
+ "credit_card": re.compile(r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b"),
369
+ "ip_address": re.compile(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b"),
370
+ }
371
+
372
+ def detect_pii(self, text: str) -> GuardRailResult:
373
+ """
374
+ Detect personally identifiable information in text
375
+
376
+ Args:
377
+ text: Text to analyze for PII
378
+
379
+ Returns:
380
+ GuardRailResult with PII detection outcome
381
+ """
382
+ if not self.config.enable_pii_detection:
383
+ return GuardRailResult(
384
+ passed=True,
385
+ blocked=False,
386
+ reason="PII detection disabled",
387
+ confidence=1.0,
388
+ metadata={},
389
+ )
390
+
391
+ detected_pii = {}
392
+ for pii_type, pattern in self.pii_patterns.items():
393
+ matches = pattern.findall(text)
394
+ if matches:
395
+ detected_pii[pii_type] = len(matches)
396
+
397
+ if detected_pii:
398
+ return GuardRailResult(
399
+ passed=False,
400
+ blocked=True,
401
+ reason=f"PII detected: {', '.join(detected_pii.keys())}",
402
+ confidence=0.9,
403
+ metadata={"detected_pii": detected_pii},
404
+ )
405
+
406
+ return GuardRailResult(
407
+ passed=True,
408
+ blocked=False,
409
+ reason="No PII detected",
410
+ confidence=1.0,
411
+ metadata={},
412
+ )
413
+
414
+ def sanitize_pii(self, text: str) -> str:
415
+ """
416
+ Sanitize text by removing or masking PII
417
+
418
+ Args:
419
+ text: Text containing potential PII
420
+
421
+ Returns:
422
+ Sanitized text with PII masked
423
+ """
424
+ # Mask email addresses
425
+ text = re.sub(
426
+ r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b", "[EMAIL]", text
427
+ )
428
+
429
+ # Mask phone numbers
430
+ text = re.sub(r"\b\d{3}[-.]?\d{3}[-.]?\d{4}\b", "[PHONE]", text)
431
+
432
+ # Mask SSN
433
+ text = re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[SSN]", text)
434
+
435
+ # Mask credit card numbers
436
+ text = re.sub(r"\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b", "[CREDIT_CARD]", text)
437
+
438
+ # Mask IP addresses
439
+ text = re.sub(r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b", "[IP_ADDRESS]", text)
440
+
441
+ return text
442
+
443
+
444
+ # =============================================================================
445
+ # SYSTEM GUARD RAILS
446
+ # =============================================================================
447
+
448
+
449
+ class SystemGuards:
450
+ """Guard rails for system-level protection"""
451
+
452
+ def __init__(self, config: GuardRailConfig):
453
+ self.config = config
454
+ self.request_history = defaultdict(lambda: deque(maxlen=1000))
455
+ self.blocked_users = set()
456
+
457
+ def check_rate_limit(self, user_id: str) -> GuardRailResult:
458
+ """
459
+ Check if user has exceeded rate limits
460
+
461
+ Args:
462
+ user_id: User identifier
463
+
464
+ Returns:
465
+ GuardRailResult with rate limit check outcome
466
+ """
467
+ current_time = time.time()
468
+ user_requests = self.request_history[user_id]
469
+
470
+ # Remove old requests outside the window
471
+ while (
472
+ user_requests
473
+ and current_time - user_requests[0] > self.config.rate_limit_window
474
+ ):
475
+ user_requests.popleft()
476
+
477
+ # Check if user is blocked
478
+ if user_id in self.blocked_users:
479
+ return GuardRailResult(
480
+ passed=False,
481
+ blocked=True,
482
+ reason="User is blocked due to previous violations",
483
+ confidence=1.0,
484
+ metadata={"user_id": user_id},
485
+ )
486
+
487
+ # Check rate limit
488
+ if len(user_requests) >= self.config.rate_limit_requests:
489
+ # Block user temporarily
490
+ self.blocked_users.add(user_id)
491
+ return GuardRailResult(
492
+ passed=False,
493
+ blocked=True,
494
+ reason=f"Rate limit exceeded ({len(user_requests)} requests in {self.config.rate_limit_window}s)",
495
+ confidence=1.0,
496
+ metadata={"requests": len(user_requests)},
497
+ )
498
+
499
+ # Add current request
500
+ user_requests.append(current_time)
501
+
502
+ return GuardRailResult(
503
+ passed=True,
504
+ blocked=False,
505
+ reason="Rate limit check passed",
506
+ confidence=1.0,
507
+ metadata={"requests": len(user_requests)},
508
+ )
509
+
510
+ def check_resource_usage(
511
+ self, memory_usage: float, cpu_usage: float
512
+ ) -> GuardRailResult:
513
+ """
514
+ Check system resource usage
515
+
516
+ Args:
517
+ memory_usage: Current memory usage percentage
518
+ cpu_usage: Current CPU usage percentage
519
+
520
+ Returns:
521
+ GuardRailResult with resource check outcome
522
+ """
523
+ # Define thresholds
524
+ memory_threshold = 90.0 # 90% memory usage
525
+ cpu_threshold = 95.0 # 95% CPU usage
526
+
527
+ if memory_usage > memory_threshold:
528
+ return GuardRailResult(
529
+ passed=False,
530
+ blocked=True,
531
+ reason=f"High memory usage ({memory_usage:.1f}%)",
532
+ confidence=1.0,
533
+ metadata={"memory_usage": memory_usage},
534
+ )
535
+
536
+ if cpu_usage > cpu_threshold:
537
+ return GuardRailResult(
538
+ passed=False,
539
+ blocked=True,
540
+ reason=f"High CPU usage ({cpu_usage:.1f}%)",
541
+ confidence=1.0,
542
+ metadata={"cpu_usage": cpu_usage},
543
+ )
544
+
545
+ return GuardRailResult(
546
+ passed=True,
547
+ blocked=False,
548
+ reason="Resource usage acceptable",
549
+ confidence=1.0,
550
+ metadata={"memory_usage": memory_usage, "cpu_usage": cpu_usage},
551
+ )
552
+
553
+
554
+ # =============================================================================
555
+ # MAIN GUARD RAIL SYSTEM
556
+ # =============================================================================
557
+
558
+
559
+ class GuardRailSystem:
560
+ """
561
+ Comprehensive guard rail system for RAG
562
+
563
+ This class orchestrates all guard rail components to ensure
564
+ safe and reliable operation of the RAG system.
565
+ """
566
+
567
+ def __init__(self, config: GuardRailConfig = None):
568
+ self.config = config or GuardRailConfig()
569
+
570
+ # Initialize all guard rail components
571
+ self.input_guards = InputGuards(self.config)
572
+ self.output_guards = OutputGuards(self.config)
573
+ self.data_guards = DataGuards(self.config)
574
+ self.system_guards = SystemGuards(self.config)
575
+
576
+ logger.info("Guard rail system initialized successfully")
577
+
578
+ def validate_input(self, query: str, user_id: str = "anonymous") -> GuardRailResult:
579
+ """
580
+ Comprehensive input validation
581
+
582
+ Args:
583
+ query: User query
584
+ user_id: User identifier
585
+
586
+ Returns:
587
+ GuardRailResult with validation outcome
588
+ """
589
+ # Check rate limits first
590
+ rate_limit_result = self.system_guards.check_rate_limit(user_id)
591
+ if not rate_limit_result.passed:
592
+ return rate_limit_result
593
+
594
+ # Validate query
595
+ query_result = self.input_guards.validate_query(query, user_id)
596
+ if not query_result.passed:
597
+ return query_result
598
+
599
+ # Check for PII in query
600
+ pii_result = self.data_guards.detect_pii(query)
601
+ if not pii_result.passed:
602
+ return pii_result
603
+
604
+ return GuardRailResult(
605
+ passed=True,
606
+ blocked=False,
607
+ reason="Input validation passed",
608
+ confidence=1.0,
609
+ metadata={},
610
+ )
611
+
612
+ def validate_output(
613
+ self, response: str, confidence: float, context: str = ""
614
+ ) -> GuardRailResult:
615
+ """
616
+ Comprehensive output validation
617
+
618
+ Args:
619
+ response: Generated response
620
+ confidence: Confidence score
621
+ context: Retrieved context
622
+
623
+ Returns:
624
+ GuardRailResult with validation outcome
625
+ """
626
+ # Validate response
627
+ response_result = self.output_guards.validate_response(
628
+ response, confidence, context
629
+ )
630
+ if not response_result.passed:
631
+ return response_result
632
+
633
+ # Check for PII in response
634
+ pii_result = self.data_guards.detect_pii(response)
635
+ if not pii_result.passed:
636
+ return pii_result
637
+
638
+ return GuardRailResult(
639
+ passed=True,
640
+ blocked=False,
641
+ reason="Output validation passed",
642
+ confidence=confidence,
643
+ metadata={},
644
+ )
645
+
646
+ def sanitize_input(self, query: str) -> str:
647
+ """Sanitize user input"""
648
+ return self.input_guards.sanitize_query(query)
649
+
650
+ def sanitize_output(self, response: str) -> str:
651
+ """Sanitize generated output"""
652
+ return self.output_guards.filter_response(response)
653
+
654
+ def sanitize_data(self, text: str) -> str:
655
+ """Sanitize data by removing PII"""
656
+ return self.data_guards.sanitize_pii(text)
657
+
658
+ def get_system_status(self) -> Dict[str, Any]:
659
+ """
660
+ Get current system status and statistics
661
+
662
+ Returns:
663
+ Dictionary with system status information
664
+ """
665
+ return {
666
+ "total_users": len(self.system_guards.request_history),
667
+ "blocked_users": len(self.system_guards.blocked_users),
668
+ "config": {
669
+ "max_query_length": self.config.max_query_length,
670
+ "max_response_length": self.config.max_response_length,
671
+ "min_confidence_threshold": self.config.min_confidence_threshold,
672
+ "rate_limit_requests": self.config.rate_limit_requests,
673
+ "rate_limit_window": self.config.rate_limit_window,
674
+ },
675
+ }
hf_spaces_config.py ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face Spaces Configuration
3
+ ================================
4
+
5
+ This module contains configuration settings optimized for deployment on
6
+ Hugging Face Spaces. It handles cache directories, permissions, and
7
+ environment-specific optimizations.
8
+
9
+ Key Features:
10
+ - Automatic cache directory setup in /tmp
11
+ - Permission handling for HF Spaces environment
12
+ - Model loading optimizations
13
+ - Resource usage monitoring
14
+ """
15
+
16
+ import os
17
+ import logging
18
+ from pathlib import Path
19
+
20
+ # Configure logging for HF Spaces
21
+ logging.basicConfig(
22
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
23
+ )
24
+ logger = logging.getLogger(__name__)
25
+
26
+
27
+ class HFSpacesConfig:
28
+ """
29
+ Configuration class for Hugging Face Spaces deployment
30
+
31
+ This class manages all environment-specific settings and ensures
32
+ the application works correctly in the HF Spaces environment.
33
+ """
34
+
35
+ def __init__(self):
36
+ """Initialize HF Spaces configuration"""
37
+ self.is_hf_spaces = self._detect_hf_spaces()
38
+ self.cache_dirs = self._setup_cache_directories()
39
+ self.env_vars = self._setup_environment_variables()
40
+
41
+ def _detect_hf_spaces(self) -> bool:
42
+ """
43
+ Detect if running in Hugging Face Spaces environment
44
+
45
+ Returns:
46
+ bool: True if running in HF Spaces
47
+ """
48
+ # Check for HF Spaces environment indicators
49
+ hf_indicators = [
50
+ "SPACE_ID" in os.environ,
51
+ "SPACE_HOST" in os.environ,
52
+ "HF_HUB_ENDPOINT" in os.environ,
53
+ os.path.exists("/tmp/huggingface"),
54
+ ]
55
+
56
+ is_hf = any(hf_indicators)
57
+ logger.info(f"HF Spaces environment detected: {is_hf}")
58
+ return is_hf
59
+
60
+ def _setup_cache_directories(self) -> dict:
61
+ """
62
+ Set up cache directories for HF Spaces
63
+
64
+ Returns:
65
+ dict: Cache directory paths
66
+ """
67
+ if self.is_hf_spaces:
68
+ # Use /tmp for HF Spaces (writable)
69
+ cache_dirs = {
70
+ "hf_home": "/tmp/huggingface",
71
+ "transformers_cache": "/tmp/huggingface/transformers",
72
+ "torch_home": "/tmp/torch",
73
+ "hub_cache": "/tmp/huggingface/hub",
74
+ "xdg_cache": "/tmp",
75
+ "vector_store": "./vector_store",
76
+ }
77
+ else:
78
+ # Use standard locations for local development
79
+ cache_dirs = {
80
+ "hf_home": os.path.expanduser("~/.cache/huggingface"),
81
+ "transformers_cache": os.path.expanduser(
82
+ "~/.cache/huggingface/transformers"
83
+ ),
84
+ "torch_home": os.path.expanduser("~/.cache/torch"),
85
+ "hub_cache": os.path.expanduser("~/.cache/huggingface/hub"),
86
+ "xdg_cache": os.path.expanduser("~/.cache"),
87
+ "vector_store": "./vector_store",
88
+ }
89
+
90
+ # Create directories
91
+ for name, path in cache_dirs.items():
92
+ try:
93
+ Path(path).mkdir(parents=True, exist_ok=True)
94
+ logger.info(f"Cache directory ready: {name} -> {path}")
95
+ except Exception as e:
96
+ logger.warning(f"Could not create cache directory {name}: {e}")
97
+
98
+ return cache_dirs
99
+
100
+ def _setup_environment_variables(self) -> dict:
101
+ """
102
+ Set up environment variables for HF Spaces
103
+
104
+ Returns:
105
+ dict: Environment variable settings
106
+ """
107
+ env_vars = {
108
+ "HF_HOME": self.cache_dirs["hf_home"],
109
+ "TRANSFORMERS_CACHE": self.cache_dirs["transformers_cache"],
110
+ "TORCH_HOME": self.cache_dirs["torch_home"],
111
+ "XDG_CACHE_HOME": self.cache_dirs["xdg_cache"],
112
+ "HF_HUB_CACHE": self.cache_dirs["hub_cache"],
113
+ "PYTHONPATH": "/app",
114
+ "STREAMLIT_SERVER_PORT": "8501",
115
+ "STREAMLIT_SERVER_ADDRESS": "0.0.0.0",
116
+ "STREAMLIT_SERVER_HEADLESS": "true",
117
+ "STREAMLIT_SERVER_ENABLE_CORS": "false",
118
+ "STREAMLIT_SERVER_ENABLE_XSRF_PROTECTION": "false",
119
+ "STREAMLIT_LOGGER_LEVEL": "info",
120
+ }
121
+
122
+ # Set environment variables
123
+ for key, value in env_vars.items():
124
+ os.environ[key] = value
125
+ logger.info(f"Set environment variable: {key}={value}")
126
+
127
+ return env_vars
128
+
129
+ def get_model_config(self) -> dict:
130
+ """
131
+ Get optimized model configuration for HF Spaces
132
+
133
+ Returns:
134
+ dict: Model configuration settings
135
+ """
136
+ return {
137
+ "embedding_model": "all-MiniLM-L6-v2",
138
+ "generative_model": "Qwen/Qwen2.5-1.5B-Instruct",
139
+ "fallback_model": "distilgpt2",
140
+ "chunk_sizes": [512, 1024, 2048],
141
+ "vector_store_path": self.cache_dirs["vector_store"],
142
+ "enable_guard_rails": True,
143
+ "cache_dir": self.cache_dirs["transformers_cache"],
144
+ }
145
+
146
+ def get_guard_rail_config(self) -> dict:
147
+ """
148
+ Get guard rail configuration optimized for HF Spaces
149
+
150
+ Returns:
151
+ dict: Guard rail configuration settings
152
+ """
153
+ return {
154
+ "max_query_length": 1000,
155
+ "max_response_length": 5000,
156
+ "min_confidence_threshold": 0.3,
157
+ "rate_limit_requests": 10,
158
+ "rate_limit_window": 60,
159
+ "enable_pii_detection": True,
160
+ "enable_prompt_injection_detection": True,
161
+ }
162
+
163
+ def get_resource_limits(self) -> dict:
164
+ """
165
+ Get resource limits for HF Spaces environment
166
+
167
+ Returns:
168
+ dict: Resource limit settings
169
+ """
170
+ return {
171
+ "max_memory_usage": 0.8, # 80% of available memory
172
+ "max_cpu_usage": 0.9, # 90% of available CPU
173
+ "max_concurrent_requests": 5,
174
+ "model_timeout": 30, # seconds
175
+ "cache_cleanup_interval": 3600, # 1 hour
176
+ }
177
+
178
+ def cleanup_cache(self):
179
+ """
180
+ Clean up cache directories to free space
181
+
182
+ This is important for HF Spaces with limited storage.
183
+ """
184
+ if not self.is_hf_spaces:
185
+ return
186
+
187
+ try:
188
+ import shutil
189
+ import time
190
+
191
+ # Remove old cache files (older than 1 hour)
192
+ current_time = time.time()
193
+ for cache_path in [
194
+ self.cache_dirs["transformers_cache"],
195
+ self.cache_dirs["torch_home"],
196
+ ]:
197
+ if os.path.exists(cache_path):
198
+ for item in os.listdir(cache_path):
199
+ item_path = os.path.join(cache_path, item)
200
+ if os.path.isfile(item_path):
201
+ if current_time - os.path.getmtime(item_path) > 3600:
202
+ os.remove(item_path)
203
+ logger.info(f"Cleaned up old cache file: {item_path}")
204
+
205
+ logger.info("Cache cleanup completed")
206
+ except Exception as e:
207
+ logger.warning(f"Cache cleanup failed: {e}")
208
+
209
+
210
+ # Global configuration instance
211
+ hf_config = HFSpacesConfig()
212
+
213
+
214
+ def get_hf_config() -> HFSpacesConfig:
215
+ """
216
+ Get the global HF Spaces configuration instance
217
+
218
+ Returns:
219
+ HFSpacesConfig: Configuration instance
220
+ """
221
+ return hf_config
222
+
223
+
224
+ def is_hf_spaces() -> bool:
225
+ """
226
+ Check if running in HF Spaces environment
227
+
228
+ Returns:
229
+ bool: True if in HF Spaces
230
+ """
231
+ return hf_config.is_hf_spaces
232
+
233
+
234
+ def get_cache_dir() -> str:
235
+ """
236
+ Get the appropriate cache directory for the current environment
237
+
238
+ Returns:
239
+ str: Cache directory path
240
+ """
241
+ return hf_config.cache_dirs["transformers_cache"]
pdf_processor.py CHANGED
@@ -1,8 +1,44 @@
1
  #!/usr/bin/env python3
2
  """
3
- Simplified PDF Processor for Hugging Face Spaces
4
 
5
- This module provides PDF processing functionality for the simplified RAG system.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """
7
 
8
  import os
@@ -15,9 +51,23 @@ import pypdf
15
  from loguru import logger
16
 
17
 
 
 
 
 
 
18
  @dataclass
19
  class DocumentChunk:
20
- """Represents a document chunk"""
 
 
 
 
 
 
 
 
 
21
 
22
  text: str
23
  doc_id: str
@@ -28,7 +78,15 @@ class DocumentChunk:
28
 
29
  @dataclass
30
  class ProcessedDocument:
31
- """Represents a processed document"""
 
 
 
 
 
 
 
 
32
 
33
  filename: str
34
  title: str
@@ -36,11 +94,31 @@ class ProcessedDocument:
36
  chunks: List[DocumentChunk]
37
 
38
 
 
 
 
 
 
39
  class SimplePDFProcessor:
40
- """Simplified PDF processor for Hugging Face Spaces"""
 
 
 
 
 
 
 
 
 
41
 
42
  def __init__(self):
43
- """Initialize the PDF processor"""
 
 
 
 
 
 
44
  self.stop_words = {
45
  "the",
46
  "a",
@@ -86,31 +164,41 @@ class SimplePDFProcessor:
86
  self, file_path: str, chunk_sizes: List[int] = None
87
  ) -> ProcessedDocument:
88
  """
89
- Process a PDF document
 
 
 
 
 
 
 
90
 
91
  Args:
92
- file_path: Path to the PDF file
93
- chunk_sizes: List of chunk sizes to use
94
 
95
  Returns:
96
- Processed document
 
 
 
97
  """
98
  if chunk_sizes is None:
99
- chunk_sizes = [100, 400]
100
 
101
  try:
102
- # Extract text from PDF
103
  text = self._extract_text(file_path)
104
 
105
- # Clean text
106
  cleaned_text = self._clean_text(text)
107
 
108
- # Extract metadata
109
  metadata = self._extract_metadata(file_path)
110
 
111
- # Create chunks
112
  chunks = []
113
- doc_id = str(uuid.uuid4())
114
 
115
  for chunk_size in chunk_sizes:
116
  chunk_list = self._create_chunks(
@@ -118,6 +206,7 @@ class SimplePDFProcessor:
118
  )
119
  chunks.extend(chunk_list)
120
 
 
121
  return ProcessedDocument(
122
  filename=metadata["filename"],
123
  title=metadata["title"],
@@ -130,12 +219,32 @@ class SimplePDFProcessor:
130
  raise
131
 
132
  def _extract_text(self, file_path: str) -> str:
133
- """Extract text from PDF file"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
  try:
135
  with open(file_path, "rb") as file:
 
136
  pdf_reader = pypdf.PdfReader(file)
137
  text = ""
138
 
 
139
  for page in pdf_reader.pages:
140
  page_text = page.extract_text()
141
  if page_text:
@@ -148,25 +257,52 @@ class SimplePDFProcessor:
148
  raise
149
 
150
  def _clean_text(self, text: str) -> str:
151
- """Clean and preprocess text"""
152
- # Remove extra whitespace
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  text = re.sub(r"\s+", " ", text)
154
 
155
- # Remove special characters but keep punctuation
 
156
  text = re.sub(r"[^\w\s\.\,\!\?\;\:\-\(\)\[\]\{\}]", "", text)
157
 
158
- # Remove page numbers and headers/footers
159
- text = re.sub(
160
- r"\b\d+\b(?=\s*\n)", "", text
161
- ) # Remove standalone numbers at line ends
162
 
163
- # Remove excessive newlines
164
  text = re.sub(r"\n\s*\n\s*\n+", "\n\n", text)
165
 
166
  return text.strip()
167
 
168
  def _extract_metadata(self, file_path: str) -> Dict[str, str]:
169
- """Extract metadata from PDF file"""
 
 
 
 
 
 
 
 
 
 
 
 
 
170
  try:
171
  with open(file_path, "rb") as file:
172
  pdf_reader = pypdf.PdfReader(file)
@@ -184,6 +320,7 @@ class SimplePDFProcessor:
184
 
185
  except Exception as e:
186
  logger.warning(f"Error extracting metadata from {file_path}: {e}")
 
187
  return {
188
  "filename": Path(file_path).name,
189
  "title": Path(file_path).stem,
@@ -193,19 +330,37 @@ class SimplePDFProcessor:
193
  def _create_chunks(
194
  self, text: str, chunk_size: int, doc_id: str, filename: str
195
  ) -> List[DocumentChunk]:
196
- """Create text chunks of specified size"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  chunks = []
198
 
199
- # Split text into sentences
200
  sentences = self._split_into_sentences(text)
201
 
202
  current_chunk = ""
203
  chunk_id = 0
204
 
205
  for sentence in sentences:
206
- # Estimate token count (rough approximation)
207
  estimated_tokens = len(sentence.split())
208
 
 
209
  if len(current_chunk.split()) + estimated_tokens <= chunk_size:
210
  current_chunk += sentence + " "
211
  else:
@@ -222,7 +377,7 @@ class SimplePDFProcessor:
222
  )
223
  chunk_id += 1
224
 
225
- # Start new chunk
226
  current_chunk = sentence + " "
227
 
228
  # Add the last chunk if not empty
@@ -240,28 +395,56 @@ class SimplePDFProcessor:
240
  return chunks
241
 
242
  def _split_into_sentences(self, text: str) -> List[str]:
243
- """Split text into sentences"""
244
- # Simple sentence splitting
 
 
 
 
 
 
 
 
 
 
 
 
 
245
  sentences = re.split(r"[.!?]+", text)
246
 
247
  # Clean and filter sentences
248
  cleaned_sentences = []
249
  for sentence in sentences:
250
  sentence = sentence.strip()
251
- if sentence and len(sentence.split()) > 3: # Minimum 3 words
 
252
  cleaned_sentences.append(sentence)
253
 
254
  return cleaned_sentences
255
 
256
  def preprocess_query(self, query: str) -> str:
257
- """Preprocess query text"""
258
- # Convert to lowercase
 
 
 
 
 
 
 
 
 
 
 
 
 
 
259
  query = query.lower()
260
 
261
- # Remove punctuation
262
  query = re.sub(r"[^\w\s]", "", query)
263
 
264
- # Remove stop words
265
  words = query.split()
266
  filtered_words = [word for word in words if word not in self.stop_words]
267
 
 
1
  #!/usr/bin/env python3
2
  """
3
+ # Simplified PDF Processor for Hugging Face Spaces
4
 
5
+ This module provides comprehensive PDF processing functionality for the RAG system.
6
+
7
+ ## Overview
8
+
9
+ The PDF processor handles the complete pipeline from raw PDF files to structured,
10
+ searchable document chunks. It includes:
11
+
12
+ - **Text Extraction**: Robust PDF text extraction with error handling
13
+ - **Text Cleaning**: Intelligent preprocessing and normalization
14
+ - **Metadata Extraction**: Document title, author, and file information
15
+ - **Smart Chunking**: Multiple chunk sizes for optimal retrieval
16
+ - **Query Preprocessing**: Text normalization for search queries
17
+
18
+ ## Key Features
19
+
20
+ - 📄 **Multi-format Support**: Handles various PDF structures and layouts
21
+ - 🧹 **Intelligent Cleaning**: Removes noise while preserving important content
22
+ - 📏 **Flexible Chunking**: Multiple chunk sizes for different use cases
23
+ - 🔍 **Search Optimization**: Preprocessing for better retrieval performance
24
+ - 🛡️ **Error Handling**: Graceful handling of corrupted or problematic files
25
+
26
+ ## Architecture
27
+
28
+ The processor follows a modular design:
29
+ 1. **Text Extraction**: Raw PDF to text conversion
30
+ 2. **Text Cleaning**: Noise removal and normalization
31
+ 3. **Metadata Extraction**: Document information extraction
32
+ 4. **Chunking**: Intelligent text segmentation
33
+ 5. **Query Processing**: Search query optimization
34
+
35
+ ## Usage Example
36
+
37
+ ```python
38
+ processor = SimplePDFProcessor()
39
+ processed_doc = processor.process_document("document.pdf", [100, 400])
40
+ print(f"Processed {len(processed_doc.chunks)} chunks")
41
+ ```
42
  """
43
 
44
  import os
 
51
  from loguru import logger
52
 
53
 
54
+ # =============================================================================
55
+ # DATA STRUCTURES
56
+ # =============================================================================
57
+
58
+
59
  @dataclass
60
  class DocumentChunk:
61
+ """
62
+ Represents a processed document chunk with metadata
63
+
64
+ Attributes:
65
+ text: The cleaned and processed text content
66
+ doc_id: Unique identifier for the source document
67
+ filename: Name of the source PDF file
68
+ chunk_id: Unique identifier for this specific chunk
69
+ chunk_size: Target size used for chunking (in tokens)
70
+ """
71
 
72
  text: str
73
  doc_id: str
 
78
 
79
  @dataclass
80
  class ProcessedDocument:
81
+ """
82
+ Represents a completely processed PDF document
83
+
84
+ Attributes:
85
+ filename: Name of the PDF file
86
+ title: Extracted or inferred document title
87
+ author: Extracted or inferred document author
88
+ chunks: List of processed document chunks
89
+ """
90
 
91
  filename: str
92
  title: str
 
94
  chunks: List[DocumentChunk]
95
 
96
 
97
+ # =============================================================================
98
+ # MAIN PDF PROCESSOR CLASS
99
+ # =============================================================================
100
+
101
+
102
  class SimplePDFProcessor:
103
+ """
104
+ Simplified PDF processor for Hugging Face Spaces
105
+
106
+ This class provides comprehensive PDF processing capabilities including:
107
+ - Text extraction and cleaning
108
+ - Metadata extraction
109
+ - Intelligent chunking
110
+ - Query preprocessing
111
+ - Error handling and logging
112
+ """
113
 
114
  def __init__(self):
115
+ """
116
+ Initialize the PDF processor with default settings
117
+
118
+ Sets up stop words and processing parameters for optimal
119
+ document processing and search performance.
120
+ """
121
+ # Common English stop words for query preprocessing
122
  self.stop_words = {
123
  "the",
124
  "a",
 
164
  self, file_path: str, chunk_sizes: List[int] = None
165
  ) -> ProcessedDocument:
166
  """
167
+ Process a PDF document through the complete pipeline
168
+
169
+ This method orchestrates the entire PDF processing workflow:
170
+ 1. Extracts text from the PDF file
171
+ 2. Cleans and normalizes the text
172
+ 3. Extracts document metadata
173
+ 4. Creates chunks of different sizes
174
+ 5. Returns a structured document object
175
 
176
  Args:
177
+ file_path: Path to the PDF file to process
178
+ chunk_sizes: List of chunk sizes to create (in tokens)
179
 
180
  Returns:
181
+ ProcessedDocument object with metadata and chunks
182
+
183
+ Raises:
184
+ Exception: If document processing fails
185
  """
186
  if chunk_sizes is None:
187
+ chunk_sizes = [100, 400] # Default chunk sizes
188
 
189
  try:
190
+ # Step 1: Extract raw text from PDF
191
  text = self._extract_text(file_path)
192
 
193
+ # Step 2: Clean and normalize the text
194
  cleaned_text = self._clean_text(text)
195
 
196
+ # Step 3: Extract document metadata
197
  metadata = self._extract_metadata(file_path)
198
 
199
+ # Step 4: Create chunks of different sizes
200
  chunks = []
201
+ doc_id = str(uuid.uuid4()) # Generate unique document ID
202
 
203
  for chunk_size in chunk_sizes:
204
  chunk_list = self._create_chunks(
 
206
  )
207
  chunks.extend(chunk_list)
208
 
209
+ # Step 5: Return processed document
210
  return ProcessedDocument(
211
  filename=metadata["filename"],
212
  title=metadata["title"],
 
219
  raise
220
 
221
  def _extract_text(self, file_path: str) -> str:
222
+ """
223
+ Extract text content from a PDF file
224
+
225
+ This method:
226
+ 1. Opens the PDF file safely
227
+ 2. Iterates through all pages
228
+ 3. Extracts text from each page
229
+ 4. Combines all text with proper spacing
230
+ 5. Handles extraction errors gracefully
231
+
232
+ Args:
233
+ file_path: Path to the PDF file
234
+
235
+ Returns:
236
+ Extracted text content as a string
237
+
238
+ Raises:
239
+ Exception: If text extraction fails
240
+ """
241
  try:
242
  with open(file_path, "rb") as file:
243
+ # Create PDF reader object
244
  pdf_reader = pypdf.PdfReader(file)
245
  text = ""
246
 
247
+ # Extract text from each page
248
  for page in pdf_reader.pages:
249
  page_text = page.extract_text()
250
  if page_text:
 
257
  raise
258
 
259
  def _clean_text(self, text: str) -> str:
260
+ """
261
+ Clean and normalize extracted text
262
+
263
+ This method performs comprehensive text cleaning:
264
+ 1. Removes excessive whitespace and newlines
265
+ 2. Normalizes special characters while preserving punctuation
266
+ 3. Removes page numbers and headers/footers
267
+ 4. Ensures consistent formatting
268
+
269
+ Args:
270
+ text: Raw extracted text from PDF
271
+
272
+ Returns:
273
+ Cleaned and normalized text
274
+ """
275
+ # Remove excessive whitespace (multiple spaces, tabs, etc.)
276
  text = re.sub(r"\s+", " ", text)
277
 
278
+ # Remove special characters but preserve important punctuation
279
+ # This keeps: letters, numbers, spaces, and common punctuation
280
  text = re.sub(r"[^\w\s\.\,\!\?\;\:\-\(\)\[\]\{\}]", "", text)
281
 
282
+ # Remove standalone page numbers at line ends
283
+ # These are often artifacts from PDF extraction
284
+ text = re.sub(r"\b\d+\b(?=\s*\n)", "", text)
 
285
 
286
+ # Normalize excessive newlines to consistent paragraph breaks
287
  text = re.sub(r"\n\s*\n\s*\n+", "\n\n", text)
288
 
289
  return text.strip()
290
 
291
  def _extract_metadata(self, file_path: str) -> Dict[str, str]:
292
+ """
293
+ Extract metadata from PDF file
294
+
295
+ This method attempts to extract:
296
+ 1. Document title from PDF metadata
297
+ 2. Author information from PDF metadata
298
+ 3. Falls back to filename if metadata is unavailable
299
+
300
+ Args:
301
+ file_path: Path to the PDF file
302
+
303
+ Returns:
304
+ Dictionary containing filename, title, and author
305
+ """
306
  try:
307
  with open(file_path, "rb") as file:
308
  pdf_reader = pypdf.PdfReader(file)
 
320
 
321
  except Exception as e:
322
  logger.warning(f"Error extracting metadata from {file_path}: {e}")
323
+ # Fallback to basic information
324
  return {
325
  "filename": Path(file_path).name,
326
  "title": Path(file_path).stem,
 
330
  def _create_chunks(
331
  self, text: str, chunk_size: int, doc_id: str, filename: str
332
  ) -> List[DocumentChunk]:
333
+ """
334
+ Create text chunks of specified size
335
+
336
+ This method implements intelligent chunking:
337
+ 1. Splits text into sentences for natural boundaries
338
+ 2. Groups sentences into chunks of target size
339
+ 3. Ensures chunks don't exceed the specified token limit
340
+ 4. Creates unique identifiers for each chunk
341
+
342
+ Args:
343
+ text: Clean text to chunk
344
+ chunk_size: Target chunk size in tokens
345
+ doc_id: Unique document identifier
346
+ filename: Source filename
347
+
348
+ Returns:
349
+ List of DocumentChunk objects
350
+ """
351
  chunks = []
352
 
353
+ # Split text into sentences for natural chunking
354
  sentences = self._split_into_sentences(text)
355
 
356
  current_chunk = ""
357
  chunk_id = 0
358
 
359
  for sentence in sentences:
360
+ # Estimate token count (rough approximation using word count)
361
  estimated_tokens = len(sentence.split())
362
 
363
+ # Add sentence to current chunk if it fits
364
  if len(current_chunk.split()) + estimated_tokens <= chunk_size:
365
  current_chunk += sentence + " "
366
  else:
 
377
  )
378
  chunk_id += 1
379
 
380
+ # Start new chunk with current sentence
381
  current_chunk = sentence + " "
382
 
383
  # Add the last chunk if not empty
 
395
  return chunks
396
 
397
  def _split_into_sentences(self, text: str) -> List[str]:
398
+ """
399
+ Split text into sentences for intelligent chunking
400
+
401
+ This method:
402
+ 1. Uses regex patterns to identify sentence boundaries
403
+ 2. Filters out very short sentences (likely noise)
404
+ 3. Ensures minimum sentence quality
405
+
406
+ Args:
407
+ text: Text to split into sentences
408
+
409
+ Returns:
410
+ List of sentence strings
411
+ """
412
+ # Split on sentence-ending punctuation
413
  sentences = re.split(r"[.!?]+", text)
414
 
415
  # Clean and filter sentences
416
  cleaned_sentences = []
417
  for sentence in sentences:
418
  sentence = sentence.strip()
419
+ # Only include sentences with meaningful content (minimum 3 words)
420
+ if sentence and len(sentence.split()) > 3:
421
  cleaned_sentences.append(sentence)
422
 
423
  return cleaned_sentences
424
 
425
  def preprocess_query(self, query: str) -> str:
426
+ """
427
+ Preprocess query text for better search performance
428
+
429
+ This method applies text normalization techniques:
430
+ 1. Converts to lowercase for case-insensitive matching
431
+ 2. Removes punctuation that might interfere with search
432
+ 3. Filters out common stop words
433
+ 4. Returns normalized query string
434
+
435
+ Args:
436
+ query: Raw query string from user
437
+
438
+ Returns:
439
+ Preprocessed query string optimized for search
440
+ """
441
+ # Convert to lowercase for consistent matching
442
  query = query.lower()
443
 
444
+ # Remove punctuation that might interfere with search
445
  query = re.sub(r"[^\w\s]", "", query)
446
 
447
+ # Remove stop words to focus on meaningful terms
448
  words = query.split()
449
  filtered_words = [word for word in words if word not in self.stop_words]
450
 
rag_system.py CHANGED
@@ -1,12 +1,47 @@
1
  #!/usr/bin/env python3
2
  """
3
- Simplified RAG System for Hugging Face Spaces
4
 
5
- This module provides a simplified RAG system using:
6
- - FAISS for vector storage
7
- - BM25 for sparse retrieval
8
- - Hybrid search combining both
9
- - Qwen 2.5 1.5B for generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  """
11
 
12
  import os
@@ -20,16 +55,42 @@ import torch
20
  from loguru import logger
21
  import threading
22
 
23
- # Import required libraries
24
  from sentence_transformers import SentenceTransformer
25
  from rank_bm25 import BM25Okapi
26
  import faiss
27
  from transformers import AutoTokenizer, AutoModelForCausalLM
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  @dataclass
31
  class DocumentChunk:
32
- """Represents a document chunk"""
 
 
 
 
 
 
 
 
 
33
 
34
  text: str
35
  doc_id: str
@@ -40,7 +101,18 @@ class DocumentChunk:
40
 
41
  @dataclass
42
  class SearchResult:
43
- """Represents a search result"""
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  text: str
46
  score: float
@@ -53,7 +125,17 @@ class SearchResult:
53
 
54
  @dataclass
55
  class RAGResponse:
56
- """Represents a RAG response"""
 
 
 
 
 
 
 
 
 
 
57
 
58
  answer: str
59
  confidence: float
@@ -63,8 +145,22 @@ class RAGResponse:
63
  query: str
64
 
65
 
 
 
 
 
 
66
  class SimpleRAGSystem:
67
- """Simplified RAG system for Hugging Face Spaces"""
 
 
 
 
 
 
 
 
 
68
 
69
  def __init__(
70
  self,
@@ -72,68 +168,121 @@ class SimpleRAGSystem:
72
  generative_model: str = "Qwen/Qwen2.5-1.5B-Instruct",
73
  chunk_sizes: List[int] = None,
74
  vector_store_path: str = "./vector_store",
 
 
75
  ):
76
  """
77
- Initialize the RAG system
78
 
79
  Args:
80
  embedding_model: Sentence transformer model for embeddings
81
- generative_model: Language model for generation
82
- chunk_sizes: List of chunk sizes to use
83
- vector_store_path: Path to store FAISS index and metadata
 
 
84
  """
85
  self.embedding_model = embedding_model
86
  self.generative_model = generative_model
87
- self.chunk_sizes = chunk_sizes or [100, 400]
88
  self.vector_store_path = vector_store_path
89
-
90
- # Initialize components
91
- self.embedder = None
92
- self.tokenizer = None
93
- self.model = None
94
- self.faiss_index = None
95
- self.bm25 = None
96
- self.documents = []
97
- self.chunks = []
 
98
  self._lock = threading.Lock() # Thread safety for concurrent loading
99
 
100
- # Create vector store directory
 
 
 
 
 
 
 
 
101
  os.makedirs(vector_store_path, exist_ok=True)
102
 
103
- # Load or initialize components
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  self._load_models()
105
  self._load_or_create_index()
106
 
107
  logger.info("Simple RAG system initialized successfully!")
108
 
109
  def _load_models(self):
110
- """Load embedding and generative models"""
 
 
 
 
 
 
 
 
111
  try:
112
- # Load embedding model
113
- self.embedder = SentenceTransformer(self.embedding_model)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  self.vector_size = self.embedder.get_sentence_embedding_dimension()
115
 
116
- # Load generative model with fallback
117
  model_loaded = False
118
 
119
- # Try Qwen model first
120
  try:
121
  self.tokenizer = AutoTokenizer.from_pretrained(
122
  self.generative_model,
123
  trust_remote_code=True,
124
- padding_side="left",
 
125
  )
126
 
127
- # Load model with explicit CPU configuration
128
  self.model = AutoModelForCausalLM.from_pretrained(
129
  self.generative_model,
130
  trust_remote_code=True,
131
- torch_dtype=torch.float32,
132
- device_map=None,
133
- low_cpu_mem_usage=False,
 
134
  )
135
 
136
- # Move to CPU explicitly
137
  self.model = self.model.to("cpu")
138
  model_loaded = True
139
 
@@ -161,7 +310,7 @@ class SimpleRAGSystem:
161
  logger.error(f"Failed to load distilgpt2: {e}")
162
  raise Exception("Could not load any generative model")
163
 
164
- # Set pad token for tokenizer
165
  if self.tokenizer.pad_token is None:
166
  self.tokenizer.pad_token = self.tokenizer.eos_token
167
  self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
@@ -175,12 +324,20 @@ class SimpleRAGSystem:
175
  raise
176
 
177
  def _load_or_create_index(self):
178
- """Load existing FAISS index or create new one"""
 
 
 
 
 
 
 
 
179
  faiss_path = os.path.join(self.vector_store_path, "faiss_index.bin")
180
  metadata_path = os.path.join(self.vector_store_path, "metadata.pkl")
181
 
182
  if os.path.exists(faiss_path) and os.path.exists(metadata_path):
183
- # Load existing index
184
  try:
185
  self.faiss_index = faiss.read_index(faiss_path)
186
  with open(metadata_path, "rb") as f:
@@ -188,7 +345,7 @@ class SimpleRAGSystem:
188
  self.documents = metadata.get("documents", [])
189
  self.chunks = metadata.get("chunks", [])
190
 
191
- # Rebuild BM25
192
  if self.chunks:
193
  texts = [chunk.text for chunk in self.chunks]
194
  tokenized_texts = [text.lower().split() for text in texts]
@@ -202,22 +359,25 @@ class SimpleRAGSystem:
202
  self._create_new_index()
203
 
204
  def _create_new_index(self):
205
- """Create new FAISS index"""
206
  vector_size = self.embedder.get_sentence_embedding_dimension()
207
- self.faiss_index = faiss.IndexFlatIP(
208
- vector_size
209
- ) # Inner product for cosine similarity
210
  self.bm25 = None
211
  logger.info(f"✅ Created new FAISS index with dimension {vector_size}")
212
 
213
  def _save_index(self):
214
- """Save FAISS index and metadata"""
 
 
 
 
215
  try:
216
  # Save FAISS index
217
  faiss_path = os.path.join(self.vector_store_path, "faiss_index.bin")
218
  faiss.write_index(self.faiss_index, faiss_path)
219
 
220
- # Save metadata
221
  metadata_path = os.path.join(self.vector_store_path, "metadata.pkl")
222
  metadata = {"documents": self.documents, "chunks": self.chunks}
223
  with open(metadata_path, "wb") as f:
@@ -229,11 +389,17 @@ class SimpleRAGSystem:
229
 
230
  def add_document(self, file_path: str, filename: str) -> bool:
231
  """
232
- Add a document to the RAG system
 
 
 
 
 
 
233
 
234
  Args:
235
  file_path: Path to the PDF file
236
- filename: Name of the file
237
 
238
  Returns:
239
  True if successful, False otherwise
@@ -241,13 +407,13 @@ class SimpleRAGSystem:
241
  try:
242
  from pdf_processor import SimplePDFProcessor
243
 
244
- # Process the document
245
  processor = SimplePDFProcessor()
246
  processed_doc = processor.process_document(file_path, self.chunk_sizes)
247
 
248
- # Thread-safe document addition
249
  with self._lock:
250
- # Add document to list
251
  self.documents.append(
252
  {
253
  "filename": filename,
@@ -257,15 +423,15 @@ class SimpleRAGSystem:
257
  }
258
  )
259
 
260
- # Add chunks
261
  for chunk in processed_doc.chunks:
262
  self.chunks.append(chunk)
263
 
264
- # Update embeddings and BM25
265
  self._update_embeddings()
266
  self._update_bm25()
267
 
268
- # Save index
269
  self._save_index()
270
 
271
  logger.info(
@@ -278,19 +444,31 @@ class SimpleRAGSystem:
278
  return False
279
 
280
  def _update_embeddings(self):
281
- """Update FAISS index with new embeddings"""
 
 
 
 
 
 
 
282
  if not self.chunks:
283
  return
284
 
285
- # Get embeddings for new chunks
286
  texts = [chunk.text for chunk in self.chunks]
287
  embeddings = self.embedder.encode(texts, show_progress_bar=False)
288
 
289
- # Add to FAISS index
290
  self.faiss_index.add(embeddings.astype("float32"))
291
 
292
  def _update_bm25(self):
293
- """Update BM25 index with new chunks"""
 
 
 
 
 
294
  if not self.chunks:
295
  return
296
 
@@ -303,28 +481,36 @@ class SimpleRAGSystem:
303
  self, query: str, method: str = "hybrid", top_k: int = 5
304
  ) -> List[SearchResult]:
305
  """
306
- Search for relevant documents
 
 
 
 
 
307
 
308
  Args:
309
- query: Search query
310
  method: Search method (hybrid, dense, sparse)
311
  top_k: Number of results to return
312
 
313
  Returns:
314
- List of search results
315
  """
316
  if not self.chunks:
317
  return []
318
 
319
  results = []
320
 
 
321
  if method == "dense" or method == "hybrid":
322
- # Dense search using FAISS
323
  query_embedding = self.embedder.encode([query])
 
324
  scores, indices = self.faiss_index.search(
325
  query_embedding.astype("float32"), min(top_k, len(self.chunks))
326
  )
327
 
 
328
  for score, idx in zip(scores[0], indices[0]):
329
  if idx < len(self.chunks):
330
  chunk = self.chunks[idx]
@@ -339,21 +525,23 @@ class SimpleRAGSystem:
339
  )
340
  )
341
 
 
342
  if method == "sparse" or method == "hybrid":
343
- # Sparse search using BM25
344
  if self.bm25:
 
345
  tokenized_query = query.lower().split()
346
  bm25_scores = self.bm25.get_scores(tokenized_query)
347
 
348
  # Get top BM25 results
349
  top_indices = np.argsort(bm25_scores)[::-1][:top_k]
350
 
 
351
  for idx in top_indices:
352
  if idx < len(self.chunks):
353
  chunk = self.chunks[idx]
354
  score = float(bm25_scores[idx])
355
 
356
- # Check if result already exists
357
  existing_result = next(
358
  (
359
  r
@@ -367,11 +555,12 @@ class SimpleRAGSystem:
367
  # Update existing result with sparse score
368
  existing_result.sparse_score = score
369
  if method == "hybrid":
370
- # Combine scores for hybrid
371
  existing_result.score = (
372
  existing_result.dense_score + score
373
  ) / 2
374
  else:
 
375
  results.append(
376
  SearchResult(
377
  text=chunk.text,
@@ -383,7 +572,7 @@ class SimpleRAGSystem:
383
  )
384
  )
385
 
386
- # Sort by score and return top_k
387
  results.sort(key=lambda x: x.score, reverse=True)
388
  return results[:top_k]
389
 
@@ -391,17 +580,23 @@ class SimpleRAGSystem:
391
  """
392
  Generate response using the language model
393
 
 
 
 
 
 
 
394
  Args:
395
- query: User query
396
- context: Retrieved context
397
 
398
  Returns:
399
- Generated response
400
  """
401
  try:
402
- # Prepare prompt
403
  if hasattr(self.tokenizer, "apply_chat_template"):
404
- # Use chat template for Qwen
405
  messages = [
406
  {
407
  "role": "system",
@@ -419,31 +614,32 @@ class SimpleRAGSystem:
419
  # Fallback for non-chat models
420
  prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
421
 
422
- # Tokenize
423
  tokenized = self.tokenizer(
424
  prompt,
425
  return_tensors="pt",
426
  truncation=True,
427
- max_length=1024,
428
  padding=True,
429
  return_attention_mask=True,
430
  )
431
 
432
- # Generate response
433
  with torch.no_grad():
434
  try:
435
  outputs = self.model.generate(
436
  tokenized.input_ids,
437
  attention_mask=tokenized.attention_mask,
438
- max_new_tokens=512,
439
  num_return_sequences=1,
440
- temperature=0.7,
441
- do_sample=True,
442
  pad_token_id=self.tokenizer.pad_token_id,
443
  eos_token_id=self.tokenizer.eos_token_id,
444
  )
445
  except RuntimeError as e:
446
  if "Half" in str(e):
 
447
  logger.warning(
448
  "Half precision not supported on CPU, converting to float32"
449
  )
@@ -462,16 +658,18 @@ class SimpleRAGSystem:
462
  else:
463
  raise e
464
 
465
- # Decode response
466
  response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
467
 
468
- # Extract only the generated part
469
  if hasattr(self.tokenizer, "apply_chat_template"):
 
470
  if "<|im_start|>assistant" in response:
471
  response = response.split("<|im_start|>assistant")[-1]
472
  if "<|im_end|>" in response:
473
  response = response.split("<|im_end|>")[0]
474
  else:
 
475
  response = response[len(prompt) :]
476
 
477
  return response.strip()
@@ -480,23 +678,66 @@ class SimpleRAGSystem:
480
  logger.error(f"Error generating response: {e}")
481
  return f"Error generating response: {str(e)}"
482
 
483
- def query(self, query: str, method: str = "hybrid", top_k: int = 5) -> RAGResponse:
 
 
 
 
 
 
484
  """
485
- Query the RAG system
 
 
 
 
 
 
 
 
486
 
487
  Args:
488
- query: User query
489
- method: Search method
490
- top_k: Number of results
 
491
 
492
  Returns:
493
- RAG response
494
  """
495
  start_time = time.time()
496
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
497
  # Search for relevant documents
498
  search_results = self.search(query, method, top_k)
499
 
 
500
  if not search_results:
501
  return RAGResponse(
502
  answer="I couldn't find any relevant information to answer your question.",
@@ -510,12 +751,42 @@ class SimpleRAGSystem:
510
  # Combine context from search results
511
  context = "\n\n".join([result.text for result in search_results])
512
 
513
- # Generate response
514
  answer = self.generate_response(query, context)
515
 
516
- # Calculate confidence (simple heuristic)
517
  confidence = np.mean([result.score for result in search_results])
518
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
519
  return RAGResponse(
520
  answer=answer,
521
  confidence=confidence,
@@ -526,7 +797,12 @@ class SimpleRAGSystem:
526
  )
527
 
528
  def get_stats(self) -> Dict:
529
- """Get system statistics"""
 
 
 
 
 
530
  return {
531
  "total_documents": len(self.documents),
532
  "total_chunks": len(self.chunks),
@@ -539,7 +815,14 @@ class SimpleRAGSystem:
539
  }
540
 
541
  def clear(self):
542
- """Clear all documents and reset the system"""
 
 
 
 
 
 
 
543
  self.documents = []
544
  self.chunks = []
545
  self._create_new_index()
 
1
  #!/usr/bin/env python3
2
  """
3
+ # Simplified RAG System for Hugging Face Spaces
4
 
5
+ This module provides a comprehensive Retrieval-Augmented Generation (RAG) system using:
6
+ - **FAISS** for efficient vector storage and similarity search
7
+ - **BM25** for sparse retrieval and keyword matching
8
+ - **Hybrid Search** combining both dense and sparse methods
9
+ - **Qwen 2.5 1.5B** for intelligent response generation
10
+ - **Thread Safety** for concurrent document loading
11
+
12
+ ## Architecture Overview
13
+
14
+ The RAG system follows a modular design with these key components:
15
+
16
+ 1. **Document Processing**: PDF extraction and intelligent chunking
17
+ 2. **Vector Storage**: FAISS index for high-dimensional embeddings
18
+ 3. **Sparse Retrieval**: BM25 for keyword-based search
19
+ 4. **Hybrid Search**: Combines dense and sparse methods for optimal results
20
+ 5. **Response Generation**: LLM-based answer synthesis with context
21
+ 6. **Thread Safety**: Concurrent document loading with proper locking
22
+
23
+ ## Key Features
24
+
25
+ - 🔍 **Multi-Method Search**: Hybrid, dense, and sparse retrieval options
26
+ - 📊 **Performance Metrics**: Confidence scores and response times
27
+ - 🔒 **Thread Safety**: Safe concurrent document loading
28
+ - 💾 **Persistence**: Automatic index saving and loading
29
+ - 🎯 **Smart Fallbacks**: Graceful model loading with alternatives
30
+ - 📈 **Scalable**: Efficient handling of large document collections
31
+
32
+ ## Usage Example
33
+
34
+ ```python
35
+ # Initialize the RAG system
36
+ rag = SimpleRAGSystem()
37
+
38
+ # Add documents
39
+ rag.add_document("document.pdf", "Document Name")
40
+
41
+ # Query the system
42
+ response = rag.query("What is the main topic?", method="hybrid", top_k=5)
43
+ print(response.answer)
44
+ ```
45
  """
46
 
47
  import os
 
55
  from loguru import logger
56
  import threading
57
 
58
+ # Import required libraries for AI/ML functionality
59
  from sentence_transformers import SentenceTransformer
60
  from rank_bm25 import BM25Okapi
61
  import faiss
62
  from transformers import AutoTokenizer, AutoModelForCausalLM
63
 
64
+ # Import guard rail system
65
+ from guard_rails import GuardRailSystem, GuardRailConfig, GuardRailResult
66
+
67
+ # Import HF Spaces configuration
68
+ try:
69
+ from hf_spaces_config import get_hf_config, is_hf_spaces
70
+
71
+ HF_SPACES_AVAILABLE = True
72
+ except ImportError:
73
+ HF_SPACES_AVAILABLE = False
74
+ logger.warning("HF Spaces configuration not available")
75
+
76
+
77
+ # =============================================================================
78
+ # DATA STRUCTURES
79
+ # =============================================================================
80
+
81
 
82
  @dataclass
83
  class DocumentChunk:
84
+ """
85
+ Represents a document chunk with metadata
86
+
87
+ Attributes:
88
+ text: The actual text content of the chunk
89
+ doc_id: Unique identifier for the source document
90
+ filename: Name of the source file
91
+ chunk_id: Unique identifier for this specific chunk
92
+ chunk_size: Target size used for chunking
93
+ """
94
 
95
  text: str
96
  doc_id: str
 
101
 
102
  @dataclass
103
  class SearchResult:
104
+ """
105
+ Represents a search result with scoring information
106
+
107
+ Attributes:
108
+ text: The retrieved text content
109
+ score: Combined relevance score
110
+ doc_id: Source document identifier
111
+ filename: Source file name
112
+ search_method: Method used for retrieval (dense/sparse/hybrid)
113
+ dense_score: Vector similarity score (if applicable)
114
+ sparse_score: Keyword matching score (if applicable)
115
+ """
116
 
117
  text: str
118
  score: float
 
125
 
126
  @dataclass
127
  class RAGResponse:
128
+ """
129
+ Represents a complete RAG system response
130
+
131
+ Attributes:
132
+ answer: Generated answer text
133
+ confidence: Confidence score for the response
134
+ search_results: List of retrieved documents
135
+ method_used: Search method that was used
136
+ response_time: Time taken to generate response
137
+ query: Original user query
138
+ """
139
 
140
  answer: str
141
  confidence: float
 
145
  query: str
146
 
147
 
148
+ # =============================================================================
149
+ # MAIN RAG SYSTEM CLASS
150
+ # =============================================================================
151
+
152
+
153
  class SimpleRAGSystem:
154
+ """
155
+ Simplified RAG system for Hugging Face Spaces
156
+
157
+ This class provides a complete RAG implementation with:
158
+ - Document ingestion and processing
159
+ - Vector and sparse search capabilities
160
+ - Response generation using language models
161
+ - Thread-safe concurrent operations
162
+ - Persistent storage and retrieval
163
+ """
164
 
165
  def __init__(
166
  self,
 
168
  generative_model: str = "Qwen/Qwen2.5-1.5B-Instruct",
169
  chunk_sizes: List[int] = None,
170
  vector_store_path: str = "./vector_store",
171
+ enable_guard_rails: bool = True,
172
+ guard_rail_config: GuardRailConfig = None,
173
  ):
174
  """
175
+ Initialize the RAG system with specified models and configuration
176
 
177
  Args:
178
  embedding_model: Sentence transformer model for embeddings
179
+ generative_model: Language model for response generation
180
+ chunk_sizes: List of chunk sizes for document processing
181
+ vector_store_path: Path for storing FAISS index and metadata
182
+ enable_guard_rails: Whether to enable guard rail system
183
+ guard_rail_config: Configuration for guard rail system
184
  """
185
  self.embedding_model = embedding_model
186
  self.generative_model = generative_model
187
+ self.chunk_sizes = chunk_sizes or [100, 400] # Default chunk sizes
188
  self.vector_store_path = vector_store_path
189
+ self.enable_guard_rails = enable_guard_rails
190
+
191
+ # Initialize core components
192
+ self.embedder = None # Sentence transformer for embeddings
193
+ self.tokenizer = None # Tokenizer for language model
194
+ self.model = None # Language model for generation
195
+ self.faiss_index = None # FAISS index for vector search
196
+ self.bm25 = None # BM25 for sparse search
197
+ self.documents = [] # List of processed documents
198
+ self.chunks = [] # List of document chunks
199
  self._lock = threading.Lock() # Thread safety for concurrent loading
200
 
201
+ # Initialize guard rail system
202
+ if self.enable_guard_rails:
203
+ self.guard_rails = GuardRailSystem(guard_rail_config)
204
+ logger.info("Guard rail system enabled")
205
+ else:
206
+ self.guard_rails = None
207
+ logger.info("Guard rail system disabled")
208
+
209
+ # Create vector store directory for persistence
210
  os.makedirs(vector_store_path, exist_ok=True)
211
 
212
+ # Set up HF Spaces configuration if available
213
+ if HF_SPACES_AVAILABLE:
214
+ try:
215
+ hf_config = get_hf_config()
216
+ if is_hf_spaces():
217
+ logger.info(
218
+ "🌐 HF Spaces environment detected - using optimized configuration"
219
+ )
220
+ # Cache directories are automatically set up by hf_config
221
+ else:
222
+ logger.info("💻 Local development environment detected")
223
+ except Exception as e:
224
+ logger.warning(f"HF Spaces configuration failed: {e}")
225
+
226
+ # Load or initialize system components
227
  self._load_models()
228
  self._load_or_create_index()
229
 
230
  logger.info("Simple RAG system initialized successfully!")
231
 
232
  def _load_models(self):
233
+ """
234
+ Load embedding and generative models with fallback handling
235
+
236
+ This method:
237
+ 1. Loads the sentence transformer for embeddings
238
+ 2. Attempts to load the primary language model (Qwen)
239
+ 3. Falls back to distilgpt2 if primary model fails
240
+ 4. Configures tokenizers and model settings
241
+ """
242
  try:
243
+ # Get cache directory from HF Spaces config if available
244
+ cache_dir = None
245
+ if HF_SPACES_AVAILABLE:
246
+ try:
247
+ hf_config = get_hf_config()
248
+ cache_dir = hf_config.cache_dirs.get("transformers_cache")
249
+ if cache_dir:
250
+ logger.info(f"Using HF Spaces cache directory: {cache_dir}")
251
+ except Exception as e:
252
+ logger.warning(f"Could not get HF Spaces cache directory: {e}")
253
+
254
+ # Load embedding model for document vectorization
255
+ if cache_dir:
256
+ self.embedder = SentenceTransformer(
257
+ self.embedding_model, cache_folder=cache_dir
258
+ )
259
+ else:
260
+ self.embedder = SentenceTransformer(self.embedding_model)
261
  self.vector_size = self.embedder.get_sentence_embedding_dimension()
262
 
263
+ # Load generative model with fallback strategy
264
  model_loaded = False
265
 
266
+ # Try loading Qwen model first (primary choice)
267
  try:
268
  self.tokenizer = AutoTokenizer.from_pretrained(
269
  self.generative_model,
270
  trust_remote_code=True,
271
+ padding_side="left", # Important for generation
272
+ cache_dir=cache_dir,
273
  )
274
 
275
+ # Load model with explicit CPU configuration for deployment compatibility
276
  self.model = AutoModelForCausalLM.from_pretrained(
277
  self.generative_model,
278
  trust_remote_code=True,
279
+ torch_dtype=torch.float32, # Use float32 for CPU compatibility
280
+ device_map=None, # Let PyTorch handle device placement
281
+ low_cpu_mem_usage=False, # Disable for better compatibility
282
+ cache_dir=cache_dir,
283
  )
284
 
285
+ # Move to CPU explicitly for deployment environments
286
  self.model = self.model.to("cpu")
287
  model_loaded = True
288
 
 
310
  logger.error(f"Failed to load distilgpt2: {e}")
311
  raise Exception("Could not load any generative model")
312
 
313
+ # Configure tokenizer settings for generation
314
  if self.tokenizer.pad_token is None:
315
  self.tokenizer.pad_token = self.tokenizer.eos_token
316
  self.tokenizer.pad_token_id = self.tokenizer.eos_token_id
 
324
  raise
325
 
326
  def _load_or_create_index(self):
327
+ """
328
+ Load existing FAISS index or create a new one
329
+
330
+ This method:
331
+ 1. Checks for existing index files
332
+ 2. Loads existing index and metadata if available
333
+ 3. Creates new index if none exists
334
+ 4. Rebuilds BM25 index from loaded chunks
335
+ """
336
  faiss_path = os.path.join(self.vector_store_path, "faiss_index.bin")
337
  metadata_path = os.path.join(self.vector_store_path, "metadata.pkl")
338
 
339
  if os.path.exists(faiss_path) and os.path.exists(metadata_path):
340
+ # Load existing index and metadata
341
  try:
342
  self.faiss_index = faiss.read_index(faiss_path)
343
  with open(metadata_path, "rb") as f:
 
345
  self.documents = metadata.get("documents", [])
346
  self.chunks = metadata.get("chunks", [])
347
 
348
+ # Rebuild BM25 index from loaded chunks
349
  if self.chunks:
350
  texts = [chunk.text for chunk in self.chunks]
351
  tokenized_texts = [text.lower().split() for text in texts]
 
359
  self._create_new_index()
360
 
361
  def _create_new_index(self):
362
+ """Create new FAISS index with appropriate configuration"""
363
  vector_size = self.embedder.get_sentence_embedding_dimension()
364
+ # Use Inner Product for cosine similarity (normalized vectors)
365
+ self.faiss_index = faiss.IndexFlatIP(vector_size)
 
366
  self.bm25 = None
367
  logger.info(f"✅ Created new FAISS index with dimension {vector_size}")
368
 
369
  def _save_index(self):
370
+ """
371
+ Save FAISS index and metadata for persistence
372
+
373
+ This ensures that the system state is preserved across restarts.
374
+ """
375
  try:
376
  # Save FAISS index
377
  faiss_path = os.path.join(self.vector_store_path, "faiss_index.bin")
378
  faiss.write_index(self.faiss_index, faiss_path)
379
 
380
+ # Save metadata including documents and chunks
381
  metadata_path = os.path.join(self.vector_store_path, "metadata.pkl")
382
  metadata = {"documents": self.documents, "chunks": self.chunks}
383
  with open(metadata_path, "wb") as f:
 
389
 
390
  def add_document(self, file_path: str, filename: str) -> bool:
391
  """
392
+ Add a document to the RAG system with thread safety
393
+
394
+ This method:
395
+ 1. Processes the PDF document into chunks
396
+ 2. Adds document metadata to the system
397
+ 3. Updates embeddings and BM25 index
398
+ 4. Saves the updated index
399
 
400
  Args:
401
  file_path: Path to the PDF file
402
+ filename: Name of the file for reference
403
 
404
  Returns:
405
  True if successful, False otherwise
 
407
  try:
408
  from pdf_processor import SimplePDFProcessor
409
 
410
+ # Process the document using the PDF processor
411
  processor = SimplePDFProcessor()
412
  processed_doc = processor.process_document(file_path, self.chunk_sizes)
413
 
414
+ # Thread-safe document addition using lock
415
  with self._lock:
416
+ # Add document metadata to the system
417
  self.documents.append(
418
  {
419
  "filename": filename,
 
423
  }
424
  )
425
 
426
+ # Add all chunks from the processed document
427
  for chunk in processed_doc.chunks:
428
  self.chunks.append(chunk)
429
 
430
+ # Update search indices with new content
431
  self._update_embeddings()
432
  self._update_bm25()
433
 
434
+ # Persist the updated index
435
  self._save_index()
436
 
437
  logger.info(
 
444
  return False
445
 
446
  def _update_embeddings(self):
447
+ """
448
+ Update FAISS index with new embeddings
449
+
450
+ This method:
451
+ 1. Extracts text from all chunks
452
+ 2. Generates embeddings using the sentence transformer
453
+ 3. Adds embeddings to the FAISS index
454
+ """
455
  if not self.chunks:
456
  return
457
 
458
+ # Generate embeddings for all chunks
459
  texts = [chunk.text for chunk in self.chunks]
460
  embeddings = self.embedder.encode(texts, show_progress_bar=False)
461
 
462
+ # Add embeddings to FAISS index
463
  self.faiss_index.add(embeddings.astype("float32"))
464
 
465
  def _update_bm25(self):
466
+ """
467
+ Update BM25 index with new chunks
468
+
469
+ This method rebuilds the BM25 index with all current chunks
470
+ for keyword-based search functionality.
471
+ """
472
  if not self.chunks:
473
  return
474
 
 
481
  self, query: str, method: str = "hybrid", top_k: int = 5
482
  ) -> List[SearchResult]:
483
  """
484
+ Search for relevant documents using specified method
485
+
486
+ This method supports three search strategies:
487
+ - **dense**: Vector similarity search using FAISS
488
+ - **sparse**: Keyword matching using BM25
489
+ - **hybrid**: Combines both methods for optimal results
490
 
491
  Args:
492
+ query: Search query string
493
  method: Search method (hybrid, dense, sparse)
494
  top_k: Number of results to return
495
 
496
  Returns:
497
+ List of search results with scores and metadata
498
  """
499
  if not self.chunks:
500
  return []
501
 
502
  results = []
503
 
504
+ # Perform dense search (vector similarity)
505
  if method == "dense" or method == "hybrid":
506
+ # Generate query embedding
507
  query_embedding = self.embedder.encode([query])
508
+ # Search FAISS index
509
  scores, indices = self.faiss_index.search(
510
  query_embedding.astype("float32"), min(top_k, len(self.chunks))
511
  )
512
 
513
+ # Process dense search results
514
  for score, idx in zip(scores[0], indices[0]):
515
  if idx < len(self.chunks):
516
  chunk = self.chunks[idx]
 
525
  )
526
  )
527
 
528
+ # Perform sparse search (keyword matching)
529
  if method == "sparse" or method == "hybrid":
 
530
  if self.bm25:
531
+ # Tokenize query for BM25
532
  tokenized_query = query.lower().split()
533
  bm25_scores = self.bm25.get_scores(tokenized_query)
534
 
535
  # Get top BM25 results
536
  top_indices = np.argsort(bm25_scores)[::-1][:top_k]
537
 
538
+ # Process sparse search results
539
  for idx in top_indices:
540
  if idx < len(self.chunks):
541
  chunk = self.chunks[idx]
542
  score = float(bm25_scores[idx])
543
 
544
+ # Check if result already exists (for hybrid search)
545
  existing_result = next(
546
  (
547
  r
 
555
  # Update existing result with sparse score
556
  existing_result.sparse_score = score
557
  if method == "hybrid":
558
+ # Combine scores for hybrid search
559
  existing_result.score = (
560
  existing_result.dense_score + score
561
  ) / 2
562
  else:
563
+ # Add new sparse result
564
  results.append(
565
  SearchResult(
566
  text=chunk.text,
 
572
  )
573
  )
574
 
575
+ # Sort by score and return top_k results
576
  results.sort(key=lambda x: x.score, reverse=True)
577
  return results[:top_k]
578
 
 
580
  """
581
  Generate response using the language model
582
 
583
+ This method:
584
+ 1. Prepares a prompt with context and query
585
+ 2. Uses the appropriate chat template for the model
586
+ 3. Generates a response with controlled parameters
587
+ 4. Handles model-specific response formatting
588
+
589
  Args:
590
+ query: User's question
591
+ context: Retrieved context from search
592
 
593
  Returns:
594
+ Generated response text
595
  """
596
  try:
597
+ # Prepare prompt based on model capabilities
598
  if hasattr(self.tokenizer, "apply_chat_template"):
599
+ # Use chat template for modern models like Qwen
600
  messages = [
601
  {
602
  "role": "system",
 
614
  # Fallback for non-chat models
615
  prompt = f"Context: {context}\n\nQuestion: {query}\n\nAnswer:"
616
 
617
+ # Tokenize input with appropriate settings
618
  tokenized = self.tokenizer(
619
  prompt,
620
  return_tensors="pt",
621
  truncation=True,
622
+ max_length=1024, # Limit input length
623
  padding=True,
624
  return_attention_mask=True,
625
  )
626
 
627
+ # Generate response with controlled parameters
628
  with torch.no_grad():
629
  try:
630
  outputs = self.model.generate(
631
  tokenized.input_ids,
632
  attention_mask=tokenized.attention_mask,
633
+ max_new_tokens=512, # Limit response length
634
  num_return_sequences=1,
635
+ temperature=0.7, # Balance creativity and coherence
636
+ do_sample=True, # Enable sampling for more natural responses
637
  pad_token_id=self.tokenizer.pad_token_id,
638
  eos_token_id=self.tokenizer.eos_token_id,
639
  )
640
  except RuntimeError as e:
641
  if "Half" in str(e):
642
+ # Handle half-precision compatibility issues
643
  logger.warning(
644
  "Half precision not supported on CPU, converting to float32"
645
  )
 
658
  else:
659
  raise e
660
 
661
+ # Decode the generated response
662
  response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
663
 
664
+ # Extract only the generated part (remove input prompt)
665
  if hasattr(self.tokenizer, "apply_chat_template"):
666
+ # Handle chat model response formatting
667
  if "<|im_start|>assistant" in response:
668
  response = response.split("<|im_start|>assistant")[-1]
669
  if "<|im_end|>" in response:
670
  response = response.split("<|im_end|>")[0]
671
  else:
672
+ # Handle standard model response formatting
673
  response = response[len(prompt) :]
674
 
675
  return response.strip()
 
678
  logger.error(f"Error generating response: {e}")
679
  return f"Error generating response: {str(e)}"
680
 
681
+ def query(
682
+ self,
683
+ query: str,
684
+ method: str = "hybrid",
685
+ top_k: int = 5,
686
+ user_id: str = "anonymous",
687
+ ) -> RAGResponse:
688
  """
689
+ Complete RAG query pipeline with guard rail protection
690
+
691
+ This method orchestrates the entire RAG process with safety checks:
692
+ 1. Validates input using guard rails
693
+ 2. Searches for relevant documents
694
+ 3. Combines context from search results
695
+ 4. Generates a response using the language model
696
+ 5. Validates output using guard rails
697
+ 6. Calculates confidence and timing metrics
698
 
699
  Args:
700
+ query: User's question
701
+ method: Search method to use
702
+ top_k: Number of search results to use
703
+ user_id: User identifier for rate limiting and tracking
704
 
705
  Returns:
706
+ Complete RAG response with answer, metadata, and metrics
707
  """
708
  start_time = time.time()
709
 
710
+ # =============================================================================
711
+ # INPUT VALIDATION WITH GUARD RAILS
712
+ # =============================================================================
713
+
714
+ if self.enable_guard_rails and self.guard_rails:
715
+ # Validate input using guard rails
716
+ input_validation = self.guard_rails.validate_input(query, user_id)
717
+ if not input_validation.passed:
718
+ logger.warning(f"Input validation failed: {input_validation.reason}")
719
+ if input_validation.blocked:
720
+ return RAGResponse(
721
+ answer=f"I cannot process this request: {input_validation.reason}",
722
+ confidence=0.0,
723
+ search_results=[],
724
+ method_used=method,
725
+ response_time=time.time() - start_time,
726
+ query=query,
727
+ )
728
+ else:
729
+ # Warning but continue processing
730
+ logger.warning(
731
+ f"Input validation warning: {input_validation.reason}"
732
+ )
733
+
734
+ # Sanitize input
735
+ query = self.guard_rails.sanitize_input(query)
736
+
737
  # Search for relevant documents
738
  search_results = self.search(query, method, top_k)
739
 
740
+ # Handle case where no relevant documents found
741
  if not search_results:
742
  return RAGResponse(
743
  answer="I couldn't find any relevant information to answer your question.",
 
751
  # Combine context from search results
752
  context = "\n\n".join([result.text for result in search_results])
753
 
754
+ # Generate response using the language model
755
  answer = self.generate_response(query, context)
756
 
757
+ # Calculate confidence based on search result scores
758
  confidence = np.mean([result.score for result in search_results])
759
 
760
+ # =============================================================================
761
+ # OUTPUT VALIDATION WITH GUARD RAILS
762
+ # =============================================================================
763
+
764
+ if self.enable_guard_rails and self.guard_rails:
765
+ # Validate output using guard rails
766
+ output_validation = self.guard_rails.validate_output(
767
+ answer, confidence, context
768
+ )
769
+ if not output_validation.passed:
770
+ logger.warning(f"Output validation failed: {output_validation.reason}")
771
+ if output_validation.blocked:
772
+ return RAGResponse(
773
+ answer="I cannot provide this response due to safety concerns.",
774
+ confidence=0.0,
775
+ search_results=search_results,
776
+ method_used=method,
777
+ response_time=time.time() - start_time,
778
+ query=query,
779
+ )
780
+ else:
781
+ # Warning but continue with response
782
+ logger.warning(
783
+ f"Output validation warning: {output_validation.reason}"
784
+ )
785
+
786
+ # Sanitize output
787
+ answer = self.guard_rails.sanitize_output(answer)
788
+
789
+ # Create and return complete response
790
  return RAGResponse(
791
  answer=answer,
792
  confidence=confidence,
 
797
  )
798
 
799
  def get_stats(self) -> Dict:
800
+ """
801
+ Get system statistics and configuration information
802
+
803
+ Returns:
804
+ Dictionary containing system metrics and settings
805
+ """
806
  return {
807
  "total_documents": len(self.documents),
808
  "total_chunks": len(self.chunks),
 
815
  }
816
 
817
  def clear(self):
818
+ """
819
+ Clear all documents and reset the system
820
+
821
+ This method:
822
+ 1. Clears all documents and chunks
823
+ 2. Creates a new FAISS index
824
+ 3. Saves the empty state
825
+ """
826
  self.documents = []
827
  self.chunks = []
828
  self._create_new_index()
requirements.txt CHANGED
@@ -1,15 +1,100 @@
1
- # Core dependencies for Docker deployment
 
 
 
 
 
 
 
 
 
 
 
2
  streamlit==1.28.1
 
 
 
 
 
 
 
3
  torch==2.1.0
 
 
 
4
  transformers>=4.36.0
 
 
 
 
 
 
 
5
  sentence-transformers==2.2.2
 
 
 
6
  faiss-cpu==1.7.4
 
 
 
 
 
 
 
7
  scikit-learn==1.3.2
 
 
 
8
  rank-bm25==0.2.2
 
 
 
 
 
 
 
9
  pypdf==3.17.1
 
 
 
 
 
 
 
10
  pandas==2.1.3
 
 
 
11
  numpy==1.24.3
 
 
 
 
 
 
 
12
  loguru==0.7.2
 
 
 
13
  tqdm==4.66.1
 
 
 
 
 
 
 
14
  accelerate==0.24.1
 
 
 
15
  huggingface-hub==0.19.4
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # RAG System Dependencies for Hugging Face Spaces Deployment
3
+ # =============================================================================
4
+ # This file contains all the Python packages required for the RAG system
5
+ # to function properly in a Docker container environment.
6
+
7
+ # =============================================================================
8
+ # CORE WEB FRAMEWORK
9
+ # =============================================================================
10
+
11
+ # Streamlit - Modern web framework for data applications
12
+ # Provides the interactive web interface for the RAG system
13
  streamlit==1.28.1
14
+
15
+ # =============================================================================
16
+ # DEEP LEARNING & AI FRAMEWORKS
17
+ # =============================================================================
18
+
19
+ # PyTorch - Deep learning framework for model inference
20
+ # Required for running the language models (Qwen, distilgpt2)
21
  torch==2.1.0
22
+
23
+ # Transformers - Hugging Face library for pre-trained models
24
+ # Provides access to language models and tokenizers
25
  transformers>=4.36.0
26
+
27
+ # =============================================================================
28
+ # EMBEDDING & VECTOR SEARCH
29
+ # =============================================================================
30
+
31
+ # Sentence Transformers - Library for sentence embeddings
32
+ # Used for converting text to vector representations
33
  sentence-transformers==2.2.2
34
+
35
+ # FAISS CPU - Facebook AI Similarity Search for vector indexing
36
+ # Provides efficient similarity search for document retrieval
37
  faiss-cpu==1.7.4
38
+
39
+ # =============================================================================
40
+ # MACHINE LEARNING & DATA PROCESSING
41
+ # =============================================================================
42
+
43
+ # Scikit-learn - Machine learning utilities
44
+ # Used for data preprocessing and BM25 implementation
45
  scikit-learn==1.3.2
46
+
47
+ # Rank BM25 - Implementation of BM25 ranking algorithm
48
+ # Provides keyword-based sparse retrieval functionality
49
  rank-bm25==0.2.2
50
+
51
+ # =============================================================================
52
+ # DOCUMENT PROCESSING
53
+ # =============================================================================
54
+
55
+ # PyPDF - Modern PDF processing library
56
+ # Used for extracting text and metadata from PDF documents
57
  pypdf==3.17.1
58
+
59
+ # =============================================================================
60
+ # DATA MANIPULATION & ANALYSIS
61
+ # =============================================================================
62
+
63
+ # Pandas - Data manipulation and analysis library
64
+ # Used for data structure management and processing
65
  pandas==2.1.3
66
+
67
+ # NumPy - Numerical computing library
68
+ # Provides mathematical operations and array handling
69
  numpy==1.24.3
70
+
71
+ # =============================================================================
72
+ # UTILITIES & LOGGING
73
+ # =============================================================================
74
+
75
+ # Loguru - Advanced logging library
76
+ # Provides structured logging with better formatting and features
77
  loguru==0.7.2
78
+
79
+ # TQDM - Progress bar library
80
+ # Shows progress for long-running operations
81
  tqdm==4.66.1
82
+
83
+ # =============================================================================
84
+ # MODEL OPTIMIZATION & DEPLOYMENT
85
+ # =============================================================================
86
+
87
+ # Accelerate - Hugging Face library for model optimization
88
+ # Helps with model loading and inference optimization
89
  accelerate==0.24.1
90
+
91
+ # Hugging Face Hub - Library for accessing Hugging Face models
92
+ # Provides utilities for downloading and managing models
93
  huggingface-hub==0.19.4
94
+
95
+ # =============================================================================
96
+ # GUARD RAIL DEPENDENCIES
97
+ # =============================================================================
98
+
99
+ # Additional libraries for enhanced security and validation
100
+ # These are optional but recommended for production deployments
test_deployment.py CHANGED
@@ -1,8 +1,40 @@
1
  #!/usr/bin/env python3
2
  """
3
- Test script for Hugging Face deployment
4
 
5
- This script tests if all components are working correctly for deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """
7
 
8
  import os
@@ -12,9 +44,24 @@ from pathlib import Path
12
 
13
 
14
  def test_imports():
15
- """Test if all required packages can be imported"""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  print("🔍 Testing imports...")
17
 
 
18
  try:
19
  import streamlit
20
 
@@ -23,6 +70,7 @@ def test_imports():
23
  print(f"❌ Streamlit import failed: {e}")
24
  return False
25
 
 
26
  try:
27
  import torch
28
 
@@ -31,6 +79,7 @@ def test_imports():
31
  print(f"❌ PyTorch import failed: {e}")
32
  return False
33
 
 
34
  try:
35
  import transformers
36
 
@@ -39,6 +88,7 @@ def test_imports():
39
  print(f"❌ Transformers import failed: {e}")
40
  return False
41
 
 
42
  try:
43
  import sentence_transformers
44
 
@@ -47,6 +97,7 @@ def test_imports():
47
  print(f"❌ Sentence Transformers import failed: {e}")
48
  return False
49
 
 
50
  try:
51
  import faiss
52
 
@@ -55,6 +106,7 @@ def test_imports():
55
  print(f"❌ FAISS import failed: {e}")
56
  return False
57
 
 
58
  try:
59
  import rank_bm25
60
 
@@ -63,6 +115,7 @@ def test_imports():
63
  print(f"❌ Rank BM25 import failed: {e}")
64
  return False
65
 
 
66
  try:
67
  import pypdf
68
 
@@ -75,17 +128,27 @@ def test_imports():
75
 
76
 
77
  def test_rag_system():
78
- """Test the RAG system"""
 
 
 
 
 
 
 
 
 
 
79
  print("\n🔍 Testing RAG system...")
80
 
81
  try:
82
  from rag_system import SimpleRAGSystem
83
 
84
- # Test initialization
85
  rag = SimpleRAGSystem()
86
  print("✅ RAG system initialized")
87
 
88
- # Test stats
89
  stats = rag.get_stats()
90
  print(f"✅ Stats retrieved: {stats}")
91
 
@@ -97,17 +160,27 @@ def test_rag_system():
97
 
98
 
99
  def test_pdf_processor():
100
- """Test the PDF processor"""
 
 
 
 
 
 
 
 
 
 
101
  print("\n🔍 Testing PDF processor...")
102
 
103
  try:
104
  from pdf_processor import SimplePDFProcessor
105
 
106
- # Test initialization
107
  processor = SimplePDFProcessor()
108
  print("✅ PDF processor initialized")
109
 
110
- # Test query preprocessing
111
  processed_query = processor.preprocess_query("What is the revenue?")
112
  print(f"✅ Query preprocessing: '{processed_query}'")
113
 
@@ -119,24 +192,35 @@ def test_pdf_processor():
119
 
120
 
121
  def test_model_loading():
122
- """Test if models can be loaded"""
 
 
 
 
 
 
 
 
 
 
 
123
  print("\n🔍 Testing model loading...")
124
 
125
  try:
126
  from sentence_transformers import SentenceTransformer
127
  from transformers import AutoTokenizer, AutoModelForCausalLM
128
 
129
- # Test embedding model
130
  embedder = SentenceTransformer("all-MiniLM-L6-v2")
131
  print("✅ Embedding model loaded")
132
 
133
- # Test tokenizer
134
  tokenizer = AutoTokenizer.from_pretrained(
135
  "Qwen/Qwen2.5-1.5B-Instruct", trust_remote_code=True
136
  )
137
  print("✅ Tokenizer loaded")
138
 
139
- # Test model (CPU only for testing)
140
  model = AutoModelForCausalLM.from_pretrained(
141
  "Qwen/Qwen2.5-1.5B-Instruct",
142
  trust_remote_code=True,
@@ -153,7 +237,17 @@ def test_model_loading():
153
 
154
 
155
  def test_streamlit_app():
156
- """Test if Streamlit app can be imported"""
 
 
 
 
 
 
 
 
 
 
157
  print("\n🔍 Testing Streamlit app...")
158
 
159
  try:
@@ -161,7 +255,6 @@ def test_streamlit_app():
161
  import app
162
 
163
  print("✅ Streamlit app imported successfully")
164
-
165
  return True
166
 
167
  except Exception as e:
@@ -170,15 +263,26 @@ def test_streamlit_app():
170
 
171
 
172
  def test_file_structure():
173
- """Test if all required files exist"""
 
 
 
 
 
 
 
 
 
 
174
  print("\n🔍 Testing file structure...")
175
 
 
176
  required_files = [
177
- "app.py",
178
- "rag_system.py",
179
- "pdf_processor.py",
180
- "requirements.txt",
181
- "README.md",
182
  ]
183
 
184
  missing_files = []
@@ -197,22 +301,32 @@ def test_file_structure():
197
 
198
 
199
  def test_requirements():
200
- """Test if requirements.txt is valid"""
 
 
 
 
 
 
 
 
 
 
201
  print("\n🔍 Testing requirements.txt...")
202
 
203
  try:
204
  with open("requirements.txt", "r") as f:
205
  requirements = f.read()
206
 
207
- # Check for essential packages
208
  essential_packages = [
209
- "streamlit",
210
- "torch",
211
- "transformers",
212
- "sentence-transformers",
213
- "faiss-cpu",
214
- "rank-bm25",
215
- "pypdf",
216
  ]
217
 
218
  missing_packages = []
@@ -235,9 +349,20 @@ def test_requirements():
235
 
236
 
237
  def main():
238
- """Run all tests"""
 
 
 
 
 
 
 
 
 
 
239
  print("🚀 Hugging Face Deployment Test\n")
240
 
 
241
  tests = [
242
  ("File Structure", test_file_structure),
243
  ("Requirements", test_requirements),
@@ -248,6 +373,7 @@ def main():
248
  ("Streamlit App", test_streamlit_app),
249
  ]
250
 
 
251
  results = []
252
  for test_name, test_func in tests:
253
  try:
@@ -257,7 +383,11 @@ def main():
257
  print(f"❌ {test_name} test failed with exception: {e}")
258
  results.append((test_name, False))
259
 
260
- # Summary
 
 
 
 
261
  print("\n" + "=" * 50)
262
  print("📊 Test Results Summary")
263
  print("=" * 50)
@@ -265,20 +395,26 @@ def main():
265
  passed = 0
266
  total = len(results)
267
 
 
268
  for test_name, result in results:
269
  status = "✅ PASS" if result else "❌ FAIL"
270
  print(f"{test_name:20} {status}")
271
  if result:
272
  passed += 1
273
 
 
274
  print(f"\nOverall: {passed}/{total} tests passed")
275
 
 
 
 
 
276
  if passed == total:
277
  print("🎉 All tests passed! Ready for Hugging Face deployment.")
278
  print("\nNext steps:")
279
  print("1. Create a new Hugging Face Space")
280
  print("2. Upload all files from this directory")
281
- print("3. Set the SDK to 'Streamlit'")
282
  print("4. Deploy and test your RAG system!")
283
  else:
284
  print("⚠️ Some tests failed. Please fix the issues before deployment.")
@@ -289,5 +425,9 @@ def main():
289
  print("4. Test locally first: streamlit run app.py")
290
 
291
 
 
 
 
 
292
  if __name__ == "__main__":
293
  main()
 
1
  #!/usr/bin/env python3
2
  """
3
+ # Test Script for Hugging Face Deployment
4
 
5
+ This script provides comprehensive testing for the RAG system deployment on Hugging Face Spaces.
6
+
7
+ ## Overview
8
+
9
+ The test script validates all components required for successful deployment:
10
+ - Package imports and dependencies
11
+ - Model loading capabilities
12
+ - RAG system functionality
13
+ - PDF processing components
14
+ - Streamlit application integration
15
+
16
+ ## Test Categories
17
+
18
+ 1. **Import Tests**: Verify all required packages can be imported
19
+ 2. **Model Tests**: Check if AI models can be loaded successfully
20
+ 3. **Component Tests**: Validate RAG system and PDF processor functionality
21
+ 4. **Integration Tests**: Ensure Streamlit app can be imported
22
+ 5. **File Structure Tests**: Confirm all required files are present
23
+ 6. **Requirements Tests**: Validate dependencies are properly specified
24
+
25
+ ## Usage
26
+
27
+ Run the script to check deployment readiness:
28
+ ```bash
29
+ python test_deployment.py
30
+ ```
31
+
32
+ ## Expected Output
33
+
34
+ The script provides detailed feedback on each test:
35
+ - ✅ PASS: Component is ready for deployment
36
+ - ❌ FAIL: Component needs attention before deployment
37
+ - ⚠️ WARNING: Optional component missing but not critical
38
  """
39
 
40
  import os
 
44
 
45
 
46
  def test_imports():
47
+ """
48
+ Test if all required packages can be imported successfully
49
+
50
+ This function checks that all essential dependencies are available:
51
+ - Streamlit for the web interface
52
+ - PyTorch for deep learning models
53
+ - Transformers for language models
54
+ - Sentence Transformers for embeddings
55
+ - FAISS for vector search
56
+ - Rank BM25 for sparse retrieval
57
+ - PyPDF for document processing
58
+
59
+ Returns:
60
+ bool: True if all imports succeed, False otherwise
61
+ """
62
  print("🔍 Testing imports...")
63
 
64
+ # Test Streamlit import (core web framework)
65
  try:
66
  import streamlit
67
 
 
70
  print(f"❌ Streamlit import failed: {e}")
71
  return False
72
 
73
+ # Test PyTorch import (deep learning framework)
74
  try:
75
  import torch
76
 
 
79
  print(f"❌ PyTorch import failed: {e}")
80
  return False
81
 
82
+ # Test Transformers import (Hugging Face models)
83
  try:
84
  import transformers
85
 
 
88
  print(f"❌ Transformers import failed: {e}")
89
  return False
90
 
91
+ # Test Sentence Transformers import (embeddings)
92
  try:
93
  import sentence_transformers
94
 
 
97
  print(f"❌ Sentence Transformers import failed: {e}")
98
  return False
99
 
100
+ # Test FAISS import (vector search)
101
  try:
102
  import faiss
103
 
 
106
  print(f"❌ FAISS import failed: {e}")
107
  return False
108
 
109
+ # Test Rank BM25 import (sparse retrieval)
110
  try:
111
  import rank_bm25
112
 
 
115
  print(f"❌ Rank BM25 import failed: {e}")
116
  return False
117
 
118
+ # Test PyPDF import (PDF processing)
119
  try:
120
  import pypdf
121
 
 
128
 
129
 
130
  def test_rag_system():
131
+ """
132
+ Test the RAG system initialization and basic functionality
133
+
134
+ This function validates:
135
+ - RAG system can be instantiated
136
+ - System statistics can be retrieved
137
+ - Basic system configuration is working
138
+
139
+ Returns:
140
+ bool: True if RAG system tests pass, False otherwise
141
+ """
142
  print("\n🔍 Testing RAG system...")
143
 
144
  try:
145
  from rag_system import SimpleRAGSystem
146
 
147
+ # Test RAG system initialization
148
  rag = SimpleRAGSystem()
149
  print("✅ RAG system initialized")
150
 
151
+ # Test statistics retrieval
152
  stats = rag.get_stats()
153
  print(f"✅ Stats retrieved: {stats}")
154
 
 
160
 
161
 
162
  def test_pdf_processor():
163
+ """
164
+ Test the PDF processor functionality
165
+
166
+ This function validates:
167
+ - PDF processor can be instantiated
168
+ - Query preprocessing works correctly
169
+ - Basic text processing capabilities
170
+
171
+ Returns:
172
+ bool: True if PDF processor tests pass, False otherwise
173
+ """
174
  print("\n🔍 Testing PDF processor...")
175
 
176
  try:
177
  from pdf_processor import SimplePDFProcessor
178
 
179
+ # Test PDF processor initialization
180
  processor = SimplePDFProcessor()
181
  print("✅ PDF processor initialized")
182
 
183
+ # Test query preprocessing functionality
184
  processed_query = processor.preprocess_query("What is the revenue?")
185
  print(f"✅ Query preprocessing: '{processed_query}'")
186
 
 
192
 
193
 
194
  def test_model_loading():
195
+ """
196
+ Test if AI models can be loaded successfully
197
+
198
+ This function validates:
199
+ - Sentence transformer model loading
200
+ - Language model tokenizer loading
201
+ - Language model loading with CPU configuration
202
+ - Fallback model capabilities
203
+
204
+ Returns:
205
+ bool: True if model loading tests pass, False otherwise
206
+ """
207
  print("\n🔍 Testing model loading...")
208
 
209
  try:
210
  from sentence_transformers import SentenceTransformer
211
  from transformers import AutoTokenizer, AutoModelForCausalLM
212
 
213
+ # Test embedding model loading
214
  embedder = SentenceTransformer("all-MiniLM-L6-v2")
215
  print("✅ Embedding model loaded")
216
 
217
+ # Test tokenizer loading
218
  tokenizer = AutoTokenizer.from_pretrained(
219
  "Qwen/Qwen2.5-1.5B-Instruct", trust_remote_code=True
220
  )
221
  print("✅ Tokenizer loaded")
222
 
223
+ # Test model loading with CPU configuration
224
  model = AutoModelForCausalLM.from_pretrained(
225
  "Qwen/Qwen2.5-1.5B-Instruct",
226
  trust_remote_code=True,
 
237
 
238
 
239
  def test_streamlit_app():
240
+ """
241
+ Test if Streamlit app can be imported and initialized
242
+
243
+ This function validates:
244
+ - Main app.py can be imported
245
+ - No critical import errors in the application
246
+ - Basic app structure is correct
247
+
248
+ Returns:
249
+ bool: True if Streamlit app tests pass, False otherwise
250
+ """
251
  print("\n🔍 Testing Streamlit app...")
252
 
253
  try:
 
255
  import app
256
 
257
  print("✅ Streamlit app imported successfully")
 
258
  return True
259
 
260
  except Exception as e:
 
263
 
264
 
265
  def test_file_structure():
266
+ """
267
+ Test if all required files exist in the project
268
+
269
+ This function checks for essential files:
270
+ - Main application files
271
+ - Configuration files
272
+ - Documentation files
273
+
274
+ Returns:
275
+ bool: True if all required files exist, False otherwise
276
+ """
277
  print("\n🔍 Testing file structure...")
278
 
279
+ # List of required files for deployment
280
  required_files = [
281
+ "app.py", # Main Streamlit application
282
+ "rag_system.py", # Core RAG system
283
+ "pdf_processor.py", # PDF processing utilities
284
+ "requirements.txt", # Python dependencies
285
+ "README.md", # Project documentation
286
  ]
287
 
288
  missing_files = []
 
301
 
302
 
303
  def test_requirements():
304
+ """
305
+ Test if requirements.txt contains all essential packages
306
+
307
+ This function validates:
308
+ - Essential packages are listed
309
+ - Package versions are specified
310
+ - No obvious missing dependencies
311
+
312
+ Returns:
313
+ bool: True if requirements are valid, False otherwise
314
+ """
315
  print("\n🔍 Testing requirements.txt...")
316
 
317
  try:
318
  with open("requirements.txt", "r") as f:
319
  requirements = f.read()
320
 
321
+ # List of essential packages that must be present
322
  essential_packages = [
323
+ "streamlit", # Web framework
324
+ "torch", # Deep learning
325
+ "transformers", # Language models
326
+ "sentence-transformers", # Embeddings
327
+ "faiss-cpu", # Vector search
328
+ "rank-bm25", # Sparse retrieval
329
+ "pypdf", # PDF processing
330
  ]
331
 
332
  missing_packages = []
 
349
 
350
 
351
  def main():
352
+ """
353
+ Run all deployment tests and provide comprehensive feedback
354
+
355
+ This function:
356
+ 1. Executes all test categories
357
+ 2. Tracks test results
358
+ 3. Provides summary statistics
359
+ 4. Gives deployment recommendations
360
+
361
+ The tests are designed to catch common deployment issues early.
362
+ """
363
  print("🚀 Hugging Face Deployment Test\n")
364
 
365
+ # Define all test functions with descriptive names
366
  tests = [
367
  ("File Structure", test_file_structure),
368
  ("Requirements", test_requirements),
 
373
  ("Streamlit App", test_streamlit_app),
374
  ]
375
 
376
+ # Execute all tests and collect results
377
  results = []
378
  for test_name, test_func in tests:
379
  try:
 
383
  print(f"❌ {test_name} test failed with exception: {e}")
384
  results.append((test_name, False))
385
 
386
+ # =============================================================================
387
+ # RESULTS SUMMARY
388
+ # =============================================================================
389
+
390
+ # Display comprehensive test results
391
  print("\n" + "=" * 50)
392
  print("📊 Test Results Summary")
393
  print("=" * 50)
 
395
  passed = 0
396
  total = len(results)
397
 
398
+ # Show individual test results
399
  for test_name, result in results:
400
  status = "✅ PASS" if result else "❌ FAIL"
401
  print(f"{test_name:20} {status}")
402
  if result:
403
  passed += 1
404
 
405
+ # Display overall statistics
406
  print(f"\nOverall: {passed}/{total} tests passed")
407
 
408
+ # =============================================================================
409
+ # DEPLOYMENT RECOMMENDATIONS
410
+ # =============================================================================
411
+
412
  if passed == total:
413
  print("🎉 All tests passed! Ready for Hugging Face deployment.")
414
  print("\nNext steps:")
415
  print("1. Create a new Hugging Face Space")
416
  print("2. Upload all files from this directory")
417
+ print("3. Set the SDK to 'Docker'")
418
  print("4. Deploy and test your RAG system!")
419
  else:
420
  print("⚠️ Some tests failed. Please fix the issues before deployment.")
 
425
  print("4. Test locally first: streamlit run app.py")
426
 
427
 
428
+ # =============================================================================
429
+ # SCRIPT ENTRY POINT
430
+ # =============================================================================
431
+
432
  if __name__ == "__main__":
433
  main()
test_docker.py CHANGED
@@ -1,8 +1,47 @@
1
  #!/usr/bin/env python3
2
  """
3
- Test script for Docker deployment
4
 
5
- This script tests if all components are working correctly for Docker deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  """
7
 
8
  import os
@@ -12,7 +51,18 @@ from pathlib import Path
12
 
13
 
14
  def test_dockerfile():
15
- """Test if Dockerfile exists and is valid"""
 
 
 
 
 
 
 
 
 
 
 
16
  print("🔍 Testing Dockerfile...")
17
 
18
  dockerfile_path = Path("Dockerfile")
@@ -24,15 +74,15 @@ def test_dockerfile():
24
  with open(dockerfile_path, "r") as f:
25
  content = f.read()
26
 
27
- # Check for essential Dockerfile components
28
  required_components = [
29
- "FROM python:",
30
- "WORKDIR /app",
31
- "COPY requirements.txt",
32
- "RUN pip install",
33
- "COPY .",
34
- "EXPOSE 8501",
35
- 'CMD ["streamlit"',
36
  ]
37
 
38
  missing_components = []
@@ -55,7 +105,15 @@ def test_dockerfile():
55
 
56
 
57
  def test_dockerignore():
58
- """Test if .dockerignore exists"""
 
 
 
 
 
 
 
 
59
  print("\n🔍 Testing .dockerignore...")
60
 
61
  dockerignore_path = Path(".dockerignore")
@@ -68,7 +126,18 @@ def test_dockerignore():
68
 
69
 
70
  def test_docker_compose():
71
- """Test if docker-compose.yml exists"""
 
 
 
 
 
 
 
 
 
 
 
72
  print("\n🔍 Testing docker-compose.yml...")
73
 
74
  compose_path = Path("docker-compose.yml")
@@ -81,16 +150,27 @@ def test_docker_compose():
81
 
82
 
83
  def test_docker_build():
84
- """Test Docker build locally"""
 
 
 
 
 
 
 
 
 
 
 
85
  print("\n🔍 Testing Docker build...")
86
 
87
  try:
88
- # Test Docker build
89
  result = subprocess.run(
90
  ["docker", "build", "-t", "rag-system-test", "."],
91
  capture_output=True,
92
  text=True,
93
- timeout=300, # 5 minutes timeout
94
  )
95
 
96
  if result.returncode == 0:
@@ -112,11 +192,22 @@ def test_docker_build():
112
 
113
 
114
  def test_docker_run():
115
- """Test Docker run locally"""
 
 
 
 
 
 
 
 
 
 
 
116
  print("\n🔍 Testing Docker run...")
117
 
118
  try:
119
- # Test Docker run (brief test)
120
  result = subprocess.run(
121
  [
122
  "docker",
@@ -131,13 +222,13 @@ def test_docker_run():
131
  ],
132
  capture_output=True,
133
  text=True,
134
- timeout=30,
135
  )
136
 
137
  if result.returncode == 0:
138
  print("✅ Docker run successful")
139
 
140
- # Clean up
141
  subprocess.run(["docker", "stop", "rag-test"], capture_output=True)
142
  return True
143
  else:
@@ -156,22 +247,40 @@ def test_docker_run():
156
 
157
 
158
  def test_file_structure():
159
- """Test if all required files exist"""
 
 
 
 
 
 
 
 
 
 
 
160
  print("\n🔍 Testing file structure...")
161
 
 
162
  required_files = [
163
- "app.py",
164
- "rag_system.py",
165
- "pdf_processor.py",
166
- "requirements.txt",
167
- "Dockerfile",
168
  ]
169
 
170
- optional_files = [".dockerignore", "docker-compose.yml", "README.md"]
 
 
 
 
 
171
 
172
  missing_required = []
173
  missing_optional = []
174
 
 
175
  for file in required_files:
176
  if os.path.exists(file):
177
  print(f"✅ {file}")
@@ -179,6 +288,7 @@ def test_file_structure():
179
  print(f"❌ {file} (missing)")
180
  missing_required.append(file)
181
 
 
182
  for file in optional_files:
183
  if os.path.exists(file):
184
  print(f"✅ {file}")
@@ -194,22 +304,33 @@ def test_file_structure():
194
 
195
 
196
  def test_requirements():
197
- """Test if requirements.txt is valid"""
 
 
 
 
 
 
 
 
 
 
 
198
  print("\n🔍 Testing requirements.txt...")
199
 
200
  try:
201
  with open("requirements.txt", "r") as f:
202
  requirements = f.read()
203
 
204
- # Check for essential packages
205
  essential_packages = [
206
- "streamlit",
207
- "torch",
208
- "transformers",
209
- "sentence-transformers",
210
- "faiss-cpu",
211
- "rank-bm25",
212
- "pypdf",
213
  ]
214
 
215
  missing_packages = []
@@ -232,9 +353,20 @@ def test_requirements():
232
 
233
 
234
  def main():
235
- """Run all tests"""
 
 
 
 
 
 
 
 
 
 
236
  print("🐳 Docker Deployment Test\n")
237
 
 
238
  tests = [
239
  ("File Structure", test_file_structure),
240
  ("Requirements", test_requirements),
@@ -245,6 +377,7 @@ def main():
245
  ("Docker Run", test_docker_run),
246
  ]
247
 
 
248
  results = []
249
  for test_name, test_func in tests:
250
  try:
@@ -254,7 +387,11 @@ def main():
254
  print(f"❌ {test_name} test failed with exception: {e}")
255
  results.append((test_name, False))
256
 
257
- # Summary
 
 
 
 
258
  print("\n" + "=" * 50)
259
  print("📊 Test Results Summary")
260
  print("=" * 50)
@@ -262,14 +399,20 @@ def main():
262
  passed = 0
263
  total = len(results)
264
 
 
265
  for test_name, result in results:
266
  status = "✅ PASS" if result else "❌ FAIL"
267
  print(f"{test_name:20} {status}")
268
  if result:
269
  passed += 1
270
 
 
271
  print(f"\nOverall: {passed}/{total} tests passed")
272
 
 
 
 
 
273
  if passed == total:
274
  print("🎉 All tests passed! Ready for Hugging Face Docker deployment.")
275
  print("\nNext steps:")
@@ -286,5 +429,9 @@ def main():
286
  print("4. Test Docker build locally: docker build -t rag-system .")
287
 
288
 
 
 
 
 
289
  if __name__ == "__main__":
290
  main()
 
1
  #!/usr/bin/env python3
2
  """
3
+ # Test Script for Docker Deployment
4
 
5
+ This script provides comprehensive testing for the RAG system Docker deployment.
6
+
7
+ ## Overview
8
+
9
+ The test script validates all components required for successful Docker deployment:
10
+ - Dockerfile syntax and structure
11
+ - Docker Compose configuration
12
+ - Docker build process
13
+ - Container runtime functionality
14
+ - File structure and dependencies
15
+
16
+ ## Test Categories
17
+
18
+ 1. **Dockerfile Tests**: Validate Dockerfile syntax and required components
19
+ 2. **Docker Compose Tests**: Check docker-compose.yml configuration
20
+ 3. **Build Tests**: Test Docker image building process
21
+ 4. **Runtime Tests**: Validate container startup and health checks
22
+ 5. **File Structure Tests**: Confirm all required files are present
23
+ 6. **Requirements Tests**: Validate dependencies are properly specified
24
+
25
+ ## Usage
26
+
27
+ Run the script to check Docker deployment readiness:
28
+ ```bash
29
+ python test_docker.py
30
+ ```
31
+
32
+ ## Prerequisites
33
+
34
+ - Docker installed and running
35
+ - Docker Compose available
36
+ - Sufficient disk space for image building
37
+ - Network connectivity for base image downloads
38
+
39
+ ## Expected Output
40
+
41
+ The script provides detailed feedback on each test:
42
+ - ✅ PASS: Component is ready for Docker deployment
43
+ - ❌ FAIL: Component needs attention before deployment
44
+ - ⚠️ WARNING: Optional component missing but not critical
45
  """
46
 
47
  import os
 
51
 
52
 
53
  def test_dockerfile():
54
+ """
55
+ Test if Dockerfile exists and contains all required components
56
+
57
+ This function validates:
58
+ - Dockerfile exists in the project root
59
+ - Contains essential Docker instructions
60
+ - Proper syntax and structure
61
+ - Required components for RAG system deployment
62
+
63
+ Returns:
64
+ bool: True if Dockerfile is valid, False otherwise
65
+ """
66
  print("🔍 Testing Dockerfile...")
67
 
68
  dockerfile_path = Path("Dockerfile")
 
74
  with open(dockerfile_path, "r") as f:
75
  content = f.read()
76
 
77
+ # List of essential Dockerfile components that must be present
78
  required_components = [
79
+ "FROM python:", # Base image specification
80
+ "WORKDIR /app", # Working directory setup
81
+ "COPY requirements.txt", # Requirements file copying
82
+ "RUN pip install", # Python package installation
83
+ "COPY .", # Application files copying
84
+ "EXPOSE 8501", # Port exposure for Streamlit
85
+ 'CMD ["streamlit"', # Application startup command
86
  ]
87
 
88
  missing_components = []
 
105
 
106
 
107
  def test_dockerignore():
108
+ """
109
+ Test if .dockerignore exists (optional but recommended)
110
+
111
+ This function checks for the presence of .dockerignore file,
112
+ which helps optimize Docker builds by excluding unnecessary files.
113
+
114
+ Returns:
115
+ bool: True if .dockerignore exists or is optional, False if critical
116
+ """
117
  print("\n🔍 Testing .dockerignore...")
118
 
119
  dockerignore_path = Path(".dockerignore")
 
126
 
127
 
128
  def test_docker_compose():
129
+ """
130
+ Test if docker-compose.yml exists and is properly configured
131
+
132
+ This function validates:
133
+ - docker-compose.yml file exists
134
+ - Contains proper service definitions
135
+ - Port mappings are correct
136
+ - Volume mounts are configured
137
+
138
+ Returns:
139
+ bool: True if docker-compose.yml is valid, False otherwise
140
+ """
141
  print("\n🔍 Testing docker-compose.yml...")
142
 
143
  compose_path = Path("docker-compose.yml")
 
150
 
151
 
152
  def test_docker_build():
153
+ """
154
+ Test Docker build process locally
155
+
156
+ This function:
157
+ - Attempts to build the Docker image
158
+ - Validates build process completes successfully
159
+ - Checks for build errors and warnings
160
+ - Ensures all dependencies are properly resolved
161
+
162
+ Returns:
163
+ bool: True if Docker build succeeds, False otherwise
164
+ """
165
  print("\n🔍 Testing Docker build...")
166
 
167
  try:
168
+ # Test Docker build with timeout to prevent hanging
169
  result = subprocess.run(
170
  ["docker", "build", "-t", "rag-system-test", "."],
171
  capture_output=True,
172
  text=True,
173
+ timeout=300, # 5 minutes timeout for build
174
  )
175
 
176
  if result.returncode == 0:
 
192
 
193
 
194
  def test_docker_run():
195
+ """
196
+ Test Docker container runtime functionality
197
+
198
+ This function:
199
+ - Attempts to run the built Docker container
200
+ - Validates container startup process
201
+ - Checks if the application is accessible
202
+ - Tests basic container functionality
203
+
204
+ Returns:
205
+ bool: True if Docker run succeeds, False otherwise
206
+ """
207
  print("\n🔍 Testing Docker run...")
208
 
209
  try:
210
+ # Test Docker run with brief execution
211
  result = subprocess.run(
212
  [
213
  "docker",
 
222
  ],
223
  capture_output=True,
224
  text=True,
225
+ timeout=30, # 30 seconds timeout for startup
226
  )
227
 
228
  if result.returncode == 0:
229
  print("✅ Docker run successful")
230
 
231
+ # Clean up the test container
232
  subprocess.run(["docker", "stop", "rag-test"], capture_output=True)
233
  return True
234
  else:
 
247
 
248
 
249
  def test_file_structure():
250
+ """
251
+ Test if all required files exist for Docker deployment
252
+
253
+ This function checks for essential files:
254
+ - Main application files
255
+ - Configuration files
256
+ - Docker-related files
257
+ - Documentation files
258
+
259
+ Returns:
260
+ bool: True if all required files exist, False otherwise
261
+ """
262
  print("\n🔍 Testing file structure...")
263
 
264
+ # List of required files for Docker deployment
265
  required_files = [
266
+ "app.py", # Main Streamlit application
267
+ "rag_system.py", # Core RAG system
268
+ "pdf_processor.py", # PDF processing utilities
269
+ "requirements.txt", # Python dependencies
270
+ "Dockerfile", # Docker configuration
271
  ]
272
 
273
+ # List of optional files (nice to have but not critical)
274
+ optional_files = [
275
+ ".dockerignore", # Docker build optimization
276
+ "docker-compose.yml", # Multi-container setup
277
+ "README.md", # Project documentation
278
+ ]
279
 
280
  missing_required = []
281
  missing_optional = []
282
 
283
+ # Check required files
284
  for file in required_files:
285
  if os.path.exists(file):
286
  print(f"✅ {file}")
 
288
  print(f"❌ {file} (missing)")
289
  missing_required.append(file)
290
 
291
+ # Check optional files
292
  for file in optional_files:
293
  if os.path.exists(file):
294
  print(f"✅ {file}")
 
304
 
305
 
306
  def test_requirements():
307
+ """
308
+ Test if requirements.txt contains all essential packages
309
+
310
+ This function validates:
311
+ - Essential packages are listed
312
+ - Package versions are specified
313
+ - No obvious missing dependencies
314
+ - Compatibility with Docker environment
315
+
316
+ Returns:
317
+ bool: True if requirements are valid, False otherwise
318
+ """
319
  print("\n🔍 Testing requirements.txt...")
320
 
321
  try:
322
  with open("requirements.txt", "r") as f:
323
  requirements = f.read()
324
 
325
+ # List of essential packages that must be present
326
  essential_packages = [
327
+ "streamlit", # Web framework
328
+ "torch", # Deep learning
329
+ "transformers", # Language models
330
+ "sentence-transformers", # Embeddings
331
+ "faiss-cpu", # Vector search
332
+ "rank-bm25", # Sparse retrieval
333
+ "pypdf", # PDF processing
334
  ]
335
 
336
  missing_packages = []
 
353
 
354
 
355
  def main():
356
+ """
357
+ Run all Docker deployment tests and provide comprehensive feedback
358
+
359
+ This function:
360
+ 1. Executes all Docker-related test categories
361
+ 2. Tracks test results and provides detailed feedback
362
+ 3. Gives deployment recommendations
363
+ 4. Identifies potential issues before deployment
364
+
365
+ The tests are designed to catch common Docker deployment issues early.
366
+ """
367
  print("🐳 Docker Deployment Test\n")
368
 
369
+ # Define all test functions with descriptive names
370
  tests = [
371
  ("File Structure", test_file_structure),
372
  ("Requirements", test_requirements),
 
377
  ("Docker Run", test_docker_run),
378
  ]
379
 
380
+ # Execute all tests and collect results
381
  results = []
382
  for test_name, test_func in tests:
383
  try:
 
387
  print(f"❌ {test_name} test failed with exception: {e}")
388
  results.append((test_name, False))
389
 
390
+ # =============================================================================
391
+ # RESULTS SUMMARY
392
+ # =============================================================================
393
+
394
+ # Display comprehensive test results
395
  print("\n" + "=" * 50)
396
  print("📊 Test Results Summary")
397
  print("=" * 50)
 
399
  passed = 0
400
  total = len(results)
401
 
402
+ # Show individual test results
403
  for test_name, result in results:
404
  status = "✅ PASS" if result else "❌ FAIL"
405
  print(f"{test_name:20} {status}")
406
  if result:
407
  passed += 1
408
 
409
+ # Display overall statistics
410
  print(f"\nOverall: {passed}/{total} tests passed")
411
 
412
+ # =============================================================================
413
+ # DEPLOYMENT RECOMMENDATIONS
414
+ # =============================================================================
415
+
416
  if passed == total:
417
  print("🎉 All tests passed! Ready for Hugging Face Docker deployment.")
418
  print("\nNext steps:")
 
429
  print("4. Test Docker build locally: docker build -t rag-system .")
430
 
431
 
432
+ # =============================================================================
433
+ # SCRIPT ENTRY POINT
434
+ # =============================================================================
435
+
436
  if __name__ == "__main__":
437
  main()
test_hf_spaces.py ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for HF Spaces configuration
4
+ =======================================
5
+
6
+ This script tests the HF Spaces configuration module to ensure it's working correctly.
7
+ Run this script to verify that the configuration is properly set up.
8
+ """
9
+
10
+ import os
11
+ import sys
12
+ from pathlib import Path
13
+
14
+
15
+ def test_hf_spaces_config():
16
+ """Test the HF Spaces configuration"""
17
+ print("🧪 Testing HF Spaces Configuration")
18
+ print("=" * 50)
19
+
20
+ try:
21
+ # Import the configuration
22
+ from hf_spaces_config import get_hf_config, is_hf_spaces
23
+
24
+ print("✅ Successfully imported HF Spaces configuration")
25
+
26
+ # Test environment detection
27
+ print(f"\n🌐 Environment Detection:")
28
+ print(f" Is HF Spaces: {is_hf_spaces()}")
29
+
30
+ # Get configuration
31
+ config = get_hf_config()
32
+ print(f" Configuration loaded: {type(config).__name__}")
33
+
34
+ # Test cache directories
35
+ print(f"\n📁 Cache Directories:")
36
+ for name, path in config.cache_dirs.items():
37
+ exists = os.path.exists(path)
38
+ print(f" {name}: {path} {'✅' if exists else '❌'}")
39
+
40
+ # Test environment variables
41
+ print(f"\n🔧 Environment Variables:")
42
+ env_vars = config.env_vars
43
+ for key, value in env_vars.items():
44
+ print(f" {key}: {value}")
45
+
46
+ # Test model configuration
47
+ print(f"\n🤖 Model Configuration:")
48
+ model_config = config.get_model_config()
49
+ for key, value in model_config.items():
50
+ print(f" {key}: {value}")
51
+
52
+ # Test guard rail configuration
53
+ print(f"\n🛡️ Guard Rail Configuration:")
54
+ guard_config = config.get_guard_rail_config()
55
+ for key, value in guard_config.items():
56
+ print(f" {key}: {value}")
57
+
58
+ # Test resource limits
59
+ print(f"\n📊 Resource Limits:")
60
+ resource_limits = config.get_resource_limits()
61
+ for key, value in resource_limits.items():
62
+ print(f" {key}: {value}")
63
+
64
+ print(f"\n✅ All tests passed!")
65
+ return True
66
+
67
+ except ImportError as e:
68
+ print(f"❌ Import error: {e}")
69
+ return False
70
+ except Exception as e:
71
+ print(f"❌ Configuration error: {e}")
72
+ return False
73
+
74
+
75
+ def test_cache_directories():
76
+ """Test cache directory creation"""
77
+ print(f"\n🔧 Testing Cache Directory Creation")
78
+ print("=" * 50)
79
+
80
+ try:
81
+ from hf_spaces_config import get_hf_config
82
+
83
+ config = get_hf_config()
84
+
85
+ # Test directory creation
86
+ for name, path in config.cache_dirs.items():
87
+ try:
88
+ Path(path).mkdir(parents=True, exist_ok=True)
89
+ print(f"✅ Created: {name} -> {path}")
90
+ except Exception as e:
91
+ print(f"❌ Failed to create {name}: {e}")
92
+
93
+ return True
94
+
95
+ except Exception as e:
96
+ print(f"❌ Cache directory test failed: {e}")
97
+ return False
98
+
99
+
100
+ def test_environment_variables():
101
+ """Test environment variable setup"""
102
+ print(f"\n🔧 Testing Environment Variables")
103
+ print("=" * 50)
104
+
105
+ try:
106
+ from hf_spaces_config import get_hf_config
107
+
108
+ config = get_hf_config()
109
+
110
+ # Check if environment variables are set
111
+ for key, expected_value in config.env_vars.items():
112
+ actual_value = os.environ.get(key, "NOT_SET")
113
+ status = "✅" if actual_value == expected_value else "❌"
114
+ print(f" {key}: {actual_value} {status}")
115
+
116
+ return True
117
+
118
+ except Exception as e:
119
+ print(f"❌ Environment variable test failed: {e}")
120
+ return False
121
+
122
+
123
+ def main():
124
+ """Run all tests"""
125
+ print("🚀 HF Spaces Configuration Test Suite")
126
+ print("=" * 60)
127
+
128
+ tests = [
129
+ ("Configuration Import", test_hf_spaces_config),
130
+ ("Cache Directories", test_cache_directories),
131
+ ("Environment Variables", test_environment_variables),
132
+ ]
133
+
134
+ results = []
135
+ for test_name, test_func in tests:
136
+ print(f"\n🧪 Running: {test_name}")
137
+ result = test_func()
138
+ results.append((test_name, result))
139
+
140
+ # Summary
141
+ print(f"\n📊 Test Summary")
142
+ print("=" * 30)
143
+ passed = sum(1 for _, result in results if result)
144
+ total = len(results)
145
+
146
+ for test_name, result in results:
147
+ status = "✅ PASS" if result else "❌ FAIL"
148
+ print(f" {test_name}: {status}")
149
+
150
+ print(f"\nOverall: {passed}/{total} tests passed")
151
+
152
+ if passed == total:
153
+ print("🎉 All tests passed! HF Spaces configuration is working correctly.")
154
+ return 0
155
+ else:
156
+ print("⚠️ Some tests failed. Please check the configuration.")
157
+ return 1
158
+
159
+
160
+ if __name__ == "__main__":
161
+ sys.exit(main())