destinyebuka commited on
Commit
bc0cd92
Β·
1 Parent(s): ee3d3d4
IMPLEMENTATION_SUMMARY.md DELETED
@@ -1,319 +0,0 @@
1
- # 🎯 Vision AI Listing Feature - Implementation Summary
2
-
3
- ## What Was Built
4
-
5
- A **smart AI-powered property listing feature** that intelligently handles THREE different listing methods and produces a unified result.
6
-
7
- ---
8
-
9
- ## Key Features Implemented
10
-
11
- ### 1. βœ… Smart Listing Method Detection
12
-
13
- The system knows HOW the user is listing and behaves accordingly:
14
-
15
- **TEXT Method** (User provides details via chat)
16
- - User says: "3-bed, 2-bath in Lagos, 500k/month, has WiFi, AC"
17
- - Uploads photos for VALIDATION (not re-extraction)
18
- - Backend: Validates images are property-related, uploads to Cloudflare
19
- - Result: Text data + validated photos
20
-
21
- **IMAGE Method** (User uploads photos only)
22
- - User just uploads photos (no text details)
23
- - Backend: EXTRACTS all details from images (bedrooms, bathrooms, amenities)
24
- - Generates: SHORT title (max 2 sentences) + full description
25
- - Result: Complete listing data extracted from photos
26
-
27
- **VIDEO Method** (User uploads video + photos)
28
- - User uploads video walkthrough
29
- - Backend: Uploads to Cloudinary, suggests adding photos
30
- - User uploads photos for analysis
31
- - Backend: Extracts details from photos (same as IMAGE method)
32
- - Result: Full data from photos + video URL
33
-
34
- ---
35
-
36
- ### 2. βœ… Intelligent Title & Description Generation
37
-
38
- **Title Requirements:**
39
- - βœ… SHORT - Maximum 2 sentences
40
- - βœ… Examples: "Modern 3-bed apartment. Great location!"
41
- - ❌ NOT: Long descriptions with many details
42
-
43
- **Description:**
44
- - Full 2-3 sentence description of property
45
- - Professional tone
46
- - Highlights key features
47
-
48
- **Both generated by Vision AI** for image/video methods
49
-
50
- ---
51
-
52
- ### 3. βœ… Smart File Naming Strategy
53
-
54
- **Pattern:** `{location}_{title}_{timestamp}_{index}.jpg`
55
-
56
- **Example filenames:**
57
- - `Lagos_Modern_Apartment_2025_01_31_0.jpg`
58
- - `Victoria_Island_3_Bed_Luxury_2025_01_31_1.jpg`
59
- - `Cotonou_Cozy_Studio_2025_01_31_0.jpg`
60
-
61
- **Benefits:**
62
- - Easy to identify property in storage
63
- - Shows when listed (timestamp)
64
- - Automatically indexed for multiple photos
65
- - Cloudflare worker detects duplicates and appends numbers
66
-
67
- ---
68
-
69
- ### 4. βœ… Unified Response Format
70
-
71
- **All three methods return the SAME structure:**
72
-
73
- ```json
74
- {
75
- "success": true,
76
- "listing_method": "text|image|video",
77
- "extracted_fields": {
78
- "bedrooms": 3,
79
- "bathrooms": 2,
80
- "amenities": ["WiFi", "Parking", "AC"],
81
- "description": "Beautiful apartment...",
82
- "title": "Modern 3-Bed Apartment. Great location!"
83
- },
84
- "confidence": {
85
- "bedrooms": 0.95,
86
- "bathrooms": 0.88,
87
- "amenities": 0.72,
88
- "title": 0.85
89
- },
90
- "image_urls": ["url1", "url2"],
91
- "video_url": "https://cloudinary..." // Only if video method
92
- }
93
- ```
94
-
95
- **Frontend shows same UI** regardless of how user listed β†’ Same draft card, same editing experience
96
-
97
- ---
98
-
99
- ### 5. βœ… Property Validation BEFORE Upload
100
-
101
- **Critical feature for space saving:**
102
-
103
- ```
104
- Image Upload Flow:
105
- 1. Receive image from frontend
106
- 2. Check: "Is this a property image?"
107
- 3. If NO β†’ Reject with message, no upload
108
- 4. If YES β†’ Upload to Cloudflare with smart filename
109
- ```
110
-
111
- This prevents non-property images from consuming Cloudflare storage!
112
-
113
- ---
114
-
115
- ### 6. βœ… Vision Service Enhancements
116
-
117
- **New capabilities in `vision_service.py`:**
118
-
119
- - `extract_property_fields()` - Now generates title + description
120
- - `_generate_title()` - Creates SHORT titles (max 2 sentences)
121
- - `_extract_room_count()` - Counts bedrooms/bathrooms
122
- - `_detect_amenities()` - Finds amenities in images
123
- - `_generate_description()` - Creates full descriptions
124
- - `merge_multiple_image_results()` - Combines results from multiple images
125
- - Confidence scoring for each field
126
-
127
- ---
128
-
129
- ### 7. βœ… Enhanced Media Upload Routes
130
-
131
- **Updated endpoints:**
132
-
133
- `POST /listings/analyze-images`
134
- - Accepts `listing_method` parameter ("text", "image", "video")
135
- - Accepts optional `location` parameter for context
136
- - Returns: Complete extracted fields + image URLs + confidence scores
137
- - Generates intelligent filenames during upload
138
-
139
- `POST /listings/analyze-video`
140
- - Uploads video to Cloudinary with smart naming
141
- - Returns: Video URL + suggestions to upload photos
142
- - Recommends photos for better accuracy
143
-
144
- ---
145
-
146
- ## Files Modified/Created
147
-
148
- ### Created Files:
149
- 1. **`app/ai/services/vision_service.py`** - Vision AI analysis service
150
- 2. **`app/routes/media_upload.py`** - Image/video upload endpoints
151
- 3. **`VISION_FEATURE_INTEGRATION_GUIDE.md`** - Complete integration guide
152
- 4. **`IMPLEMENTATION_SUMMARY.md`** - This file
153
-
154
- ### Modified Files:
155
- 1. **`app/config.py`** - Added Cloudinary + Vision settings
156
- 2. **`requirements.txt`** - Added cloudinary + ffmpeg-python
157
- 3. **`app/ai/agent/nodes/listing_collect.py`** - Added `initialize_from_vision_analysis()` function
158
- 4. **`main.py`** - Registered media_upload routes
159
-
160
- ---
161
-
162
- ## Configuration Required
163
-
164
- Add to `.env`:
165
-
166
- ```bash
167
- # Cloudinary (Video Storage)
168
- CLOUDINARY_CLOUD_NAME=your_cloud_name
169
- CLOUDINARY_API_KEY=your_api_key
170
- CLOUDINARY_API_SECRET=your_api_secret
171
-
172
- # Hugging Face Vision Model
173
- HF_TOKEN=your_hf_token
174
- HF_VISION_MODEL=vikhyatk/moondream2
175
- HF_VISION_API_ENABLED=true
176
- PROPERTY_IMAGE_MIN_CONFIDENCE=0.6
177
- ```
178
-
179
- ---
180
-
181
- ## Frontend Changes Required
182
-
183
- ### Update Image Upload Flow
184
-
185
- **OLD (Direct to Cloudflare):**
186
- ```javascript
187
- // Upload directly to Cloudflare
188
- const url = await uploadToCloudflare(image)
189
- ```
190
-
191
- **NEW (Via Backend with Validation):**
192
- ```javascript
193
- // Method 1: Text listing (chat + photos)
194
- const result = await fetch('/listings/analyze-images', {
195
- method: 'POST',
196
- body: formData,
197
- headers: { 'listing_method': 'text', 'location': chatLocation }
198
- })
199
-
200
- // Method 2: Image listing (photos only)
201
- const result = await fetch('/listings/analyze-images', {
202
- method: 'POST',
203
- body: formData,
204
- headers: { 'listing_method': 'image' }
205
- })
206
-
207
- // Method 3: Video listing
208
- const result = await fetch('/listings/analyze-video', {
209
- method: 'POST',
210
- body: formData,
211
- headers: { 'listing_method': 'video' }
212
- })
213
- ```
214
-
215
- ---
216
-
217
- ## User Experience Flow
218
-
219
- ### For Image Listing Method:
220
-
221
- ```
222
- User clicks "List with Photos" β†’ Uploads 2-3 images
223
- ↓
224
- Backend validates images are property-related
225
- ↓
226
- AI extracts:
227
- - Bedrooms: 3 (confidence: 95%)
228
- - Bathrooms: 2 (confidence: 88%)
229
- - Amenities: WiFi, AC, Parking, Pool
230
- - Title: "Modern 3-Bed Apartment. Great location!" (SHORT)
231
- - Description: "Beautiful 3-bed with modern furnishings..."
232
- ↓
233
- Shows Draft UI with:
234
- - Photos with smart names (Lagos_Modern_Apartment_2025_01_31_0.jpg)
235
- - Extracted fields
236
- - Confidence indicators
237
- ↓
238
- User asked: "What's the location, address, and price?"
239
- ↓
240
- User provides: "Lagos, Victoria Island, 500,000 per month"
241
- ↓
242
- AI infers listing_type: "rent" (from price context)
243
- ↓
244
- User edits via text:
245
- - "Change amenities to WiFi, gym, and pool"
246
- - "Update title to something catchier"
247
- ↓
248
- User publishes: "Publish this listing"
249
- ↓
250
- Listing created with all auto-detected + user-provided data
251
- ```
252
-
253
- ---
254
-
255
- ## Key Differences from Previous Design
256
-
257
- | Aspect | Before | Now |
258
- |--------|--------|-----|
259
- | **File naming** | Random/original names | Smart names (location_title_date) |
260
- | **Title generation** | Not generated for images | AI generates SHORT titles (max 2 sentences) |
261
- | **Listing methods** | Only text-based | Three methods: text, image, video |
262
- | **Method detection** | N/A | AI knows how user is listing |
263
- | **Video storage** | N/A | Cloudinary for videos |
264
- | **Upload strategy** | Direct to Cloudflare | Backend validates first (saves space) |
265
- | **Confidence scores** | Not implemented | Per-field confidence for each extraction |
266
-
267
- ---
268
-
269
- ## Performance Notes
270
-
271
- **Vision API Response Times:**
272
- - Image validation: 2-3 seconds (first image), +1s per additional
273
- - Field extraction: 2-4 seconds per image
274
- - Title generation: 1-2 seconds per image
275
- - Video upload: 5-10 seconds (depends on file size)
276
-
277
- **Cost Optimization:**
278
- - Only valid property images uploaded (rejects non-property images early)
279
- - Smaller file sizes with smart naming
280
- - Cloudflare worker deduplicates files
281
- - Hugging Face Inference API used (cheaper than self-hosted)
282
-
283
- ---
284
-
285
- ## Testing Checklist
286
-
287
- - [ ] Test TEXT method: Chat + upload images
288
- - [ ] Test IMAGE method: Upload images only
289
- - [ ] Test VIDEO method: Upload video + photos
290
- - [ ] Verify short titles generated (max 2 sentences)
291
- - [ ] Verify descriptions generated (full, not short)
292
- - [ ] Verify file naming is intelligent (location_title_date)
293
- - [ ] Verify property validation rejects non-property images
294
- - [ ] Verify confidence scores are returned
295
- - [ ] Verify all three methods produce same draft UI
296
- - [ ] Test editing via natural language commands
297
- - [ ] Test publishing with all three methods
298
-
299
- ---
300
-
301
- ## Next Steps
302
-
303
- 1. **Frontend Integration** - Update image/video upload flows
304
- 2. **Test All Three Methods** - Verify each method works end-to-end
305
- 3. **Monitor Accuracy** - Track field extraction accuracy metrics
306
- 4. **Optimize Prompts** - Fine-tune Vision AI prompts based on real data
307
- 5. **User Feedback** - Gather feedback on titles/descriptions
308
- 6. **Enhance Features** - Add OCR for address extraction, price suggestions, etc.
309
-
310
- ---
311
-
312
- ## Support
313
-
314
- See `VISION_FEATURE_INTEGRATION_GUIDE.md` for:
315
- - Detailed API documentation
316
- - Complete example code
317
- - Error handling
318
- - Troubleshooting
319
- - Future enhancements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
VISION_FEATURE_INTEGRATION_GUIDE.md DELETED
@@ -1,794 +0,0 @@
1
- # πŸ€– AI-Powered Property Listing with Image/Video Analysis
2
- ## Integration Guide
3
-
4
- ---
5
-
6
- ## Overview
7
-
8
- This document explains how to integrate the new **Vision AI feature** that allows users to list properties by uploading images or videos. The AI automatically detects property details (bedrooms, bathrooms, amenities) and fills listing fields.
9
-
10
- ---
11
-
12
- ## Architecture
13
-
14
- ### Flow Diagram
15
-
16
- ```
17
- USER UPLOADS IMAGES/VIDEO
18
- ↓
19
- [BACKEND IMAGE VALIDATION]
20
- - Check if image is property-related (BEFORE upload)
21
- - Reject non-property images (saves Cloudflare space)
22
- ↓
23
- [VISION AI ANALYSIS] (Hugging Face Inference API)
24
- - Extract bedrooms, bathrooms
25
- - Detect amenities
26
- - Generate description
27
- - Return confidence scores
28
- ↓
29
- [UPLOAD TO CLOUD STORAGE]
30
- - Images β†’ Cloudflare (only if validated)
31
- - Videos β†’ Cloudinary
32
- ↓
33
- [INITIALIZE LISTING]
34
- - Pre-fill extracted fields
35
- - Ask user for uncertain/missing fields (price, location, address)
36
- ↓
37
- [DRAFT UI]
38
- - Show preview card like text-based flow
39
- ↓
40
- [USER REVIEWS & EDITS]
41
- - Edit via natural language commands
42
- ↓
43
- [PUBLISH]
44
- - Same as text-based flow
45
- ```
46
-
47
- ---
48
-
49
- ## New Files Created
50
-
51
- ### 1. **Vision Service** - `app/ai/services/vision_service.py`
52
-
53
- **Purpose**: Analyzes images/videos using Hugging Face Inference API
54
-
55
- **Key Classes**:
56
- ```python
57
- class VisionService:
58
- def validate_property_image(image_bytes) β†’ (bool, float, str)
59
- def extract_property_fields(image_bytes) β†’ Dict
60
- def merge_multiple_image_results(results_list) β†’ Dict
61
- ```
62
-
63
- **Functions**:
64
- - `validate_property_image()` - Check if image is property-related (BEFORE upload)
65
- - `extract_property_fields()` - Extract bedrooms, bathrooms, amenities, description
66
- - `_extract_room_count()` - Count rooms
67
- - `_detect_amenities()` - Find amenities
68
- - `_generate_description()` - Create property description
69
- - `merge_multiple_image_results()` - Combine results from multiple images
70
-
71
- ---
72
-
73
- ### 2. **Media Upload Routes** - `app/routes/media_upload.py`
74
-
75
- **Purpose**: Handle image/video uploads with validation
76
-
77
- **Endpoints**:
78
-
79
- #### `POST /listings/analyze-images`
80
- ```
81
- Request:
82
- - files: List of image files (max 10, max 10MB each)
83
- - listing_method: "text" | "image" | "video" (how user is listing)
84
- - location: Optional string (context from text method)
85
-
86
- Process:
87
- 1. Validate image format (JPEG, PNG, WebP)
88
- 2. Validate image is property-related (BEFORE upload)
89
- 3. Extract property fields
90
- 4. Upload to Cloudflare (only if validated)
91
- 5. Return extracted fields + image URLs
92
-
93
- Response:
94
- {
95
- "success": true,
96
- "images_processed": 2,
97
- "images_validated": ["image1.jpg", "image2.jpg"],
98
- "image_urls": [
99
- "https://cloudflare.../image1.jpg",
100
- "https://cloudflare.../image2.jpg"
101
- ],
102
- "extracted_fields": {
103
- "bedrooms": 3,
104
- "bathrooms": 2,
105
- "amenities": ["WiFi", "Parking", "AC"],
106
- "description": "Spacious modern apartment..."
107
- },
108
- "confidence": {
109
- "bedrooms": 0.95,
110
- "bathrooms": 0.88,
111
- "amenities": 0.72,
112
- "description": 0.91
113
- },
114
- "validation_errors": [],
115
- "suggestions": ["Verify bedroom count", "...]
116
- }
117
- ```
118
-
119
- #### `POST /listings/analyze-video`
120
- ```
121
- Request:
122
- - video: Single video file (max 100MB)
123
-
124
- Response:
125
- {
126
- "success": true,
127
- "video_url": "https://res.cloudinary.com/.../video.mp4",
128
- "message": "Video uploaded. Photos recommended for better accuracy.",
129
- "extracted_fields": {...},
130
- "suggestions": ["Upload property photos for better detection"]
131
- }
132
- ```
133
-
134
- #### `POST /listings/validate-media`
135
- ```
136
- Quick validation without uploading
137
- Returns: Validation results for each file
138
- ```
139
-
140
- ---
141
-
142
- ### 3. **Listing Collection Integration** - `app/ai/agent/nodes/listing_collect.py`
143
-
144
- **New Function**: `initialize_from_vision_analysis(state, vision_data)`
145
-
146
- **Purpose**: Pre-populate listing state with AI-detected fields
147
-
148
- **Usage**:
149
- ```python
150
- # After user uploads images and AI analyzes them
151
- state = await initialize_from_vision_analysis(state, vision_data)
152
- # State now has bedrooms, bathrooms, amenities, images, description pre-filled
153
- ```
154
-
155
- ---
156
-
157
- ## Configuration
158
-
159
- ### Add to `.env`
160
-
161
- ```bash
162
- # Cloudinary (Video Storage)
163
- CLOUDINARY_CLOUD_NAME=your_cloud_name
164
- CLOUDINARY_API_KEY=your_api_key
165
- CLOUDINARY_API_SECRET=your_api_secret
166
-
167
- # Hugging Face Vision Model
168
- HF_TOKEN=your_hf_token
169
- HF_VISION_MODEL=vikhyatk/moondream2
170
- HF_VISION_API_ENABLED=true
171
- PROPERTY_IMAGE_MIN_CONFIDENCE=0.6
172
- ```
173
-
174
- ### Update `app/config.py` βœ… (Already Done)
175
-
176
- Added:
177
- - `CLOUDINARY_CLOUD_NAME`
178
- - `CLOUDINARY_API_KEY`
179
- - `CLOUDINARY_API_SECRET`
180
- - `HF_VISION_MODEL`
181
- - `HF_VISION_API_ENABLED`
182
- - `PROPERTY_IMAGE_MIN_CONFIDENCE`
183
-
184
- ### Update `requirements.txt` βœ… (Already Done)
185
-
186
- Added:
187
- - `cloudinary>=1.40.0`
188
- - `ffmpeg-python>=0.2.1`
189
-
190
- ---
191
-
192
- ## Frontend Integration
193
-
194
- ### Frontend Responsibilities
195
-
196
- **IMPORTANT**: Images must now be uploaded to the **backend** (not directly to Cloudflare)
197
-
198
- #### 1. **Image Upload Flow**
199
-
200
- ```typescript
201
- // OLD (Direct to Cloudflare) - DEPRECATED
202
- POST to Cloudflare directly
203
-
204
- // NEW (Via Backend with Validation) - REQUIRED
205
- POST /listings/analyze-images
206
- Headers: Authorization: Bearer {token}
207
- Body: FormData with files
208
- Response: Extracted fields + image URLs
209
- ```
210
-
211
- #### 2. **Example Frontend Code**
212
-
213
- **For TEXT method** (user provided details via chat):
214
- ```typescript
215
- async function uploadImagesForTextListing(files: File[], location: string) {
216
- const formData = new FormData()
217
- files.forEach(file => formData.append('images', file))
218
- formData.append('listing_method', 'text')
219
- formData.append('location', location) // Context from text conversation
220
-
221
- const response = await fetch('/listings/analyze-images', {
222
- method: 'POST',
223
- headers: { 'Authorization': `Bearer ${token}` },
224
- body: formData
225
- })
226
-
227
- const result = await response.json()
228
-
229
- if (!result.success) {
230
- result.validation_errors.forEach(err => {
231
- alert(`${err.image}: ${err.error}`)
232
- })
233
- return
234
- }
235
-
236
- // Images validated with text-provided data
237
- showListingDraft({
238
- // Use data from CHAT (text-provided), images as validation
239
- bedrooms: result.extracted_fields.bedrooms,
240
- bathrooms: result.extracted_fields.bathrooms,
241
- images: result.image_urls,
242
- })
243
- }
244
- ```
245
-
246
- **For IMAGE method** (user uploading photos only):
247
- ```typescript
248
- async function uploadImagesForPhotListing(files: File[]) {
249
- const formData = new FormData()
250
- files.forEach(file => formData.append('images', file))
251
- formData.append('listing_method', 'image')
252
- // No location - we'll extract everything from images
253
-
254
- const response = await fetch('/listings/analyze-images', {
255
- method: 'POST',
256
- headers: { 'Authorization': `Bearer ${token}` },
257
- body: formData
258
- })
259
-
260
- const result = await response.json()
261
-
262
- if (!result.success) {
263
- result.validation_errors.forEach(err => {
264
- alert(`${err.image}: ${err.error}`)
265
- })
266
- return
267
- }
268
-
269
- // Show extracted fields (AI analyzed images)
270
- showListingDraft({
271
- title: result.extracted_fields.title, // AI-generated SHORT title
272
- description: result.extracted_fields.description, // AI-generated description
273
- bedrooms: result.extracted_fields.bedrooms,
274
- bathrooms: result.extracted_fields.bathrooms,
275
- amenities: result.extracted_fields.amenities,
276
- images: result.image_urls,
277
- confidence: result.confidence
278
- })
279
- }
280
- ```
281
-
282
- **For VIDEO method**:
283
- ```typescript
284
- async function uploadVideoForListing(videoFile: File, location?: string) {
285
- const formData = new FormData()
286
- formData.append('video', videoFile)
287
- if (location) formData.append('location', location)
288
-
289
- const response = await fetch('/listings/analyze-video', {
290
- method: 'POST',
291
- headers: { 'Authorization': `Bearer ${token}` },
292
- body: formData
293
- })
294
-
295
- const result = await response.json()
296
-
297
- // Suggest uploading photos
298
- alert(result.message)
299
- // Then call uploadImagesForPhotListing with photos
300
- }
301
- ```
302
-
303
- #### 3. **Video Upload Flow**
304
-
305
- ```typescript
306
- POST /listings/analyze-video
307
- Headers: Authorization: Bearer {token}
308
- Body: FormData with video file
309
- Response: Video URL + suggestions
310
- ```
311
-
312
- ---
313
-
314
- ## Three Listing Methods (Smart Differentiation)
315
-
316
- The system intelligently handles THREE different listing creation methods:
317
-
318
- ### 1️⃣ Text-Based Listing (Existing - User provides details via text)
319
-
320
- ```
321
- User says: "I have a 3-bed, 2-bath in Lagos for 500k per month.
322
- It has WiFi, AC, and parking."
323
-
324
- FLOW:
325
- 1. AI extracts fields from text (bedrooms, bathrooms, price, etc.)
326
- 2. User uploads photos to validate
327
- 3. Backend:
328
- - Validates images are property-related
329
- - Just checks they match (no re-extraction needed)
330
- - Uploads to Cloudflare with smart naming
331
- 4. Shows draft UI with text-provided data + validated photos
332
- 5. User edits via text: "change price to 450k"
333
- 6. AI infers listing_type from price: "rent"
334
- 7. User publishes: "publish this listing"
335
-
336
- METHOD CONTEXT: listing_method="text"
337
- ```
338
-
339
- ### 2️⃣ Image-Based Listing (NEW - User uploads photos only)
340
-
341
- ```
342
- User clicks "List with Photos"
343
-
344
- FLOW:
345
- 1. User uploads 1-5 photos (no text details provided)
346
- 2. Backend:
347
- - Validates images are property-related
348
- - EXTRACTS ALL DETAILS: bedrooms, bathrooms, amenities
349
- - GENERATES TITLE (short, max 2 sentences)
350
- - GENERATES DESCRIPTION (full description)
351
- - Creates intelligent filenames (location_title_date.jpg)
352
- - Uploads to Cloudflare
353
- 3. Shows draft UI with AI-extracted fields
354
- 4. User is prompted: "What's the location, address, and price?"
355
- 5. User provides: "Lagos, Victoria Island, 500,000 per month"
356
- 6. AIDA Auto-infers:
357
- - Currency from location: Lagos β†’ NGN (via CurrencyManager API)
358
- - Listing_type from price_type: "per month" β†’ "rent" βœ“
359
- 7. User can edit via text: "add gym to amenities", "change title"
360
- 8. User publishes: "publish this listing"
361
-
362
- METHOD CONTEXT: listing_method="image"
363
- AI EXTRACTS: bedrooms, bathrooms, amenities, description, title
364
- AUTO-INFERRED: currency (from location), listing_type (from price_type)
365
- ```
366
-
367
- ### 3️⃣ Video-Based Listing (NEW - User uploads video, optionally photos)
368
-
369
- ```
370
- User clicks "List with Video"
371
-
372
- FLOW:
373
- 1. User uploads video (walkthrough)
374
- 2. Backend:
375
- - Uploads to Cloudinary
376
- - Creates intelligent filename
377
- 3. System suggests: "Video uploaded! Upload 2-3 photos for better detection."
378
- 4. User uploads photos
379
- 5. Backend:
380
- - Validates images are property-related
381
- - EXTRACTS ALL DETAILS from photos
382
- - GENERATES TITLE and DESCRIPTION
383
- 6. Shows draft UI with extracted fields + video URL
384
- 7. Same flow as image-based from step 5 onwards:
385
- - User prompted for: location, address, price (with price_type)
386
- - AIDA auto-infers: currency (from location), listing_type (from price_type)
387
-
388
- METHOD CONTEXT: listing_method="video"
389
- AI EXTRACTS: From photos (not video)
390
- AUTO-INFERRED: currency (from location), listing_type (from price_type)
391
- VIDEO STORAGE: Cloudinary
392
- PHOTO STORAGE: Cloudflare
393
- ```
394
-
395
- ### Unified Draft UI Result
396
-
397
- **All three methods produce the SAME final result:**
398
-
399
- ```json
400
- {
401
- "success": true,
402
- "listing_method": "text|image|video",
403
- "extracted_fields": {
404
- "bedrooms": 3,
405
- "bathrooms": 2,
406
- "amenities": ["WiFi", "Parking", "AC"],
407
- "description": "Beautiful apartment with modern amenities.",
408
- "title": "3-Bed Modern Apartment. Great location!"
409
- },
410
- "confidence": { ... },
411
- "image_urls": [ ... ],
412
- "video_url": "..." // Only if video method
413
- }
414
- ```
415
-
416
- The **frontend shows the same UI** regardless of listing method - user sees:
417
- - Property images
418
- - Extracted details
419
- - Ability to edit via text commands
420
- - Publish button
421
-
422
- ---
423
-
424
- ## Data Flow Example
425
-
426
- ### Request
427
-
428
- ```bash
429
- curl -X POST http://localhost:8000/listings/analyze-images \
430
- -H "Authorization: Bearer {token}" \
431
- -F "images=@bedroom.jpg" \
432
- -F "images=@kitchen.jpg" \
433
- -F "images=@bathroom.jpg"
434
- ```
435
-
436
- ### Response
437
-
438
- ```json
439
- {
440
- "success": true,
441
- "images_processed": 3,
442
- "images_validated": ["bedroom.jpg", "kitchen.jpg", "bathroom.jpg"],
443
- "image_urls": [
444
- "https://imagedelivery.net/lojiz/bedroom_hash/public",
445
- "https://imagedelivery.net/lojiz/kitchen_hash/public",
446
- "https://imagedelivery.net/lojiz/bathroom_hash/public"
447
- ],
448
- "extracted_fields": {
449
- "bedrooms": 3,
450
- "bathrooms": 2,
451
- "amenities": ["WiFi Router", "AC Unit", "Furniture", "Balcony"],
452
- "description": "Beautiful 3-bedroom, 2-bathroom modern apartment with contemporary furnishings and excellent amenities."
453
- },
454
- "confidence": {
455
- "bedrooms": 0.95,
456
- "bathrooms": 0.88,
457
- "amenities": 0.72,
458
- "description": 0.91
459
- },
460
- "validation_errors": [],
461
- "suggestions": [
462
- "Verify bedroom and bathroom counts are accurate",
463
- "You'll need to provide location, address, and price information"
464
- ]
465
- }
466
- ```
467
-
468
- ---
469
-
470
- ## API Endpoints Summary
471
-
472
- | Endpoint | Method | Purpose | Auth |
473
- |----------|--------|---------|------|
474
- | `/listings/analyze-images` | POST | Upload & analyze images | Required |
475
- | `/listings/analyze-video` | POST | Upload & analyze video | Required |
476
- | `/listings/validate-media` | POST | Quick file validation | Required |
477
-
478
- ---
479
-
480
- ## Important Notes
481
-
482
- ### Image Validation
483
-
484
- - **Property validation happens BEFORE upload** - Non-property images are rejected, saving Cloudflare storage
485
- - **Confidence threshold**: Default 0.6 (60%) - Can be adjusted via `PROPERTY_IMAGE_MIN_CONFIDENCE`
486
- - **High-confidence fields** (>0.7): Auto-filled in listing form
487
- - **Medium-confidence fields** (0.5-0.7): Shown as suggestions; user confirms
488
- - **Low-confidence fields** (<0.5): User must provide manually
489
-
490
- ### Video Processing
491
-
492
- - Videos uploaded to **Cloudinary** (not Cloudflare)
493
- - Frame extraction available for future frame-by-frame analysis
494
- - Users encouraged to upload photos alongside video for better accuracy
495
-
496
- ### Listing Type Inference
497
-
498
- After user provides **price**, system infers listing_type:
499
-
500
- ```python
501
- Price Input β†’ Listing Type
502
- - High monthly (e.g., 500,000/month) β†’ "rent"
503
- - Low nightly (e.g., 5,000/night) β†’ "short-stay"
504
- - Very high one-time (e.g., 50,000,000) β†’ "sale"
505
- - "Looking for roommate" context β†’ "roommate"
506
- ```
507
-
508
- ---
509
-
510
- ## Testing
511
-
512
- ### Test 1: TEXT Method (User provided text details + uploading images)
513
-
514
- ```bash
515
- # User already provided details via chat
516
- # Now uploading images to validate
517
-
518
- curl -X POST /listings/analyze-images \
519
- -H "Authorization: Bearer {token}" \
520
- -F "images=@bedroom.jpg" \
521
- -F "images=@kitchen.jpg" \
522
- -F "listing_method=text" \
523
- -F "location=Lagos"
524
-
525
- # Response:
526
- # - Images validated as property-related βœ“
527
- # - Details preserved from text conversation
528
- # - Returns same format with extracted fields + image URLs
529
- ```
530
-
531
- ### Test 2: IMAGE Method (User uploading photos only)
532
-
533
- ```bash
534
- # User has no text details - AI extracts everything
535
-
536
- curl -X POST /listings/analyze-images \
537
- -H "Authorization: Bearer {token}" \
538
- -F "images=@bedroom.jpg" \
539
- -F "images=@kitchen.jpg" \
540
- -F "images=@bathroom.jpg" \
541
- -F "listing_method=image"
542
-
543
- # Response:
544
- # - bedrooms: 3 (extracted from images)
545
- # - bathrooms: 2 (extracted from images)
546
- # - title: "Modern 3-Bed Apartment. Great Location!" (AI-generated, SHORT)
547
- # - description: "Beautiful apartment with..." (AI-generated, full)
548
- # - amenities: ["WiFi", "AC", "Parking"] (extracted)
549
- # - confidence: { bedrooms: 0.95, bathrooms: 0.88, ... }
550
- ```
551
-
552
- ### Test 3: VIDEO Method (User uploading video + photos)
553
-
554
- ```bash
555
- # Step 1: Upload video
556
- curl -X POST /listings/analyze-video \
557
- -H "Authorization: Bearer {token}" \
558
- -F "video=@walkthrough.mp4" \
559
- -F "location=Lagos"
560
-
561
- # Response: video_url, suggestions to upload photos
562
-
563
- # Step 2: Upload photos for analysis
564
- curl -X POST /listings/analyze-images \
565
- -H "Authorization: Bearer {token}" \
566
- -F "images=@photo1.jpg" \
567
- -F "images=@photo2.jpg" \
568
- -F "listing_method=video" \
569
- -F "location=Lagos"
570
-
571
- # Response: Same as IMAGE method + video_url in final listing
572
- ```
573
-
574
- ### Test 4: File Naming
575
-
576
- ```bash
577
- # Upload images with location context
578
- curl -X POST /listings/analyze-images \
579
- -F "images=@IMG_1234.jpg" \
580
- -F "images=@IMG_5678.jpg" \
581
- -F "listing_method=image" \
582
- -F "location=Lagos"
583
-
584
- # Backend generates:
585
- # - Lagos_Modern_Apartment_2025_01_31_0.jpg
586
- # - Lagos_Modern_Apartment_2025_01_31_1.jpg
587
- # (AI extracts title from image and uses it in filename)
588
-
589
- # Cloudflare stores with these intelligent names
590
- # If duplicate: Lagos_Modern_Apartment_2025_01_31_0_1.jpg (worker appends _1)
591
- ```
592
-
593
- ### Test 5: Short Title Validation
594
-
595
- ```bash
596
- # Verify title is SHORT (max 2 sentences)
597
-
598
- Response:
599
- {
600
- "extracted_fields": {
601
- "title": "Modern 3-Bed Apartment. Great location!", βœ“ SHORT
602
- "description": "Beautiful 3-bedroom, 2-bathroom modern apartment..." βœ“ FULL
603
- }
604
- }
605
-
606
- # NOT acceptable:
607
- {
608
- "title": "This is a beautiful 3-bedroom, 2-bathroom modern apartment..." ❌ TOO LONG
609
- }
610
- ```
611
-
612
- ---
613
-
614
- ## Error Handling
615
-
616
- ### Common Errors
617
-
618
- | Error | Cause | Solution |
619
- |-------|-------|----------|
620
- | `Not a property photo` | Image rejected by vision AI | Upload actual property photos |
621
- | `Image size exceeds 10MB` | File too large | Compress image or use smaller file |
622
- | `Invalid image type` | Wrong file format | Use JPEG, PNG, or WebP |
623
- | `Cloudinary upload failed` | Credentials not set | Check `.env` variables |
624
- | `HF API timeout` | Vision model slow | Retry or use Cloudinary-hosted fallback |
625
-
626
- ---
627
-
628
- ## Smart File Naming & Storage
629
-
630
- ### Intelligent Filename Generation
631
-
632
- **Backend generates meaningful filenames instead of using random names:**
633
-
634
- ```python
635
- Pattern: {location}_{title}_{timestamp}_{index}.jpg
636
-
637
- Examples:
638
- - Lagos_Modern_Apartment_2025_01_31_1.jpg
639
- - Victoria_Island_3_Bed_Luxury_2025_01_31_0.jpg
640
- - Cotonou_Cozy_Studio_2025_01_31_0.jpg
641
- ```
642
-
643
- **Algorithm:**
644
- 1. Extract location (if available)
645
- 2. Extract title (first 20 chars, AI-generated if image/video method)
646
- 3. Add timestamp (YYYY_MM_DD_HHMMSS)
647
- 4. Add index for multiple images (0, 1, 2...)
648
-
649
- **Benefits:**
650
- - Easy to identify property in storage
651
- - Date shows when listed
652
- - Cloudflare worker can detect duplicates
653
- - Organized file structure
654
-
655
- ### Cloudflare Worker Deduplication
656
-
657
- When image reaches Cloudflare:
658
- ```
659
- 1. Check if filename exists
660
- 2. If NEW β†’ Store as-is
661
- 3. If DUPLICATE β†’ Append counter
662
- - first duplicate: {name}_1.jpg
663
- - second: {name}_2.jpg
664
- ```
665
-
666
- **Example:**
667
- ```
668
- Scenario: Same user uploads "Lagos_Apartment.jpg" twice
669
- 1st upload β†’ Lagos_Apartment.jpg
670
- 2nd upload β†’ Lagos_Apartment_1.jpg (worker auto-appended)
671
- ```
672
-
673
- ---
674
-
675
- ## Title & Description Generation
676
-
677
- ### Title Requirements
678
-
679
- **MUST BE SHORT:**
680
- - βœ… "Modern 3-bed apartment. Great location!"
681
- - βœ… "Spacious family home with garden."
682
- - ❌ "This is a beautiful 3-bedroom, 2-bathroom modern apartment with contemporary furnishings, located in a prime area of the city with excellent amenities and facilities"
683
-
684
- **Maximum:** 2 sentences (not full descriptions)
685
-
686
- **Generated by Vision AI for image/video methods:**
687
- ```python
688
- Example prompts:
689
- "Generate a SHORT, catchy real estate listing title for this property (3bed, 2bath) in Lagos.
690
- Maximum 2 sentences. Must be concise and appealing.
691
- Example: 'Modern 2-bed apartment with balcony. Great location!'"
692
- ```
693
-
694
- ### Description Generation
695
-
696
- **Full property description (2-3 sentences):**
697
- - Generated from images/video
698
- - Professional tone
699
- - Highlights key features
700
- - Stored in `extracted_fields.description`
701
-
702
- **Example:**
703
- ```
704
- "Beautiful 3-bedroom, 2-bathroom modern apartment featuring contemporary
705
- furnishings, air conditioning, WiFi, and private balcony overlooking the
706
- city. Located in a secure, gated community with excellent amenities."
707
- ```
708
-
709
- ---
710
-
711
- ## Performance Optimization
712
-
713
- ### Recommended for Production
714
-
715
- 1. **Implement caching**: Cache similar property images to reduce API calls
716
- 2. **Batch processing**: Process multiple images in parallel
717
- 3. **Frame extraction**: For videos, extract key frames instead of all frames
718
- 4. **Model optimization**: Consider smaller model variant for faster inference
719
- 5. **Async processing**: Long-running tasks (video analysis) should be async jobs
720
-
721
- ### Estimated Response Times
722
-
723
- - Image validation: **2-3 seconds** (first image), **+1s per additional**
724
- - Video upload: **5-10 seconds** depending on file size
725
- - Vision analysis: **2-4 seconds** per image
726
-
727
- ---
728
-
729
- ## Success Metrics
730
-
731
- Track these to measure feature adoption:
732
-
733
- 1. **Adoption Rate**: % of new listings created via image/video upload
734
- 2. **Time Saved**: Avg creation time (image-based vs text-based)
735
- 3. **Accuracy**: % of auto-detected fields accepted by users
736
- 4. **Field Coverage**: Which fields have highest accuracy
737
- 5. **Error Rate**: % of images rejected as non-property
738
-
739
- ---
740
-
741
- ## Future Enhancements
742
-
743
- 1. **Multi-frame video analysis**: Extract key frames from video, analyze each
744
- 2. **OCR for signs**: Extract property addresses from signs visible in photos
745
- 3. **Furniture detection**: Count furniture items, estimate age
746
- 4. **Damage detection**: Identify needed repairs
747
- 5. **Neighborhood analysis**: Analyze background (street view, buildings)
748
- 6. **Price estimation**: AI suggests price based on similar listings
749
- 7. **Virtual tour generation**: Automatically create walkthrough from photos
750
-
751
- ---
752
-
753
- ## Support & Troubleshooting
754
-
755
- ### Check Vision Service Status
756
-
757
- ```bash
758
- GET /health
759
- # Returns: vision_service: "healthy" | "unavailable"
760
- ```
761
-
762
- ### View Logs
763
-
764
- ```bash
765
- # Backend logs for vision analysis
766
- grep "Vision Service" logs/app.log
767
- grep "Hugging Face API" logs/app.log
768
- ```
769
-
770
- ### Reset Cloudinary Cache
771
-
772
- ```bash
773
- # Clear vision service cache (if implemented)
774
- DELETE /admin/cache/vision
775
- ```
776
-
777
- ---
778
-
779
- ## Summary
780
-
781
- βœ… **Phase 1 Complete:**
782
- - Vision service created (Hugging Face integration)
783
- - Media upload endpoints ready
784
- - Property validation implemented
785
- - Listing collection integration done
786
- - Image/video storage configured
787
-
788
- **Next Steps:**
789
- 1. Update frontend to use `/listings/analyze-images` endpoint
790
- 2. Update frontend to use `/listings/analyze-video` endpoint
791
- 3. Add vision results to chat UI
792
- 4. Test end-to-end flow
793
- 5. Monitor accuracy metrics
794
- 6. Optimize based on user feedback
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/__pycache__/config.cpython-313.pyc CHANGED
Binary files a/app/__pycache__/config.cpython-313.pyc and b/app/__pycache__/config.cpython-313.pyc differ
 
app/ai/agent/__pycache__/graph.cpython-313.pyc CHANGED
Binary files a/app/ai/agent/__pycache__/graph.cpython-313.pyc and b/app/ai/agent/__pycache__/graph.cpython-313.pyc differ
 
app/ai/agent/brain.py CHANGED
@@ -15,6 +15,8 @@ from langchain_core.messages import SystemMessage, HumanMessage
15
  from app.ai.agent.state import AgentState, FlowState
16
  from app.ai.agent.schema import get_schema_for_llm, get_draft_summary, get_missing_fields
17
  from app.config import settings
 
 
18
 
19
  logger = get_logger(__name__)
20
 
@@ -851,21 +853,25 @@ async def execute_tool(tool_name: str, params: Dict[str, Any], state: AgentState
851
  else:
852
  # No draft_ui yet - AIDA will ask for images
853
  state.temp_data["action"] = "respond"
854
-
 
 
 
855
  return True, f"Updated: {list(fields.keys())}", state.provided_fields
856
 
857
  elif tool_name == "search_properties":
858
  # Import and call search service
859
  from app.ai.services.search_extractor import extract_search_params
860
  from app.ai.services.search_service import search_listings_hybrid, search_mongodb
861
-
 
862
  # SMART UI: Clear old my_listings when doing new search
863
  state.my_listings = []
864
  state.temp_data.pop("my_listings", None)
865
-
866
  # Step 1: Extract params from the full user message (LLM is smart)
867
  search_params = await extract_search_params(state.last_user_message)
868
-
869
  # Step 2: Merge with Brain-extracted params (these have priority if present)
870
  if params.get("location"):
871
  search_params["location"] = params["location"]
@@ -875,81 +881,142 @@ async def execute_tool(tool_name: str, params: Dict[str, Any], state: AgentState
875
  search_params["max_price"] = params["max_price"]
876
  if params.get("beds"):
877
  search_params["bedrooms"] = params["beds"]
878
-
879
  is_suggestion = False
880
-
881
- # Step 3: STRICT SEARCH FIRST (only use what user specified)
882
- results = await search_mongodb(search_params, limit=10)
883
-
884
- # Step 4: If 0 results, try RELAXED SUGGESTION search
885
- if not results:
886
- logger.info("Strict search yielded 0 results, trying suggestion search...")
887
- suggestion_results, currency = await search_listings_hybrid(
888
- user_query=state.last_user_message,
889
- search_params=search_params,
890
- limit=10,
891
- mode="relaxed"
 
892
  )
893
-
894
- # Step 4.5: Filter suggestions - only keep results from the requested location
895
- # If user asked for "New York" but we get results from "Lagos", discard them
896
- requested_location = (search_params.get("location") or "").lower()
897
-
898
- if requested_location and suggestion_results:
899
- relevant_suggestions = []
900
- for listing in suggestion_results:
901
- listing_location = (listing.get("location") or "").lower()
902
- # Check if the listing is from the requested location (or nearby)
903
- if requested_location in listing_location or listing_location in requested_location:
904
- relevant_suggestions.append(listing)
905
-
906
- if relevant_suggestions:
907
- results = relevant_suggestions
908
- is_suggestion = True
909
- logger.info(f"Found {len(relevant_suggestions)} relevant suggestions for {requested_location}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
910
  else:
911
- # No relevant suggestions - results stay empty for "Notify me" prompt
912
- results = []
913
- is_suggestion = False
914
- logger.info(f"No relevant suggestions found for {requested_location}, will prompt for notification")
915
- else:
916
- # No location filter specified, use all suggestions
917
- results = suggestion_results
918
- is_suggestion = True
919
-
920
  # Step 5: Enrich results with owner/review data (same as listings API)
921
  if results:
922
  from app.database import get_db
923
  from app.services.listing_service import enrich_listings_batch
924
-
925
  db = await get_db()
926
- # Convert to dicts if needed and stringify _id
927
  formatted_results = []
928
  for doc in results:
929
  if "_id" in doc and not isinstance(doc["_id"], str):
930
  doc["_id"] = str(doc["_id"])
931
  formatted_results.append(doc)
932
-
933
  results = await enrich_listings_batch(formatted_results, db)
934
  logger.info(f"Enriched {len(results)} search results with owner/review data")
935
-
936
  # Step 6: Store results and flags
937
  state.search_results = results
938
  state.temp_data["search_results"] = results
939
  state.temp_data["action"] = "search_results"
940
  state.temp_data["is_suggestion"] = is_suggestion
941
- state.temp_data["search_params"] = search_params # For "Notify me" feature
942
-
 
943
  # Always save last search params for "Notify me" feature
944
  state.temp_data["last_search_params"] = search_params
945
  state.temp_data["last_search_query"] = state.last_user_message
946
-
947
  # If no results found, flag to propose alert
948
  if len(results) == 0:
949
  state.temp_data["propose_alert"] = True
950
  state.temp_data["response_text"] = f"I couldn't find any properties matching your search right now. Would you like me to notify you when something becomes available? πŸ””"
951
-
952
- return True, f"Found {len(results)} properties", results
 
 
 
 
953
 
954
  elif tool_name == "get_my_listings":
955
  # Get user's listings
@@ -1147,6 +1214,9 @@ async def execute_tool(tool_name: str, params: Dict[str, Any], state: AgentState
1147
  state.temp_data["response_text"] = f"Got it! πŸ”” I'll keep watching for properties in {location} and notify you the moment something becomes available!"
1148
  state.temp_data["action"] = "alert_created"
1149
 
 
 
 
1150
  return True, f"Alert created: {alert.id} (found {len(current_results)} current matches)", {
1151
  "alert_id": str(alert.id),
1152
  "current_match_count": len(current_results)
@@ -1297,6 +1367,8 @@ async def execute_tool(tool_name: str, params: Dict[str, Any], state: AgentState
1297
 
1298
  except Exception as e:
1299
  logger.error("Tool execution error", tool=tool_name, exc_info=e)
 
 
1300
  return False, str(e), None
1301
 
1302
 
@@ -1305,11 +1377,31 @@ async def agent_think(state: AgentState) -> AgentState:
1305
  Main agent thinking loop.
1306
  LLM reasons β†’ decides tool β†’ executes β†’ generates response.
1307
  """
1308
-
1309
  logger.info("Agent thinking started", user_id=state.user_id)
1310
-
 
 
 
 
 
 
 
 
1311
  # Step 1: Brain decides what to do
1312
  decision = await brain_decide(state)
 
 
 
 
 
 
 
 
 
 
 
 
1313
 
1314
  # Store thinking for debugging
1315
  state.temp_data["brain_thinking"] = decision.thinking
@@ -1359,8 +1451,19 @@ async def agent_think(state: AgentState) -> AgentState:
1359
  else:
1360
  state.temp_data["action"] = "respond" # Just text, no data cards
1361
 
 
 
 
 
 
 
 
 
 
 
 
1362
  logger.info("Agent thinking complete", action=decision.tool, show_data=decision.show_data)
1363
-
1364
  return state
1365
 
1366
 
 
15
  from app.ai.agent.state import AgentState, FlowState
16
  from app.ai.agent.schema import get_schema_for_llm, get_draft_summary, get_missing_fields
17
  from app.config import settings
18
+ from app.ai.lightning.rewards import log_field_reward, log_reward, log_negative_reward, REWARD_SEARCH_COMPLETED, REWARD_ALERT_CREATED
19
+ from app.ai.lightning.tracer import log_trajectory_step
20
 
21
  logger = get_logger(__name__)
22
 
 
853
  else:
854
  # No draft_ui yet - AIDA will ask for images
855
  state.temp_data["action"] = "respond"
856
+
857
+ # Log reward for field extraction (Agent Lightning)
858
+ await log_field_reward(state.session_id, list(fields.keys()))
859
+
860
  return True, f"Updated: {list(fields.keys())}", state.provided_fields
861
 
862
  elif tool_name == "search_properties":
863
  # Import and call search service
864
  from app.ai.services.search_extractor import extract_search_params
865
  from app.ai.services.search_service import search_listings_hybrid, search_mongodb
866
+ from app.ai.services.search_strategy_selector import select_search_strategy, SearchStrategy
867
+
868
  # SMART UI: Clear old my_listings when doing new search
869
  state.my_listings = []
870
  state.temp_data.pop("my_listings", None)
871
+
872
  # Step 1: Extract params from the full user message (LLM is smart)
873
  search_params = await extract_search_params(state.last_user_message)
874
+
875
  # Step 2: Merge with Brain-extracted params (these have priority if present)
876
  if params.get("location"):
877
  search_params["location"] = params["location"]
 
881
  search_params["max_price"] = params["max_price"]
882
  if params.get("beds"):
883
  search_params["bedrooms"] = params["beds"]
884
+
885
  is_suggestion = False
886
+ rlm_used = False
887
+
888
+ # ================================================================
889
+ # Step 2.5: CHECK IF RLM SHOULD BE USED (NEW!)
890
+ # ================================================================
891
+ strategy_result = await select_search_strategy(state.last_user_message, search_params)
892
+
893
+ if strategy_result.get("use_rlm"):
894
+ # Use RLM for complex queries
895
+ logger.info(
896
+ "🧠 RLM activated for search",
897
+ strategy=strategy_result["strategy"].value,
898
+ reasoning=strategy_result["reasoning"][:50]
899
  )
900
+
901
+ try:
902
+ from app.ai.services.rlm_search_service import rlm_search
903
+
904
+ rlm_result = await rlm_search(
905
+ query=state.last_user_message,
906
+ context={
907
+ "user_location": state.user_location,
908
+ "search_params": search_params
909
+ }
910
+ )
911
+
912
+ results = rlm_result.get("results", [])
913
+ rlm_used = True
914
+
915
+ # Store RLM metadata
916
+ state.temp_data["rlm_strategy"] = rlm_result.get("strategy_used")
917
+ state.temp_data["rlm_reasoning_steps"] = rlm_result.get("reasoning_steps")
918
+ state.temp_data["rlm_call_count"] = rlm_result.get("call_count")
919
+
920
+ # Use RLM-generated message if available
921
+ if rlm_result.get("message"):
922
+ state.temp_data["response_text"] = rlm_result["message"]
923
+
924
+ # Store comparison data if available
925
+ if rlm_result.get("comparison_data"):
926
+ state.temp_data["comparison_data"] = rlm_result["comparison_data"]
927
+
928
+ # Store aggregation result if available
929
+ if rlm_result.get("aggregation_result"):
930
+ state.temp_data["aggregation_result"] = rlm_result["aggregation_result"]
931
+
932
+ logger.info(
933
+ f"🧠 RLM search complete",
934
+ result_count=len(results),
935
+ strategy=rlm_result.get("strategy_used"),
936
+ calls=rlm_result.get("call_count")
937
+ )
938
+
939
+ except Exception as rlm_error:
940
+ logger.error(f"RLM search failed, falling back to standard: {rlm_error}")
941
+ rlm_used = False
942
+ results = []
943
+
944
+ # ================================================================
945
+ # Standard search path (if RLM not used or failed)
946
+ # ================================================================
947
+ if not rlm_used:
948
+ # Step 3: STRICT SEARCH FIRST (only use what user specified)
949
+ results = await search_mongodb(search_params, limit=10)
950
+
951
+ # Step 4: If 0 results, try RELAXED SUGGESTION search
952
+ if not results:
953
+ logger.info("Strict search yielded 0 results, trying suggestion search...")
954
+ suggestion_results, currency = await search_listings_hybrid(
955
+ user_query=state.last_user_message,
956
+ search_params=search_params,
957
+ limit=10,
958
+ mode="relaxed"
959
+ )
960
+
961
+ # Step 4.5: Filter suggestions - only keep results from the requested location
962
+ requested_location = (search_params.get("location") or "").lower()
963
+
964
+ if requested_location and suggestion_results:
965
+ relevant_suggestions = []
966
+ for listing in suggestion_results:
967
+ listing_location = (listing.get("location") or "").lower()
968
+ if requested_location in listing_location or listing_location in requested_location:
969
+ relevant_suggestions.append(listing)
970
+
971
+ if relevant_suggestions:
972
+ results = relevant_suggestions
973
+ is_suggestion = True
974
+ logger.info(f"Found {len(relevant_suggestions)} relevant suggestions for {requested_location}")
975
+ else:
976
+ results = []
977
+ is_suggestion = False
978
+ logger.info(f"No relevant suggestions found for {requested_location}")
979
  else:
980
+ results = suggestion_results
981
+ is_suggestion = True
982
+
 
 
 
 
 
 
983
  # Step 5: Enrich results with owner/review data (same as listings API)
984
  if results:
985
  from app.database import get_db
986
  from app.services.listing_service import enrich_listings_batch
987
+
988
  db = await get_db()
 
989
  formatted_results = []
990
  for doc in results:
991
  if "_id" in doc and not isinstance(doc["_id"], str):
992
  doc["_id"] = str(doc["_id"])
993
  formatted_results.append(doc)
994
+
995
  results = await enrich_listings_batch(formatted_results, db)
996
  logger.info(f"Enriched {len(results)} search results with owner/review data")
997
+
998
  # Step 6: Store results and flags
999
  state.search_results = results
1000
  state.temp_data["search_results"] = results
1001
  state.temp_data["action"] = "search_results"
1002
  state.temp_data["is_suggestion"] = is_suggestion
1003
+ state.temp_data["search_params"] = search_params
1004
+ state.temp_data["search_strategy"] = strategy_result["strategy"].value if hasattr(strategy_result["strategy"], "value") else str(strategy_result["strategy"])
1005
+
1006
  # Always save last search params for "Notify me" feature
1007
  state.temp_data["last_search_params"] = search_params
1008
  state.temp_data["last_search_query"] = state.last_user_message
1009
+
1010
  # If no results found, flag to propose alert
1011
  if len(results) == 0:
1012
  state.temp_data["propose_alert"] = True
1013
  state.temp_data["response_text"] = f"I couldn't find any properties matching your search right now. Would you like me to notify you when something becomes available? πŸ””"
1014
+
1015
+ # Log reward for search completion (Agent Lightning)
1016
+ if len(results) > 0:
1017
+ await log_reward(state.session_id, REWARD_SEARCH_COMPLETED, "search_completed", {"result_count": len(results)})
1018
+
1019
+ return True, f"Found {len(results)} properties" + (" (via RLM)" if rlm_used else ""), results
1020
 
1021
  elif tool_name == "get_my_listings":
1022
  # Get user's listings
 
1214
  state.temp_data["response_text"] = f"Got it! πŸ”” I'll keep watching for properties in {location} and notify you the moment something becomes available!"
1215
  state.temp_data["action"] = "alert_created"
1216
 
1217
+ # Log reward for alert creation (Agent Lightning)
1218
+ await log_reward(state.session_id, REWARD_ALERT_CREATED, "alert_created", {"alert_id": str(alert.id)})
1219
+
1220
  return True, f"Alert created: {alert.id} (found {len(current_results)} current matches)", {
1221
  "alert_id": str(alert.id),
1222
  "current_match_count": len(current_results)
 
1367
 
1368
  except Exception as e:
1369
  logger.error("Tool execution error", tool=tool_name, exc_info=e)
1370
+ # Log negative reward for tool execution error (Agent Lightning)
1371
+ await log_negative_reward(state.session_id, "error", f"Tool {tool_name} failed: {str(e)}")
1372
  return False, str(e), None
1373
 
1374
 
 
1377
  Main agent thinking loop.
1378
  LLM reasons β†’ decides tool β†’ executes β†’ generates response.
1379
  """
1380
+
1381
  logger.info("Agent thinking started", user_id=state.user_id)
1382
+
1383
+ # Log user input trajectory (Agent Lightning)
1384
+ await log_trajectory_step(
1385
+ state.session_id,
1386
+ "user_input",
1387
+ {"message": state.last_user_message[:500] if state.last_user_message else ""},
1388
+ state.user_id
1389
+ )
1390
+
1391
  # Step 1: Brain decides what to do
1392
  decision = await brain_decide(state)
1393
+
1394
+ # Log brain decision trajectory (Agent Lightning)
1395
+ await log_trajectory_step(
1396
+ state.session_id,
1397
+ "brain_decision",
1398
+ {
1399
+ "thinking": decision.thinking[:200] if decision.thinking else "",
1400
+ "tool": decision.tool,
1401
+ "is_final": decision.is_final
1402
+ },
1403
+ state.user_id
1404
+ )
1405
 
1406
  # Store thinking for debugging
1407
  state.temp_data["brain_thinking"] = decision.thinking
 
1451
  else:
1452
  state.temp_data["action"] = "respond" # Just text, no data cards
1453
 
1454
+ # Log response trajectory (Agent Lightning)
1455
+ await log_trajectory_step(
1456
+ state.session_id,
1457
+ "response",
1458
+ {
1459
+ "response": state.temp_data.get("response_text", "")[:500],
1460
+ "action": state.temp_data.get("action", "respond")
1461
+ },
1462
+ state.user_id
1463
+ )
1464
+
1465
  logger.info("Agent thinking complete", action=decision.tool, show_data=decision.show_data)
1466
+
1467
  return state
1468
 
1469
 
app/ai/agent/graph.py CHANGED
@@ -15,6 +15,7 @@ from langgraph.checkpoint.memory import MemorySaver
15
  from structlog import get_logger
16
 
17
  from app.ai.agent.state import AgentState, FlowState
 
18
  from app.ai.agent.nodes.authenticate import authenticate
19
  from app.ai.agent.brain import agent_think
20
  from app.ai.agent.nodes.validate_output import validate_output_node
@@ -117,9 +118,12 @@ def build_aida_graph():
117
 
118
  checkpointer = MemorySaver()
119
  compiled_graph = graph.compile(checkpointer=checkpointer)
120
-
 
 
 
121
  logger.info("βœ… LangGraph V2 compiled (Brain-Based)")
122
-
123
  return compiled_graph
124
 
125
 
 
15
  from structlog import get_logger
16
 
17
  from app.ai.agent.state import AgentState, FlowState
18
+ from app.ai.lightning.tracer import wrap_graph_if_enabled
19
  from app.ai.agent.nodes.authenticate import authenticate
20
  from app.ai.agent.brain import agent_think
21
  from app.ai.agent.nodes.validate_output import validate_output_node
 
118
 
119
  checkpointer = MemorySaver()
120
  compiled_graph = graph.compile(checkpointer=checkpointer)
121
+
122
+ # Wrap with Agent Lightning tracer (if enabled)
123
+ compiled_graph = wrap_graph_if_enabled(compiled_graph)
124
+
125
  logger.info("βœ… LangGraph V2 compiled (Brain-Based)")
126
+
127
  return compiled_graph
128
 
129
 
app/ai/agent/nodes/__pycache__/listing_publish.cpython-313.pyc CHANGED
Binary files a/app/ai/agent/nodes/__pycache__/listing_publish.cpython-313.pyc and b/app/ai/agent/nodes/__pycache__/listing_publish.cpython-313.pyc differ
 
app/ai/agent/nodes/listing_collect.py CHANGED
@@ -30,80 +30,19 @@ llm = ChatOpenAI(
30
  temperature=0.7,
31
  )
32
 
33
- async def initialize_from_vision_analysis(
34
- state: AgentState,
35
- vision_data: Dict
36
- ) -> AgentState:
37
- """
38
- Initialize listing from AI vision analysis (images/video)
39
-
40
- Populates state with auto-detected fields and sets up for user confirmation.
41
- User will be prompted for required fields: location, address, price (with price_type).
42
-
43
- Auto-inferred fields:
44
- - Currency: Auto-detected from location via external API
45
- - Listing type: Auto-inferred from price_type (per month β†’ rent, once β†’ sale, etc.)
46
-
47
- Args:
48
- state: Current agent state
49
- vision_data: Dict with extracted fields from vision service
50
-
51
- Returns:
52
- Updated state ready for collection
53
- """
54
- try:
55
- # Extract vision analysis results
56
- extracted_fields = vision_data.get("extracted_fields", {})
57
- confidence = vision_data.get("confidence", {})
58
- image_urls = vision_data.get("image_urls", [])
59
-
60
- logger.info("πŸ€– Initializing listing from vision analysis",
61
- bedrooms=extracted_fields.get("bedrooms"),
62
- bathrooms=extracted_fields.get("bathrooms"),
63
- amenities_count=len(extracted_fields.get("amenities", [])))
64
-
65
- # Pre-fill detected fields with high confidence (>0.7)
66
- high_confidence_threshold = 0.7
67
-
68
- # Always add images (they were validated)
69
- if image_urls:
70
- state.update_listing_progress("images", image_urls)
71
- logger.info(f"βœ… Added {len(image_urls)} validated images")
72
-
73
- # Bedrooms (high confidence)
74
- if extracted_fields.get("bedrooms") is not None and confidence.get("bedrooms", 0) > high_confidence_threshold:
75
- state.update_listing_progress("bedrooms", extracted_fields["bedrooms"])
76
- logger.info(f"βœ… Auto-filled bedrooms: {extracted_fields['bedrooms']}")
77
-
78
- # Bathrooms (high confidence)
79
- if extracted_fields.get("bathrooms") is not None and confidence.get("bathrooms", 0) > high_confidence_threshold:
80
- state.update_listing_progress("bathrooms", extracted_fields["bathrooms"])
81
- logger.info(f"βœ… Auto-filled bathrooms: {extracted_fields['bathrooms']}")
82
-
83
- # Amenities (even medium confidence is good for amenities)
84
- if extracted_fields.get("amenities") and confidence.get("amenities", 0) > 0.5:
85
- state.update_listing_progress("amenities", extracted_fields["amenities"])
86
- logger.info(f"βœ… Auto-filled amenities: {extracted_fields['amenities']}")
87
-
88
- # Description (if high confidence)
89
- if extracted_fields.get("description") and confidence.get("description", 0) > high_confidence_threshold:
90
- state.update_listing_progress("description", extracted_fields["description"])
91
- logger.info(f"βœ… Auto-filled description")
92
-
93
- # Store vision confidence scores in temp_data for reference
94
- state.temp_data["vision_confidence"] = confidence
95
- state.temp_data["from_vision_analysis"] = True
96
-
97
- # Set user message to indicate vision analysis was done
98
- state.last_user_message = "[Vision analysis completed - awaiting user confirmation]"
99
-
100
- logger.info("βœ… Vision analysis initialization complete")
101
- return state
102
-
103
- except Exception as e:
104
- logger.error("Error initializing from vision analysis", exc_info=e)
105
- state.set_error(f"Error initializing from vision: {str(e)}", should_retry=True)
106
- return state
107
 
108
 
109
  async def generate_contextual_question(state: AgentState, next_field: str = None) -> str:
 
30
  temperature=0.7,
31
  )
32
 
33
+ # ============================================================
34
+ # VISION ANALYSIS - DISABLED
35
+ # ============================================================
36
+ # NOTE: Vision analysis is NOT in use. Image uploads are handled
37
+ # directly by Cloudflare Worker (frontend upload).
38
+ # This function is kept for future reference only.
39
+ # ============================================================
40
+ # async def initialize_from_vision_analysis(
41
+ # state: AgentState,
42
+ # vision_data: Dict
43
+ # ) -> AgentState:
44
+ # """Initialize listing from AI vision analysis (images/video) - DISABLED"""
45
+ # pass
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
 
48
  async def generate_contextual_question(state: AgentState, next_field: str = None) -> str:
app/ai/agent/nodes/listing_publish.py CHANGED
@@ -11,6 +11,7 @@ from app.ai.agent.state import AgentState, FlowState
11
  from app.ai.agent.schemas import ListingDraft
12
  from app.database import get_db
13
  from app.ai.services.vector_service import upsert_listing_to_vector_db
 
14
 
15
  logger = get_logger(__name__)
16
 
@@ -282,6 +283,15 @@ async def listing_publish_handler(state: AgentState) -> AgentState:
282
  logger.info("Proactive alerts processed for new listing", listing_id=listing_id)
283
  except Exception as notify_err:
284
  logger.warning("Proactive notification check failed", error=str(notify_err))
 
 
 
 
 
 
 
 
 
285
 
286
  except Exception as e:
287
  logger.error("MongoDB save failed", exc_info=e)
 
11
  from app.ai.agent.schemas import ListingDraft
12
  from app.database import get_db
13
  from app.ai.services.vector_service import upsert_listing_to_vector_db
14
+ from app.ai.lightning.rewards import log_reward
15
 
16
  logger = get_logger(__name__)
17
 
 
283
  logger.info("Proactive alerts processed for new listing", listing_id=listing_id)
284
  except Exception as notify_err:
285
  logger.warning("Proactive notification check failed", error=str(notify_err))
286
+
287
+ # βœ… STEP 3.3: Log reward for successful publish (Agent Lightning)
288
+ is_update = bool(state.temp_data.get("editing_listing_id"))
289
+ await log_reward(
290
+ state.session_id,
291
+ 1.0, # Primary success signal
292
+ "listing_published",
293
+ {"listing_id": listing_id, "is_update": is_update}
294
+ )
295
 
296
  except Exception as e:
297
  logger.error("MongoDB save failed", exc_info=e)
app/ai/lightning/__init__.py ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/lightning/__init__.py
2
+ """
3
+ Agent Lightning - RL Trajectory Capture for AIDA
4
+
5
+ This module implements Agent Lightning-inspired reinforcement learning
6
+ trajectory capture for training AIDA to improve listing completion rates.
7
+
8
+ Components:
9
+ - tracer.py: Captures state transitions, tool calls, and outcomes
10
+ - rewards.py: Logs reward signals at key events
11
+ - config.py: Lightning-specific configuration
12
+
13
+ Usage:
14
+ from app.ai.lightning import log_reward, log_trajectory
15
+
16
+ # Log a reward signal
17
+ await log_reward(session_id, 1.0, "listing_published", {"listing_id": "..."})
18
+
19
+ # Trajectories are captured automatically when LIGHTNING_ENABLED=true
20
+ """
21
+
22
+ from app.ai.lightning.rewards import log_reward, log_field_reward, log_negative_reward
23
+ from app.ai.lightning.tracer import (
24
+ wrap_graph_if_enabled,
25
+ log_trajectory_step,
26
+ get_session_trajectory,
27
+ export_trajectories_for_training
28
+ )
29
+
30
+ __all__ = [
31
+ "log_reward",
32
+ "log_field_reward",
33
+ "log_negative_reward",
34
+ "wrap_graph_if_enabled",
35
+ "log_trajectory_step",
36
+ "get_session_trajectory",
37
+ "export_trajectories_for_training"
38
+ ]
app/ai/lightning/rewards.py ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/lightning/rewards.py
2
+ """
3
+ Agent Lightning Reward Signals
4
+
5
+ Logs reward signals at key events for RL training.
6
+ Rewards are associated with session trajectories.
7
+
8
+ Reward Definitions:
9
+ - listing_published: +1.0 (primary success signal)
10
+ - field_extracted: +0.1 per field (incremental progress)
11
+ - search_completed: +0.3 (user found what they wanted)
12
+ - alert_created: +0.2 (user engaged with notifications)
13
+ - conversation_error: -0.5 (negative signal for failures)
14
+ - conversation_abandoned: -0.3 (user left mid-flow)
15
+ """
16
+
17
+ import json
18
+ from datetime import datetime
19
+ from typing import Any, Dict, List, Optional
20
+ from structlog import get_logger
21
+
22
+ logger = get_logger(__name__)
23
+
24
+ # Reward value constants
25
+ REWARD_LISTING_PUBLISHED = 1.0
26
+ REWARD_FIELD_EXTRACTED = 0.1
27
+ REWARD_SEARCH_COMPLETED = 0.3
28
+ REWARD_ALERT_CREATED = 0.2
29
+ REWARD_CONVERSATION_ERROR = -0.5
30
+ REWARD_CONVERSATION_ABANDONED = -0.3
31
+
32
+ # Redis connection (lazy initialization)
33
+ _redis_client = None
34
+
35
+
36
+ async def _get_redis():
37
+ """Get or create Redis client"""
38
+ global _redis_client
39
+ if _redis_client is None:
40
+ try:
41
+ from app.ai.memory.redis_memory import get_redis_client
42
+ _redis_client = await get_redis_client()
43
+ except Exception as e:
44
+ logger.warning("Lightning Rewards: Redis connection failed", error=str(e))
45
+ return None
46
+ return _redis_client
47
+
48
+
49
+ def _is_lightning_enabled() -> bool:
50
+ """Check if Lightning is enabled"""
51
+ try:
52
+ from app.config import settings
53
+ return getattr(settings, 'LIGHTNING_ENABLED', False)
54
+ except Exception:
55
+ return False
56
+
57
+
58
+ def _get_reward_ttl() -> int:
59
+ """Get TTL for rewards in seconds (same as trajectories)"""
60
+ try:
61
+ from app.config import settings
62
+ days = getattr(settings, 'LIGHTNING_TRAJECTORY_TTL_DAYS', 30)
63
+ return days * 24 * 60 * 60
64
+ except Exception:
65
+ return 30 * 24 * 60 * 60
66
+
67
+
68
+ async def log_reward(
69
+ session_id: str,
70
+ reward: float,
71
+ event_type: str,
72
+ metadata: Optional[Dict[str, Any]] = None
73
+ ) -> bool:
74
+ """
75
+ Log a reward signal for a session.
76
+
77
+ Args:
78
+ session_id: Session identifier
79
+ reward: Reward value (positive or negative)
80
+ event_type: Type of event that triggered reward
81
+ metadata: Optional additional context
82
+
83
+ Returns:
84
+ True if logged successfully
85
+ """
86
+ if not _is_lightning_enabled():
87
+ return False
88
+
89
+ try:
90
+ redis = await _get_redis()
91
+ if not redis:
92
+ return False
93
+
94
+ reward_entry = {
95
+ "timestamp": datetime.utcnow().isoformat(),
96
+ "reward": reward,
97
+ "event_type": event_type,
98
+ "metadata": metadata or {}
99
+ }
100
+
101
+ key = f"lightning:rewards:{session_id}"
102
+ await redis.rpush(key, json.dumps(reward_entry))
103
+ await redis.expire(key, _get_reward_ttl())
104
+
105
+ # Also increment global counters for monitoring
106
+ counter_key = f"lightning:stats:{event_type}"
107
+ await redis.incr(counter_key)
108
+
109
+ logger.info("Lightning: Reward logged",
110
+ session_id=session_id[:8],
111
+ reward=reward,
112
+ event_type=event_type)
113
+ return True
114
+
115
+ except Exception as e:
116
+ logger.warning("Lightning: Failed to log reward", error=str(e))
117
+ return False
118
+
119
+
120
+ async def log_field_reward(
121
+ session_id: str,
122
+ fields: List[str]
123
+ ) -> bool:
124
+ """
125
+ Log reward for successfully extracted listing fields.
126
+
127
+ Args:
128
+ session_id: Session identifier
129
+ fields: List of field names that were extracted
130
+
131
+ Returns:
132
+ True if logged successfully
133
+ """
134
+ if not fields:
135
+ return False
136
+
137
+ # Calculate reward: 0.1 per field
138
+ reward = REWARD_FIELD_EXTRACTED * len(fields)
139
+
140
+ return await log_reward(
141
+ session_id,
142
+ reward,
143
+ "field_extracted",
144
+ {"fields": fields, "field_count": len(fields)}
145
+ )
146
+
147
+
148
+ async def log_negative_reward(
149
+ session_id: str,
150
+ event_type: str,
151
+ reason: str
152
+ ) -> bool:
153
+ """
154
+ Log a negative reward for errors or abandonment.
155
+
156
+ Args:
157
+ session_id: Session identifier
158
+ event_type: "error" or "abandoned"
159
+ reason: Description of what went wrong
160
+
161
+ Returns:
162
+ True if logged successfully
163
+ """
164
+ if event_type == "error":
165
+ reward = REWARD_CONVERSATION_ERROR
166
+ elif event_type == "abandoned":
167
+ reward = REWARD_CONVERSATION_ABANDONED
168
+ else:
169
+ reward = -0.1 # Generic negative
170
+
171
+ return await log_reward(
172
+ session_id,
173
+ reward,
174
+ f"conversation_{event_type}",
175
+ {"reason": reason}
176
+ )
177
+
178
+
179
+ async def get_session_rewards(session_id: str) -> List[Dict[str, Any]]:
180
+ """
181
+ Get all rewards for a session.
182
+
183
+ Args:
184
+ session_id: Session identifier
185
+
186
+ Returns:
187
+ List of reward entries
188
+ """
189
+ if not _is_lightning_enabled():
190
+ return []
191
+
192
+ try:
193
+ redis = await _get_redis()
194
+ if not redis:
195
+ return []
196
+
197
+ key = f"lightning:rewards:{session_id}"
198
+ raw_rewards = await redis.lrange(key, 0, -1)
199
+
200
+ return [json.loads(r) for r in raw_rewards]
201
+
202
+ except Exception as e:
203
+ logger.warning("Lightning: Failed to get rewards", error=str(e))
204
+ return []
205
+
206
+
207
+ async def get_total_session_reward(session_id: str) -> float:
208
+ """
209
+ Calculate total reward for a session.
210
+
211
+ Args:
212
+ session_id: Session identifier
213
+
214
+ Returns:
215
+ Sum of all rewards
216
+ """
217
+ rewards = await get_session_rewards(session_id)
218
+ return sum(r.get("reward", 0) for r in rewards)
219
+
220
+
221
+ async def get_lightning_stats() -> Dict[str, int]:
222
+ """
223
+ Get global Lightning statistics.
224
+
225
+ Returns:
226
+ Dict of event_type -> count
227
+ """
228
+ if not _is_lightning_enabled():
229
+ return {}
230
+
231
+ try:
232
+ redis = await _get_redis()
233
+ if not redis:
234
+ return {}
235
+
236
+ # Get all stats keys
237
+ stats_keys = await redis.keys("lightning:stats:*")
238
+
239
+ stats = {}
240
+ for key in stats_keys:
241
+ event_type = key.decode().split(":")[-1]
242
+ count = await redis.get(key)
243
+ stats[event_type] = int(count) if count else 0
244
+
245
+ return stats
246
+
247
+ except Exception as e:
248
+ logger.warning("Lightning: Failed to get stats", error=str(e))
249
+ return {}
app/ai/lightning/tracer.py ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/lightning/tracer.py
2
+ """
3
+ Agent Lightning Trajectory Tracer
4
+
5
+ Captures state transitions and tool calls for RL training.
6
+ Uses Redis for trajectory storage with automatic TTL cleanup.
7
+
8
+ Design Principles:
9
+ 1. Zero overhead when disabled (LIGHTNING_ENABLED=false)
10
+ 2. Non-blocking async operations
11
+ 3. Graceful degradation on errors
12
+ 4. Compatible with existing LangGraph architecture
13
+ """
14
+
15
+ import json
16
+ import asyncio
17
+ from datetime import datetime
18
+ from typing import Any, Dict, List, Optional, Callable
19
+ from functools import wraps
20
+ from structlog import get_logger
21
+
22
+ logger = get_logger(__name__)
23
+
24
+ # Redis connection (lazy initialization)
25
+ _redis_client = None
26
+
27
+
28
+ async def _get_redis():
29
+ """Get or create Redis client for lightning storage"""
30
+ global _redis_client
31
+ if _redis_client is None:
32
+ try:
33
+ from app.ai.memory.redis_memory import get_redis_client
34
+ _redis_client = await get_redis_client()
35
+ except Exception as e:
36
+ logger.warning("Lightning: Redis connection failed, trajectories will not be stored", error=str(e))
37
+ return None
38
+ return _redis_client
39
+
40
+
41
+ def _is_lightning_enabled() -> bool:
42
+ """Check if Lightning is enabled via config"""
43
+ try:
44
+ from app.config import settings
45
+ return getattr(settings, 'LIGHTNING_ENABLED', False)
46
+ except Exception:
47
+ return False
48
+
49
+
50
+ def _get_trajectory_ttl() -> int:
51
+ """Get TTL for trajectories in seconds"""
52
+ try:
53
+ from app.config import settings
54
+ days = getattr(settings, 'LIGHTNING_TRAJECTORY_TTL_DAYS', 30)
55
+ return days * 24 * 60 * 60 # Convert to seconds
56
+ except Exception:
57
+ return 30 * 24 * 60 * 60 # Default 30 days
58
+
59
+
60
+ async def log_trajectory_step(
61
+ session_id: str,
62
+ step_type: str,
63
+ data: Dict[str, Any],
64
+ user_id: Optional[str] = None
65
+ ) -> bool:
66
+ """
67
+ Log a single trajectory step to Redis.
68
+
69
+ Args:
70
+ session_id: Unique session identifier
71
+ step_type: Type of step (user_input, brain_decision, tool_call, tool_result, response)
72
+ data: Step data (varies by type)
73
+ user_id: Optional user ID for filtering
74
+
75
+ Returns:
76
+ True if logged successfully, False otherwise
77
+ """
78
+ if not _is_lightning_enabled():
79
+ return False
80
+
81
+ try:
82
+ redis = await _get_redis()
83
+ if not redis:
84
+ return False
85
+
86
+ step = {
87
+ "timestamp": datetime.utcnow().isoformat(),
88
+ "step_type": step_type,
89
+ "data": data,
90
+ "user_id": user_id
91
+ }
92
+
93
+ # Store as list under session key
94
+ key = f"lightning:trajectory:{session_id}"
95
+ await redis.rpush(key, json.dumps(step))
96
+ await redis.expire(key, _get_trajectory_ttl())
97
+
98
+ logger.debug("Lightning: Trajectory step logged",
99
+ session_id=session_id[:8],
100
+ step_type=step_type)
101
+ return True
102
+
103
+ except Exception as e:
104
+ logger.warning("Lightning: Failed to log trajectory step", error=str(e))
105
+ return False
106
+
107
+
108
+ async def get_session_trajectory(session_id: str) -> List[Dict[str, Any]]:
109
+ """
110
+ Retrieve full trajectory for a session.
111
+
112
+ Args:
113
+ session_id: Session identifier
114
+
115
+ Returns:
116
+ List of trajectory steps
117
+ """
118
+ if not _is_lightning_enabled():
119
+ return []
120
+
121
+ try:
122
+ redis = await _get_redis()
123
+ if not redis:
124
+ return []
125
+
126
+ key = f"lightning:trajectory:{session_id}"
127
+ raw_steps = await redis.lrange(key, 0, -1)
128
+
129
+ return [json.loads(step) for step in raw_steps]
130
+
131
+ except Exception as e:
132
+ logger.warning("Lightning: Failed to get trajectory", error=str(e))
133
+ return []
134
+
135
+
136
+ async def export_trajectories_for_training(
137
+ min_steps: int = 3,
138
+ max_trajectories: int = 1000,
139
+ only_completed: bool = True
140
+ ) -> List[Dict[str, Any]]:
141
+ """
142
+ Export trajectories for RL training.
143
+
144
+ Args:
145
+ min_steps: Minimum steps required per trajectory
146
+ max_trajectories: Maximum number to export
147
+ only_completed: Only include trajectories with rewards
148
+
149
+ Returns:
150
+ List of trajectories with their rewards
151
+ """
152
+ if not _is_lightning_enabled():
153
+ logger.warning("Lightning: Cannot export - Lightning not enabled")
154
+ return []
155
+
156
+ try:
157
+ redis = await _get_redis()
158
+ if not redis:
159
+ return []
160
+
161
+ # Get all trajectory keys
162
+ trajectory_keys = await redis.keys("lightning:trajectory:*")
163
+
164
+ trajectories = []
165
+ for key in trajectory_keys[:max_trajectories * 2]: # Get extra to filter
166
+ session_id = key.decode().split(":")[-1]
167
+
168
+ # Get trajectory
169
+ raw_steps = await redis.lrange(key, 0, -1)
170
+ steps = [json.loads(step) for step in raw_steps]
171
+
172
+ if len(steps) < min_steps:
173
+ continue
174
+
175
+ # Get rewards for this session
176
+ reward_key = f"lightning:rewards:{session_id}"
177
+ raw_rewards = await redis.lrange(reward_key, 0, -1)
178
+ rewards = [json.loads(r) for r in raw_rewards]
179
+
180
+ if only_completed and not rewards:
181
+ continue
182
+
183
+ # Calculate total reward
184
+ total_reward = sum(r.get("reward", 0) for r in rewards)
185
+
186
+ trajectories.append({
187
+ "session_id": session_id,
188
+ "steps": steps,
189
+ "rewards": rewards,
190
+ "total_reward": total_reward,
191
+ "step_count": len(steps)
192
+ })
193
+
194
+ if len(trajectories) >= max_trajectories:
195
+ break
196
+
197
+ logger.info("Lightning: Exported trajectories for training", count=len(trajectories))
198
+ return trajectories
199
+
200
+ except Exception as e:
201
+ logger.error("Lightning: Failed to export trajectories", error=str(e))
202
+ return []
203
+
204
+
205
+ def wrap_graph_if_enabled(compiled_graph):
206
+ """
207
+ Wrap a compiled LangGraph with trajectory logging.
208
+
209
+ This is a passthrough wrapper that logs trajectory steps
210
+ without modifying the graph's behavior.
211
+
212
+ Args:
213
+ compiled_graph: The compiled LangGraph
214
+
215
+ Returns:
216
+ Wrapped graph (or original if Lightning disabled)
217
+ """
218
+ if not _is_lightning_enabled():
219
+ logger.info("Lightning: Disabled - returning unwrapped graph")
220
+ return compiled_graph
221
+
222
+ logger.info("Lightning: Wrapping graph with trajectory capture")
223
+
224
+ # For now, return the original graph
225
+ # The trajectory logging is done at the brain.py level
226
+ # This wrapper is a hook for future enhancements
227
+ return compiled_graph
228
+
229
+
230
+ class TrajectoryContext:
231
+ """
232
+ Context manager for tracking a complete conversation trajectory.
233
+
234
+ Usage:
235
+ async with TrajectoryContext(session_id, user_id) as ctx:
236
+ ctx.log_user_input(message)
237
+ ctx.log_brain_decision(decision)
238
+ ctx.log_tool_call(tool, params, result)
239
+ ctx.log_response(response)
240
+ """
241
+
242
+ def __init__(self, session_id: str, user_id: Optional[str] = None):
243
+ self.session_id = session_id
244
+ self.user_id = user_id
245
+ self.start_time = None
246
+ self.enabled = _is_lightning_enabled()
247
+
248
+ async def __aenter__(self):
249
+ self.start_time = datetime.utcnow()
250
+ if self.enabled:
251
+ await log_trajectory_step(
252
+ self.session_id,
253
+ "session_start",
254
+ {"timestamp": self.start_time.isoformat()},
255
+ self.user_id
256
+ )
257
+ return self
258
+
259
+ async def __aexit__(self, exc_type, exc_val, exc_tb):
260
+ if self.enabled:
261
+ end_time = datetime.utcnow()
262
+ duration = (end_time - self.start_time).total_seconds()
263
+ await log_trajectory_step(
264
+ self.session_id,
265
+ "session_end",
266
+ {
267
+ "duration_seconds": duration,
268
+ "error": str(exc_val) if exc_val else None
269
+ },
270
+ self.user_id
271
+ )
272
+ return False # Don't suppress exceptions
273
+
274
+ async def log_user_input(self, message: str, is_voice: bool = False):
275
+ """Log user input step"""
276
+ if self.enabled:
277
+ await log_trajectory_step(
278
+ self.session_id,
279
+ "user_input",
280
+ {
281
+ "message": message[:500], # Truncate long messages
282
+ "is_voice": is_voice
283
+ },
284
+ self.user_id
285
+ )
286
+
287
+ async def log_brain_decision(self, thinking: str, tool: Optional[str], params: Dict):
288
+ """Log brain decision step"""
289
+ if self.enabled:
290
+ await log_trajectory_step(
291
+ self.session_id,
292
+ "brain_decision",
293
+ {
294
+ "thinking": thinking[:200], # Truncate
295
+ "tool": tool,
296
+ "params": {k: str(v)[:100] for k, v in params.items()} if params else {}
297
+ },
298
+ self.user_id
299
+ )
300
+
301
+ async def log_tool_call(self, tool: str, success: bool, message: str):
302
+ """Log tool execution step"""
303
+ if self.enabled:
304
+ await log_trajectory_step(
305
+ self.session_id,
306
+ "tool_result",
307
+ {
308
+ "tool": tool,
309
+ "success": success,
310
+ "message": message[:200]
311
+ },
312
+ self.user_id
313
+ )
314
+
315
+ async def log_response(self, response: str, action: Optional[str] = None):
316
+ """Log AI response step"""
317
+ if self.enabled:
318
+ await log_trajectory_step(
319
+ self.session_id,
320
+ "response",
321
+ {
322
+ "response": response[:500],
323
+ "action": action
324
+ },
325
+ self.user_id
326
+ )
app/ai/services/__init__.py CHANGED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/services/__init__.py
2
+ """
3
+ AI Services for AIDA
4
+
5
+ Includes:
6
+ - Search services (hybrid, MongoDB, Qdrant)
7
+ - RLM (Recursive Language Model) for complex queries
8
+ - Strategy selection
9
+ - Intent classification
10
+ - OpenStreetMap POI service for proximity searches
11
+ """
12
+
13
+ from app.ai.services.rlm_query_analyzer import (
14
+ QueryComplexity,
15
+ QueryAnalysis,
16
+ analyze_query_complexity,
17
+ should_use_rlm
18
+ )
19
+
20
+ from app.ai.services.rlm_search_service import (
21
+ RLMSearchAgent,
22
+ get_rlm_agent,
23
+ rlm_search
24
+ )
25
+
26
+ from app.ai.services.osm_poi_service import (
27
+ find_pois,
28
+ find_pois_overpass,
29
+ geocode_location,
30
+ find_multiple_poi_types,
31
+ calculate_distance_km
32
+ )
33
+
34
+ __all__ = [
35
+ # RLM Query Analyzer
36
+ "QueryComplexity",
37
+ "QueryAnalysis",
38
+ "analyze_query_complexity",
39
+ "should_use_rlm",
40
+
41
+ # RLM Search Service
42
+ "RLMSearchAgent",
43
+ "get_rlm_agent",
44
+ "rlm_search",
45
+
46
+ # OpenStreetMap POI Service
47
+ "find_pois",
48
+ "find_pois_overpass",
49
+ "geocode_location",
50
+ "find_multiple_poi_types",
51
+ "calculate_distance_km",
52
+ ]
app/ai/services/osm_poi_service.py ADDED
@@ -0,0 +1,499 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/services/osm_poi_service.py
2
+ """
3
+ OpenStreetMap POI (Point of Interest) Service for AIDA RLM.
4
+
5
+ Uses FREE OpenStreetMap APIs:
6
+ - Nominatim: Geocoding (location name β†’ coordinates)
7
+ - Overpass: POI search (find schools, hospitals, parks near a location)
8
+
9
+ No API key required! Just respect rate limits (1 request/second for Nominatim).
10
+
11
+ Supports:
12
+ - Schools, universities, colleges
13
+ - Hospitals, clinics, pharmacies
14
+ - Parks, gardens, beaches
15
+ - Markets, supermarkets, malls
16
+ - Airports, bus stations
17
+ - Mosques, churches
18
+ - And more...
19
+ """
20
+
21
+ import asyncio
22
+ import httpx
23
+ from typing import List, Dict, Optional, Tuple
24
+ from structlog import get_logger
25
+
26
+ logger = get_logger(__name__)
27
+
28
+ # Rate limiting: Nominatim requires max 1 request/second
29
+ _last_nominatim_request = 0
30
+
31
+
32
+ # =============================================================================
33
+ # OSM Tag Mappings
34
+ # =============================================================================
35
+
36
+ OSM_POI_TAGS = {
37
+ # Education
38
+ "school": "amenity=school",
39
+ "schools": "amenity=school",
40
+ "primary school": "amenity=school",
41
+ "secondary school": "amenity=school",
42
+ "high school": "amenity=school",
43
+ "university": "amenity=university",
44
+ "college": "amenity=college",
45
+ "kindergarten": "amenity=kindergarten",
46
+
47
+ # Healthcare
48
+ "hospital": "amenity=hospital",
49
+ "clinic": "amenity=clinic",
50
+ "pharmacy": "amenity=pharmacy",
51
+ "doctor": "amenity=doctors",
52
+
53
+ # Recreation & Nature
54
+ "beach": "natural=beach",
55
+ "park": "leisure=park",
56
+ "garden": "leisure=garden",
57
+ "playground": "leisure=playground",
58
+ "sports": "leisure=sports_centre",
59
+ "gym": "leisure=fitness_centre",
60
+ "swimming pool": "leisure=swimming_pool",
61
+ "stadium": "leisure=stadium",
62
+
63
+ # Shopping
64
+ "market": "amenity=marketplace",
65
+ "supermarket": "shop=supermarket",
66
+ "mall": "shop=mall",
67
+ "shopping center": "shop=mall",
68
+ "shop": "shop=supermarket",
69
+
70
+ # Transport
71
+ "airport": "aeroway=aerodrome",
72
+ "bus station": "amenity=bus_station",
73
+ "bus stop": "highway=bus_stop",
74
+ "train station": "railway=station",
75
+ "port": "amenity=ferry_terminal",
76
+
77
+ # Religious
78
+ "mosque": 'amenity=place_of_worship"][religion=muslim',
79
+ "church": 'amenity=place_of_worship"][religion=christian',
80
+ "cathedral": "building=cathedral",
81
+
82
+ # Food & Drink
83
+ "restaurant": "amenity=restaurant",
84
+ "cafe": "amenity=cafe",
85
+ "bar": "amenity=bar",
86
+
87
+ # Business & Services
88
+ "bank": "amenity=bank",
89
+ "atm": "amenity=atm",
90
+ "police": "amenity=police",
91
+ "post office": "amenity=post_office",
92
+ "embassy": "amenity=embassy",
93
+
94
+ # Landmarks
95
+ "downtown": "place=city_centre",
96
+ "city center": "place=city_centre",
97
+ "city centre": "place=city_centre",
98
+ }
99
+
100
+ # French translations
101
+ OSM_POI_TAGS_FR = {
102
+ "Γ©cole": "amenity=school",
103
+ "ecole": "amenity=school",
104
+ "lycΓ©e": "amenity=school",
105
+ "lycee": "amenity=school",
106
+ "collège": "amenity=school",
107
+ "college": "amenity=college",
108
+ "universitΓ©": "amenity=university",
109
+ "universite": "amenity=university",
110
+ "hΓ΄pital": "amenity=hospital",
111
+ "hopital": "amenity=hospital",
112
+ "clinique": "amenity=clinic",
113
+ "pharmacie": "amenity=pharmacy",
114
+ "plage": "natural=beach",
115
+ "parc": "leisure=park",
116
+ "jardin": "leisure=garden",
117
+ "marchΓ©": "amenity=marketplace",
118
+ "marche": "amenity=marketplace",
119
+ "supermarchΓ©": "shop=supermarket",
120
+ "aΓ©roport": "aeroway=aerodrome",
121
+ "aeroport": "aeroway=aerodrome",
122
+ "gare": "railway=station",
123
+ "mosquΓ©e": 'amenity=place_of_worship"][religion=muslim',
124
+ "mosquee": 'amenity=place_of_worship"][religion=muslim',
125
+ "Γ©glise": 'amenity=place_of_worship"][religion=christian',
126
+ "eglise": 'amenity=place_of_worship"][religion=christian',
127
+ "centre-ville": "place=city_centre",
128
+ }
129
+
130
+ # Merge all tags
131
+ ALL_POI_TAGS = {**OSM_POI_TAGS, **OSM_POI_TAGS_FR}
132
+
133
+
134
+ # =============================================================================
135
+ # Nominatim Geocoding
136
+ # =============================================================================
137
+
138
+ async def geocode_location(location: str) -> Optional[Tuple[float, float]]:
139
+ """
140
+ Convert location name to coordinates using Nominatim.
141
+
142
+ Args:
143
+ location: Location name (e.g., "Cotonou, Benin")
144
+
145
+ Returns:
146
+ Tuple of (latitude, longitude) or None if not found
147
+ """
148
+ global _last_nominatim_request
149
+
150
+ # Rate limiting: wait if needed
151
+ import time
152
+ now = time.time()
153
+ if now - _last_nominatim_request < 1:
154
+ await asyncio.sleep(1 - (now - _last_nominatim_request))
155
+ _last_nominatim_request = time.time()
156
+
157
+ try:
158
+ async with httpx.AsyncClient(timeout=15) as client:
159
+ response = await client.get(
160
+ "https://nominatim.openstreetmap.org/search",
161
+ params={
162
+ "q": location,
163
+ "format": "json",
164
+ "limit": 1,
165
+ "addressdetails": 1
166
+ },
167
+ headers={
168
+ "User-Agent": "AIDA-RealEstate/1.0 (contact@lojiz.com)"
169
+ }
170
+ )
171
+
172
+ if response.status_code != 200:
173
+ logger.error(f"Nominatim error: {response.status_code}")
174
+ return None
175
+
176
+ data = response.json()
177
+
178
+ if not data:
179
+ logger.warning(f"Location not found: {location}")
180
+ return None
181
+
182
+ lat = float(data[0]["lat"])
183
+ lon = float(data[0]["lon"])
184
+
185
+ logger.info(
186
+ "Geocoded location",
187
+ location=location,
188
+ lat=lat,
189
+ lon=lon
190
+ )
191
+
192
+ return (lat, lon)
193
+
194
+ except Exception as e:
195
+ logger.error(f"Geocoding failed: {e}")
196
+ return None
197
+
198
+
199
+ # =============================================================================
200
+ # Overpass POI Search
201
+ # =============================================================================
202
+
203
+ async def find_pois_overpass(
204
+ poi_type: str,
205
+ center_lat: float,
206
+ center_lon: float,
207
+ radius_km: float = 5
208
+ ) -> List[Dict]:
209
+ """
210
+ Find POIs near a location using Overpass API.
211
+
212
+ Args:
213
+ poi_type: Type of POI (school, hospital, beach, etc.)
214
+ center_lat: Center latitude
215
+ center_lon: Center longitude
216
+ radius_km: Search radius in kilometers
217
+
218
+ Returns:
219
+ List of POI dicts with name, lat, lon, type
220
+ """
221
+ # Get OSM tag for this POI type
222
+ poi_lower = poi_type.lower().strip()
223
+ osm_tag = ALL_POI_TAGS.get(poi_lower)
224
+
225
+ if not osm_tag:
226
+ # Try partial matching
227
+ for key, tag in ALL_POI_TAGS.items():
228
+ if poi_lower in key or key in poi_lower:
229
+ osm_tag = tag
230
+ break
231
+
232
+ if not osm_tag:
233
+ # Default to amenity search
234
+ osm_tag = f"amenity={poi_lower}"
235
+ logger.warning(f"Unknown POI type '{poi_type}', using default: {osm_tag}")
236
+
237
+ # Build Overpass QL query
238
+ radius_meters = radius_km * 1000
239
+
240
+ query = f"""
241
+ [out:json][timeout:25];
242
+ (
243
+ node[{osm_tag}](around:{radius_meters},{center_lat},{center_lon});
244
+ way[{osm_tag}](around:{radius_meters},{center_lat},{center_lon});
245
+ relation[{osm_tag}](around:{radius_meters},{center_lat},{center_lon});
246
+ );
247
+ out center tags;
248
+ """
249
+
250
+ try:
251
+ async with httpx.AsyncClient(timeout=30) as client:
252
+ response = await client.post(
253
+ "https://overpass-api.de/api/interpreter",
254
+ data={"data": query},
255
+ headers={
256
+ "User-Agent": "AIDA-RealEstate/1.0"
257
+ }
258
+ )
259
+
260
+ if response.status_code != 200:
261
+ logger.error(f"Overpass error: {response.status_code}")
262
+ return []
263
+
264
+ data = response.json()
265
+
266
+ except Exception as e:
267
+ logger.error(f"Overpass query failed: {e}")
268
+ return []
269
+
270
+ # Parse results
271
+ pois = []
272
+ for element in data.get("elements", []):
273
+ # Get coordinates
274
+ if element["type"] == "node":
275
+ lat = element.get("lat")
276
+ lon = element.get("lon")
277
+ else:
278
+ # For ways/relations, use center
279
+ center = element.get("center", {})
280
+ lat = center.get("lat")
281
+ lon = center.get("lon")
282
+
283
+ if not lat or not lon:
284
+ continue
285
+
286
+ tags = element.get("tags", {})
287
+
288
+ # Build POI entry
289
+ poi = {
290
+ "name": tags.get("name", f"{poi_type.title()} (unnamed)"),
291
+ "lat": lat,
292
+ "lon": lon,
293
+ "type": poi_type,
294
+ "osm_id": element.get("id"),
295
+ "osm_type": element.get("type"),
296
+ }
297
+
298
+ # Add extra info if available
299
+ if tags.get("addr:street"):
300
+ poi["address"] = f"{tags.get('addr:housenumber', '')} {tags['addr:street']}".strip()
301
+ if tags.get("website"):
302
+ poi["website"] = tags["website"]
303
+ if tags.get("phone"):
304
+ poi["phone"] = tags["phone"]
305
+
306
+ pois.append(poi)
307
+
308
+ logger.info(
309
+ "Found POIs",
310
+ poi_type=poi_type,
311
+ count=len(pois),
312
+ radius_km=radius_km
313
+ )
314
+
315
+ return pois
316
+
317
+
318
+ # =============================================================================
319
+ # Main Function: Find POIs by Location Name
320
+ # =============================================================================
321
+
322
+ async def find_pois(
323
+ poi_type: str,
324
+ location: str,
325
+ radius_km: float = 5,
326
+ limit: int = 10
327
+ ) -> List[Dict]:
328
+ """
329
+ Find POIs near a location (main entry point).
330
+
331
+ Args:
332
+ poi_type: Type of POI (school, hospital, beach, park, etc.)
333
+ location: Location name (e.g., "Cotonou", "Calavi, Benin")
334
+ radius_km: Search radius in kilometers (default 5km)
335
+ limit: Maximum number of results (default 10)
336
+
337
+ Returns:
338
+ List of POI dicts:
339
+ [
340
+ {
341
+ "name": "Collège Père Aupiais",
342
+ "lat": 6.3654,
343
+ "lon": 2.4183,
344
+ "type": "school",
345
+ "osm_id": 12345678,
346
+ "address": "Rue de l'Γ‰cole"
347
+ },
348
+ ...
349
+ ]
350
+
351
+ Example:
352
+ pois = await find_pois("school", "Cotonou, Benin", radius_km=3)
353
+ """
354
+ # Step 1: Geocode the location
355
+ coords = await geocode_location(location)
356
+
357
+ if not coords:
358
+ logger.warning(f"Could not geocode location: {location}")
359
+ return []
360
+
361
+ center_lat, center_lon = coords
362
+
363
+ # Step 2: Find POIs near those coordinates
364
+ pois = await find_pois_overpass(
365
+ poi_type=poi_type,
366
+ center_lat=center_lat,
367
+ center_lon=center_lon,
368
+ radius_km=radius_km
369
+ )
370
+
371
+ # Limit results
372
+ return pois[:limit]
373
+
374
+
375
+ # =============================================================================
376
+ # Batch POI Search
377
+ # =============================================================================
378
+
379
+ async def find_multiple_poi_types(
380
+ poi_types: List[str],
381
+ location: str,
382
+ radius_km: float = 5
383
+ ) -> Dict[str, List[Dict]]:
384
+ """
385
+ Find multiple types of POIs at once.
386
+
387
+ Args:
388
+ poi_types: List of POI types (e.g., ["school", "hospital", "park"])
389
+ location: Location name
390
+
391
+ Returns:
392
+ Dict mapping POI type to list of POIs:
393
+ {
394
+ "school": [...],
395
+ "hospital": [...],
396
+ "park": [...]
397
+ }
398
+ """
399
+ # Geocode once
400
+ coords = await geocode_location(location)
401
+
402
+ if not coords:
403
+ return {poi_type: [] for poi_type in poi_types}
404
+
405
+ center_lat, center_lon = coords
406
+
407
+ # Search each POI type in parallel
408
+ async def search_poi(poi_type: str):
409
+ return poi_type, await find_pois_overpass(
410
+ poi_type, center_lat, center_lon, radius_km
411
+ )
412
+
413
+ results = await asyncio.gather(*[search_poi(pt) for pt in poi_types])
414
+
415
+ return {poi_type: pois for poi_type, pois in results}
416
+
417
+
418
+ # =============================================================================
419
+ # Utility: Calculate Distance
420
+ # =============================================================================
421
+
422
+ def calculate_distance_km(
423
+ lat1: float,
424
+ lon1: float,
425
+ lat2: float,
426
+ lon2: float
427
+ ) -> float:
428
+ """
429
+ Calculate distance between two points using Haversine formula.
430
+
431
+ Returns distance in kilometers.
432
+ """
433
+ import math
434
+
435
+ R = 6371 # Earth's radius in km
436
+
437
+ lat1_rad = math.radians(lat1)
438
+ lat2_rad = math.radians(lat2)
439
+ delta_lat = math.radians(lat2 - lat1)
440
+ delta_lon = math.radians(lon2 - lon1)
441
+
442
+ a = (math.sin(delta_lat / 2) ** 2 +
443
+ math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon / 2) ** 2)
444
+ c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
445
+
446
+ return R * c
447
+
448
+
449
+ # =============================================================================
450
+ # Test Function
451
+ # =============================================================================
452
+
453
+ async def test_osm_service():
454
+ """Test the OSM POI service."""
455
+ print("\n" + "=" * 60)
456
+ print("Testing OpenStreetMap POI Service")
457
+ print("=" * 60 + "\n")
458
+
459
+ # Test 1: Geocoding
460
+ print("Test 1: Geocoding 'Cotonou, Benin'")
461
+ coords = await geocode_location("Cotonou, Benin")
462
+ if coords:
463
+ print(f" βœ… Found: {coords}")
464
+ else:
465
+ print(" ❌ Failed")
466
+
467
+ # Test 2: Find schools
468
+ print("\nTest 2: Find schools in Cotonou")
469
+ schools = await find_pois("school", "Cotonou, Benin", radius_km=3)
470
+ print(f" Found {len(schools)} schools:")
471
+ for school in schools[:5]:
472
+ print(f" - {school['name']} ({school['lat']:.4f}, {school['lon']:.4f})")
473
+
474
+ # Test 3: Find hospitals
475
+ print("\nTest 3: Find hospitals in Cotonou")
476
+ hospitals = await find_pois("hospital", "Cotonou, Benin", radius_km=5)
477
+ print(f" Found {len(hospitals)} hospitals:")
478
+ for hospital in hospitals[:3]:
479
+ print(f" - {hospital['name']} ({hospital['lat']:.4f}, {hospital['lon']:.4f})")
480
+
481
+ # Test 4: Find markets
482
+ print("\nTest 4: Find markets in Cotonou")
483
+ markets = await find_pois("market", "Cotonou, Benin", radius_km=3)
484
+ print(f" Found {len(markets)} markets:")
485
+ for market in markets[:3]:
486
+ print(f" - {market['name']} ({market['lat']:.4f}, {market['lon']:.4f})")
487
+
488
+ # Test 5: French POI type
489
+ print("\nTest 5: Find 'Γ©cole' (French) in Cotonou")
490
+ ecoles = await find_pois("Γ©cole", "Cotonou, Benin", radius_km=3)
491
+ print(f" Found {len(ecoles)} Γ©coles")
492
+
493
+ print("\n" + "=" * 60)
494
+ print("OSM Service Tests Complete!")
495
+ print("=" * 60 + "\n")
496
+
497
+
498
+ if __name__ == "__main__":
499
+ asyncio.run(test_osm_service())
app/ai/services/rlm_query_analyzer.py ADDED
@@ -0,0 +1,287 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/services/rlm_query_analyzer.py
2
+ """
3
+ RLM Query Analyzer - Detects complex queries that need recursive reasoning.
4
+
5
+ Identifies:
6
+ - Multi-hop queries: "near schools", "close to beach"
7
+ - Boolean OR queries: "under 500k OR has pool"
8
+ - Comparative queries: "compare Cotonou vs Calavi"
9
+ - Aggregation queries: "average price", "how many"
10
+ - Multi-factor queries: "best family apartment near schools and parks"
11
+ """
12
+
13
+ import re
14
+ from typing import Dict, List, Literal, Optional
15
+ from enum import Enum
16
+ from structlog import get_logger
17
+ from pydantic import BaseModel
18
+
19
+ logger = get_logger(__name__)
20
+
21
+
22
+ class QueryComplexity(str, Enum):
23
+ """Types of complex queries that RLM can handle"""
24
+ SIMPLE = "simple" # Standard single-hop search
25
+ MULTI_HOP = "multi_hop" # "near X", "close to Y"
26
+ BOOLEAN_OR = "boolean_or" # "A OR B"
27
+ COMPARATIVE = "comparative" # "compare A vs B"
28
+ AGGREGATION = "aggregation" # "average", "total", "count"
29
+ MULTI_FACTOR = "multi_factor" # Multiple ranking criteria
30
+
31
+
32
+ class QueryAnalysis(BaseModel):
33
+ """Result of query analysis"""
34
+ complexity: QueryComplexity
35
+ confidence: float # 0.0 to 1.0
36
+ reasoning: str
37
+ detected_patterns: List[str]
38
+ sub_query_hints: List[str] # Hints for decomposition
39
+ use_rlm: bool
40
+
41
+
42
+ # Pattern definitions for each complexity type
43
+ MULTI_HOP_PATTERNS = [
44
+ r"\bnear\b",
45
+ r"\bclose to\b",
46
+ r"\bnearby\b",
47
+ r"\bwalking distance\b",
48
+ r"\bwithin \d+ ?(?:km|m|meters|miles|minutes)\b",
49
+ r"\baround\b",
50
+ r"\bproximity\b",
51
+ r"\bnext to\b",
52
+ r"\bbeside\b",
53
+ r"\bopposite\b",
54
+ r"\bacross from\b",
55
+ # French equivalents
56
+ r"\bprès de\b",
57
+ r"\bΓ  cΓ΄tΓ© de\b",
58
+ r"\bproche de\b",
59
+ r"\baux alentours\b",
60
+ ]
61
+
62
+ BOOLEAN_OR_PATTERNS = [
63
+ r"\bor\b",
64
+ r"\beither\b",
65
+ r"\balternatively\b",
66
+ r"\botherwise\b",
67
+ # French
68
+ r"\bou\b",
69
+ r"\bsoit\b",
70
+ ]
71
+
72
+ COMPARATIVE_PATTERNS = [
73
+ r"\bcompare\b",
74
+ r"\bvs\.?\b",
75
+ r"\bversus\b",
76
+ r"\bdifference between\b",
77
+ r"\bcheaper\b",
78
+ r"\bmore expensive\b",
79
+ r"\bbetter\b",
80
+ r"\bwhich is\b",
81
+ # French
82
+ r"\bcomparer\b",
83
+ r"\bentre\b",
84
+ r"\bmoins cher\b",
85
+ r"\bplus cher\b",
86
+ ]
87
+
88
+ AGGREGATION_PATTERNS = [
89
+ r"\baverage\b",
90
+ r"\bmean\b",
91
+ r"\btotal\b",
92
+ r"\bcount\b",
93
+ r"\bhow many\b",
94
+ r"\bsum\b",
95
+ r"\bstatistics\b",
96
+ r"\brange\b",
97
+ r"\bmin(?:imum)?\b",
98
+ r"\bmax(?:imum)?\b",
99
+ # French
100
+ r"\bmoyenne\b",
101
+ r"\bcombien\b",
102
+ r"\btotal\b",
103
+ ]
104
+
105
+ MULTI_FACTOR_PATTERNS = [
106
+ r"\bbest\b",
107
+ r"\btop\b",
108
+ r"\bideal\b",
109
+ r"\bperfect\b",
110
+ r"\brecommend\b",
111
+ r"\bsuitable\b",
112
+ r"\bfamily.?friendly\b",
113
+ r"\bsafe\b",
114
+ r"\bquiet\b",
115
+ r"\bpeaceful\b",
116
+ # Combined criteria indicators
117
+ r"\band\b.*\band\b", # Multiple ANDs suggest multi-factor
118
+ # French
119
+ r"\bmeilleur\b",
120
+ r"\bidΓ©al\b",
121
+ r"\brecommandΓ©\b",
122
+ r"\bfamilial\b",
123
+ r"\bsΓ©curisΓ©\b",
124
+ ]
125
+
126
+ # Points of Interest that trigger multi-hop search
127
+ POI_KEYWORDS = [
128
+ # Education
129
+ "school", "university", "college", "campus", "Γ©cole", "universitΓ©",
130
+ # Health
131
+ "hospital", "clinic", "pharmacy", "hΓ΄pital", "clinique",
132
+ # Recreation
133
+ "beach", "park", "garden", "gym", "plage", "parc", "jardin",
134
+ # Shopping
135
+ "mall", "market", "supermarket", "marchΓ©", "supermarchΓ©",
136
+ # Transport
137
+ "airport", "station", "bus stop", "aΓ©roport", "gare",
138
+ # Business
139
+ "downtown", "city center", "business district", "centre-ville",
140
+ # Landmarks
141
+ "mosque", "church", "cathedral", "mosquΓ©e", "Γ©glise",
142
+ ]
143
+
144
+
145
+ def analyze_query_complexity(query: str) -> QueryAnalysis:
146
+ """
147
+ Analyze a search query to determine if it needs RLM processing.
148
+
149
+ Args:
150
+ query: User's search query
151
+
152
+ Returns:
153
+ QueryAnalysis with complexity type and recommendations
154
+ """
155
+ query_lower = query.lower()
156
+ detected_patterns = []
157
+ sub_query_hints = []
158
+ scores = {
159
+ QueryComplexity.MULTI_HOP: 0.0,
160
+ QueryComplexity.BOOLEAN_OR: 0.0,
161
+ QueryComplexity.COMPARATIVE: 0.0,
162
+ QueryComplexity.AGGREGATION: 0.0,
163
+ QueryComplexity.MULTI_FACTOR: 0.0,
164
+ }
165
+
166
+ # Check for multi-hop patterns
167
+ for pattern in MULTI_HOP_PATTERNS:
168
+ if re.search(pattern, query_lower, re.IGNORECASE):
169
+ scores[QueryComplexity.MULTI_HOP] += 0.4
170
+ detected_patterns.append(f"proximity: {pattern}")
171
+
172
+ # Check for POI keywords (boost multi-hop if found with proximity)
173
+ poi_found = []
174
+ for poi in POI_KEYWORDS:
175
+ if poi.lower() in query_lower:
176
+ poi_found.append(poi)
177
+ if scores[QueryComplexity.MULTI_HOP] > 0:
178
+ scores[QueryComplexity.MULTI_HOP] += 0.3
179
+ sub_query_hints.append(f"Find {poi} locations first")
180
+
181
+ if poi_found:
182
+ detected_patterns.append(f"POI: {', '.join(poi_found)}")
183
+
184
+ # Check for boolean OR patterns
185
+ for pattern in BOOLEAN_OR_PATTERNS:
186
+ if re.search(pattern, query_lower, re.IGNORECASE):
187
+ scores[QueryComplexity.BOOLEAN_OR] += 0.5
188
+ detected_patterns.append(f"boolean: {pattern}")
189
+
190
+ # Try to extract OR branches
191
+ parts = re.split(r'\bor\b|\bou\b', query_lower, flags=re.IGNORECASE)
192
+ if len(parts) > 1:
193
+ for i, part in enumerate(parts):
194
+ sub_query_hints.append(f"Branch {i+1}: {part.strip()}")
195
+
196
+ # Check for comparative patterns
197
+ for pattern in COMPARATIVE_PATTERNS:
198
+ if re.search(pattern, query_lower, re.IGNORECASE):
199
+ scores[QueryComplexity.COMPARATIVE] += 0.5
200
+ detected_patterns.append(f"comparative: {pattern}")
201
+
202
+ # Try to extract comparison subjects
203
+ vs_match = re.search(r'(\w+)\s+(?:vs\.?|versus|or)\s+(\w+)', query_lower)
204
+ if vs_match:
205
+ sub_query_hints.append(f"Compare: {vs_match.group(1)} vs {vs_match.group(2)}")
206
+
207
+ # Check for aggregation patterns
208
+ for pattern in AGGREGATION_PATTERNS:
209
+ if re.search(pattern, query_lower, re.IGNORECASE):
210
+ scores[QueryComplexity.AGGREGATION] += 0.5
211
+ detected_patterns.append(f"aggregation: {pattern}")
212
+ sub_query_hints.append("Fetch all matching listings, then aggregate")
213
+
214
+ # Check for multi-factor patterns
215
+ multi_factor_count = 0
216
+ for pattern in MULTI_FACTOR_PATTERNS:
217
+ if re.search(pattern, query_lower, re.IGNORECASE):
218
+ multi_factor_count += 1
219
+ detected_patterns.append(f"multi-factor: {pattern}")
220
+
221
+ # If 2+ factors detected, it's multi-factor
222
+ if multi_factor_count >= 2:
223
+ scores[QueryComplexity.MULTI_FACTOR] += 0.3 * multi_factor_count
224
+ sub_query_hints.append("Evaluate each factor separately, then combine scores")
225
+
226
+ # Determine dominant complexity type
227
+ max_score = max(scores.values())
228
+
229
+ if max_score < 0.3:
230
+ # Simple query - no RLM needed
231
+ return QueryAnalysis(
232
+ complexity=QueryComplexity.SIMPLE,
233
+ confidence=1.0 - max_score,
234
+ reasoning="No complex patterns detected, standard search sufficient",
235
+ detected_patterns=detected_patterns,
236
+ sub_query_hints=[],
237
+ use_rlm=False
238
+ )
239
+
240
+ # Find the complexity type with highest score
241
+ dominant_type = max(scores, key=scores.get)
242
+ confidence = min(scores[dominant_type], 1.0)
243
+
244
+ # Build reasoning
245
+ reasoning_map = {
246
+ QueryComplexity.MULTI_HOP: f"Query requires finding POI locations first, then searching nearby. POIs: {poi_found}",
247
+ QueryComplexity.BOOLEAN_OR: "Query has OR logic requiring separate searches and union",
248
+ QueryComplexity.COMPARATIVE: "Query requires searching multiple locations and comparing results",
249
+ QueryComplexity.AGGREGATION: "Query requires aggregating data across listings",
250
+ QueryComplexity.MULTI_FACTOR: f"Query has {multi_factor_count} ranking factors requiring weighted scoring",
251
+ }
252
+
253
+ logger.info(
254
+ "Query analyzed",
255
+ complexity=dominant_type.value,
256
+ confidence=confidence,
257
+ patterns=len(detected_patterns),
258
+ use_rlm=True
259
+ )
260
+
261
+ return QueryAnalysis(
262
+ complexity=dominant_type,
263
+ confidence=confidence,
264
+ reasoning=reasoning_map.get(dominant_type, "Complex query detected"),
265
+ detected_patterns=detected_patterns,
266
+ sub_query_hints=sub_query_hints,
267
+ use_rlm=True
268
+ )
269
+
270
+
271
+ async def should_use_rlm(query: str) -> bool:
272
+ """
273
+ Quick check if query should use RLM.
274
+
275
+ Returns True if query is complex enough for RLM.
276
+ """
277
+ analysis = analyze_query_complexity(query)
278
+ return analysis.use_rlm
279
+
280
+
281
+ # Export for use in other modules
282
+ __all__ = [
283
+ "QueryComplexity",
284
+ "QueryAnalysis",
285
+ "analyze_query_complexity",
286
+ "should_use_rlm"
287
+ ]
app/ai/services/rlm_search_service.py ADDED
@@ -0,0 +1,1202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # app/ai/services/rlm_search_service.py
2
+ """
3
+ RLM (Recursive Language Model) Search Service for AIDA.
4
+
5
+ Implements multi-hop reasoning for complex search queries using
6
+ recursive decomposition and aggregation.
7
+
8
+ Key Features:
9
+ - Multi-hop proximity search ("near schools", "close to beach")
10
+ - Boolean OR query handling ("under 500k OR has pool")
11
+ - Comparative analysis ("compare Cotonou vs Calavi")
12
+ - Aggregation queries ("average price in Cotonou")
13
+ - Multi-factor ranking ("best family apartment near schools and parks")
14
+
15
+ Uses existing DeepSeek LLM (brain_llm) - no additional infrastructure needed.
16
+ """
17
+
18
+ import json
19
+ import asyncio
20
+ from typing import Dict, List, Any, Optional, Tuple
21
+ from structlog import get_logger
22
+ from langchain_openai import ChatOpenAI
23
+ from langchain_core.messages import SystemMessage, HumanMessage
24
+
25
+ from app.config import settings
26
+ from app.ai.services.rlm_query_analyzer import (
27
+ QueryComplexity,
28
+ QueryAnalysis,
29
+ analyze_query_complexity
30
+ )
31
+
32
+ logger = get_logger(__name__)
33
+
34
+
35
+ # Use existing DeepSeek LLM configuration
36
+ rlm_llm = ChatOpenAI(
37
+ api_key=settings.DEEPSEEK_API_KEY,
38
+ base_url=settings.DEEPSEEK_BASE_URL,
39
+ model="deepseek-chat",
40
+ temperature=0.3, # Lower temp for more deterministic decomposition
41
+ )
42
+
43
+
44
+ # =============================================================================
45
+ # RLM CORE: Recursive Search Agent
46
+ # =============================================================================
47
+
48
+ class RLMSearchAgent:
49
+ """
50
+ Recursive Language Model Search Agent.
51
+
52
+ Decomposes complex queries into sub-queries, executes them recursively,
53
+ and aggregates results using LLM reasoning.
54
+
55
+ Example:
56
+ Query: "3-bed apartment near international schools in Cotonou under 500k"
57
+
58
+ RLM Flow:
59
+ 1. Decompose: ["Find schools in Cotonou", "Find 3-bed under 500k near schools"]
60
+ 2. Execute: Search schools β†’ Get coordinates β†’ Search apartments nearby
61
+ 3. Aggregate: Rank by proximity to schools
62
+ """
63
+
64
+ def __init__(self):
65
+ self.llm = rlm_llm
66
+ self.max_depth = 3
67
+ self.call_count = 0
68
+ self.search_cache = {} # Cache sub-query results
69
+
70
+ async def search(
71
+ self,
72
+ query: str,
73
+ context: Optional[Dict] = None,
74
+ analysis: Optional[QueryAnalysis] = None
75
+ ) -> Dict[str, Any]:
76
+ """
77
+ Main entry point for RLM search.
78
+
79
+ Args:
80
+ query: User's search query
81
+ context: Optional context (user location, previous results, etc.)
82
+ analysis: Optional pre-computed query analysis
83
+
84
+ Returns:
85
+ Dict with:
86
+ - results: List of matching listings
87
+ - strategy_used: RLM strategy name
88
+ - reasoning_steps: List of reasoning steps taken
89
+ - call_count: Number of LLM calls made
90
+ """
91
+ self.call_count = 0
92
+
93
+ # Analyze query if not provided
94
+ if analysis is None:
95
+ analysis = analyze_query_complexity(query)
96
+
97
+ logger.info(
98
+ "RLM search started",
99
+ query=query[:50],
100
+ complexity=analysis.complexity.value,
101
+ confidence=analysis.confidence
102
+ )
103
+
104
+ # Route to appropriate handler based on complexity
105
+ handler_map = {
106
+ QueryComplexity.MULTI_HOP: self._handle_multi_hop,
107
+ QueryComplexity.BOOLEAN_OR: self._handle_boolean_or,
108
+ QueryComplexity.COMPARATIVE: self._handle_comparative,
109
+ QueryComplexity.AGGREGATION: self._handle_aggregation,
110
+ QueryComplexity.MULTI_FACTOR: self._handle_multi_factor,
111
+ QueryComplexity.SIMPLE: self._handle_simple,
112
+ }
113
+
114
+ handler = handler_map.get(analysis.complexity, self._handle_simple)
115
+
116
+ try:
117
+ results = await handler(query, context or {}, analysis)
118
+
119
+ logger.info(
120
+ "RLM search complete",
121
+ query=query[:50],
122
+ result_count=len(results.get("results", [])),
123
+ call_count=self.call_count
124
+ )
125
+
126
+ return {
127
+ **results,
128
+ "strategy_used": f"RLM_{analysis.complexity.value.upper()}",
129
+ "call_count": self.call_count,
130
+ "analysis": analysis.model_dump()
131
+ }
132
+
133
+ except Exception as e:
134
+ logger.error("RLM search failed", error=str(e), query=query[:50])
135
+ # Fallback to simple search
136
+ return await self._handle_simple(query, context or {}, analysis)
137
+
138
+ # =========================================================================
139
+ # Handler: Multi-hop Queries ("near X", "close to Y")
140
+ # =========================================================================
141
+
142
+ async def _handle_multi_hop(
143
+ self,
144
+ query: str,
145
+ context: Dict,
146
+ analysis: QueryAnalysis
147
+ ) -> Dict[str, Any]:
148
+ """
149
+ Handle multi-hop proximity queries.
150
+
151
+ Example: "3-bed apartment near international schools in Cotonou"
152
+
153
+ Steps:
154
+ 1. Extract POI type (schools) and location (Cotonou)
155
+ 2. Find POI coordinates (schools in Cotonou)
156
+ 3. Search listings near POI coordinates
157
+ 4. Rank by proximity
158
+ """
159
+ reasoning_steps = []
160
+
161
+ # Step 1: Decompose query to extract POI and criteria
162
+ decomposition_prompt = f"""
163
+ Analyze this real estate search query and extract the proximity components:
164
+
165
+ Query: "{query}"
166
+
167
+ Extract:
168
+ 1. POI (Point of Interest) type: What the user wants to be near (school, beach, park, etc.)
169
+ 2. Location: The city/area being searched
170
+ 3. Listing criteria: bedrooms, price, amenities, etc.
171
+
172
+ Return JSON:
173
+ {{
174
+ "poi_type": "school" or "beach" or "park" or "hospital" or "market" or "airport" or null,
175
+ "poi_name": "specific name if mentioned" or null,
176
+ "location": "city or area name",
177
+ "listing_criteria": {{
178
+ "bedrooms": number or null,
179
+ "max_price": number or null,
180
+ "min_price": number or null,
181
+ "amenities": ["list"] or [],
182
+ "listing_type": "rent" or "sale" or null
183
+ }},
184
+ "proximity_km": 2 // default proximity radius in km
185
+ }}
186
+ """
187
+ self.call_count += 1
188
+ decomp_response = await self.llm.ainvoke([
189
+ HumanMessage(content=decomposition_prompt)
190
+ ])
191
+
192
+ try:
193
+ decomposition = self._extract_json(decomp_response.content)
194
+ except Exception:
195
+ logger.error("Failed to parse decomposition, falling back to simple search")
196
+ return await self._handle_simple(query, context, analysis)
197
+
198
+ reasoning_steps.append({
199
+ "step": "decomposition",
200
+ "result": decomposition
201
+ })
202
+
203
+ poi_type = decomposition.get("poi_type")
204
+ location = decomposition.get("location")
205
+ criteria = decomposition.get("listing_criteria", {})
206
+ proximity_km = decomposition.get("proximity_km", 2)
207
+
208
+ # Step 2: Find POI coordinates
209
+ poi_locations = []
210
+ if poi_type and location:
211
+ poi_locations = await self._find_poi_locations(
212
+ poi_type,
213
+ location,
214
+ decomposition.get("poi_name")
215
+ )
216
+ reasoning_steps.append({
217
+ "step": "find_poi",
218
+ "poi_type": poi_type,
219
+ "location": location,
220
+ "found": len(poi_locations)
221
+ })
222
+
223
+ # Step 3: Search listings near POI locations
224
+ if poi_locations:
225
+ # Search near each POI and aggregate
226
+ all_listings = []
227
+ for poi in poi_locations[:3]: # Limit to top 3 POIs
228
+ nearby_listings = await self._search_near_coordinates(
229
+ lat=poi["lat"],
230
+ lon=poi["lon"],
231
+ radius_km=proximity_km,
232
+ criteria=criteria,
233
+ location=location
234
+ )
235
+ # Add distance info to each listing
236
+ for listing in nearby_listings:
237
+ listing["_poi_name"] = poi.get("name", poi_type)
238
+ listing["_distance_km"] = self._calculate_distance(
239
+ poi["lat"], poi["lon"],
240
+ listing.get("latitude"), listing.get("longitude")
241
+ )
242
+ all_listings.extend(nearby_listings)
243
+
244
+ # Deduplicate by listing ID
245
+ seen_ids = set()
246
+ unique_listings = []
247
+ for listing in all_listings:
248
+ lid = str(listing.get("_id") or listing.get("mongo_id"))
249
+ if lid not in seen_ids:
250
+ seen_ids.add(lid)
251
+ unique_listings.append(listing)
252
+
253
+ # Sort by distance
254
+ unique_listings.sort(key=lambda x: x.get("_distance_km", 999))
255
+
256
+ reasoning_steps.append({
257
+ "step": "proximity_search",
258
+ "poi_count": len(poi_locations),
259
+ "listings_found": len(unique_listings)
260
+ })
261
+
262
+ return {
263
+ "results": unique_listings[:10],
264
+ "reasoning_steps": reasoning_steps,
265
+ "message": f"Found {len(unique_listings)} listings near {poi_type}s in {location}"
266
+ }
267
+
268
+ else:
269
+ # No POI found, fall back to semantic search with location
270
+ logger.warning("No POI locations found, using semantic search")
271
+ return await self._semantic_search_with_criteria(query, location, criteria)
272
+
273
+ # =========================================================================
274
+ # Handler: Boolean OR Queries
275
+ # =========================================================================
276
+
277
+ async def _handle_boolean_or(
278
+ self,
279
+ query: str,
280
+ context: Dict,
281
+ analysis: QueryAnalysis
282
+ ) -> Dict[str, Any]:
283
+ """
284
+ Handle queries with OR logic.
285
+
286
+ Example: "Under 500k XOF OR (2-bedroom AND has pool)"
287
+
288
+ Steps:
289
+ 1. Parse OR branches
290
+ 2. Execute each branch in parallel
291
+ 3. Union results
292
+ """
293
+ reasoning_steps = []
294
+
295
+ # Step 1: Parse OR branches
296
+ parse_prompt = f"""
297
+ Parse this real estate query into separate OR branches:
298
+
299
+ Query: "{query}"
300
+
301
+ Return JSON:
302
+ {{
303
+ "branches": [
304
+ {{
305
+ "description": "human-readable description",
306
+ "criteria": {{
307
+ "location": "city" or null,
308
+ "max_price": number or null,
309
+ "min_price": number or null,
310
+ "bedrooms": number or null,
311
+ "amenities": ["list"] or [],
312
+ "listing_type": "rent" or "sale" or null
313
+ }}
314
+ }}
315
+ ],
316
+ "shared_criteria": {{
317
+ // Criteria that apply to ALL branches (e.g., location)
318
+ "location": "city" or null
319
+ }}
320
+ }}
321
+
322
+ Example for "Under 500k OR (2-bed AND pool) in Cotonou":
323
+ {{
324
+ "branches": [
325
+ {{"description": "Under 500k", "criteria": {{"max_price": 500000}}}},
326
+ {{"description": "2-bed with pool", "criteria": {{"bedrooms": 2, "amenities": ["pool"]}}}}
327
+ ],
328
+ "shared_criteria": {{"location": "Cotonou"}}
329
+ }}
330
+ """
331
+ self.call_count += 1
332
+ parse_response = await self.llm.ainvoke([
333
+ HumanMessage(content=parse_prompt)
334
+ ])
335
+
336
+ try:
337
+ parsed = self._extract_json(parse_response.content)
338
+ except Exception:
339
+ logger.error("Failed to parse OR branches")
340
+ return await self._handle_simple(query, context, analysis)
341
+
342
+ branches = parsed.get("branches", [])
343
+ shared = parsed.get("shared_criteria", {})
344
+
345
+ reasoning_steps.append({
346
+ "step": "parse_or_branches",
347
+ "branch_count": len(branches),
348
+ "shared_criteria": shared
349
+ })
350
+
351
+ # Step 2: Execute each branch in parallel
352
+ async def execute_branch(branch: Dict) -> List[Dict]:
353
+ criteria = {**shared, **branch.get("criteria", {})}
354
+ return await self._execute_criteria_search(criteria)
355
+
356
+ branch_results = await asyncio.gather(
357
+ *[execute_branch(b) for b in branches]
358
+ )
359
+
360
+ # Step 3: Union results (deduplicate)
361
+ seen_ids = set()
362
+ union_results = []
363
+ for i, results in enumerate(branch_results):
364
+ reasoning_steps.append({
365
+ "step": f"branch_{i+1}",
366
+ "description": branches[i].get("description"),
367
+ "results_count": len(results)
368
+ })
369
+ for listing in results:
370
+ lid = str(listing.get("_id") or listing.get("mongo_id"))
371
+ if lid not in seen_ids:
372
+ seen_ids.add(lid)
373
+ listing["_matched_branch"] = branches[i].get("description")
374
+ union_results.append(listing)
375
+
376
+ reasoning_steps.append({
377
+ "step": "union",
378
+ "total_unique": len(union_results)
379
+ })
380
+
381
+ return {
382
+ "results": union_results[:10],
383
+ "reasoning_steps": reasoning_steps,
384
+ "message": f"Found {len(union_results)} listings matching any of {len(branches)} criteria"
385
+ }
386
+
387
+ # =========================================================================
388
+ # Handler: Comparative Queries
389
+ # =========================================================================
390
+
391
+ async def _handle_comparative(
392
+ self,
393
+ query: str,
394
+ context: Dict,
395
+ analysis: QueryAnalysis
396
+ ) -> Dict[str, Any]:
397
+ """
398
+ Handle comparative queries.
399
+
400
+ Example: "Compare average prices in Cotonou vs Calavi"
401
+
402
+ Steps:
403
+ 1. Extract comparison subjects and metrics
404
+ 2. Search each subject
405
+ 3. Calculate and compare metrics
406
+ """
407
+ reasoning_steps = []
408
+
409
+ # Step 1: Parse comparison
410
+ compare_prompt = f"""
411
+ Parse this comparative real estate query:
412
+
413
+ Query: "{query}"
414
+
415
+ Return JSON:
416
+ {{
417
+ "subjects": [
418
+ {{"name": "Cotonou", "type": "location"}},
419
+ {{"name": "Calavi", "type": "location"}}
420
+ ],
421
+ "metric": "average_price" or "count" or "price_range",
422
+ "listing_criteria": {{
423
+ "bedrooms": number or null,
424
+ "listing_type": "rent" or "sale" or null
425
+ }}
426
+ }}
427
+ """
428
+ self.call_count += 1
429
+ compare_response = await self.llm.ainvoke([
430
+ HumanMessage(content=compare_prompt)
431
+ ])
432
+
433
+ try:
434
+ comparison = self._extract_json(compare_response.content)
435
+ except Exception:
436
+ return await self._handle_simple(query, context, analysis)
437
+
438
+ subjects = comparison.get("subjects", [])
439
+ metric = comparison.get("metric", "average_price")
440
+ criteria = comparison.get("listing_criteria", {})
441
+
442
+ reasoning_steps.append({
443
+ "step": "parse_comparison",
444
+ "subjects": [s["name"] for s in subjects],
445
+ "metric": metric
446
+ })
447
+
448
+ # Step 2: Search each subject
449
+ subject_results = []
450
+ for subject in subjects:
451
+ search_criteria = {**criteria, "location": subject["name"]}
452
+ listings = await self._execute_criteria_search(search_criteria, limit=50)
453
+ subject_results.append({
454
+ "name": subject["name"],
455
+ "listings": listings,
456
+ "count": len(listings)
457
+ })
458
+
459
+ # Step 3: Calculate metrics
460
+ for result in subject_results:
461
+ listings = result["listings"]
462
+ if listings:
463
+ prices = [l.get("price", 0) for l in listings if l.get("price")]
464
+ result["avg_price"] = sum(prices) / len(prices) if prices else 0
465
+ result["min_price"] = min(prices) if prices else 0
466
+ result["max_price"] = max(prices) if prices else 0
467
+ else:
468
+ result["avg_price"] = 0
469
+ result["min_price"] = 0
470
+ result["max_price"] = 0
471
+
472
+ reasoning_steps.append({
473
+ "step": f"metrics_{result['name']}",
474
+ "count": result["count"],
475
+ "avg_price": result["avg_price"]
476
+ })
477
+
478
+ # Step 4: Generate comparison summary
479
+ summary = await self._generate_comparison_summary(subject_results, metric)
480
+
481
+ # Return top listings from each subject
482
+ combined_results = []
483
+ for result in subject_results:
484
+ for listing in result["listings"][:5]:
485
+ listing["_comparison_group"] = result["name"]
486
+ combined_results.append(listing)
487
+
488
+ return {
489
+ "results": combined_results[:10],
490
+ "reasoning_steps": reasoning_steps,
491
+ "comparison_data": subject_results,
492
+ "message": summary
493
+ }
494
+
495
+ # =========================================================================
496
+ # Handler: Aggregation Queries
497
+ # =========================================================================
498
+
499
+ async def _handle_aggregation(
500
+ self,
501
+ query: str,
502
+ context: Dict,
503
+ analysis: QueryAnalysis
504
+ ) -> Dict[str, Any]:
505
+ """
506
+ Handle aggregation queries (average, count, etc.)
507
+ """
508
+ reasoning_steps = []
509
+
510
+ # Parse aggregation request
511
+ agg_prompt = f"""
512
+ Parse this aggregation query:
513
+
514
+ Query: "{query}"
515
+
516
+ Return JSON:
517
+ {{
518
+ "aggregation_type": "average" or "count" or "sum" or "min" or "max",
519
+ "field": "price" or "bedrooms",
520
+ "filters": {{
521
+ "location": "city" or null,
522
+ "listing_type": "rent" or "sale" or null
523
+ }}
524
+ }}
525
+ """
526
+ self.call_count += 1
527
+ agg_response = await self.llm.ainvoke([
528
+ HumanMessage(content=agg_prompt)
529
+ ])
530
+
531
+ try:
532
+ aggregation = self._extract_json(agg_response.content)
533
+ except Exception:
534
+ return await self._handle_simple(query, context, analysis)
535
+
536
+ agg_type = aggregation.get("aggregation_type", "count")
537
+ field = aggregation.get("field", "price")
538
+ filters = aggregation.get("filters", {})
539
+
540
+ # Fetch listings
541
+ listings = await self._execute_criteria_search(filters, limit=100)
542
+
543
+ # Calculate aggregation
544
+ values = [l.get(field, 0) for l in listings if l.get(field) is not None]
545
+
546
+ result = 0
547
+ if agg_type == "count":
548
+ result = len(listings)
549
+ elif agg_type == "average" and values:
550
+ result = sum(values) / len(values)
551
+ elif agg_type == "sum":
552
+ result = sum(values)
553
+ elif agg_type == "min" and values:
554
+ result = min(values)
555
+ elif agg_type == "max" and values:
556
+ result = max(values)
557
+
558
+ reasoning_steps.append({
559
+ "step": "aggregation",
560
+ "type": agg_type,
561
+ "field": field,
562
+ "sample_size": len(listings),
563
+ "result": result
564
+ })
565
+
566
+ location = filters.get("location", "all areas")
567
+ message = f"The {agg_type} {field} in {location} is {result:,.0f}"
568
+
569
+ return {
570
+ "results": listings[:10],
571
+ "reasoning_steps": reasoning_steps,
572
+ "aggregation_result": {
573
+ "type": agg_type,
574
+ "field": field,
575
+ "value": result,
576
+ "sample_size": len(listings)
577
+ },
578
+ "message": message
579
+ }
580
+
581
+ # =========================================================================
582
+ # Handler: Multi-Factor Queries
583
+ # =========================================================================
584
+
585
+ async def _handle_multi_factor(
586
+ self,
587
+ query: str,
588
+ context: Dict,
589
+ analysis: QueryAnalysis
590
+ ) -> Dict[str, Any]:
591
+ """
592
+ Handle multi-factor ranking queries.
593
+
594
+ Example: "Best family apartment near schools and parks, safe area"
595
+
596
+ Steps:
597
+ 1. Extract ranking factors
598
+ 2. Score each factor
599
+ 3. Combine scores with weights
600
+ """
601
+ reasoning_steps = []
602
+
603
+ # Parse factors
604
+ factor_prompt = f"""
605
+ Extract ranking factors from this query:
606
+
607
+ Query: "{query}"
608
+
609
+ Return JSON:
610
+ {{
611
+ "location": "city" or null,
612
+ "base_criteria": {{
613
+ "bedrooms": number or null,
614
+ "max_price": number or null
615
+ }},
616
+ "ranking_factors": [
617
+ {{"factor": "school_proximity", "weight": 0.3}},
618
+ {{"factor": "park_proximity", "weight": 0.2}},
619
+ {{"factor": "safety", "weight": 0.3}},
620
+ {{"factor": "family_friendly", "weight": 0.2}}
621
+ ]
622
+ }}
623
+
624
+ Available factors:
625
+ - school_proximity: Near schools
626
+ - park_proximity: Near parks
627
+ - beach_proximity: Near beach
628
+ - safety: Safe neighborhood
629
+ - family_friendly: Family-friendly amenities
630
+ - luxury: Luxury amenities
631
+ - modern: Modern/renovated
632
+ - quiet: Quiet/peaceful area
633
+ """
634
+ self.call_count += 1
635
+ factor_response = await self.llm.ainvoke([
636
+ HumanMessage(content=factor_prompt)
637
+ ])
638
+
639
+ try:
640
+ factors = self._extract_json(factor_response.content)
641
+ except Exception:
642
+ return await self._handle_simple(query, context, analysis)
643
+
644
+ location = factors.get("location")
645
+ base_criteria = factors.get("base_criteria", {})
646
+ ranking_factors = factors.get("ranking_factors", [])
647
+
648
+ reasoning_steps.append({
649
+ "step": "extract_factors",
650
+ "location": location,
651
+ "factor_count": len(ranking_factors)
652
+ })
653
+
654
+ # Get base listings
655
+ search_criteria = {**base_criteria}
656
+ if location:
657
+ search_criteria["location"] = location
658
+
659
+ listings = await self._execute_criteria_search(search_criteria, limit=30)
660
+
661
+ if not listings:
662
+ return {
663
+ "results": [],
664
+ "reasoning_steps": reasoning_steps,
665
+ "message": f"No listings found in {location}"
666
+ }
667
+
668
+ # Score each listing on each factor
669
+ for listing in listings:
670
+ total_score = 0
671
+ factor_scores = {}
672
+
673
+ for factor_info in ranking_factors:
674
+ factor = factor_info["factor"]
675
+ weight = factor_info.get("weight", 0.25)
676
+
677
+ score = await self._score_factor(listing, factor, location)
678
+ factor_scores[factor] = score
679
+ total_score += score * weight
680
+
681
+ listing["_factor_scores"] = factor_scores
682
+ listing["_total_score"] = total_score
683
+
684
+ # Sort by total score
685
+ listings.sort(key=lambda x: x.get("_total_score", 0), reverse=True)
686
+
687
+ reasoning_steps.append({
688
+ "step": "scoring",
689
+ "listings_scored": len(listings),
690
+ "top_score": listings[0].get("_total_score") if listings else 0
691
+ })
692
+
693
+ return {
694
+ "results": listings[:10],
695
+ "reasoning_steps": reasoning_steps,
696
+ "message": f"Found {len(listings)} listings ranked by {len(ranking_factors)} factors"
697
+ }
698
+
699
+ # =========================================================================
700
+ # Handler: Simple Queries (Fallback)
701
+ # =========================================================================
702
+
703
+ async def _handle_simple(
704
+ self,
705
+ query: str,
706
+ context: Dict,
707
+ analysis: QueryAnalysis
708
+ ) -> Dict[str, Any]:
709
+ """
710
+ Fallback handler for simple queries - uses existing hybrid search.
711
+ """
712
+ from app.ai.services.search_service import hybrid_search
713
+ from app.ai.services.search_extractor import extract_search_params
714
+
715
+ params = await extract_search_params(query)
716
+ results = await hybrid_search(
717
+ query_text=query,
718
+ search_params=params,
719
+ limit=10
720
+ )
721
+
722
+ return {
723
+ "results": results,
724
+ "reasoning_steps": [{"step": "simple_search", "params": params}],
725
+ "message": f"Found {len(results)} listings"
726
+ }
727
+
728
+ # =========================================================================
729
+ # Helper Methods
730
+ # =========================================================================
731
+
732
+ async def _find_poi_locations(
733
+ self,
734
+ poi_type: str,
735
+ location: str,
736
+ specific_name: Optional[str] = None
737
+ ) -> List[Dict]:
738
+ """
739
+ Find POI (Point of Interest) locations using OpenStreetMap.
740
+
741
+ Uses FREE OpenStreetMap APIs:
742
+ - Nominatim: Geocoding (location name β†’ coordinates)
743
+ - Overpass: POI search (find schools, hospitals, parks near a location)
744
+
745
+ Args:
746
+ poi_type: Type of POI (school, hospital, beach, park, etc.)
747
+ location: City/area name (e.g., "Cotonou, Benin")
748
+ specific_name: Optional specific POI name to search for
749
+
750
+ Returns:
751
+ List of POI dicts with name, lat, lon, type
752
+ """
753
+ try:
754
+ from app.ai.services.osm_poi_service import find_pois
755
+
756
+ # Use OSM to get real POI locations
757
+ search_type = specific_name if specific_name else poi_type
758
+
759
+ logger.info(
760
+ "Finding POIs via OpenStreetMap",
761
+ poi_type=search_type,
762
+ location=location
763
+ )
764
+
765
+ pois = await find_pois(
766
+ poi_type=search_type,
767
+ location=location,
768
+ radius_km=5, # Search within 5km of location center
769
+ limit=5 # Get top 5 POIs
770
+ )
771
+
772
+ if pois:
773
+ logger.info(
774
+ "OSM POIs found",
775
+ count=len(pois),
776
+ poi_type=poi_type,
777
+ location=location
778
+ )
779
+ return pois
780
+
781
+ logger.warning(
782
+ "No OSM POIs found, trying with broader search",
783
+ poi_type=poi_type,
784
+ location=location
785
+ )
786
+
787
+ # Try with just the POI type if specific name returned nothing
788
+ if specific_name:
789
+ pois = await find_pois(
790
+ poi_type=poi_type,
791
+ location=location,
792
+ radius_km=10, # Expand radius
793
+ limit=5
794
+ )
795
+ if pois:
796
+ return pois
797
+
798
+ except ImportError:
799
+ logger.error("OSM POI service not available")
800
+ except Exception as e:
801
+ logger.error(f"OSM POI search failed: {e}")
802
+
803
+ # Fallback: Use LLM to estimate coordinates (less accurate)
804
+ logger.warning("Falling back to LLM POI estimation")
805
+ return await self._fallback_llm_poi_locations(poi_type, location, specific_name)
806
+
807
+ async def _fallback_llm_poi_locations(
808
+ self,
809
+ poi_type: str,
810
+ location: str,
811
+ specific_name: Optional[str] = None
812
+ ) -> List[Dict]:
813
+ """
814
+ Fallback: Use LLM to estimate POI coordinates when OSM fails.
815
+
816
+ Note: This is less accurate than OSM data and should only be used as fallback.
817
+ """
818
+ poi_prompt = f"""
819
+ You are a geolocation assistant. Provide approximate coordinates for {poi_type}s in {location}.
820
+
821
+ {f"Specifically looking for: {specific_name}" if specific_name else ""}
822
+
823
+ Return JSON array of up to 3 POIs:
824
+ [
825
+ {{"name": "POI name", "lat": 6.3654, "lon": 2.4183, "type": "{poi_type}"}}
826
+ ]
827
+
828
+ Use realistic coordinates for {location}. If you don't know exact coordinates,
829
+ provide approximate city center coordinates for {location}.
830
+ """
831
+ self.call_count += 1
832
+ poi_response = await self.llm.ainvoke([
833
+ HumanMessage(content=poi_prompt)
834
+ ])
835
+
836
+ try:
837
+ pois = self._extract_json(poi_response.content)
838
+ return pois if isinstance(pois, list) else []
839
+ except Exception:
840
+ logger.error("Failed to get POI locations from LLM fallback")
841
+ return []
842
+
843
+ async def _search_near_coordinates(
844
+ self,
845
+ lat: float,
846
+ lon: float,
847
+ radius_km: float,
848
+ criteria: Dict,
849
+ location: str
850
+ ) -> List[Dict]:
851
+ """
852
+ Search listings near specific coordinates.
853
+
854
+ Uses MongoDB geospatial query if listings have lat/lon,
855
+ otherwise falls back to location-based search.
856
+ """
857
+ from app.database import get_db
858
+
859
+ try:
860
+ db = await get_db()
861
+
862
+ # Build geo query
863
+ # Note: This requires a 2dsphere index on listings collection
864
+ # db.listings.create_index([("location_geo", "2dsphere")])
865
+
866
+ geo_query = {
867
+ "status": "active"
868
+ }
869
+
870
+ # Add criteria filters
871
+ if criteria.get("bedrooms"):
872
+ geo_query["bedrooms"] = {"$gte": criteria["bedrooms"]}
873
+ if criteria.get("max_price"):
874
+ geo_query["price"] = {"$lte": criteria["max_price"]}
875
+ if criteria.get("min_price"):
876
+ if "price" in geo_query:
877
+ geo_query["price"]["$gte"] = criteria["min_price"]
878
+ else:
879
+ geo_query["price"] = {"$gte": criteria["min_price"]}
880
+ if criteria.get("listing_type"):
881
+ geo_query["listing_type"] = {"$regex": criteria["listing_type"], "$options": "i"}
882
+
883
+ # Try geospatial query first
884
+ if lat and lon:
885
+ # Convert km to meters for MongoDB
886
+ radius_meters = radius_km * 1000
887
+
888
+ geo_query["$or"] = [
889
+ # Check if listing has coordinates
890
+ {
891
+ "latitude": {"$exists": True, "$ne": None},
892
+ "longitude": {"$exists": True, "$ne": None}
893
+ }
894
+ ]
895
+
896
+ # Fetch listings and filter by distance in Python
897
+ # (More flexible than requiring 2dsphere index)
898
+ cursor = db.listings.find(geo_query).limit(50)
899
+ listings = await cursor.to_list(length=50)
900
+
901
+ # Filter by distance
902
+ nearby = []
903
+ for listing in listings:
904
+ if listing.get("latitude") and listing.get("longitude"):
905
+ dist = self._calculate_distance(
906
+ lat, lon,
907
+ listing["latitude"], listing["longitude"]
908
+ )
909
+ if dist <= radius_km:
910
+ listing["_id"] = str(listing["_id"])
911
+ nearby.append(listing)
912
+ # Also include listings in the same location (fallback)
913
+ elif location and location.lower() in str(listing.get("location", "")).lower():
914
+ listing["_id"] = str(listing["_id"])
915
+ nearby.append(listing)
916
+
917
+ return nearby
918
+
919
+ else:
920
+ # No coordinates, search by location name
921
+ if location:
922
+ geo_query["location"] = {"$regex": location, "$options": "i"}
923
+
924
+ cursor = db.listings.find(geo_query).limit(20)
925
+ listings = await cursor.to_list(length=20)
926
+
927
+ for listing in listings:
928
+ listing["_id"] = str(listing["_id"])
929
+
930
+ return listings
931
+
932
+ except Exception as e:
933
+ logger.error("Geo search failed", error=str(e))
934
+ return []
935
+
936
+ async def _execute_criteria_search(
937
+ self,
938
+ criteria: Dict,
939
+ limit: int = 20
940
+ ) -> List[Dict]:
941
+ """
942
+ Execute a search with given criteria using existing search infrastructure.
943
+ """
944
+ from app.ai.services.search_service import search_mongodb
945
+
946
+ results = await search_mongodb(criteria, limit=limit)
947
+ return results
948
+
949
+ async def _semantic_search_with_criteria(
950
+ self,
951
+ query: str,
952
+ location: str,
953
+ criteria: Dict
954
+ ) -> Dict[str, Any]:
955
+ """
956
+ Semantic search with additional criteria.
957
+ """
958
+ from app.ai.services.search_service import hybrid_search
959
+
960
+ search_params = {**criteria}
961
+ if location:
962
+ search_params["location"] = location
963
+
964
+ results = await hybrid_search(
965
+ query_text=query,
966
+ search_params=search_params,
967
+ limit=10
968
+ )
969
+
970
+ return {
971
+ "results": results,
972
+ "reasoning_steps": [{"step": "semantic_fallback"}],
973
+ "message": f"Found {len(results)} listings in {location}"
974
+ }
975
+
976
+ async def _generate_comparison_summary(
977
+ self,
978
+ subject_results: List[Dict],
979
+ metric: str
980
+ ) -> str:
981
+ """
982
+ Generate a natural language comparison summary.
983
+ """
984
+ if len(subject_results) < 2:
985
+ return "Not enough data for comparison"
986
+
987
+ s1, s2 = subject_results[0], subject_results[1]
988
+
989
+ if metric == "average_price":
990
+ diff = abs(s1["avg_price"] - s2["avg_price"])
991
+ cheaper = s1["name"] if s1["avg_price"] < s2["avg_price"] else s2["name"]
992
+ pct = (diff / max(s1["avg_price"], s2["avg_price"]) * 100) if max(s1["avg_price"], s2["avg_price"]) > 0 else 0
993
+
994
+ return (
995
+ f"Average prices: {s1['name']}: {s1['avg_price']:,.0f} XOF | "
996
+ f"{s2['name']}: {s2['avg_price']:,.0f} XOF. "
997
+ f"{cheaper} is {pct:.0f}% cheaper."
998
+ )
999
+ else:
1000
+ return f"Comparison: {s1['name']} ({s1['count']} listings) vs {s2['name']} ({s2['count']} listings)"
1001
+
1002
+ async def _score_factor(
1003
+ self,
1004
+ listing: Dict,
1005
+ factor: str,
1006
+ location: str
1007
+ ) -> float:
1008
+ """
1009
+ Score a listing on a specific factor (0-1).
1010
+
1011
+ Uses:
1012
+ - OpenStreetMap for proximity calculations (school_proximity, park_proximity, etc.)
1013
+ - Text analysis for non-proximity factors (safety, luxury, modern, etc.)
1014
+ """
1015
+ # Proximity factors - use OSM for actual distance calculation
1016
+ proximity_factors = {
1017
+ "school_proximity": "school",
1018
+ "park_proximity": "park",
1019
+ "beach_proximity": "beach",
1020
+ "hospital_proximity": "hospital",
1021
+ "market_proximity": "market",
1022
+ }
1023
+
1024
+ # Check if this is a proximity factor and listing has coordinates
1025
+ if factor in proximity_factors and listing.get("latitude") and listing.get("longitude"):
1026
+ poi_type = proximity_factors[factor]
1027
+ return await self._score_proximity_factor(
1028
+ listing=listing,
1029
+ poi_type=poi_type,
1030
+ location=location
1031
+ )
1032
+
1033
+ # Non-proximity factors - use text analysis
1034
+ score = 0.5 # Default neutral score
1035
+
1036
+ title = str(listing.get("title", "")).lower()
1037
+ description = str(listing.get("description", "")).lower()
1038
+ amenities = [a.lower() for a in listing.get("amenities", [])]
1039
+ text = f"{title} {description} {' '.join(amenities)}"
1040
+
1041
+ factor_keywords = {
1042
+ "school_proximity": ["school", "Γ©cole", "university", "campus", "education"],
1043
+ "park_proximity": ["park", "garden", "parc", "jardin", "green"],
1044
+ "beach_proximity": ["beach", "plage", "ocean", "sea", "waterfront"],
1045
+ "safety": ["safe", "secure", "security", "sΓ©curitΓ©", "gated", "guard"],
1046
+ "family_friendly": ["family", "children", "kids", "playground", "familial"],
1047
+ "luxury": ["luxury", "luxe", "premium", "high-end", "prestige", "elegant"],
1048
+ "modern": ["modern", "new", "renovated", "contemporary", "neuf"],
1049
+ "quiet": ["quiet", "peaceful", "calm", "tranquil", "calme"],
1050
+ }
1051
+
1052
+ keywords = factor_keywords.get(factor, [])
1053
+ matches = sum(1 for kw in keywords if kw in text)
1054
+
1055
+ if matches > 0:
1056
+ score = min(0.5 + (matches * 0.2), 1.0)
1057
+
1058
+ return score
1059
+
1060
+ async def _score_proximity_factor(
1061
+ self,
1062
+ listing: Dict,
1063
+ poi_type: str,
1064
+ location: str
1065
+ ) -> float:
1066
+ """
1067
+ Score a listing based on actual proximity to POIs using OpenStreetMap.
1068
+
1069
+ Scoring:
1070
+ - < 0.5 km: 1.0 (excellent)
1071
+ - 0.5 - 1 km: 0.9 (very good)
1072
+ - 1 - 2 km: 0.75 (good)
1073
+ - 2 - 3 km: 0.5 (average)
1074
+ - 3 - 5 km: 0.3 (below average)
1075
+ - > 5 km: 0.1 (poor)
1076
+ """
1077
+ try:
1078
+ from app.ai.services.osm_poi_service import find_pois_overpass
1079
+
1080
+ listing_lat = listing.get("latitude")
1081
+ listing_lon = listing.get("longitude")
1082
+
1083
+ if not listing_lat or not listing_lon:
1084
+ return 0.5 # No coordinates, return neutral score
1085
+
1086
+ # Find nearby POIs
1087
+ pois = await find_pois_overpass(
1088
+ poi_type=poi_type,
1089
+ center_lat=listing_lat,
1090
+ center_lon=listing_lon,
1091
+ radius_km=5
1092
+ )
1093
+
1094
+ if not pois:
1095
+ return 0.3 # No POIs found nearby
1096
+
1097
+ # Find closest POI
1098
+ min_distance = float('inf')
1099
+ for poi in pois:
1100
+ dist = self._calculate_distance(
1101
+ listing_lat, listing_lon,
1102
+ poi.get("lat"), poi.get("lon")
1103
+ )
1104
+ min_distance = min(min_distance, dist)
1105
+
1106
+ # Score based on distance
1107
+ if min_distance < 0.5:
1108
+ return 1.0
1109
+ elif min_distance < 1:
1110
+ return 0.9
1111
+ elif min_distance < 2:
1112
+ return 0.75
1113
+ elif min_distance < 3:
1114
+ return 0.5
1115
+ elif min_distance < 5:
1116
+ return 0.3
1117
+ else:
1118
+ return 0.1
1119
+
1120
+ except Exception as e:
1121
+ logger.error(f"Proximity scoring failed: {e}")
1122
+ return 0.5 # Return neutral on error
1123
+
1124
+ def _calculate_distance(
1125
+ self,
1126
+ lat1: float,
1127
+ lon1: float,
1128
+ lat2: Optional[float],
1129
+ lon2: Optional[float]
1130
+ ) -> float:
1131
+ """
1132
+ Calculate distance between two points using Haversine formula.
1133
+ Returns distance in kilometers.
1134
+ """
1135
+ import math
1136
+
1137
+ if lat2 is None or lon2 is None:
1138
+ return 999.0 # Return large distance for missing coordinates
1139
+
1140
+ R = 6371 # Earth's radius in km
1141
+
1142
+ lat1_rad = math.radians(lat1)
1143
+ lat2_rad = math.radians(lat2)
1144
+ delta_lat = math.radians(lat2 - lat1)
1145
+ delta_lon = math.radians(lon2 - lon1)
1146
+
1147
+ a = (math.sin(delta_lat/2)**2 +
1148
+ math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon/2)**2)
1149
+ c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
1150
+
1151
+ return R * c
1152
+
1153
+ def _extract_json(self, text: str) -> Any:
1154
+ """
1155
+ Extract JSON from LLM response text.
1156
+ """
1157
+ import re
1158
+
1159
+ # Try to find JSON in the response
1160
+ json_match = re.search(r'[\[{][\s\S]*[\]}]', text)
1161
+ if json_match:
1162
+ return json.loads(json_match.group())
1163
+ raise ValueError("No JSON found in response")
1164
+
1165
+
1166
+ # =============================================================================
1167
+ # Singleton Instance
1168
+ # =============================================================================
1169
+
1170
+ _rlm_agent: Optional[RLMSearchAgent] = None
1171
+
1172
+
1173
+ def get_rlm_agent() -> RLMSearchAgent:
1174
+ """Get or create the singleton RLM agent."""
1175
+ global _rlm_agent
1176
+ if _rlm_agent is None:
1177
+ _rlm_agent = RLMSearchAgent()
1178
+ return _rlm_agent
1179
+
1180
+
1181
+ # =============================================================================
1182
+ # Convenience Function
1183
+ # =============================================================================
1184
+
1185
+ async def rlm_search(query: str, context: Optional[Dict] = None) -> Dict[str, Any]:
1186
+ """
1187
+ Convenience function for RLM search.
1188
+
1189
+ Usage:
1190
+ from app.ai.services.rlm_search_service import rlm_search
1191
+
1192
+ results = await rlm_search("3-bed near schools in Cotonou")
1193
+ """
1194
+ agent = get_rlm_agent()
1195
+ return await agent.search(query, context)
1196
+
1197
+
1198
+ __all__ = [
1199
+ "RLMSearchAgent",
1200
+ "get_rlm_agent",
1201
+ "rlm_search"
1202
+ ]
app/ai/services/search_strategy_selector.py CHANGED
@@ -7,6 +7,13 @@ Strategies:
7
  - QDRANT_ONLY: Pure semantic search (vague/descriptive queries)
8
  - MONGO_THEN_QDRANT: Filter by location/price in MongoDB, then semantic search within results
9
  - QDRANT_THEN_MONGO: Semantic search first, then apply MongoDB filters
 
 
 
 
 
 
 
10
  """
11
 
12
  import logging
@@ -22,11 +29,19 @@ logger = logging.getLogger(__name__)
22
 
23
  class SearchStrategy(str, Enum):
24
  """Available search strategies"""
 
25
  MONGO_ONLY = "MONGO_ONLY"
26
  QDRANT_ONLY = "QDRANT_ONLY"
27
  MONGO_THEN_QDRANT = "MONGO_THEN_QDRANT"
28
  QDRANT_THEN_MONGO = "QDRANT_THEN_MONGO"
29
 
 
 
 
 
 
 
 
30
 
31
  # LLM for strategy selection
32
  llm = ChatOpenAI(
@@ -81,19 +96,67 @@ Return ONLY valid JSON:
81
  async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
82
  """
83
  Select optimal search strategy based on query and extracted parameters.
84
-
 
 
 
 
85
  Args:
86
  user_query: Original user query
87
  search_params: Extracted search parameters
88
-
89
  Returns:
90
  Dict with:
91
  - strategy: SearchStrategy enum value
92
  - reasoning: str
93
  - has_semantic_features: bool
94
  - has_structured_filters: bool
 
95
  """
96
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  # Quick heuristics for obvious cases
98
  has_location = bool(search_params.get("location"))
99
  has_price = bool(search_params.get("min_price") or search_params.get("max_price"))
@@ -101,9 +164,9 @@ async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
101
  has_bathrooms = bool(search_params.get("bathrooms"))
102
  has_listing_type = bool(search_params.get("listing_type"))
103
  has_amenities = bool(search_params.get("amenities") and len(search_params.get("amenities", [])) > 0)
104
-
105
  structured_count = sum([has_location, has_price, has_bedrooms, has_bathrooms, has_listing_type])
106
-
107
  # Detect semantic keywords in query
108
  semantic_keywords = [
109
  "close to", "near", "nearby", "walking distance",
@@ -117,10 +180,9 @@ async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
117
  "good vibes", "nice area", "good neighborhood",
118
  "beach", "school", "market", "downtown", "city center",
119
  ]
120
-
121
- query_lower = user_query.lower()
122
  has_semantic = any(keyword in query_lower for keyword in semantic_keywords)
123
-
124
  # Simple rule-based decision for clear cases
125
  if structured_count >= 2 and not has_semantic and not has_amenities:
126
  # Pure structured query
@@ -128,25 +190,28 @@ async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
128
  "strategy": SearchStrategy.MONGO_ONLY,
129
  "reasoning": "Query has multiple structured filters and no semantic features",
130
  "has_semantic_features": False,
131
- "has_structured_filters": True
 
132
  }
133
-
134
  if structured_count == 0 and (has_semantic or has_amenities):
135
  # Pure semantic query
136
  return {
137
  "strategy": SearchStrategy.QDRANT_ONLY,
138
  "reasoning": "Query is purely semantic/descriptive with no structured filters",
139
  "has_semantic_features": True,
140
- "has_structured_filters": False
 
141
  }
142
-
143
  if has_location and has_semantic:
144
  # Location + semantic features
145
  return {
146
  "strategy": SearchStrategy.MONGO_THEN_QDRANT,
147
  "reasoning": "Query has location filter and semantic features - filter by location first, then semantic search",
148
  "has_semantic_features": True,
149
- "has_structured_filters": True
 
150
  }
151
 
152
  # Use LLM for complex cases
@@ -171,13 +236,15 @@ async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
171
  "strategy": SearchStrategy.MONGO_ONLY,
172
  "reasoning": "Strategy selection failed, using MongoDB filters",
173
  "has_semantic_features": False,
174
- "has_structured_filters": True
 
175
  }
176
-
177
  result = validation.data
 
178
  logger.info(f"Strategy selected: {result.get('strategy')} - {result.get('reasoning')}")
179
  return result
180
-
181
  except Exception as e:
182
  logger.error(f"Strategy selection error: {e}")
183
  # Default to MONGO_ONLY on error
@@ -185,5 +252,6 @@ async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
185
  "strategy": SearchStrategy.MONGO_ONLY,
186
  "reasoning": "Strategy selection error, defaulting to MongoDB",
187
  "has_semantic_features": False,
188
- "has_structured_filters": True
 
189
  }
 
7
  - QDRANT_ONLY: Pure semantic search (vague/descriptive queries)
8
  - MONGO_THEN_QDRANT: Filter by location/price in MongoDB, then semantic search within results
9
  - QDRANT_THEN_MONGO: Semantic search first, then apply MongoDB filters
10
+
11
+ RLM Strategies (Recursive Language Model):
12
+ - RLM_MULTI_HOP: "near schools", "close to beach" - requires finding POI first
13
+ - RLM_BOOLEAN_OR: "under 500k OR has pool" - complex OR logic
14
+ - RLM_COMPARATIVE: "compare Cotonou vs Calavi" - multi-location comparison
15
+ - RLM_AGGREGATION: "average price", "how many" - data aggregation
16
+ - RLM_MULTI_FACTOR: "best family apartment" - multi-criteria ranking
17
  """
18
 
19
  import logging
 
29
 
30
  class SearchStrategy(str, Enum):
31
  """Available search strategies"""
32
+ # Traditional strategies
33
  MONGO_ONLY = "MONGO_ONLY"
34
  QDRANT_ONLY = "QDRANT_ONLY"
35
  MONGO_THEN_QDRANT = "MONGO_THEN_QDRANT"
36
  QDRANT_THEN_MONGO = "QDRANT_THEN_MONGO"
37
 
38
+ # RLM (Recursive Language Model) strategies
39
+ RLM_MULTI_HOP = "RLM_MULTI_HOP" # "near X", "close to Y"
40
+ RLM_BOOLEAN_OR = "RLM_BOOLEAN_OR" # "X OR Y"
41
+ RLM_COMPARATIVE = "RLM_COMPARATIVE" # "compare A vs B"
42
+ RLM_AGGREGATION = "RLM_AGGREGATION" # "average", "count"
43
+ RLM_MULTI_FACTOR = "RLM_MULTI_FACTOR" # multi-criteria ranking
44
+
45
 
46
  # LLM for strategy selection
47
  llm = ChatOpenAI(
 
96
  async def select_search_strategy(user_query: str, search_params: Dict) -> Dict:
97
  """
98
  Select optimal search strategy based on query and extracted parameters.
99
+
100
+ PRIORITY ORDER:
101
+ 1. Check for RLM-appropriate queries (complex multi-hop, OR, comparative)
102
+ 2. Fall back to traditional strategies for simple queries
103
+
104
  Args:
105
  user_query: Original user query
106
  search_params: Extracted search parameters
107
+
108
  Returns:
109
  Dict with:
110
  - strategy: SearchStrategy enum value
111
  - reasoning: str
112
  - has_semantic_features: bool
113
  - has_structured_filters: bool
114
+ - use_rlm: bool (NEW)
115
  """
116
+ query_lower = user_query.lower()
117
+
118
+ # =========================================================================
119
+ # STEP 1: Check for RLM-appropriate queries FIRST
120
+ # =========================================================================
121
+ try:
122
+ from app.ai.services.rlm_query_analyzer import analyze_query_complexity, QueryComplexity
123
+
124
+ rlm_analysis = analyze_query_complexity(user_query)
125
+
126
+ if rlm_analysis.use_rlm:
127
+ # Map QueryComplexity to SearchStrategy
128
+ rlm_strategy_map = {
129
+ QueryComplexity.MULTI_HOP: SearchStrategy.RLM_MULTI_HOP,
130
+ QueryComplexity.BOOLEAN_OR: SearchStrategy.RLM_BOOLEAN_OR,
131
+ QueryComplexity.COMPARATIVE: SearchStrategy.RLM_COMPARATIVE,
132
+ QueryComplexity.AGGREGATION: SearchStrategy.RLM_AGGREGATION,
133
+ QueryComplexity.MULTI_FACTOR: SearchStrategy.RLM_MULTI_FACTOR,
134
+ }
135
+
136
+ strategy = rlm_strategy_map.get(rlm_analysis.complexity)
137
+ if strategy:
138
+ logger.info(
139
+ f"RLM strategy selected: {strategy.value}",
140
+ query=user_query[:50],
141
+ confidence=rlm_analysis.confidence
142
+ )
143
+ return {
144
+ "strategy": strategy,
145
+ "reasoning": rlm_analysis.reasoning,
146
+ "has_semantic_features": True,
147
+ "has_structured_filters": True,
148
+ "use_rlm": True,
149
+ "rlm_analysis": rlm_analysis.model_dump()
150
+ }
151
+ except ImportError:
152
+ logger.warning("RLM module not available, using traditional strategies")
153
+ except Exception as e:
154
+ logger.error(f"RLM analysis failed: {e}, falling back to traditional")
155
+
156
+ # =========================================================================
157
+ # STEP 2: Traditional strategy selection (for simple queries)
158
+ # =========================================================================
159
+
160
  # Quick heuristics for obvious cases
161
  has_location = bool(search_params.get("location"))
162
  has_price = bool(search_params.get("min_price") or search_params.get("max_price"))
 
164
  has_bathrooms = bool(search_params.get("bathrooms"))
165
  has_listing_type = bool(search_params.get("listing_type"))
166
  has_amenities = bool(search_params.get("amenities") and len(search_params.get("amenities", [])) > 0)
167
+
168
  structured_count = sum([has_location, has_price, has_bedrooms, has_bathrooms, has_listing_type])
169
+
170
  # Detect semantic keywords in query
171
  semantic_keywords = [
172
  "close to", "near", "nearby", "walking distance",
 
180
  "good vibes", "nice area", "good neighborhood",
181
  "beach", "school", "market", "downtown", "city center",
182
  ]
183
+
 
184
  has_semantic = any(keyword in query_lower for keyword in semantic_keywords)
185
+
186
  # Simple rule-based decision for clear cases
187
  if structured_count >= 2 and not has_semantic and not has_amenities:
188
  # Pure structured query
 
190
  "strategy": SearchStrategy.MONGO_ONLY,
191
  "reasoning": "Query has multiple structured filters and no semantic features",
192
  "has_semantic_features": False,
193
+ "has_structured_filters": True,
194
+ "use_rlm": False
195
  }
196
+
197
  if structured_count == 0 and (has_semantic or has_amenities):
198
  # Pure semantic query
199
  return {
200
  "strategy": SearchStrategy.QDRANT_ONLY,
201
  "reasoning": "Query is purely semantic/descriptive with no structured filters",
202
  "has_semantic_features": True,
203
+ "has_structured_filters": False,
204
+ "use_rlm": False
205
  }
206
+
207
  if has_location and has_semantic:
208
  # Location + semantic features
209
  return {
210
  "strategy": SearchStrategy.MONGO_THEN_QDRANT,
211
  "reasoning": "Query has location filter and semantic features - filter by location first, then semantic search",
212
  "has_semantic_features": True,
213
+ "has_structured_filters": True,
214
+ "use_rlm": False
215
  }
216
 
217
  # Use LLM for complex cases
 
236
  "strategy": SearchStrategy.MONGO_ONLY,
237
  "reasoning": "Strategy selection failed, using MongoDB filters",
238
  "has_semantic_features": False,
239
+ "has_structured_filters": True,
240
+ "use_rlm": False
241
  }
242
+
243
  result = validation.data
244
+ result["use_rlm"] = False # LLM-selected strategies are not RLM
245
  logger.info(f"Strategy selected: {result.get('strategy')} - {result.get('reasoning')}")
246
  return result
247
+
248
  except Exception as e:
249
  logger.error(f"Strategy selection error: {e}")
250
  # Default to MONGO_ONLY on error
 
252
  "strategy": SearchStrategy.MONGO_ONLY,
253
  "reasoning": "Strategy selection error, defaulting to MongoDB",
254
  "has_semantic_features": False,
255
+ "has_structured_filters": True,
256
+ "use_rlm": False
257
  }
app/ai/services/vision_service.py DELETED
@@ -1,697 +0,0 @@
1
- # ============================================================
2
- # app/ai/services/vision_service.py
3
- # Vision AI Service for Property Image Analysis
4
- # Uses Hugging Face Inference API (Moondream2 model)
5
- # ============================================================
6
-
7
- import io
8
- import os
9
- import base64
10
- import logging
11
- from typing import Dict, List, Optional, Tuple
12
- from PIL import Image
13
- import requests
14
- import cv2
15
- import numpy as np
16
- import tempfile
17
- from app.config import settings
18
-
19
- logger = logging.getLogger(__name__)
20
-
21
-
22
- class VisionService:
23
- """Service for analyzing property images using HuggingFace Inference API (BLIP - FREE)"""
24
-
25
- def __init__(self):
26
- # BLIP image captioning works with HuggingFace FREE Inference API
27
- # No special providers needed - uses standard inference endpoint
28
- self.hf_token = settings.HF_TOKEN or settings.HUGGINGFACE_API_KEY
29
- self.model_id = settings.HF_VISION_MODEL # Salesforce/blip-image-captioning-large
30
- # Standard HuggingFace Inference API endpoint (works with BLIP!)
31
- self.api_url = f"https://api-inference.huggingface.co/models/{self.model_id}"
32
- self.headers = {
33
- "Authorization": f"Bearer {self.hf_token}",
34
- "Content-Type": "application/json"
35
- }
36
- self.property_confidence_threshold = settings.PROPERTY_IMAGE_MIN_CONFIDENCE
37
- logger.info(f"πŸ”§ Vision Service initialized with HF Inference: {self.model_id}")
38
-
39
- # ============================================================
40
- # Core Image Validation & Analysis
41
- # ============================================================
42
-
43
- def validate_property_image(self, image_bytes: bytes) -> Tuple[bool, float, str]:
44
- """
45
- Validate if image is property-related before uploading
46
-
47
- Args:
48
- image_bytes: Raw image bytes
49
-
50
- Returns:
51
- Tuple of (is_valid, confidence, message)
52
- """
53
- try:
54
- # Check if image is readable
55
- image = Image.open(io.BytesIO(image_bytes))
56
- image_rgb = image.convert("RGB")
57
-
58
- # Query vision model to check if it's a property
59
- payload = {
60
- "inputs": image_rgb,
61
- "question": (
62
- "Is this image a photo of a real property (house, apartment, room, "
63
- "office, land, or commercial building)? Answer only yes or no."
64
- ),
65
- }
66
-
67
- response = self._query_hf_api(payload)
68
-
69
- if not response:
70
- return False, 0.0, "Failed to process image"
71
-
72
- answer = response.strip().lower()
73
- is_property = "yes" in answer or "this is a property" in answer.lower()
74
-
75
- # Assign confidence based on response clarity
76
- confidence = 0.95 if is_property else 0.5
77
-
78
- if is_property:
79
- return (
80
- True,
81
- confidence,
82
- "Property image validated successfully"
83
- )
84
- else:
85
- return (
86
- False,
87
- confidence,
88
- "This doesn't look like a property photo. Please upload images of "
89
- "actual properties (houses, apartments, rooms, offices, or land)."
90
- )
91
-
92
- except Exception as e:
93
- logger.error(f"Error validating property image: {str(e)}")
94
- return False, 0.0, f"Error processing image: {str(e)}"
95
-
96
- # ============================================================
97
- # Property Field Extraction
98
- # ============================================================
99
-
100
- def extract_property_fields(
101
- self,
102
- image_bytes: bytes,
103
- location: Optional[str] = None,
104
- fast_validate: bool = False
105
- ) -> Dict:
106
- """
107
- Extract property listing fields from image
108
-
109
- Args:
110
- image_bytes: Raw image bytes
111
- location: Optional location context (helps with accuracy)
112
- fast_validate: If True, only generate title (skip detailed extraction)
113
- Use when image is complementary to text listing
114
-
115
- Returns:
116
- Dict with extracted fields and confidence scores
117
- """
118
- try:
119
- image = Image.open(io.BytesIO(image_bytes))
120
- image_rgb = image.convert("RGB")
121
-
122
- extracted = {
123
- "bedrooms": None,
124
- "bathrooms": None,
125
- "amenities": [],
126
- "description": "",
127
- "title": "",
128
- "confidence": {}
129
- }
130
-
131
- # ============================================================
132
- # FAST VALIDATE MODE: Only generate title, skip extraction
133
- # Used when user has already provided details via text
134
- # ============================================================
135
-
136
- if fast_validate:
137
- logger.info("πŸš€ Fast validation mode: Generating title only")
138
- title_data = self._generate_title(
139
- image_rgb,
140
- bedrooms=None,
141
- bathrooms=None,
142
- location=location
143
- )
144
- extracted["title"] = title_data.get("title", "Property Image")
145
- extracted["confidence"]["title"] = title_data.get("confidence", 0.8)
146
- extracted["fast_validated"] = True
147
- return extracted
148
-
149
- # ============================================================
150
- # FULL EXTRACTION MODE: Extract all details for new listing
151
- # ============================================================
152
-
153
- # Query 1: Count rooms (bedrooms + bathrooms)
154
- rooms_data = self._extract_room_count(image_rgb)
155
- extracted["bedrooms"] = rooms_data.get("bedrooms")
156
- extracted["bathrooms"] = rooms_data.get("bathrooms")
157
- extracted["confidence"].update({
158
- "bedrooms": rooms_data.get("bedroom_confidence", 0.0),
159
- "bathrooms": rooms_data.get("bathroom_confidence", 0.0)
160
- })
161
-
162
- # Query 2: Detect amenities
163
- amenities_data = self._detect_amenities(image_rgb)
164
- extracted["amenities"] = amenities_data.get("amenities", [])
165
- extracted["confidence"]["amenities"] = amenities_data.get("confidence", 0.0)
166
-
167
- # Query 3: Generate description
168
- description_data = self._generate_description(image_rgb)
169
- extracted["description"] = description_data.get("description", "")
170
- extracted["confidence"]["description"] = description_data.get("confidence", 0.0)
171
-
172
- # Query 4: Generate SHORT title (max 2 sentences)
173
- title_data = self._generate_title(
174
- image_rgb,
175
- bedrooms=extracted.get("bedrooms"),
176
- bathrooms=extracted.get("bathrooms"),
177
- location=location
178
- )
179
- extracted["title"] = title_data.get("title", "")
180
- extracted["confidence"]["title"] = title_data.get("confidence", 0.0)
181
-
182
- return extracted
183
-
184
- except Exception as e:
185
- logger.error(f"Error extracting property fields: {str(e)}")
186
- return {
187
- "bedrooms": None,
188
- "bathrooms": None,
189
- "amenities": [],
190
- "description": "",
191
- "title": "",
192
- "confidence": {},
193
- "error": str(e)
194
- }
195
-
196
- # ============================================================
197
- # Specific Field Extraction Methods
198
- # ============================================================
199
-
200
- def _extract_room_count(self, image: Image.Image) -> Dict:
201
- """
202
- Extract bedroom and bathroom count (matches listing schema)
203
-
204
- Returns bedrooms and bathrooms as integers (not property_type)
205
- """
206
- try:
207
- payload = {
208
- "inputs": image,
209
- "question": (
210
- "Count the number of bedrooms and bathrooms you can see in this property photo. "
211
- "Only count what you can clearly identify. "
212
- "Format: bedrooms: [number], bathrooms: [number]"
213
- ),
214
- }
215
-
216
- response = self._query_hf_api(payload)
217
-
218
- bedrooms = None
219
- bathrooms = None
220
- bedroom_conf = 0.0
221
- bathroom_conf = 0.0
222
-
223
- if response:
224
- response_lower = response.lower()
225
-
226
- # Extract bedrooms
227
- if "bedrooms:" in response_lower or "bedroom:" in response_lower:
228
- try:
229
- # Handle both "bedrooms:" and "bedroom:"
230
- if "bedrooms:" in response_lower:
231
- bed_str = response_lower.split("bedrooms:")[1].split(",")[0].strip()
232
- else:
233
- bed_str = response_lower.split("bedroom:")[1].split(",")[0].strip()
234
-
235
- # Extract first number found
236
- numbers = ''.join(filter(str.isdigit, bed_str))
237
- if numbers:
238
- bedrooms = int(numbers)
239
- bedroom_conf = 0.80 # Good confidence if extracted
240
- except Exception as e:
241
- logger.debug(f"Failed to parse bedrooms: {e}")
242
- bedroom_conf = 0.2
243
-
244
- # Extract bathrooms
245
- if "bathrooms:" in response_lower or "bathroom:" in response_lower:
246
- try:
247
- if "bathrooms:" in response_lower:
248
- bath_str = response_lower.split("bathrooms:")[1].strip()
249
- else:
250
- bath_str = response_lower.split("bathroom:")[1].strip()
251
-
252
- numbers = ''.join(filter(str.isdigit, bath_str))
253
- if numbers:
254
- bathrooms = int(numbers)
255
- bathroom_conf = 0.80
256
- except Exception as e:
257
- logger.debug(f"Failed to parse bathrooms: {e}")
258
- bathroom_conf = 0.2
259
-
260
- return {
261
- "bedrooms": bedrooms,
262
- "bathrooms": bathrooms,
263
- "bedroom_confidence": bedroom_conf,
264
- "bathroom_confidence": bathroom_conf
265
- }
266
-
267
- except Exception as e:
268
- logger.error(f"Error extracting room count: {str(e)}")
269
- return {
270
- "bedrooms": None,
271
- "bathrooms": None,
272
- "bedroom_confidence": 0.0,
273
- "bathroom_confidence": 0.0
274
- }
275
-
276
- def _detect_amenities(self, image: Image.Image) -> Dict:
277
- """
278
- Detect amenities visible in property (matches listing schema)
279
-
280
- Amenities is a simple list of strings in the listing model.
281
- Common amenities: balcony, pool, parking, garden, gym, wifi, AC, security
282
- """
283
- try:
284
- payload = {
285
- "inputs": image,
286
- "question": (
287
- "What amenities can you see in this property? "
288
- "List only what is clearly visible. Examples: "
289
- "balcony, pool, parking, garden, gym, wifi router, AC unit, security gate, "
290
- "furnished, modern kitchen, etc. "
291
- "If nothing special, say 'none'."
292
- ),
293
- }
294
-
295
- response = self._query_hf_api(payload)
296
- amenities = []
297
- confidence = 0.5
298
-
299
- if response and response.lower().strip() not in ["none", "none."]:
300
- # Split by common separators (comma, and, newline)
301
- import re
302
- # Replace "and" with comma for easier splitting
303
- cleaned = response.replace(" and ", ", ")
304
- # Split by comma or newline
305
- parts = re.split(r'[,\n]', cleaned)
306
-
307
- # Clean and filter amenities
308
- for amenity in parts:
309
- amenity = amenity.strip().lower()
310
- # Remove numbers, bullets, dashes at start
311
- amenity = re.sub(r'^[\d\-β€’\*\.\)]+\s*', '', amenity)
312
- # Skip empty or too short
313
- if amenity and len(amenity) > 2:
314
- amenities.append(amenity)
315
-
316
- # Remove duplicates while preserving order
317
- amenities = list(dict.fromkeys(amenities))
318
- confidence = 0.70 if amenities else 0.3
319
-
320
- return {
321
- "amenities": amenities,
322
- "confidence": confidence
323
- }
324
-
325
- except Exception as e:
326
- logger.error(f"Error detecting amenities: {str(e)}")
327
- return {"amenities": [], "confidence": 0.0}
328
-
329
- def _generate_description(self, image: Image.Image) -> Dict:
330
- """
331
- Generate brief property description (matches listing schema)
332
-
333
- Note: The listing flow will later use LLM to generate full title/description
334
- based on all provided fields. This is just initial extraction from image.
335
- """
336
- try:
337
- payload = {
338
- "inputs": image,
339
- "question": (
340
- "Describe what you see in this property photo in 1-2 sentences. "
341
- "Focus on visible features: room type, condition, style, notable features. "
342
- "Be factual and concise."
343
- ),
344
- }
345
-
346
- response = self._query_hf_api(payload)
347
-
348
- # Limit length
349
- if response and len(response) > 150:
350
- response = response[:147] + "..."
351
-
352
- return {
353
- "description": response or "",
354
- "confidence": 0.70 if response else 0.0
355
- }
356
-
357
- except Exception as e:
358
- logger.error(f"Error generating description: {str(e)}")
359
- return {"description": "", "confidence": 0.0}
360
-
361
- def _generate_title(self, image: Image.Image, bedrooms: int = None, bathrooms: int = None, location: str = None) -> Dict:
362
- """
363
- Generate simple property title (matches listing schema)
364
-
365
- Note: The listing flow will later generate SEO-optimized title using LLM
366
- based on all fields. This is just placeholder from image.
367
- """
368
- try:
369
- # Build basic context
370
- room_info = ""
371
- if bedrooms is not None:
372
- room_info = f"{bedrooms}-bedroom property"
373
- elif bathrooms is not None:
374
- room_info = "Property"
375
- else:
376
- room_info = "Property listing"
377
-
378
- payload = {
379
- "inputs": image,
380
- "question": (
381
- f"Generate a short title for this property photo. "
382
- f"It appears to be a {room_info}. "
383
- "Keep it under 50 characters. Example: 'Modern apartment with balcony'"
384
- ),
385
- }
386
-
387
- response = self._query_hf_api(payload)
388
-
389
- # Ensure it's short
390
- if response and len(response) > 80:
391
- response = response[:77] + "..."
392
-
393
- # Fallback if no response
394
- if not response:
395
- if bedrooms:
396
- response = f"{bedrooms}-Bedroom Property"
397
- else:
398
- response = "Property Listing"
399
-
400
- return {
401
- "title": response or "Property Listing",
402
- "confidence": 0.60 if response else 0.3
403
- }
404
-
405
- except Exception as e:
406
- logger.error(f"Error generating title: {str(e)}")
407
- return {"title": "Property Listing", "confidence": 0.3}
408
-
409
- # ============================================================
410
- # Hugging Face API Communication
411
- # ============================================================
412
-
413
- def _query_hf_api(self, payload: Dict) -> Optional[str]:
414
- """
415
- Query HuggingFace Inference API for image captioning (BLIP model).
416
- Works with FREE HuggingFace Inference API - no special providers needed!
417
-
418
- Args:
419
- payload: Dict with "inputs" (PIL Image) and "question" (str - optional prompt)
420
-
421
- Returns:
422
- Response text (caption) or None
423
- """
424
- try:
425
- if not self.hf_token:
426
- logger.error("HF_TOKEN not set!")
427
- return None
428
-
429
- # BLIP accepts raw image bytes directly
430
- if isinstance(payload.get("inputs"), Image.Image):
431
- # Convert PIL Image to bytes
432
- image_buffer = io.BytesIO()
433
- payload["inputs"].save(image_buffer, format="JPEG")
434
- image_bytes = image_buffer.getvalue()
435
-
436
- # Optional: Add question/prompt for conditional captioning
437
- # BLIP supports this via inputs parameter
438
- question = payload.get("question", "")
439
-
440
- # For BLIP, we send image bytes directly
441
- # The question can be sent as a query parameter or ignored
442
- response = requests.post(
443
- self.api_url,
444
- headers={"Authorization": f"Bearer {self.hf_token}"},
445
- data=image_bytes,
446
- timeout=60
447
- )
448
- else:
449
- # Text-only query not supported for image captioning
450
- logger.warning("BLIP requires an image input")
451
- return None
452
-
453
- if response.status_code == 200:
454
- result = response.json()
455
-
456
- # BLIP response format: [{"generated_text": "..."}]
457
- if isinstance(result, list) and len(result) > 0:
458
- return result[0].get("generated_text", "")
459
- elif isinstance(result, dict):
460
- return result.get("generated_text", "") or result.get("caption", "")
461
- elif isinstance(result, str):
462
- return result
463
- else:
464
- logger.warning(f"Unexpected BLIP response format: {result}")
465
- return str(result)
466
-
467
- elif response.status_code == 503:
468
- # Model loading - wait and retry
469
- logger.warning("Model loading (503). Retrying in 15s...")
470
- import time
471
- time.sleep(15)
472
- response = requests.post(
473
- self.api_url,
474
- headers={"Authorization": f"Bearer {self.hf_token}"},
475
- data=image_bytes,
476
- timeout=60
477
- )
478
- if response.status_code == 200:
479
- result = response.json()
480
- if isinstance(result, list) and len(result) > 0:
481
- return result[0].get("generated_text", "")
482
- return str(result)
483
- else:
484
- logger.error(f"HF API error after retry: {response.status_code}")
485
- return None
486
- else:
487
- logger.error(f"HF API error: {response.status_code} - {response.text[:200]}")
488
- return None
489
-
490
- except Exception as e:
491
- logger.error(f"Error querying HF API: {str(e)}")
492
- return None
493
-
494
- # ============================================================
495
- # Video Frame Extraction
496
- # ============================================================
497
-
498
- def extract_frames_from_video(self, video_bytes: bytes, max_frames: int = 8) -> List[Image.Image]:
499
- """
500
- Extract key frames from video for analysis
501
-
502
- Args:
503
- video_bytes: Raw video file bytes
504
- max_frames: Maximum number of frames to extract (default 8)
505
-
506
- Returns:
507
- List of PIL Images extracted from video
508
- """
509
- frames = []
510
- temp_video_path = None
511
-
512
- try:
513
- # Save video bytes to temp file (OpenCV needs a file path)
514
- with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as temp_video:
515
- temp_video.write(video_bytes)
516
- temp_video_path = temp_video.name
517
-
518
- # Open video with OpenCV
519
- cap = cv2.VideoCapture(temp_video_path)
520
-
521
- if not cap.isOpened():
522
- logger.error("Failed to open video file")
523
- return frames
524
-
525
- # Get video properties
526
- total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
527
- fps = cap.get(cv2.CAP_PROP_FPS)
528
- duration = total_frames / fps if fps > 0 else 0
529
-
530
- logger.info(f"Video: {total_frames} frames, {fps:.2f} FPS, {duration:.2f}s duration")
531
-
532
- # Calculate frame interval to extract max_frames evenly distributed
533
- if total_frames <= max_frames:
534
- # Extract all frames if video has fewer frames than max_frames
535
- frame_indices = list(range(total_frames))
536
- else:
537
- # Extract frames at regular intervals
538
- interval = total_frames // max_frames
539
- frame_indices = [i * interval for i in range(max_frames)]
540
-
541
- # Extract frames
542
- for frame_idx in frame_indices:
543
- cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
544
- ret, frame = cap.read()
545
-
546
- if ret:
547
- # Convert BGR (OpenCV) to RGB (PIL)
548
- frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
549
-
550
- # Convert numpy array to PIL Image
551
- pil_image = Image.fromarray(frame_rgb)
552
-
553
- frames.append(pil_image)
554
- logger.info(f"Extracted frame {len(frames)}/{max_frames} at index {frame_idx}")
555
-
556
- cap.release()
557
- logger.info(f"βœ… Successfully extracted {len(frames)} frames from video")
558
-
559
- except Exception as e:
560
- logger.error(f"Error extracting video frames: {str(e)}")
561
-
562
- finally:
563
- # Cleanup temp file
564
- if temp_video_path:
565
- try:
566
- import os
567
- os.unlink(temp_video_path)
568
- except:
569
- pass
570
-
571
- return frames
572
-
573
- def analyze_video(self, video_bytes: bytes, location: str = None, max_frames: int = 8) -> Dict:
574
- """
575
- Analyze property video by extracting frames and analyzing them
576
-
577
- Args:
578
- video_bytes: Raw video file bytes
579
- location: Optional location for context
580
- max_frames: Maximum frames to extract (default 8)
581
-
582
- Returns:
583
- Dict with extracted property fields and confidence scores
584
- """
585
- try:
586
- # Step 1: Extract frames from video
587
- logger.info(f"🎬 Extracting up to {max_frames} frames from video...")
588
- frames = self.extract_frames_from_video(video_bytes, max_frames=max_frames)
589
-
590
- if not frames:
591
- logger.error("No frames extracted from video")
592
- return {
593
- "bedrooms": None,
594
- "bathrooms": None,
595
- "amenities": [],
596
- "description": "Unable to analyze video - no frames extracted",
597
- "title": "Property Video",
598
- "confidence": {},
599
- "error": "Failed to extract frames from video"
600
- }
601
-
602
- logger.info(f"βœ… Extracted {len(frames)} frames, analyzing each frame...")
603
-
604
- # Step 2: Analyze each frame as an image
605
- frame_results = []
606
- for idx, frame in enumerate(frames):
607
- logger.info(f"Analyzing frame {idx + 1}/{len(frames)}...")
608
-
609
- # Convert PIL Image to bytes for analysis
610
- frame_bytes = io.BytesIO()
611
- frame.save(frame_bytes, format='JPEG')
612
- frame_bytes.seek(0)
613
-
614
- # Analyze this frame
615
- frame_data = self.extract_property_fields(frame_bytes.getvalue(), location=location)
616
- frame_results.append(frame_data)
617
-
618
- # Step 3: Merge results from all frames
619
- logger.info(f"Merging results from {len(frame_results)} analyzed frames...")
620
- consolidated = self.merge_multiple_image_results(frame_results)
621
-
622
- logger.info(f"βœ… Video analysis complete: {consolidated.get('bedrooms')} beds, {consolidated.get('bathrooms')} baths, {len(consolidated.get('amenities', []))} amenities")
623
-
624
- return consolidated
625
-
626
- except Exception as e:
627
- logger.error(f"Error analyzing video: {str(e)}")
628
- return {
629
- "bedrooms": None,
630
- "bathrooms": None,
631
- "amenities": [],
632
- "description": "",
633
- "title": "",
634
- "confidence": {},
635
- "error": str(e)
636
- }
637
-
638
- # ============================================================
639
- # Utility Methods
640
- # ============================================================
641
-
642
- def merge_multiple_image_results(self, results_list: List[Dict]) -> Dict:
643
- """
644
- Merge results from multiple images into single listing data
645
-
646
- Args:
647
- results_list: List of extracted field dicts from different images
648
-
649
- Returns:
650
- Consolidated dict with most likely values
651
- """
652
- if not results_list:
653
- return {}
654
-
655
- consolidated = {
656
- "bedrooms": None,
657
- "bathrooms": None,
658
- "amenities": [],
659
- "description": "",
660
- "confidence": {}
661
- }
662
-
663
- # Bedrooms: take highest count mentioned
664
- bedrooms_list = [r.get("bedrooms") for r in results_list if r.get("bedrooms")]
665
- if bedrooms_list:
666
- consolidated["bedrooms"] = max(bedrooms_list)
667
- consolidated["confidence"]["bedrooms"] = sum(
668
- [r.get("confidence", {}).get("bedrooms", 0)
669
- for r in results_list]
670
- ) / len(results_list)
671
-
672
- # Bathrooms: take highest count mentioned
673
- bathrooms_list = [r.get("bathrooms") for r in results_list if r.get("bathrooms")]
674
- if bathrooms_list:
675
- consolidated["bathrooms"] = max(bathrooms_list)
676
- consolidated["confidence"]["bathrooms"] = sum(
677
- [r.get("confidence", {}).get("bathrooms", 0)
678
- for r in results_list]
679
- ) / len(results_list)
680
-
681
- # Amenities: deduplicate and combine
682
- all_amenities = set()
683
- for result in results_list:
684
- all_amenities.update(result.get("amenities", []))
685
- consolidated["amenities"] = list(all_amenities)
686
- consolidated["confidence"]["amenities"] = sum(
687
- [r.get("confidence", {}).get("amenities", 0)
688
- for r in results_list]
689
- ) / len(results_list)
690
-
691
- # Description: use longest one
692
- descriptions = [r.get("description", "") for r in results_list if r.get("description")]
693
- if descriptions:
694
- consolidated["description"] = max(descriptions, key=len)
695
- consolidated["confidence"]["description"] = 0.8
696
-
697
- return consolidated
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/config.py CHANGED
@@ -114,13 +114,15 @@ class Settings(BaseSettings):
114
  HF_WHISPER_MODEL: str = os.getenv("HF_WHISPER_MODEL", "openai/whisper-large-v3")
115
 
116
  # ------------------------------------------------------------------
117
- # Vision AI (Property Analysis)
118
  # ------------------------------------------------------------------
119
- # BLIP - Works with HuggingFace FREE Inference API
120
- # Alternative: nlpconnect/vit-gpt2-image-captioning
121
- HF_VISION_MODEL: str = os.getenv("HF_VISION_MODEL", "Salesforce/blip-image-captioning-large")
122
- HF_VISION_API_ENABLED: bool = os.getenv("HF_VISION_API_ENABLED", "true").lower() == "true"
123
- PROPERTY_IMAGE_MIN_CONFIDENCE: float = float(os.getenv("PROPERTY_IMAGE_MIN_CONFIDENCE", "0.6"))
 
 
124
 
125
  # ------------------------------------------------------------------
126
  # LLM / Tooling keys
@@ -141,6 +143,12 @@ class Settings(BaseSettings):
141
  LANGCHAIN_TRACING_V2: bool = os.getenv("LANGCHAIN_TRACING_V2", "false").lower() == "true"
142
  LANGCHAIN_API_KEY: str = os.getenv("LANGCHAIN_API_KEY", "")
143
  LANGCHAIN_PROJECT: str = os.getenv("LANGCHAIN_PROJECT", "aida_agent")
 
 
 
 
 
 
144
 
145
  # ============ REDIS (SESSION & MEMORY) ============
146
  REDIS_URL: str = os.getenv("REDIS_URL", "redis://localhost:6379")
 
114
  HF_WHISPER_MODEL: str = os.getenv("HF_WHISPER_MODEL", "openai/whisper-large-v3")
115
 
116
  # ------------------------------------------------------------------
117
+ # Vision AI (Property Analysis) - DISABLED
118
  # ------------------------------------------------------------------
119
+ # NOTE: Vision analysis is NOT in use. Image uploads are handled
120
+ # directly by Cloudflare Worker (frontend upload).
121
+ # These settings are kept for future reference only.
122
+ # ------------------------------------------------------------------
123
+ # HF_VISION_MODEL: str = os.getenv("HF_VISION_MODEL", "Salesforce/blip-image-captioning-large")
124
+ # HF_VISION_API_ENABLED: bool = os.getenv("HF_VISION_API_ENABLED", "true").lower() == "true"
125
+ # PROPERTY_IMAGE_MIN_CONFIDENCE: float = float(os.getenv("PROPERTY_IMAGE_MIN_CONFIDENCE", "0.6"))
126
 
127
  # ------------------------------------------------------------------
128
  # LLM / Tooling keys
 
143
  LANGCHAIN_TRACING_V2: bool = os.getenv("LANGCHAIN_TRACING_V2", "false").lower() == "true"
144
  LANGCHAIN_API_KEY: str = os.getenv("LANGCHAIN_API_KEY", "")
145
  LANGCHAIN_PROJECT: str = os.getenv("LANGCHAIN_PROJECT", "aida_agent")
146
+
147
+ # ============ AGENT LIGHTNING (RL TRAINING) ============
148
+ # Enable trajectory capture for reinforcement learning
149
+ # Set LIGHTNING_ENABLED=true in .env to start collecting training data
150
+ LIGHTNING_ENABLED: bool = os.getenv("LIGHTNING_ENABLED", "false").lower() == "true"
151
+ LIGHTNING_TRAJECTORY_TTL_DAYS: int = int(os.getenv("LIGHTNING_TRAJECTORY_TTL_DAYS", "30"))
152
 
153
  # ============ REDIS (SESSION & MEMORY) ============
154
  REDIS_URL: str = os.getenv("REDIS_URL", "redis://localhost:6379")
app/routes/auth.py CHANGED
@@ -11,6 +11,7 @@ from app.schemas.auth import (
11
  ResetPasswordDto,
12
  ResendOtpDto,
13
  )
 
14
  from app.services.auth_service import auth_service
15
  from app.services.user_service import user_service
16
  from app.services.otp_service import otp_service
@@ -162,6 +163,61 @@ async def get_current_user_profile(current_user: dict = Depends(get_current_user
162
  logger.info(f"Get current user profile: {current_user.get('user_id')}")
163
  return await user_service.get_current_user_profile(current_user.get("user_id"))
164
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
  # ============================================================
166
  # LOGOUT ENDPOINT
167
  # ============================================================
 
11
  ResetPasswordDto,
12
  ResendOtpDto,
13
  )
14
+ from app.schemas.user import ProfileUpdateRequest, ProfileUpdateResponse
15
  from app.services.auth_service import auth_service
16
  from app.services.user_service import user_service
17
  from app.services.otp_service import otp_service
 
163
  logger.info(f"Get current user profile: {current_user.get('user_id')}")
164
  return await user_service.get_current_user_profile(current_user.get("user_id"))
165
 
166
+
167
+ @router.patch("/profile", status_code=status.HTTP_200_OK, response_model=ProfileUpdateResponse)
168
+ async def update_current_user_profile(
169
+ profile_data: ProfileUpdateRequest,
170
+ current_user: dict = Depends(get_current_user)
171
+ ):
172
+ """
173
+ Update Current User Profile
174
+
175
+ Update the logged-in user's profile information.
176
+ All fields are optional - only include fields you want to update.
177
+
178
+ **Allowed fields:**
179
+ - `firstName`: User's first name (1-50 chars)
180
+ - `lastName`: User's last name (1-50 chars)
181
+ - `bio`: Short bio (max 150 chars)
182
+ - `location`: Location in "City, Country" format (max 100 chars)
183
+ - `languages`: Array of languages spoken (max 3)
184
+ - `profilePicture`: URL to profile picture
185
+
186
+ **Requires:** Bearer token in Authorization header
187
+
188
+ **Example request body:**
189
+ ```json
190
+ {
191
+ "firstName": "John",
192
+ "lastName": "Doe",
193
+ "bio": "Real estate enthusiast",
194
+ "location": "Cotonou, Benin",
195
+ "languages": ["English", "French"]
196
+ }
197
+ ```
198
+ """
199
+ user_id = current_user.get("user_id")
200
+ logger.info(f"Update user profile: {user_id}")
201
+
202
+ # Convert to dict and remove None values
203
+ update_data = {k: v for k, v in profile_data.model_dump().items() if v is not None}
204
+
205
+ if not update_data:
206
+ raise HTTPException(
207
+ status_code=status.HTTP_400_BAD_REQUEST,
208
+ detail="No fields to update. Please provide at least one field."
209
+ )
210
+
211
+ # Validate languages count
212
+ if "languages" in update_data and len(update_data["languages"]) > 3:
213
+ raise HTTPException(
214
+ status_code=status.HTTP_400_BAD_REQUEST,
215
+ detail="Maximum 3 languages allowed"
216
+ )
217
+
218
+ return await user_service.update_user_profile(user_id, update_data)
219
+
220
+
221
  # ============================================================
222
  # LOGOUT ENDPOINT
223
  # ============================================================
app/routes/media_upload.py CHANGED
@@ -1,282 +1,33 @@
1
  # ============================================================
2
  # app/routes/media_upload.py
3
- # Media Upload & Property Analysis Routes
4
- # Handles image validation, video upload, and field extraction
 
 
5
  # ============================================================
6
 
7
- import io
8
  import logging
9
- from typing import List, Optional
10
- from fastapi import APIRouter, UploadFile, File, Form, Depends, HTTPException, status
11
- from fastapi.responses import JSONResponse
12
- import cloudinary
13
- import cloudinary.uploader
14
  from app.config import settings
15
- # PAUSED: Vision service temporarily disabled
16
- # from app.ai.services.vision_service import VisionService
17
- from app.guards.jwt_guard import get_current_user
18
- from app.core.llm_router import LLMRouter, TaskComplexity
19
 
20
  logger = logging.getLogger(__name__)
21
 
22
  router = APIRouter(prefix="/listings", tags=["media"])
23
 
24
- # PAUSED: Vision Service temporarily disabled
25
- # vision_service = VisionService()
26
-
27
- # Initialize LLM Router for generating personalized messages
28
- llm_router = LLMRouter()
29
-
30
- # Configure Cloudinary
31
- if settings.CLOUDINARY_CLOUD_NAME:
32
- cloudinary.config(
33
- cloud_name=settings.CLOUDINARY_CLOUD_NAME,
34
- api_key=settings.CLOUDINARY_API_KEY,
35
- api_secret=settings.CLOUDINARY_API_SECRET,
36
- secure=True
37
- )
38
-
39
-
40
- # ============================================================
41
- # File Validation & Limits
42
- # ============================================================
43
-
44
- ALLOWED_IMAGE_TYPES = {"image/jpeg", "image/png", "image/webp"}
45
- ALLOWED_VIDEO_TYPES = {"video/mp4", "video/quicktime", "video/x-msvideo"}
46
- MAX_IMAGE_SIZE = 10 * 1024 * 1024 # 10MB
47
- MAX_VIDEO_SIZE = 100 * 1024 * 1024 # 100MB
48
- MAX_IMAGES_PER_UPLOAD = 10
49
- MAX_VIDEO_DURATION = 300 # 5 minutes
50
-
51
-
52
- # ============================================================
53
- # Helper Functions
54
- # ============================================================
55
-
56
- async def validate_image_file(file: UploadFile) -> bytes:
57
- """Validate and read image file"""
58
- if file.content_type not in ALLOWED_IMAGE_TYPES:
59
- raise HTTPException(
60
- status_code=status.HTTP_400_BAD_REQUEST,
61
- detail=f"Invalid image type. Allowed: {', '.join(ALLOWED_IMAGE_TYPES)}"
62
- )
63
-
64
- contents = await file.read()
65
- if len(contents) > MAX_IMAGE_SIZE:
66
- raise HTTPException(
67
- status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
68
- detail=f"Image size exceeds {MAX_IMAGE_SIZE / 1024 / 1024}MB limit"
69
- )
70
-
71
- return contents
72
-
73
-
74
- async def validate_video_file(file: UploadFile) -> bytes:
75
- """Validate and read video file"""
76
- if file.content_type not in ALLOWED_VIDEO_TYPES:
77
- raise HTTPException(
78
- status_code=status.HTTP_400_BAD_REQUEST,
79
- detail=f"Invalid video type. Allowed: {', '.join(ALLOWED_VIDEO_TYPES)}"
80
- )
81
-
82
- contents = await file.read()
83
- if len(contents) > MAX_VIDEO_SIZE:
84
- raise HTTPException(
85
- status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
86
- detail=f"Video size exceeds {MAX_VIDEO_SIZE / 1024 / 1024}MB limit"
87
- )
88
-
89
- return contents
90
-
91
-
92
- def generate_intelligent_filename(
93
- original_filename: str,
94
- location: Optional[str] = None,
95
- title: Optional[str] = None,
96
- index: int = 0
97
- ) -> str:
98
- """
99
- Generate intelligent filename for uploaded image
100
-
101
- Pattern: {location}_{title}_{date}_{index}.jpg
102
- Example: Lagos_Modern_Apartment_2025_01_31_1.jpg
103
-
104
- The Cloudflare worker will handle duplicates by appending numbers
105
- """
106
- from datetime import datetime
107
-
108
- # Get original extension
109
- _, ext = original_filename.rsplit('.', 1) if '.' in original_filename else (original_filename, 'jpg')
110
- ext = ext.lower()
111
- if ext not in ['jpg', 'jpeg', 'png', 'webp']:
112
- ext = 'jpg'
113
-
114
- # Build filename components
115
- parts = []
116
-
117
- # Add location if available
118
- if location:
119
- clean_location = location.replace(' ', '_').replace(',', '').lower()[:20]
120
- parts.append(clean_location)
121
-
122
- # Add title if available (first 20 chars)
123
- if title:
124
- clean_title = title.replace(' ', '_').replace(',', '').lower()[:20]
125
- parts.append(clean_title)
126
-
127
- # Add timestamp
128
- timestamp = datetime.utcnow().strftime("%Y_%m_%d_%H%M%S")
129
- parts.append(timestamp)
130
-
131
- # Add index if multiple images
132
- if index > 0:
133
- parts.append(str(index))
134
-
135
- filename = "_".join(parts)
136
- return f"{filename}.{ext}"
137
-
138
-
139
- async def upload_to_cloudflare(file_bytes: bytes, filename: str, meaningful_name: str = None) -> str:
140
- """
141
- Upload image/video to Cloudflare R2
142
-
143
- Args:
144
- file_bytes: File bytes
145
- filename: Original filename
146
- meaningful_name: AI-generated meaningful filename (optional)
147
-
148
- Returns:
149
- Public URL of uploaded file
150
- """
151
- import boto3
152
- from botocore.config import Config
153
- import os
154
- from datetime import datetime
155
-
156
- try:
157
- # Use meaningful name if provided, otherwise original filename
158
- final_filename = meaningful_name or filename
159
-
160
- # Initialize R2 client
161
- r2_client = boto3.client(
162
- 's3',
163
- endpoint_url=settings.CF_R2_ENDPOINT,
164
- aws_access_key_id=settings.CF_R2_ACCESS_KEY_ID,
165
- aws_secret_access_key=settings.CF_R2_SECRET_ACCESS_KEY,
166
- config=Config(
167
- signature_version='s3v4',
168
- s3={'addressing_style': 'path'}
169
- ),
170
- region_name='auto'
171
- )
172
-
173
- # Determine content type based on file extension
174
- ext = os.path.splitext(final_filename)[1].lower()
175
- content_type_map = {
176
- '.jpg': 'image/jpeg',
177
- '.jpeg': 'image/jpeg',
178
- '.png': 'image/png',
179
- '.webp': 'image/webp',
180
- '.mp4': 'video/mp4',
181
- '.mov': 'video/quicktime',
182
- '.avi': 'video/x-msvideo'
183
- }
184
- content_type = content_type_map.get(ext, 'application/octet-stream')
185
-
186
- # Create folder structure: media/YYYY/MM/filename
187
- now = datetime.utcnow()
188
- folder_path = f"media/{now.year}/{now.month:02d}"
189
- object_key = f"{folder_path}/{final_filename}"
190
-
191
- # Upload to R2 (use lojiz-audio bucket for now, or create lojiz-media bucket)
192
- bucket_name = settings.CF_R2_BUCKET_NAME
193
-
194
- r2_client.put_object(
195
- Bucket=bucket_name,
196
- Key=object_key,
197
- Body=file_bytes,
198
- ContentType=content_type,
199
- CacheControl='public, max-age=31536000', # Cache for 1 year
200
- )
201
-
202
- # Construct public URL
203
- public_url = f"{settings.CF_R2_PUBLIC_URL}/{object_key}"
204
-
205
- logger.info(f"βœ… Uploaded to Cloudflare R2: {public_url}")
206
- return public_url
207
-
208
- except Exception as e:
209
- logger.error(f"❌ Error uploading to Cloudflare R2: {str(e)}")
210
- raise HTTPException(
211
- status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
212
- detail=f"Failed to upload to cloud storage: {str(e)}"
213
- )
214
-
215
-
216
- async def upload_to_cloudinary(file_bytes: bytes, filename: str, resource_type: str = "video") -> str:
217
- """Upload video to Cloudinary"""
218
- try:
219
- file_obj = io.BytesIO(file_bytes)
220
-
221
- result = cloudinary.uploader.upload(
222
- file_obj,
223
- resource_type=resource_type,
224
- folder="lojiz/property-videos",
225
- public_id=filename.split(".")[0],
226
- overwrite=True,
227
- quality="auto",
228
- fetch_format="auto"
229
- )
230
-
231
- return result.get("secure_url", "")
232
-
233
- except Exception as e:
234
- logger.error(f"Error uploading to Cloudinary: {str(e)}")
235
- raise HTTPException(
236
- status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
237
- detail="Failed to upload video to Cloudinary"
238
- )
239
-
240
 
241
  # ============================================================
242
  # API Endpoints
243
  # ============================================================
244
 
245
- @router.post("/analyze-images")
246
- async def analyze_property_images(
247
- images: List[UploadFile] = File(...),
248
- listing_method: str = "image",
249
- location: Optional[str] = None,
250
- user_input: Optional[str] = Form(None),
251
- session_id: Optional[str] = Form(None),
252
- current_user = Depends(get_current_user)
253
- ):
254
- """
255
- 🚧 VISION FEATURE PAUSED 🚧
256
-
257
- This route is temporarily disabled while we set up a reliable vision provider.
258
- For now, use /upload-images for direct image upload without AI analysis.
259
- """
260
- raise HTTPException(
261
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
262
- detail={
263
- "message": "Vision analysis temporarily unavailable",
264
- "suggestion": "Use /listings/upload-images for direct upload without AI analysis",
265
- "feature_status": "paused"
266
- }
267
- )
268
-
269
-
270
-
271
-
272
  @router.get("/upload-config")
273
  async def get_upload_configuration():
274
  """
275
  Get image upload configuration for frontend.
276
-
277
  Frontend should upload images DIRECTLY to the Cloudflare Worker URL.
278
  No backend processing involved.
279
-
280
  Returns:
281
  {
282
  "worker_url": "https://image-upload-worker.destinyebuka7.workers.dev",
@@ -290,55 +41,60 @@ async def get_upload_configuration():
290
  "max_file_size_mb": 5,
291
  "allowed_types": ["image/jpeg", "image/png", "image/webp"],
292
  "instructions": {
293
- "step_1": "Create FormData with: file, user_id, session_id, message (optional)",
294
- "step_2": "POST to worker_url",
295
- "step_3": "Worker validates image, uploads to Cloudflare, returns URL",
296
- "step_4": "Send URL back to AIDA in chat message"
 
 
 
 
 
 
 
 
297
  }
298
  }
299
 
300
 
301
  @router.post("/get-image-name")
302
-
303
-
304
  async def get_image_name_for_upload(
305
  user_id: str = Form(...),
306
  session_id: str = Form(...)
307
  ):
308
  """
309
  Generate intelligent filename for Cloudflare Worker uploads.
310
-
311
- Called by Cloudflare Worker before uploading image.
312
  Returns a descriptive filename based on current listing context.
313
-
314
  Args:
315
  user_id: User ID
316
  session_id: Current session ID
317
-
318
  Returns:
319
  {"name": "lagos_modern_apartment_2bed"}
320
  """
321
  try:
322
  from app.ai.services.conversation_service import ConversationService
323
- from datetime import datetime
324
-
325
  # Get current conversation state to extract listing context
326
  conv_service = ConversationService()
327
  state = await conv_service.get_or_create_conversation(user_id, session_id)
328
-
329
  # Extract relevant fields for filename
330
  location = state.provided_fields.get("location", "")
331
  title = state.provided_fields.get("title", "")
332
  bedrooms = state.provided_fields.get("bedrooms")
333
  listing_type = state.provided_fields.get("listing_type", "property")
334
-
335
  # Build intelligent filename
336
  parts = []
337
-
338
  if location:
339
  clean_location = location.replace(' ', '_').replace(',', '').lower()[:15]
340
  parts.append(clean_location)
341
-
342
  if title:
343
  # Extract first 2-3 meaningful words from title
344
  title_words = [w for w in title.lower().split() if len(w) > 3][:2]
@@ -346,23 +102,23 @@ async def get_image_name_for_upload(
346
  parts.extend(title_words)
347
  elif bedrooms:
348
  parts.append(f"{bedrooms}bed")
349
-
350
  if listing_type and listing_type != "property":
351
  parts.append(listing_type[:4])
352
-
353
  # If we have no context, use generic name
354
  if not parts:
355
  parts = ["property", datetime.now().strftime("%Y%m%d")]
356
-
357
  filename = "_".join(parts)
358
-
359
- logger.info(f"Generated image name: {filename}", user_id=user_id, session_id=session_id)
360
-
361
  return {
362
  "name": filename,
363
  "success": True
364
  }
365
-
366
  except Exception as e:
367
  logger.error(f"Failed to generate image name: {str(e)}")
368
  # Fallback to timestamp-based name
@@ -370,53 +126,3 @@ async def get_image_name_for_upload(
370
  "name": f"property_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
371
  "success": True
372
  }
373
-
374
-
375
- # ============================================================
376
- # DEPRECATED ENDPOINTS (Vision Feature Paused)
377
- # ============================================================
378
-
379
- @router.post("/analyze-images")
380
- async def analyze_property_images_deprecated(current_user = Depends(get_current_user)):
381
- """
382
- 🚧 VISION FEATURE PAUSED 🚧
383
-
384
- Image analysis is now handled by Cloudflare Worker with AI vision.
385
- Frontend should upload directly to the worker endpoint.
386
- """
387
- raise HTTPException(
388
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
389
- detail={
390
- "message": "Vision analysis moved to Cloudflare Worker",
391
- "suggestion": "Upload images to Cloudflare Worker endpoint for direct processing",
392
- "feature_status": "deprecated"
393
- }
394
- )
395
-
396
-
397
- @router.post("/analyze-video")
398
- async def analyze_property_video_deprecated(current_user = Depends(get_current_user)):
399
- """
400
- 🚧 VISION FEATURE PAUSED 🚧
401
- """
402
- raise HTTPException(
403
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
404
- detail={
405
- "message": "Video analysis temporarily unavailable",
406
- "feature_status": "paused"
407
- }
408
- )
409
-
410
-
411
- @router.post("/validate-media")
412
- async def validate_media_deprecated(current_user = Depends(get_current_user)):
413
- """
414
- 🚧 VISION FEATURE PAUSED 🚧
415
- """
416
- raise HTTPException(
417
- status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
418
- detail={
419
- "message": "Media validation moved to Cloudflare Worker",
420
- "feature_status": "deprecated"
421
- }
422
- )
 
1
  # ============================================================
2
  # app/routes/media_upload.py
3
+ # Media Upload Configuration Routes
4
+ # ============================================================
5
+ # NOTE: Image/video uploads are handled DIRECTLY by Cloudflare Worker
6
+ # This file only provides configuration endpoints for the frontend
7
  # ============================================================
8
 
 
9
  import logging
10
+ from datetime import datetime
11
+ from fastapi import APIRouter, Form, HTTPException, status
 
 
 
12
  from app.config import settings
 
 
 
 
13
 
14
  logger = logging.getLogger(__name__)
15
 
16
  router = APIRouter(prefix="/listings", tags=["media"])
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  # ============================================================
20
  # API Endpoints
21
  # ============================================================
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  @router.get("/upload-config")
24
  async def get_upload_configuration():
25
  """
26
  Get image upload configuration for frontend.
27
+
28
  Frontend should upload images DIRECTLY to the Cloudflare Worker URL.
29
  No backend processing involved.
30
+
31
  Returns:
32
  {
33
  "worker_url": "https://image-upload-worker.destinyebuka7.workers.dev",
 
41
  "max_file_size_mb": 5,
42
  "allowed_types": ["image/jpeg", "image/png", "image/webp"],
43
  "instructions": {
44
+ "profile_upload": {
45
+ "step_1": "Create FormData with: file, type='profile', user_name, user_id",
46
+ "step_2": "POST to worker_url",
47
+ "step_3": "Worker uploads to Cloudflare, returns URL",
48
+ "step_4": "Include URL in PATCH /auth/profile payload"
49
+ },
50
+ "property_upload": {
51
+ "step_1": "Create FormData with: file, type='property', user_id, session_id",
52
+ "step_2": "POST to worker_url",
53
+ "step_3": "Worker uploads to Cloudflare, returns URL",
54
+ "step_4": "Send URL to AIDA in chat message"
55
+ }
56
  }
57
  }
58
 
59
 
60
  @router.post("/get-image-name")
 
 
61
  async def get_image_name_for_upload(
62
  user_id: str = Form(...),
63
  session_id: str = Form(...)
64
  ):
65
  """
66
  Generate intelligent filename for Cloudflare Worker uploads.
67
+
68
+ Called by Cloudflare Worker before uploading property images.
69
  Returns a descriptive filename based on current listing context.
70
+
71
  Args:
72
  user_id: User ID
73
  session_id: Current session ID
74
+
75
  Returns:
76
  {"name": "lagos_modern_apartment_2bed"}
77
  """
78
  try:
79
  from app.ai.services.conversation_service import ConversationService
80
+
 
81
  # Get current conversation state to extract listing context
82
  conv_service = ConversationService()
83
  state = await conv_service.get_or_create_conversation(user_id, session_id)
84
+
85
  # Extract relevant fields for filename
86
  location = state.provided_fields.get("location", "")
87
  title = state.provided_fields.get("title", "")
88
  bedrooms = state.provided_fields.get("bedrooms")
89
  listing_type = state.provided_fields.get("listing_type", "property")
90
+
91
  # Build intelligent filename
92
  parts = []
93
+
94
  if location:
95
  clean_location = location.replace(' ', '_').replace(',', '').lower()[:15]
96
  parts.append(clean_location)
97
+
98
  if title:
99
  # Extract first 2-3 meaningful words from title
100
  title_words = [w for w in title.lower().split() if len(w) > 3][:2]
 
102
  parts.extend(title_words)
103
  elif bedrooms:
104
  parts.append(f"{bedrooms}bed")
105
+
106
  if listing_type and listing_type != "property":
107
  parts.append(listing_type[:4])
108
+
109
  # If we have no context, use generic name
110
  if not parts:
111
  parts = ["property", datetime.now().strftime("%Y%m%d")]
112
+
113
  filename = "_".join(parts)
114
+
115
+ logger.info(f"Generated image name: {filename} for user: {user_id}")
116
+
117
  return {
118
  "name": filename,
119
  "success": True
120
  }
121
+
122
  except Exception as e:
123
  logger.error(f"Failed to generate image name: {str(e)}")
124
  # Fallback to timestamp-based name
 
126
  "name": f"property_{datetime.now().strftime('%Y%m%d_%H%M%S')}",
127
  "success": True
128
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/schemas/user.py CHANGED
@@ -100,6 +100,69 @@ class UserUpdateDto(BaseModel):
100
  languages: Optional[list[str]] = None
101
 
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  # ============================================================
104
  # Generic Response DTOs
105
  # ============================================================
 
100
  languages: Optional[list[str]] = None
101
 
102
 
103
+ class ProfileUpdateRequest(BaseModel):
104
+ """
105
+ Update user profile request body.
106
+ All fields are optional - only include fields you want to update.
107
+ """
108
+ firstName: Optional[str] = Field(
109
+ None,
110
+ min_length=1,
111
+ max_length=50,
112
+ description="User's first name",
113
+ examples=["John"]
114
+ )
115
+ lastName: Optional[str] = Field(
116
+ None,
117
+ min_length=1,
118
+ max_length=50,
119
+ description="User's last name",
120
+ examples=["Doe"]
121
+ )
122
+ bio: Optional[str] = Field(
123
+ None,
124
+ max_length=150,
125
+ description="Short bio about the user (max 150 characters)",
126
+ examples=["Real estate enthusiast looking for the perfect apartment in Cotonou."]
127
+ )
128
+ location: Optional[str] = Field(
129
+ None,
130
+ max_length=100,
131
+ description="User's location in 'City, Country' format",
132
+ examples=["Cotonou, Benin"]
133
+ )
134
+ languages: Optional[list[str]] = Field(
135
+ None,
136
+ max_length=3,
137
+ description="Languages spoken by the user (max 3)",
138
+ examples=[["English", "French", "Portuguese"]]
139
+ )
140
+ profilePicture: Optional[str] = Field(
141
+ None,
142
+ description="URL to the user's profile picture",
143
+ examples=["https://example.com/images/profile.jpg"]
144
+ )
145
+
146
+ class Config:
147
+ json_schema_extra = {
148
+ "example": {
149
+ "firstName": "John",
150
+ "lastName": "Doe",
151
+ "bio": "Real estate enthusiast",
152
+ "location": "Cotonou, Benin",
153
+ "languages": ["English", "French"],
154
+ "profilePicture": "https://example.com/profile.jpg"
155
+ }
156
+ }
157
+
158
+
159
+ class ProfileUpdateResponse(BaseModel):
160
+ """Response after updating user profile"""
161
+ success: bool = Field(default=True, description="Whether the update was successful")
162
+ message: str = Field(..., description="Response message")
163
+ data: UserProfileWithReviewsDto = Field(..., description="Updated user profile")
164
+
165
+
166
  # ============================================================
167
  # Generic Response DTOs
168
  # ============================================================
cloudflare-worker/image-upload-worker.js CHANGED
@@ -1,9 +1,11 @@
1
- // src/index.js - Updated Image Upload Worker with AI Vision Validation
2
- // Features:
3
- // 1. AI Vision validation (is this a property photo?)
4
- // 2. Get image name from AIDA
5
- // 3. Handle add/replace operations
6
- // 4. Duplicate name numbering
 
 
7
 
8
  export default {
9
  async fetch(request, env) {
@@ -52,9 +54,11 @@ export default {
52
  // Parse form data
53
  const formData = await request.formData();
54
  const imageFile = formData.get("file");
55
- const userMessage = formData.get("message") || "";
56
  const userId = formData.get("user_id") || "";
 
57
  const sessionId = formData.get("session_id") || "";
 
58
  const operation = formData.get("operation") || "add"; // "add" or "replace"
59
  const replaceIndex = formData.get("replace_index"); // For replace operations
60
  const existingImageId = formData.get("existing_image_id"); // ID of image to replace
@@ -63,19 +67,26 @@ export default {
63
  return jsonResponse({ success: false, error: "no_image", message: "No image file provided" }, 400);
64
  }
65
 
66
- // Convert image to bytes for AI validation
67
  const imageBytes = await imageFile.arrayBuffer();
68
- const imageArray = [...new Uint8Array(imageBytes)];
69
 
70
  // ============================================================
71
- // STEP 1: AI Vision Validation
 
 
 
 
 
72
  // ============================================================
 
 
 
73
  let isPropertyImage = false;
74
  let validationReason = "";
75
 
76
  try {
77
  const aiResult = await env.AI.run('@cf/llava-hf/llava-1.5-7b-hf', {
78
- image: imageArray,
79
  prompt: "Is this image showing a real estate property such as a house, apartment, room, building, or property exterior/interior? Answer with ONLY 'YES' or 'NO' followed by a brief reason.",
80
  max_tokens: 50
81
  });
@@ -85,14 +96,13 @@ export default {
85
  validationReason = response;
86
 
87
  } catch (aiError) {
88
- // If AI fails, allow the image through (fail-open for better UX)
89
  console.error("AI validation error:", aiError);
90
  isPropertyImage = true;
91
  validationReason = "AI validation skipped due to error";
92
  }
93
 
94
- // If not a property image, return error (AIDA will handle friendly message)
95
- if (!isPropertyImage) {
96
  return jsonResponse({
97
  success: false,
98
  error: "not_property_image",
@@ -102,13 +112,24 @@ export default {
102
  session_id: sessionId
103
  }, 400);
104
  }
 
 
105
 
106
  // ============================================================
107
- // STEP 2: Get Image Name from AIDA (if new image)
108
  // ============================================================
109
  let imageName = "";
110
 
111
- if (operation === "add") {
 
 
 
 
 
 
 
 
 
112
  try {
113
  const nameResponse = await fetch(`${AIDA_BASE_URL}/ai/get-image-name`, {
114
  method: "POST",
@@ -132,7 +153,7 @@ export default {
132
  }
133
 
134
  // ============================================================
135
- // STEP 3: Handle Replace Operation (delete old image)
136
  // ============================================================
137
  if (operation === "replace" && existingImageId) {
138
  try {
@@ -167,15 +188,15 @@ export default {
167
  }
168
 
169
  // ============================================================
170
- // STEP 4: Upload to Cloudflare Images
171
  // ============================================================
172
-
173
  // Clean the image name for use as filename
174
  const cleanName = imageName
175
  .toLowerCase()
176
- .replace(/[^a-z0-9]+/g, '-')
177
  .replace(/^-|-$/g, '')
178
- || `property-${Date.now()}`;
179
 
180
  // Create new FormData for Cloudflare upload
181
  const uploadFormData = new FormData();
@@ -194,12 +215,13 @@ export default {
194
  const imageId = uploadResponseBody.result.id;
195
  const imageUrl = `https://imagedelivery.net/${ACCOUNT_HASH}/${imageId}/public`;
196
 
197
- // Return success with all context for AIDA
198
  return jsonResponse({
199
  success: true,
200
  id: imageId,
201
  url: imageUrl,
202
  filename: cleanName,
 
203
  message: userMessage,
204
  operation: operation,
205
  replace_index: replaceIndex,
 
1
+ // src/index.js - Image Upload Worker
2
+ // ============================================================
3
+ // SUPPORTS:
4
+ // 1. Profile pictures (type=profile) - named as {user_id}/profile.jpg
5
+ // 2. Property images (type=property) - for listing photos
6
+ // ============================================================
7
+ // NOTE: AI Vision validation is PAUSED - not currently in use
8
+ // ============================================================
9
 
10
  export default {
11
  async fetch(request, env) {
 
54
  // Parse form data
55
  const formData = await request.formData();
56
  const imageFile = formData.get("file");
57
+ const uploadType = formData.get("type") || "property"; // "profile" or "property"
58
  const userId = formData.get("user_id") || "";
59
+ const userName = formData.get("user_name") || ""; // For profile naming
60
  const sessionId = formData.get("session_id") || "";
61
+ const userMessage = formData.get("message") || "";
62
  const operation = formData.get("operation") || "add"; // "add" or "replace"
63
  const replaceIndex = formData.get("replace_index"); // For replace operations
64
  const existingImageId = formData.get("existing_image_id"); // ID of image to replace
 
67
  return jsonResponse({ success: false, error: "no_image", message: "No image file provided" }, 400);
68
  }
69
 
70
+ // Convert image to bytes
71
  const imageBytes = await imageFile.arrayBuffer();
 
72
 
73
  // ============================================================
74
+ // AI VISION VALIDATION - PAUSED
75
+ // ============================================================
76
+ // NOTE: AI Vision validation is currently disabled/paused.
77
+ // The HuggingFace vision API is not being used.
78
+ // All images are allowed through without property validation.
79
+ // To re-enable, uncomment the validation block below.
80
  // ============================================================
81
+
82
+ /*
83
+ // PAUSED: AI Vision Validation Block
84
  let isPropertyImage = false;
85
  let validationReason = "";
86
 
87
  try {
88
  const aiResult = await env.AI.run('@cf/llava-hf/llava-1.5-7b-hf', {
89
+ image: [...new Uint8Array(imageBytes)],
90
  prompt: "Is this image showing a real estate property such as a house, apartment, room, building, or property exterior/interior? Answer with ONLY 'YES' or 'NO' followed by a brief reason.",
91
  max_tokens: 50
92
  });
 
96
  validationReason = response;
97
 
98
  } catch (aiError) {
 
99
  console.error("AI validation error:", aiError);
100
  isPropertyImage = true;
101
  validationReason = "AI validation skipped due to error";
102
  }
103
 
104
+ // If not a property image and type is property, return error
105
+ if (!isPropertyImage && uploadType === "property") {
106
  return jsonResponse({
107
  success: false,
108
  error: "not_property_image",
 
112
  session_id: sessionId
113
  }, 400);
114
  }
115
+ */
116
+ // END PAUSED BLOCK
117
 
118
  // ============================================================
119
+ // DETERMINE IMAGE NAME
120
  // ============================================================
121
  let imageName = "";
122
 
123
+ if (uploadType === "profile") {
124
+ // Profile pictures: {user_id}/profile or {user_name}/profile
125
+ const identifier = userName || userId || `user_${Date.now()}`;
126
+ const cleanIdentifier = identifier
127
+ .toLowerCase()
128
+ .replace(/[^a-z0-9]+/g, '_')
129
+ .replace(/^_|_$/g, '');
130
+ imageName = `${cleanIdentifier}_profile`;
131
+ } else if (operation === "add") {
132
+ // Property images: get name from AIDA or use timestamp
133
  try {
134
  const nameResponse = await fetch(`${AIDA_BASE_URL}/ai/get-image-name`, {
135
  method: "POST",
 
153
  }
154
 
155
  // ============================================================
156
+ // HANDLE REPLACE OPERATION (delete old image)
157
  // ============================================================
158
  if (operation === "replace" && existingImageId) {
159
  try {
 
188
  }
189
 
190
  // ============================================================
191
+ // UPLOAD TO CLOUDFLARE IMAGES
192
  // ============================================================
193
+
194
  // Clean the image name for use as filename
195
  const cleanName = imageName
196
  .toLowerCase()
197
+ .replace(/[^a-z0-9_]+/g, '-')
198
  .replace(/^-|-$/g, '')
199
+ || `image-${Date.now()}`;
200
 
201
  // Create new FormData for Cloudflare upload
202
  const uploadFormData = new FormData();
 
215
  const imageId = uploadResponseBody.result.id;
216
  const imageUrl = `https://imagedelivery.net/${ACCOUNT_HASH}/${imageId}/public`;
217
 
218
+ // Return success with all context
219
  return jsonResponse({
220
  success: true,
221
  id: imageId,
222
  url: imageUrl,
223
  filename: cleanName,
224
+ type: uploadType,
225
  message: userMessage,
226
  operation: operation,
227
  replace_index: replaceIndex,
docs/CLARA_RLM_INTEGRATION_PLAN.md ADDED
@@ -0,0 +1,537 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLaRa + RLM Integration Plan for AIDA
2
+
3
+ **Date**: 2026-02-09
4
+ **Author**: AI Architecture Analysis
5
+ **Status**: Proposal
6
+
7
+ ---
8
+
9
+ ## Executive Summary
10
+
11
+ This document outlines how **Apple's CLaRa** (Continuous Latent Reasoning) and **MIT's RLM** (Recursive Language Models) can enhance AIDA's current RAG architecture for real estate search.
12
+
13
+ **TL;DR**:
14
+ - **CLaRa**: Compress 4096-dim vectors to 256-dim β†’ 16x faster search, 90% storage savings
15
+ - **RLM**: Enable complex multi-hop reasoning for queries like "3-bed near good schools in safe neighborhood under 500k"
16
+ - **Combined Impact**: 10x performance boost + deeper contextual understanding
17
+
18
+ ---
19
+
20
+ ## Part 1: Current RAG Implementation Analysis
21
+
22
+ ### Architecture Overview
23
+
24
+ ```
25
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
26
+ β”‚ AIDA Current RAG Architecture β”‚
27
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
28
+ β”‚ β”‚
29
+ β”‚ User Query β†’ Intent Classifier β†’ Search Extractor β”‚
30
+ β”‚ ↓ β”‚
31
+ β”‚ Strategy Selector (LLM decides): β”‚
32
+ β”‚ β€’ MONGO_ONLY (pure filters) β”‚
33
+ β”‚ β€’ QDRANT_ONLY (semantic search) β”‚
34
+ β”‚ β€’ MONGO_THEN_QDRANT (filter β†’ semantic) β”‚
35
+ β”‚ β€’ QDRANT_THEN_MONGO (semantic β†’ filter) β”‚
36
+ β”‚ ↓ β”‚
37
+ β”‚ Embedding Service: β”‚
38
+ β”‚ β€’ Model: qwen/qwen3-embedding-8b (via OpenRouter) β”‚
39
+ β”‚ β€’ Dimension: 4096 β”‚
40
+ β”‚ β€’ Format: "{title}. {beds}-bed in {location}. {description}" β”‚
41
+ β”‚ ↓ β”‚
42
+ β”‚ Qdrant Vector DB: β”‚
43
+ β”‚ β€’ Collection: "listings" β”‚
44
+ β”‚ β€’ ~1000s of listings Γ— 4096 floats/listing = ~16MB+ vectors β”‚
45
+ β”‚ β€’ Payload: full listing metadata (~50KB per listing) β”‚
46
+ β”‚ ↓ β”‚
47
+ β”‚ Search Results β†’ Enrich with owner data β†’ Brain LLM β†’ Response β”‚
48
+ β”‚ β”‚
49
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
50
+ ```
51
+
52
+ ### Key Files Involved
53
+
54
+ | File | Purpose | RAG Role |
55
+ |------|---------|----------|
56
+ | `search_service.py` | Main search orchestration | Hybrid search execution |
57
+ | `vector_service.py` | Qdrant indexing | Real-time vector upserts |
58
+ | `search_strategy_selector.py` | LLM-based strategy picker | Intelligent routing |
59
+ | `search_extractor.py` | Extract params from query | Query understanding |
60
+ | `brain.py` | Agent reasoning engine | Response generation |
61
+ | `redis_context_memory.py` | Conversation memory | Context retention |
62
+
63
+ ### Current Performance Metrics (Estimated)
64
+
65
+ | Metric | Current Value | Bottleneck |
66
+ |--------|--------------|------------|
67
+ | **Vector Size** | 4096 floats Γ— 4 bytes = 16KB/listing | Storage & bandwidth |
68
+ | **Search Latency** | ~200-500ms (embedding + search + enrichment) | Multiple network calls |
69
+ | **Memory Usage** | 16KB vectors + 50KB payload = 66KB/listing | Qdrant payload size |
70
+ | **Semantic Depth** | Single-hop (direct semantic match) | No multi-hop reasoning |
71
+
72
+ ---
73
+
74
+ ## Part 2: CLaRa Integration Strategy
75
+
76
+ ### What is CLaRa?
77
+
78
+ **CLaRa** = Continuous Latent Reasoning for Compression-Native RAG
79
+
80
+ **Key Innovation**: Instead of storing raw text chunks or large embeddings, CLaRa compresses documents into **continuous memory tokens** that preserve semantic reasoning while being 16x-128x smaller.
81
+
82
+ ### How CLaRa Would Transform AIDA
83
+
84
+ #### Current Flow:
85
+ ```python
86
+ # app/ai/services/search_service.py (CURRENT)
87
+
88
+ async def embed_query(text: str) -> List[float]:
89
+ # Returns 4096-dim vector
90
+ response = await client.post(
91
+ "https://openrouter.ai/api/v1/embeddings",
92
+ json={"model": "qwen/qwen3-embedding-8b", "input": text}
93
+ )
94
+ return response["data"][0]["embedding"] # 4096 floats
95
+
96
+ async def hybrid_search(query_text: str, search_params: Dict):
97
+ vector = await embed_query(query_text) # 4096-dim
98
+ results = await qdrant_client.query_points(
99
+ collection_name="listings",
100
+ query=vector, # Search with 4096-dim
101
+ query_filter=build_filters(search_params),
102
+ limit=10
103
+ )
104
+ # PROBLEM: Separate retrieval & generation
105
+ # Brain LLM has to re-process retrieved listings
106
+ ```
107
+
108
+ #### With CLaRa:
109
+ ```python
110
+ # app/ai/services/clara_search_service.py (NEW)
111
+
112
+ from transformers import AutoModel, AutoTokenizer
113
+ import torch
114
+
115
+ # Load CLaRa model
116
+ clara_model = AutoModel.from_pretrained("apple/CLaRa-7B-Instruct")
117
+ clara_tokenizer = AutoTokenizer.from_pretrained("apple/CLaRa-7B-Instruct")
118
+
119
+ async def compress_listing_to_memory_tokens(listing: Dict) -> torch.Tensor:
120
+ """
121
+ Compress listing into continuous memory tokens (16x-128x smaller)
122
+
123
+ BEFORE: 4096-dim embedding + full payload
124
+ AFTER: 256-dim (16x) or 32-dim (128x) continuous token
125
+ """
126
+ # Build semantic text
127
+ text = f"{listing['title']}. {listing['bedrooms']}-bed in {listing['location']}. {listing['description']}"
128
+
129
+ # CLaRa compression (QA-guided semantic compression)
130
+ inputs = clara_tokenizer(text, return_tensors="pt")
131
+ with torch.no_grad():
132
+ compressed_token = clara_model.compress(
133
+ inputs,
134
+ compression_ratio=16 # or 128 for max compression
135
+ )
136
+
137
+ # Returns: 256-dim continuous memory token
138
+ # Preserves: "key reasoning signals" (location, price, features)
139
+ # Discards: Filler words, redundant descriptions
140
+ return compressed_token
141
+
142
+ async def clara_unified_search(query: str, search_params: Dict):
143
+ """
144
+ Unified retrieval + generation in CLaRa's shared latent space
145
+
146
+ BENEFIT: No need to re-encode for generation - already in shared space
147
+ """
148
+ # 1. Compress query
149
+ query_inputs = clara_tokenizer(query, return_tensors="pt")
150
+ query_token = clara_model.compress(query_inputs)
151
+
152
+ # 2. Retrieve in latent space (16x-128x faster than 4096-dim search)
153
+ # CLaRa's query encoder and generator share the same space
154
+ results = await qdrant_client.query_points(
155
+ collection_name="listings_clara_compressed",
156
+ query=query_token.tolist(), # 256-dim (16x smaller)
157
+ limit=10
158
+ )
159
+
160
+ # 3. Generate response DIRECTLY from compressed tokens
161
+ # No re-encoding needed - already in shared latent space
162
+ response = clara_model.generate_from_compressed(
163
+ query_token=query_token,
164
+ retrieved_tokens=[r.vector for r in results],
165
+ max_length=200
166
+ )
167
+
168
+ return {
169
+ "results": results,
170
+ "natural_response": response,
171
+ "compression_used": "16x"
172
+ }
173
+ ```
174
+
175
+ ### CLaRa Benefits for AIDA
176
+
177
+ | Benefit | Impact | Measurement |
178
+ |---------|--------|-------------|
179
+ | **Storage Savings** | 4096 β†’ 256 dims = 16x smaller | 1000 listings: 16MB β†’ 1MB |
180
+ | **Search Speed** | Smaller vectors = faster cosine similarity | 200ms β†’ 50ms (4x faster) |
181
+ | **Unified Processing** | Retrieval + generation in same space | No re-encoding overhead |
182
+ | **Semantic Preservation** | QA-guided compression keeps reasoning signals | Same search quality, less data |
183
+ | **Memory Efficiency** | Less Redis cache pressure | Can cache 16x more listings |
184
+
185
+ ### Migration Path to CLaRa
186
+
187
+ #### Phase 1: Parallel Deployment (Low Risk)
188
+ ```python
189
+ # app/ai/services/hybrid_search_router.py (NEW)
190
+
191
+ async def search_with_fallback(query: str, params: Dict):
192
+ """
193
+ Run CLaRa + Traditional RAG in parallel, compare results
194
+ """
195
+ clara_results, traditional_results = await asyncio.gather(
196
+ clara_unified_search(query, params),
197
+ hybrid_search(query, params) # Current implementation
198
+ )
199
+
200
+ # Log comparison metrics
201
+ logger.info("CLaRa vs Traditional",
202
+ clara_latency=clara_results['latency'],
203
+ trad_latency=traditional_results['latency'],
204
+ clara_count=len(clara_results['results']),
205
+ trad_count=len(traditional_results['results']))
206
+
207
+ # Use CLaRa if available, fallback to traditional
208
+ return clara_results if clara_results['success'] else traditional_results
209
+ ```
210
+
211
+ #### Phase 2: Gradual Indexing
212
+ ```python
213
+ # Migration script: sync_to_clara_compressed.py
214
+
215
+ async def migrate_to_clara():
216
+ """
217
+ Compress existing listings into CLaRa memory tokens
218
+ """
219
+ db = await get_db()
220
+ cursor = db.listings.find({"status": "active"})
221
+
222
+ async for listing in cursor:
223
+ # Compress to memory tokens
224
+ compressed_token = await compress_listing_to_memory_tokens(listing)
225
+
226
+ # Upsert to new collection
227
+ await qdrant_client.upsert(
228
+ collection_name="listings_clara_compressed",
229
+ points=[PointStruct(
230
+ id=str(listing["_id"]),
231
+ vector=compressed_token.tolist(), # 256-dim
232
+ payload={
233
+ "mongo_id": str(listing["_id"]),
234
+ "title": listing["title"],
235
+ "location": listing["location"],
236
+ "price": listing["price"],
237
+ # Minimal payload - most semantic info is in compressed token
238
+ }
239
+ )]
240
+ )
241
+ ```
242
+
243
+ #### Phase 3: Cutover
244
+ - Monitor CLaRa performance for 1 week
245
+ - If latency < 100ms and quality β‰₯ traditional RAG β†’ full cutover
246
+ - Deprecate old `qwen/qwen3-embedding-8b` embeddings
247
+
248
+ ---
249
+
250
+ ## Part 3: RLM Integration Strategy
251
+
252
+ ### What is RLM?
253
+
254
+ **RLM** = Recursive Language Models (from MIT CSAIL)
255
+
256
+ **Key Innovation**: Instead of processing entire context at once, RLM **recursively explores** text by:
257
+ 1. Decomposing queries into sub-tasks
258
+ 2. Calling itself on snippets
259
+ 3. Building up understanding through recursive reasoning
260
+
261
+ ### Where RLM Excels Over Current RAG
262
+
263
+ | Query Type | Current RAG Limitation | RLM Solution |
264
+ |------------|----------------------|--------------|
265
+ | **Multi-hop**: "3-bed near good schools AND safe neighborhood" | Single semantic search can't connect "schools" β†’ "safety" | Recursively explore: Find schools β†’ Check neighborhoods β†’ Cross-reference safety data |
266
+ | **Aggregation**: "Show me average prices in Cotonou vs Calavi" | No aggregation logic in vector search | Recursive aggregation: Search Cotonou β†’ Calculate avg β†’ Search Calavi β†’ Compare |
267
+ | **Complex filters**: "Under 500k OR (2-bed AND has pool)" | Boolean logic not native to vector similarity | Recursive decomposition: (Filter 1) βˆͺ (Filter 2 ∩ Filter 3) |
268
+
269
+ ### RLM Architecture for AIDA
270
+
271
+ ```python
272
+ # app/ai/services/rlm_search_service.py (NEW)
273
+
274
+ class RecursiveSearchAgent:
275
+ """
276
+ RLM-based search agent for complex multi-hop queries
277
+
278
+ Example Query: "3-bed apartments near international schools in
279
+ safe neighborhoods in Cotonou under 500k XOF"
280
+
281
+ Recursive Breakdown:
282
+ 1. Find international schools in Cotonou
283
+ 2. For each school β†’ Find safe neighborhoods within 2km
284
+ 3. For each neighborhood β†’ Find 3-bed apartments under 500k
285
+ 4. Aggregate results β†’ Return top matches
286
+ """
287
+
288
+ def __init__(self, brain_llm, search_service):
289
+ self.brain = brain_llm
290
+ self.search = search_service
291
+ self.max_depth = 3 # Prevent infinite recursion
292
+
293
+ async def recursive_search(
294
+ self,
295
+ query: str,
296
+ depth: int = 0,
297
+ context: Dict = None
298
+ ) -> List[Dict]:
299
+ """
300
+ Recursively decompose and execute complex queries
301
+ """
302
+ if depth > self.max_depth:
303
+ logger.warning("Max recursion depth reached")
304
+ return []
305
+
306
+ # Step 1: Decompose query using Brain LLM
307
+ decomposition = await self.brain.decompose_query(query, context)
308
+
309
+ if decomposition["is_atomic"]:
310
+ # Base case: Execute simple search
311
+ return await self.search.hybrid_search(query, decomposition["params"])
312
+
313
+ # Recursive case: Break into sub-queries
314
+ sub_results = []
315
+ for sub_query in decomposition["sub_queries"]:
316
+ sub_result = await self.recursive_search(
317
+ sub_query["query"],
318
+ depth=depth + 1,
319
+ context={**context, **sub_query["context"]}
320
+ )
321
+ sub_results.append(sub_result)
322
+
323
+ # Step 2: Aggregate sub-results using LLM reasoning
324
+ aggregated = await self.brain.aggregate_results(
325
+ query=query,
326
+ sub_results=sub_results,
327
+ strategy=decomposition["aggregation_strategy"] # "union", "intersection", "rank"
328
+ )
329
+
330
+ return aggregated
331
+
332
+ # Example Usage:
333
+ rlm_agent = RecursiveSearchAgent(brain_llm, search_service)
334
+
335
+ results = await rlm_agent.recursive_search(
336
+ "Find 3-bed apartments near international schools in safe neighborhoods in Cotonou under 500k"
337
+ )
338
+
339
+ # RLM Flow:
340
+ # 1. Decompose: "Find international schools in Cotonou"
341
+ # β†’ Calls itself: search("international schools Cotonou")
342
+ # 2. For each school location:
343
+ # β†’ Calls itself: search("safe neighborhoods within 2km of {school.lat, school.lon}")
344
+ # 3. For each neighborhood:
345
+ # β†’ Calls itself: search("3-bed apartments under 500k in {neighborhood}")
346
+ # 4. Aggregate all results β†’ Rank by proximity to schools + safety score
347
+ ```
348
+
349
+ ### RLM Benefits for AIDA
350
+
351
+ | Benefit | Impact |
352
+ |---------|--------|
353
+ | **Complex Queries** | Handle multi-hop reasoning (schools β†’ safety β†’ apartments) |
354
+ | **Boolean Logic** | Native support for AND/OR/NOT conditions |
355
+ | **Aggregation** | Calculate averages, comparisons across locations |
356
+ | **Context Preservation** | Each recursive call maintains full reasoning chain |
357
+ | **Explainability** | Can show reasoning tree to users ("I found 3 schools, then...") |
358
+
359
+ ### Integration with CLaRa
360
+
361
+ **Best of Both Worlds**: CLaRa for fast retrieval, RLM for deep reasoning
362
+
363
+ ```python
364
+ async def clara_rlm_hybrid_search(query: str):
365
+ """
366
+ Use CLaRa for speed, RLM for depth
367
+
368
+ Flow:
369
+ 1. Quick check: Is this a simple query? β†’ Use CLaRa only (fast path)
370
+ 2. Complex query? β†’ Use RLM to decompose β†’ CLaRa for each sub-query (deep path)
371
+ """
372
+ complexity = await analyze_query_complexity(query)
373
+
374
+ if complexity == "simple":
375
+ # Fast path: CLaRa unified search
376
+ return await clara_unified_search(query, params)
377
+
378
+ else:
379
+ # Deep path: RLM decomposes β†’ CLaRa executes each step
380
+ rlm_agent = RecursiveSearchAgent(
381
+ brain_llm=brain_llm,
382
+ search_service=clara_unified_search # Use CLaRa as base search engine
383
+ )
384
+ return await rlm_agent.recursive_search(query)
385
+ ```
386
+
387
+ ---
388
+
389
+ ## Part 4: Implementation Roadmap
390
+
391
+ ### Timeline: 12 Weeks
392
+
393
+ #### **Week 1-2: Research & Setup**
394
+ - [ ] Test CLaRa-7B-Instruct locally with sample listings
395
+ - [ ] Benchmark compression ratio (16x vs 128x) vs search quality
396
+ - [ ] Measure latency: CLaRa vs current qwen3-embedding-8b
397
+ - [ ] Set up RLM proof-of-concept with MIT framework
398
+
399
+ #### **Week 3-4: CLaRa Pilot**
400
+ - [ ] Create `listings_clara_compressed` Qdrant collection
401
+ - [ ] Implement `compress_listing_to_memory_tokens()` function
402
+ - [ ] Migrate 100 test listings to CLaRa compressed format
403
+ - [ ] A/B test: CLaRa vs traditional RAG on 100 real queries
404
+ - [ ] Measure: latency, storage, search quality (user feedback)
405
+
406
+ #### **Week 5-6: RLM Prototype**
407
+ - [ ] Implement `RecursiveSearchAgent` class
408
+ - [ ] Build query decomposition logic with Brain LLM
409
+ - [ ] Test on complex queries: "3-bed near schools in safe areas under 500k"
410
+ - [ ] Validate: Does RLM find better results than single-hop RAG?
411
+
412
+ #### **Week 7-8: Integration**
413
+ - [ ] Build `clara_rlm_hybrid_search()` router
414
+ - [ ] Simple queries β†’ CLaRa (fast path)
415
+ - [ ] Complex queries β†’ RLM + CLaRa (deep path)
416
+ - [ ] Add query complexity classifier
417
+
418
+ #### **Week 9-10: Production Prep**
419
+ - [ ] Migrate all active listings to CLaRa compressed format
420
+ - [ ] Set up monitoring: Latency, storage, cache hit rates
421
+ - [ ] Implement fallback to traditional RAG (safety net)
422
+ - [ ] Load testing: 1000 concurrent searches
423
+
424
+ #### **Week 11-12: Deployment & Optimization**
425
+ - [ ] Deploy CLaRa to production (gradual rollout: 10% β†’ 50% β†’ 100%)
426
+ - [ ] Monitor performance vs baseline
427
+ - [ ] Fine-tune compression ratio based on real-world data
428
+ - [ ] Optimize RLM recursion depth and caching
429
+
430
+ ---
431
+
432
+ ## Part 5: Expected Impact
433
+
434
+ ### Performance Gains
435
+
436
+ | Metric | Current | With CLaRa | With CLaRa + RLM |
437
+ |--------|---------|-----------|-----------------|
438
+ | **Search Latency** | 200-500ms | 50-150ms (3-4x faster) | 100-300ms (complex queries) |
439
+ | **Storage (1000 listings)** | 16MB vectors | 1MB (16x smaller) | 1MB + reasoning cache |
440
+ | **Complex Query Support** | ❌ Single-hop only | βœ… Fast retrieval | βœ…βœ… Multi-hop reasoning |
441
+ | **Memory Efficiency** | 66KB/listing | 5KB/listing (13x better) | 5KB + context cache |
442
+
443
+ ### Cost Savings
444
+
445
+ ```
446
+ Qdrant Cloud Costs (Estimated):
447
+ - Current: 16MB vectors + 50MB payloads = $XX/month
448
+ - With CLaRa: 1MB vectors + 10MB payloads = $YY/month (80% savings)
449
+
450
+ OpenRouter Embedding API:
451
+ - Current: 1000 queries/day Γ— $0.0001/query = $3/month
452
+ - With CLaRa: Reduced by 50% (fewer re-embeddings) = $1.50/month
453
+ ```
454
+
455
+ ### User Experience
456
+
457
+ | Before | After |
458
+ |--------|-------|
459
+ | "Find 3-bed in Cotonou" β†’ 10 results (generic) | "Find 3-bed in Cotonou" β†’ 10 results (same speed, less cost) |
460
+ | "Find apartment near school" β†’ Mixed results (no school proximity logic) | "Find apartment near school" β†’ RLM finds schools β†’ ranks by proximity |
461
+ | Complex queries fail or return irrelevant results | Multi-hop reasoning delivers accurate results |
462
+
463
+ ---
464
+
465
+ ## Part 6: Risk Analysis & Mitigation
466
+
467
+ ### Risks
468
+
469
+ | Risk | Impact | Mitigation |
470
+ |------|--------|------------|
471
+ | **CLaRa Model Size** | 7B parameters = high memory | Use quantized version (4-bit) or cloud API |
472
+ | **Compression Loss** | Over-compression loses semantic detail | Test 16x vs 128x, pick optimal ratio |
473
+ | **RLM Recursion Depth** | Infinite loops or slow queries | Max depth limit = 3, timeout after 5s |
474
+ | **Integration Complexity** | Breaking existing search flow | Parallel deployment, gradual rollout |
475
+ | **Vendor Lock-in** | Relying on Apple CLaRa | Keep traditional RAG as fallback |
476
+
477
+ ### Mitigation Strategy
478
+
479
+ 1. **Parallel Deployment**: Run CLaRa + Traditional RAG side-by-side for 2 weeks
480
+ 2. **Gradual Rollout**: Start with 10% traffic β†’ Monitor β†’ Scale to 100%
481
+ 3. **Fallback Mechanism**: If CLaRa fails β†’ Auto-fallback to qwen3-embedding-8b
482
+ 4. **A/B Testing**: Measure user satisfaction (click-through rate, booking conversions)
483
+
484
+ ---
485
+
486
+ ## Part 7: Next Steps
487
+
488
+ ### Immediate Actions (This Week)
489
+
490
+ 1. **Research**:
491
+ - [ ] Clone CLaRa repo: `git clone https://github.com/apple/ml-clara`
492
+ - [ ] Review Hugging Face model card: https://huggingface.co/apple/CLaRa-7B-Instruct
493
+ - [ ] Read MIT RLM paper: https://arxiv.org/abs/[RLM-paper-id]
494
+
495
+ 2. **Prototype**:
496
+ - [ ] Create `docs/clara_prototype.py` (compression test)
497
+ - [ ] Test with 10 sample listings
498
+ - [ ] Measure: original size vs compressed size vs search quality
499
+
500
+ 3. **Planning**:
501
+ - [ ] Schedule team meeting to review this plan
502
+ - [ ] Estimate GPU/CPU requirements for CLaRa inference
503
+ - [ ] Check budget for cloud inference (AWS SageMaker, Modal, etc.)
504
+
505
+ ### Questions to Answer
506
+
507
+ 1. **Hosting**: Run CLaRa locally (GPU required) or use cloud API?
508
+ 2. **Compression Ratio**: 16x or 128x? (Trade-off: speed vs quality)
509
+ 3. **RLM Priority**: Do we need multi-hop reasoning now, or focus on CLaRa first?
510
+ 4. **User Impact**: Will users notice the difference? (Faster search? Better results?)
511
+
512
+ ---
513
+
514
+ ## Conclusion
515
+
516
+ **CLaRa** and **RLM** represent the next evolution of RAG architecture:
517
+
518
+ - **CLaRa** β†’ **16x faster search, 90% storage savings, unified retrieval + generation**
519
+ - **RLM** β†’ **Multi-hop reasoning for complex queries traditional RAG can't handle**
520
+
521
+ Your AIDA backend is already well-architected with:
522
+ - βœ… Hybrid search strategies
523
+ - βœ… Intelligent routing
524
+ - βœ… Real-time vector sync
525
+ - βœ… Conversation memory
526
+
527
+ Adding CLaRa + RLM would **supercharge** this foundation, making AIDA:
528
+ 1. **Faster** (3-4x search speed)
529
+ 2. **Cheaper** (80% storage savings)
530
+ 3. **Smarter** (multi-hop reasoning)
531
+ 4. **More scalable** (handle 10x more listings without performance degradation)
532
+
533
+ **Recommended First Step**: Start with **CLaRa pilot** (Week 1-4) to prove compression works, then add **RLM** for complex queries.
534
+
535
+ ---
536
+
537
+ **Contact**: For questions or to discuss implementation details, ping the team.
test_rlm.py ADDED
@@ -0,0 +1,481 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ RLM (Recursive Language Model) Test Suite for AIDA
4
+
5
+ Tests:
6
+ 1. Query Analyzer - Detect complex query types
7
+ 2. RLM Search Service - Execute recursive searches
8
+ 3. Integration - End-to-end flow
9
+
10
+ Run with:
11
+ python test_rlm.py
12
+ python test_rlm.py --live # Run with actual LLM calls
13
+
14
+ Author: AIDA Team
15
+ Date: 2026-02-09
16
+ """
17
+
18
+ import asyncio
19
+ import sys
20
+ import json
21
+ from typing import List, Dict
22
+
23
+ # Add project root to path
24
+ sys.path.insert(0, ".")
25
+
26
+
27
+ # =============================================================================
28
+ # Color output for terminal
29
+ # =============================================================================
30
+
31
+ class Colors:
32
+ HEADER = '\033[95m'
33
+ BLUE = '\033[94m'
34
+ CYAN = '\033[96m'
35
+ GREEN = '\033[92m'
36
+ WARNING = '\033[93m'
37
+ FAIL = '\033[91m'
38
+ ENDC = '\033[0m'
39
+ BOLD = '\033[1m'
40
+
41
+
42
+ def print_header(text: str):
43
+ print(f"\n{Colors.HEADER}{Colors.BOLD}{'='*60}{Colors.ENDC}")
44
+ print(f"{Colors.HEADER}{Colors.BOLD}{text}{Colors.ENDC}")
45
+ print(f"{Colors.HEADER}{Colors.BOLD}{'='*60}{Colors.ENDC}\n")
46
+
47
+
48
+ def print_success(text: str):
49
+ print(f"{Colors.GREEN}βœ… {text}{Colors.ENDC}")
50
+
51
+
52
+ def print_fail(text: str):
53
+ print(f"{Colors.FAIL}❌ {text}{Colors.ENDC}")
54
+
55
+
56
+ def print_info(text: str):
57
+ print(f"{Colors.CYAN}ℹ️ {text}{Colors.ENDC}")
58
+
59
+
60
+ def print_warning(text: str):
61
+ print(f"{Colors.WARNING}⚠️ {text}{Colors.ENDC}")
62
+
63
+
64
+ # =============================================================================
65
+ # Test 1: Query Analyzer
66
+ # =============================================================================
67
+
68
+ def test_query_analyzer():
69
+ """Test the RLM Query Analyzer"""
70
+ print_header("Test 1: RLM Query Analyzer")
71
+
72
+ from app.ai.services.rlm_query_analyzer import (
73
+ analyze_query_complexity,
74
+ QueryComplexity
75
+ )
76
+
77
+ test_cases = [
78
+ # Multi-hop queries
79
+ ("3-bed apartment near international schools in Cotonou", QueryComplexity.MULTI_HOP),
80
+ ("House close to the beach in Calavi", QueryComplexity.MULTI_HOP),
81
+ ("Apartment within 2km of the airport", QueryComplexity.MULTI_HOP),
82
+ ("Find something near the university", QueryComplexity.MULTI_HOP),
83
+
84
+ # Boolean OR queries
85
+ ("Under 500k XOF or has a pool", QueryComplexity.BOOLEAN_OR),
86
+ ("2-bedroom or 3-bedroom in Cotonou", QueryComplexity.BOOLEAN_OR),
87
+ ("Either furnished or with parking", QueryComplexity.BOOLEAN_OR),
88
+
89
+ # Comparative queries
90
+ ("Compare prices in Cotonou vs Calavi", QueryComplexity.COMPARATIVE),
91
+ ("Which is cheaper: 2-bed in Cotonou or 3-bed in Calavi?", QueryComplexity.COMPARATIVE),
92
+ ("Difference between rent in Porto-Novo and Cotonou", QueryComplexity.COMPARATIVE),
93
+
94
+ # Aggregation queries
95
+ ("What is the average price in Cotonou?", QueryComplexity.AGGREGATION),
96
+ ("How many 3-bed apartments are available?", QueryComplexity.AGGREGATION),
97
+ ("Total listings in Calavi", QueryComplexity.AGGREGATION),
98
+
99
+ # Multi-factor queries
100
+ ("Best family apartment near schools and parks in safe area", QueryComplexity.MULTI_FACTOR),
101
+ ("Top luxury modern apartments with good security", QueryComplexity.MULTI_FACTOR),
102
+ ("Ideal quiet peaceful home for family", QueryComplexity.MULTI_FACTOR),
103
+
104
+ # Simple queries (should NOT trigger RLM)
105
+ ("3-bed apartment in Cotonou", QueryComplexity.SIMPLE),
106
+ ("Houses under 500k", QueryComplexity.SIMPLE),
107
+ ("Furnished apartment for rent", QueryComplexity.SIMPLE),
108
+ ]
109
+
110
+ passed = 0
111
+ failed = 0
112
+
113
+ for query, expected_complexity in test_cases:
114
+ analysis = analyze_query_complexity(query)
115
+
116
+ if analysis.complexity == expected_complexity:
117
+ passed += 1
118
+ print_success(f"'{query[:40]}...' β†’ {analysis.complexity.value}")
119
+ else:
120
+ failed += 1
121
+ print_fail(f"'{query[:40]}...'")
122
+ print(f" Expected: {expected_complexity.value}")
123
+ print(f" Got: {analysis.complexity.value}")
124
+ print(f" Reasoning: {analysis.reasoning}")
125
+
126
+ print(f"\n{Colors.BOLD}Results: {passed}/{len(test_cases)} passed{Colors.ENDC}")
127
+ return failed == 0
128
+
129
+
130
+ # =============================================================================
131
+ # Test 2: Strategy Selector Integration
132
+ # =============================================================================
133
+
134
+ async def test_strategy_selector():
135
+ """Test that strategy selector correctly routes to RLM"""
136
+ print_header("Test 2: Strategy Selector RLM Routing")
137
+
138
+ from app.ai.services.search_strategy_selector import (
139
+ select_search_strategy,
140
+ SearchStrategy
141
+ )
142
+
143
+ test_cases = [
144
+ # RLM strategies
145
+ {
146
+ "query": "3-bed near schools in Cotonou",
147
+ "params": {"location": "Cotonou", "bedrooms": 3},
148
+ "expected_rlm": True,
149
+ "expected_strategy": SearchStrategy.RLM_MULTI_HOP
150
+ },
151
+ {
152
+ "query": "Under 500k or has pool",
153
+ "params": {"max_price": 500000},
154
+ "expected_rlm": True,
155
+ "expected_strategy": SearchStrategy.RLM_BOOLEAN_OR
156
+ },
157
+ {
158
+ "query": "Compare Cotonou vs Calavi",
159
+ "params": {},
160
+ "expected_rlm": True,
161
+ "expected_strategy": SearchStrategy.RLM_COMPARATIVE
162
+ },
163
+
164
+ # Traditional strategies (should NOT use RLM)
165
+ {
166
+ "query": "3-bed apartment in Cotonou under 500k",
167
+ "params": {"location": "Cotonou", "bedrooms": 3, "max_price": 500000},
168
+ "expected_rlm": False,
169
+ "expected_strategy": SearchStrategy.MONGO_ONLY
170
+ },
171
+ ]
172
+
173
+ passed = 0
174
+ failed = 0
175
+
176
+ for case in test_cases:
177
+ result = await select_search_strategy(case["query"], case["params"])
178
+
179
+ rlm_match = result.get("use_rlm", False) == case["expected_rlm"]
180
+ strategy_match = result["strategy"] == case["expected_strategy"]
181
+
182
+ if rlm_match and strategy_match:
183
+ passed += 1
184
+ print_success(f"'{case['query'][:40]}...'")
185
+ print(f" Strategy: {result['strategy'].value}")
186
+ print(f" RLM: {result.get('use_rlm', False)}")
187
+ else:
188
+ failed += 1
189
+ print_fail(f"'{case['query'][:40]}...'")
190
+ print(f" Expected: {case['expected_strategy'].value}, RLM={case['expected_rlm']}")
191
+ print(f" Got: {result['strategy'].value}, RLM={result.get('use_rlm', False)}")
192
+
193
+ print(f"\n{Colors.BOLD}Results: {passed}/{len(test_cases)} passed{Colors.ENDC}")
194
+ return failed == 0
195
+
196
+
197
+ # =============================================================================
198
+ # Test 3: RLM Search Service (LIVE)
199
+ # =============================================================================
200
+
201
+ async def test_rlm_search_live():
202
+ """Test the RLM Search Service with actual LLM calls"""
203
+ print_header("Test 3: RLM Search Service (LIVE)")
204
+
205
+ print_warning("This test makes actual API calls to DeepSeek LLM")
206
+ print_info("Ensure DEEPSEEK_API_KEY is set in your environment\n")
207
+
208
+ from app.ai.services.rlm_search_service import rlm_search
209
+
210
+ test_queries = [
211
+ {
212
+ "query": "3-bed apartment near schools in Cotonou",
213
+ "description": "Multi-hop proximity search"
214
+ },
215
+ {
216
+ "query": "Under 300k or has pool",
217
+ "description": "Boolean OR query"
218
+ },
219
+ {
220
+ "query": "Compare average prices in Cotonou vs Calavi",
221
+ "description": "Comparative analysis"
222
+ },
223
+ {
224
+ "query": "Best family apartment near schools and parks",
225
+ "description": "Multi-factor ranking"
226
+ },
227
+ ]
228
+
229
+ for i, test in enumerate(test_queries, 1):
230
+ print(f"\n{Colors.CYAN}Test {i}: {test['description']}{Colors.ENDC}")
231
+ print(f"Query: \"{test['query']}\"")
232
+
233
+ try:
234
+ result = await rlm_search(test["query"])
235
+
236
+ print_success(f"Strategy used: {result.get('strategy_used', 'Unknown')}")
237
+ print(f" Results: {len(result.get('results', []))} listings")
238
+ print(f" LLM calls: {result.get('call_count', 'N/A')}")
239
+
240
+ if result.get("reasoning_steps"):
241
+ print(f" Reasoning steps:")
242
+ for step in result["reasoning_steps"][:3]:
243
+ print(f" - {step.get('step', 'unknown')}: {json.dumps(step, default=str)[:80]}...")
244
+
245
+ if result.get("message"):
246
+ print(f" Message: {result['message'][:100]}...")
247
+
248
+ if result.get("comparison_data"):
249
+ print(f" Comparison data available: Yes")
250
+
251
+ except Exception as e:
252
+ print_fail(f"Error: {str(e)}")
253
+
254
+ return True
255
+
256
+
257
+ # =============================================================================
258
+ # Test 4: Query Pattern Detection
259
+ # =============================================================================
260
+
261
+ def test_pattern_detection():
262
+ """Test specific pattern detection in queries"""
263
+ print_header("Test 4: Pattern Detection")
264
+
265
+ from app.ai.services.rlm_query_analyzer import analyze_query_complexity
266
+
267
+ # Test POI detection
268
+ poi_queries = [
269
+ ("apartment near the school", "school"),
270
+ ("house close to beach", "beach"),
271
+ ("near the university campus", "university"),
272
+ ("walking distance from hospital", "hospital"),
273
+ ("close to the market", "market"),
274
+ ("near the airport", "airport"),
275
+ ]
276
+
277
+ print(f"{Colors.BOLD}POI (Point of Interest) Detection:{Colors.ENDC}")
278
+ for query, expected_poi in poi_queries:
279
+ analysis = analyze_query_complexity(query)
280
+ poi_found = any(expected_poi in p.lower() for p in analysis.detected_patterns)
281
+ if poi_found:
282
+ print_success(f"'{query}' β†’ Detected '{expected_poi}'")
283
+ else:
284
+ print_fail(f"'{query}' β†’ Expected '{expected_poi}', got {analysis.detected_patterns}")
285
+
286
+ # Test French queries
287
+ print(f"\n{Colors.BOLD}French Query Detection:{Colors.ENDC}")
288
+ french_queries = [
289
+ ("appartement près de l'école", True), # Near school
290
+ ("maison proche de la plage", True), # Close to beach
291
+ ("comparer les prix", True), # Compare prices
292
+ ("appartement 3 chambres Γ  Cotonou", False), # Simple query
293
+ ]
294
+
295
+ for query, expected_rlm in french_queries:
296
+ analysis = analyze_query_complexity(query)
297
+ if analysis.use_rlm == expected_rlm:
298
+ print_success(f"'{query}' β†’ RLM={analysis.use_rlm}")
299
+ else:
300
+ print_fail(f"'{query}' β†’ Expected RLM={expected_rlm}, got {analysis.use_rlm}")
301
+
302
+ return True
303
+
304
+
305
+ # =============================================================================
306
+ # Test 5: Distance Calculation
307
+ # =============================================================================
308
+
309
+ def test_distance_calculation():
310
+ """Test the Haversine distance calculation"""
311
+ print_header("Test 5: Distance Calculation (Haversine)")
312
+
313
+ from app.ai.services.rlm_search_service import RLMSearchAgent
314
+
315
+ agent = RLMSearchAgent()
316
+
317
+ # Known distances (approximate)
318
+ test_cases = [
319
+ # (lat1, lon1, lat2, lon2, expected_km, tolerance_km)
320
+ (6.3654, 2.4183, 6.3700, 2.4200, 0.5, 0.3), # Nearby in Cotonou
321
+ (6.3654, 2.4183, 6.4300, 2.3500, 10, 2), # Cross-city
322
+ (6.3654, 2.4183, 6.5000, 2.0000, 50, 10), # Longer distance
323
+ ]
324
+
325
+ passed = 0
326
+ for lat1, lon1, lat2, lon2, expected, tolerance in test_cases:
327
+ distance = agent._calculate_distance(lat1, lon1, lat2, lon2)
328
+ within_tolerance = abs(distance - expected) <= tolerance
329
+
330
+ if within_tolerance:
331
+ passed += 1
332
+ print_success(f"({lat1}, {lon1}) β†’ ({lat2}, {lon2}): {distance:.2f} km (expected ~{expected} km)")
333
+ else:
334
+ print_fail(f"({lat1}, {lon1}) β†’ ({lat2}, {lon2}): {distance:.2f} km (expected ~{expected} km)")
335
+
336
+ print(f"\n{Colors.BOLD}Results: {passed}/{len(test_cases)} passed{Colors.ENDC}")
337
+ return passed == len(test_cases)
338
+
339
+
340
+ # =============================================================================
341
+ # Test 6: OpenStreetMap POI Service
342
+ # =============================================================================
343
+
344
+ async def test_osm_poi_service():
345
+ """Test the OpenStreetMap POI service integration"""
346
+ print_header("Test 6: OpenStreetMap POI Service")
347
+
348
+ print_info("This test makes real API calls to OpenStreetMap (FREE)")
349
+ print_info("Testing: Nominatim geocoding + Overpass POI search\n")
350
+
351
+ from app.ai.services.osm_poi_service import (
352
+ geocode_location,
353
+ find_pois,
354
+ find_pois_overpass,
355
+ calculate_distance_km
356
+ )
357
+
358
+ # Test 1: Geocoding
359
+ print(f"{Colors.BOLD}1. Geocoding Test:{Colors.ENDC}")
360
+ coords = await geocode_location("Cotonou, Benin")
361
+ if coords:
362
+ print_success(f"Geocoded 'Cotonou, Benin' β†’ ({coords[0]:.4f}, {coords[1]:.4f})")
363
+ else:
364
+ print_fail("Failed to geocode 'Cotonou, Benin'")
365
+
366
+ # Test 2: Find Schools
367
+ print(f"\n{Colors.BOLD}2. Find Schools in Cotonou:{Colors.ENDC}")
368
+ schools = await find_pois("school", "Cotonou, Benin", radius_km=3, limit=5)
369
+ print(f" Found {len(schools)} schools:")
370
+ for school in schools[:3]:
371
+ print(f" - {school['name']} ({school['lat']:.4f}, {school['lon']:.4f})")
372
+
373
+ # Test 3: Find Hospitals
374
+ print(f"\n{Colors.BOLD}3. Find Hospitals in Cotonou:{Colors.ENDC}")
375
+ hospitals = await find_pois("hospital", "Cotonou, Benin", radius_km=5, limit=5)
376
+ print(f" Found {len(hospitals)} hospitals:")
377
+ for hospital in hospitals[:3]:
378
+ print(f" - {hospital['name']} ({hospital['lat']:.4f}, {hospital['lon']:.4f})")
379
+
380
+ # Test 4: French POI type
381
+ print(f"\n{Colors.BOLD}4. French POI Type 'plage' (beach):{Colors.ENDC}")
382
+ beaches = await find_pois("plage", "Cotonou, Benin", radius_km=10, limit=5)
383
+ print(f" Found {len(beaches)} beaches")
384
+
385
+ # Test 5: Distance calculation
386
+ print(f"\n{Colors.BOLD}5. Distance Calculation:{Colors.ENDC}")
387
+ if coords and schools:
388
+ dist = calculate_distance_km(
389
+ coords[0], coords[1],
390
+ schools[0]["lat"], schools[0]["lon"]
391
+ )
392
+ print_success(f"Distance from Cotonou center to {schools[0]['name']}: {dist:.2f} km")
393
+
394
+ # Test 6: Integration with RLM
395
+ print(f"\n{Colors.BOLD}6. RLM Integration Test:{Colors.ENDC}")
396
+ from app.ai.services.rlm_search_service import RLMSearchAgent
397
+ agent = RLMSearchAgent()
398
+
399
+ pois = await agent._find_poi_locations("school", "Cotonou, Benin")
400
+ if pois:
401
+ print_success(f"RLM agent found {len(pois)} schools via OSM")
402
+ print(f" First result: {pois[0].get('name', 'Unknown')}")
403
+ else:
404
+ print_warning("RLM agent found no schools (may be network issue)")
405
+
406
+ print(f"\n{Colors.BOLD}OSM Integration Complete!{Colors.ENDC}")
407
+ return True
408
+
409
+
410
+ # =============================================================================
411
+ # Main
412
+ # =============================================================================
413
+
414
+ async def main():
415
+ """Run all tests"""
416
+ print(f"\n{Colors.BOLD}{Colors.HEADER}")
417
+ print("╔═══════════════════════════════════════════════════════════╗")
418
+ print("β•‘ RLM (Recursive Language Model) Test Suite for AIDA β•‘")
419
+ print("β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•")
420
+ print(f"{Colors.ENDC}\n")
421
+
422
+ live_mode = "--live" in sys.argv
423
+
424
+ all_passed = True
425
+
426
+ # Test 1: Query Analyzer (no LLM calls)
427
+ if not test_query_analyzer():
428
+ all_passed = False
429
+
430
+ # Test 2: Strategy Selector
431
+ if not await test_strategy_selector():
432
+ all_passed = False
433
+
434
+ # Test 3: Pattern Detection
435
+ if not test_pattern_detection():
436
+ all_passed = False
437
+
438
+ # Test 4: Distance Calculation
439
+ if not test_distance_calculation():
440
+ all_passed = False
441
+
442
+ # Test 5: OpenStreetMap POI Service
443
+ await test_osm_poi_service()
444
+
445
+ # Test 6: Live RLM Search (only if --live flag)
446
+ if live_mode:
447
+ print_warning("\nRunning LIVE tests with actual LLM calls...")
448
+ await test_rlm_search_live()
449
+ else:
450
+ print_info("\nSkipping live LLM tests. Run with --live flag to include them.")
451
+ print_info("Example: python test_rlm.py --live")
452
+
453
+ # Summary
454
+ print_header("Test Summary")
455
+ if all_passed:
456
+ print_success("All offline tests passed!")
457
+ print_info("RLM is ready to use in AIDA.")
458
+ else:
459
+ print_fail("Some tests failed. Check the output above.")
460
+
461
+ # Usage examples
462
+ print(f"\n{Colors.BOLD}Usage Examples:{Colors.ENDC}")
463
+ print("""
464
+ # In your code:
465
+ from app.ai.services.rlm_search_service import rlm_search
466
+
467
+ # Multi-hop search (near POI)
468
+ results = await rlm_search("3-bed near schools in Cotonou")
469
+
470
+ # Boolean OR
471
+ results = await rlm_search("under 500k or has pool")
472
+
473
+ # Comparative
474
+ results = await rlm_search("compare Cotonou vs Calavi")
475
+
476
+ # The brain.py automatically uses RLM when appropriate!
477
+ """)
478
+
479
+
480
+ if __name__ == "__main__":
481
+ asyncio.run(main())