minhvtt commited on
Commit
d97be90
·
verified ·
1 Parent(s): 1fcf78a

Upload 7 files

Browse files
Files changed (7) hide show
  1. API_DOCUMENTATION.md +533 -0
  2. README.md +133 -12
  3. app.py +447 -0
  4. config.py +50 -0
  5. database.py +111 -0
  6. main.py +91 -0
  7. requirements.txt +42 -0
API_DOCUMENTATION.md ADDED
@@ -0,0 +1,533 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audience Segmentation AI - REST API Documentation
2
+
3
+ ## Base Information
4
+
5
+ **Base URL**: `http://localhost:7860`
6
+ **API Documentation**: `/api/docs` (Swagger UI)
7
+ **Content-Type**: `application/json`
8
+
9
+ ---
10
+
11
+ ## Health & System
12
+
13
+ ### GET `/health`
14
+ Check API and database connection status.
15
+
16
+ **Response:**
17
+ ```json
18
+ {
19
+ "status": "healthy",
20
+ "timestamp": "2025-11-24T00:00:00",
21
+ "database": "connected"
22
+ }
23
+ ```
24
+
25
+ ---
26
+
27
+ ## Event Analysis
28
+
29
+ ### POST `/api/events/{event_code}/analyze`
30
+ Trigger full AI analysis pipeline for an event (Segmentation + Sentiment + GenAI).
31
+
32
+ **Path Parameters:**
33
+ - `event_code` (string, required): Event identifier
34
+
35
+ **Response:**
36
+ ```json
37
+ {
38
+ "status": "started",
39
+ "message": "Analysis pipeline started for event {event_code}",
40
+ "job_id": "analysis_abc123"
41
+ }
42
+ ```
43
+
44
+ ### GET `/api/events/{event_code}/dashboard`
45
+ Get comprehensive analytics dashboard for Event Owner.
46
+
47
+ **Path Parameters:**
48
+ - `event_code` (string, required): Event identifier
49
+
50
+ **Response:**
51
+ ```json
52
+ {
53
+ "event_code": "event_123",
54
+ "segments": [
55
+ {
56
+ "id": "507f1f77bcf86cd799439011",
57
+ "segment_name": "VIP Khách Hàng Trung Thành",
58
+ "user_count": 150,
59
+ "criteria": {
60
+ "avg_spend": 1500000,
61
+ "avg_tickets": 5.2,
62
+ "avg_recency": 15
63
+ },
64
+ "marketing_content": {
65
+ "email_subject": "Ưu đãi đặc biệt cho bạn!",
66
+ "email_body": "...",
67
+ "status": "Draft",
68
+ "generated_at": "2025-11-24T00:00:00"
69
+ }
70
+ }
71
+ ],
72
+ "sentiment_summary": {
73
+ "total_comments": 200,
74
+ "sentiment_distribution": {
75
+ "Positive": 150,
76
+ "Negative": 30,
77
+ "Neutral": 20
78
+ },
79
+ "avg_confidence": 0.87,
80
+ "top_keywords": ["tuyệt vời", "âm thanh", "tổ chức"],
81
+ "ai_insights": {
82
+ "summary": "Sự kiện được đánh giá tích cực...",
83
+ "top_issues": ["Check-in chậm", "Âm thanh yếu"],
84
+ "improvement_suggestions": ["Tăng quầy check-in", "Nâng cấp loa"],
85
+ "predicted_nps": 65.5
86
+ }
87
+ }
88
+ }
89
+ ```
90
+
91
+ ---
92
+
93
+ ## Audience Segmentation
94
+
95
+ ### POST `/api/events/{event_code}/segmentation/run`
96
+ Run segmentation analysis for an event.
97
+
98
+ **Path Parameters:**
99
+ - `event_code` (string, required): Event identifier
100
+
101
+ **Query Parameters:**
102
+ - `n_clusters` (integer, optional): Number of segments (default: 5)
103
+
104
+ **Response:**
105
+ ```json
106
+ {
107
+ "status": "started",
108
+ "message": "Segmentation started",
109
+ "event_code": "event_123"
110
+ }
111
+ ```
112
+
113
+ ### GET `/api/events/{event_code}/segments`
114
+ Get all audience segments for an event.
115
+
116
+ **Path Parameters:**
117
+ - `event_code` (string, required): Event identifier
118
+
119
+ **Query Parameters:**
120
+ - `status` (string, optional): Filter by status (`Draft`, `Approved`, `Sent`)
121
+
122
+ **Response:**
123
+ ```json
124
+ [
125
+ {
126
+ "id": "507f1f77bcf86cd799439011",
127
+ "event_code": "event_123",
128
+ "segment_name": "VIP Khách Hàng Trung Thành",
129
+ "segment_type": "High Value",
130
+ "user_count": 150,
131
+ "user_ids": ["user_1", "user_2", "..."],
132
+ "criteria": {
133
+ "avg_spend": 1500000,
134
+ "avg_tickets": 5.2,
135
+ "avg_recency": 15,
136
+ "avg_follow_count": 3
137
+ },
138
+ "marketing_content": {
139
+ "email_subject": "Ưu đãi VIP đặc biệt",
140
+ "email_body": "Kính gửi Quý khách...",
141
+ "status": "Draft",
142
+ "generated_at": "2025-11-24T00:00:00"
143
+ },
144
+ "created_at": "2025-11-24T00:00:00",
145
+ "last_updated": "2025-11-24T00:00:00"
146
+ }
147
+ ]
148
+ ```
149
+
150
+ ### GET `/api/events/{event_code}/segments/{segment_id}`
151
+ Get specific segment details.
152
+
153
+ **Path Parameters:**
154
+ - `event_code` (string, required)
155
+ - `segment_id` (string, required): Segment ObjectId
156
+
157
+ **Response:**
158
+ ```json
159
+ {
160
+ "id": "507f1f77bcf86cd799439011",
161
+ "event_code": "event_123",
162
+ "segment_name": "VIP Khách Hàng Trung Thành",
163
+ "user_count": 150,
164
+ "user_ids": ["user_1", "user_2"],
165
+ "criteria": {...},
166
+ "marketing_content": {...}
167
+ }
168
+ ```
169
+
170
+ ### GET `/api/events/{event_code}/segments/{segment_id}/users`
171
+ Get user list in a segment.
172
+
173
+ **Path Parameters:**
174
+ - `event_code` (string, required)
175
+ - `segment_id` (string, required)
176
+
177
+ **Query Parameters:**
178
+ - `skip` (integer): Offset (default: 0)
179
+ - `limit` (integer): Max results (default: 100)
180
+
181
+ **Response:**
182
+ ```json
183
+ {
184
+ "segment_id": "507f1f77bcf86cd799439011",
185
+ "total_users": 150,
186
+ "users": [
187
+ {
188
+ "user_id": "user_1",
189
+ "email": "user@example.com",
190
+ "full_name": "Nguyễn Văn A",
191
+ "stats": {
192
+ "total_spend": 2000000,
193
+ "tickets_bought": 6,
194
+ "last_purchase": "2025-11-20"
195
+ }
196
+ }
197
+ ]
198
+ }
199
+ ```
200
+
201
+ ---
202
+
203
+ ## Approval Workflow
204
+
205
+ ### POST `/api/events/{event_code}/segments/{segment_id}/approve`
206
+ Event Owner approves marketing content.
207
+
208
+ **Path Parameters:**
209
+ - `event_code` (string, required)
210
+ - `segment_id` (string, required)
211
+
212
+ **Request Body (optional):**
213
+ ```json
214
+ {
215
+ "approved_by": "owner_user_id",
216
+ "modified_content": {
217
+ "email_subject": "Modified subject",
218
+ "email_body": "Modified body"
219
+ }
220
+ }
221
+ ```
222
+
223
+ **Response:**
224
+ ```json
225
+ {
226
+ "status": "success",
227
+ "message": "Segment approved",
228
+ "segment_id": "507f1f77bcf86cd799439011",
229
+ "marketing_content": {
230
+ "status": "Approved",
231
+ "approved_at": "2025-11-24T00:00:00",
232
+ "approved_by": "owner_user_id"
233
+ }
234
+ }
235
+ ```
236
+
237
+ ### POST `/api/events/{event_code}/segments/{segment_id}/send-email`
238
+ Send approved marketing email to segment users.
239
+
240
+ **Path Parameters:**
241
+ - `event_code` (string, required)
242
+ - `segment_id` (string, required)
243
+
244
+ **Request Body:**
245
+ ```json
246
+ {
247
+ "send_immediately": true,
248
+ "schedule_at": "2025-11-25T10:00:00"
249
+ }
250
+ ```
251
+
252
+ **Response:**
253
+ ```json
254
+ {
255
+ "status": "success",
256
+ "message": "Email sent to 150 users",
257
+ "segment_id": "507f1f77bcf86cd799439011",
258
+ "emails_sent": 150,
259
+ "emails_failed": 0,
260
+ "marketing_content": {
261
+ "status": "Sent"
262
+ }
263
+ }
264
+ ```
265
+
266
+ ---
267
+
268
+ ## Sentiment Analysis
269
+
270
+ ### POST `/api/events/{event_code}/sentiment/analyze`
271
+ Analyze sentiment for event comments.
272
+
273
+ **Path Parameters:**
274
+ - `event_code` (string, required)
275
+
276
+ **Response:**
277
+ ```json
278
+ {
279
+ "status": "started",
280
+ "message": "Sentiment analysis started for event {event_code}"
281
+ }
282
+ ```
283
+
284
+ ### GET `/api/events/{event_code}/sentiment/summary`
285
+ Get sentiment summary for an event.
286
+
287
+ **Path Parameters:**
288
+ - `event_code` (string, required)
289
+
290
+ **Response:**
291
+ ```json
292
+ {
293
+ "event_code": "event_123",
294
+ "total_comments": 200,
295
+ "sentiment_distribution": {
296
+ "Positive": 150,
297
+ "Negative": 30,
298
+ "Neutral": 20
299
+ },
300
+ "avg_confidence": 0.87,
301
+ "top_keywords": ["tuyệt vời", "âm thanh", "tổ chức"],
302
+ "ai_insights": {
303
+ "summary": "Sự kiện được đánh giá tích cực với 75% feedback tích cực...",
304
+ "top_issues": [
305
+ "Check-in quá chậm (15 mentions)",
306
+ "Âm thanh yếu ở khu vực sau (10 mentions)"
307
+ ],
308
+ "improvement_suggestions": [
309
+ "Tăng số quầy check-in lên 5 quầy",
310
+ "Bổ sung loa phụ khu vực sau"
311
+ ],
312
+ "predicted_nps": 65.5
313
+ },
314
+ "last_updated": "2025-11-24T00:00:00"
315
+ }
316
+ ```
317
+
318
+ ### GET `/api/events/{event_code}/sentiment/results`
319
+ Get detailed sentiment results.
320
+
321
+ **Path Parameters:**
322
+ - `event_code` (string, required)
323
+
324
+ **Query Parameters:**
325
+ - `sentiment_label` (string, optional): Filter by `Positive`, `Negative`, `Neutral`
326
+ - `skip` (integer): Offset
327
+ - `limit` (integer): Max results
328
+
329
+ **Response:**
330
+ ```json
331
+ {
332
+ "total": 200,
333
+ "results": [
334
+ {
335
+ "id": "507f...",
336
+ "event_code": "event_123",
337
+ "source_id": "comment_abc",
338
+ "sentiment_label": "Positive",
339
+ "confidence_score": 0.92,
340
+ "key_phrases": ["tuyệt vời", "hài lòng"],
341
+ "analyzed_at": "2025-11-24T00:00:00"
342
+ }
343
+ ]
344
+ }
345
+ ```
346
+
347
+ ---
348
+
349
+ ## Generative AI
350
+
351
+ ### POST `/api/events/{event_code}/genai/generate-emails`
352
+ Generate marketing emails for all segments.
353
+
354
+ **Path Parameters:**
355
+ - `event_code` (string, required)
356
+
357
+ **Response:**
358
+ ```json
359
+ {
360
+ "status": "started",
361
+ "message": "Email generation started for {n} segments"
362
+ }
363
+ ```
364
+
365
+ ### POST `/api/events/{event_code}/genai/generate-insights`
366
+ Generate AI insights from negative feedback.
367
+
368
+ **Path Parameters:**
369
+ - `event_code` (string, required)
370
+
371
+ **Response:**
372
+ ```json
373
+ {
374
+ "status": "success",
375
+ "insights": {
376
+ "summary": "...",
377
+ "top_issues": ["..."],
378
+ "improvement_suggestions": ["..."],
379
+ "predicted_nps": 62.5
380
+ }
381
+ }
382
+ ```
383
+
384
+ ---
385
+
386
+ ## Monitoring & Analytics
387
+
388
+ ### GET `/api/monitoring/pipelines/{pipeline}/metrics`
389
+ Get performance metrics for a pipeline.
390
+
391
+ **Path Parameters:**
392
+ - `pipeline` (string): `segmentation`, `sentiment`, `genai`
393
+
394
+ **Query Parameters:**
395
+ - `event_code` (string, optional): Filter by event
396
+ - `days` (integer): Date range (default: 7)
397
+
398
+ **Response:**
399
+ ```json
400
+ {
401
+ "pipeline": "segmentation",
402
+ "event_code": "event_123",
403
+ "total_runs": 5,
404
+ "avg_execution_time": 4.2,
405
+ "last_run": "2025-11-24T00:00:00",
406
+ "metrics": {
407
+ "avg_users_processed": 850,
408
+ "avg_segments_created": 5,
409
+ "avg_inertia": 1250.5
410
+ }
411
+ }
412
+ ```
413
+
414
+ ### GET `/api/monitoring/pipelines/{pipeline}/drift`
415
+ Check for model drift.
416
+
417
+ **Path Parameters:**
418
+ - `pipeline` (string): `segmentation`, `sentiment`
419
+
420
+ **Query Parameters:**
421
+ - `event_code` (string, optional)
422
+
423
+ **Response:**
424
+ ```json
425
+ {
426
+ "pipeline": "segmentation",
427
+ "drift_detected": true,
428
+ "avg_drift": 0.65,
429
+ "max_drift": 1.2,
430
+ "threshold": 0.5,
431
+ "recommendation": "Consider retraining model"
432
+ }
433
+ ```
434
+
435
+ ---
436
+
437
+ ## Feedback & Performance
438
+
439
+ ### POST `/api/feedback/email-engagement`
440
+ Record email engagement metrics.
441
+
442
+ **Request Body:**
443
+ ```json
444
+ {
445
+ "segment_id": "507f...",
446
+ "user_id": "user_1",
447
+ "event_code": "event_123",
448
+ "opened": true,
449
+ "clicked": true,
450
+ "converted": false,
451
+ "unsubscribed": false
452
+ }
453
+ ```
454
+
455
+ **Response:**
456
+ ```json
457
+ {
458
+ "status": "recorded",
459
+ "feedback_id": "feedback_xyz"
460
+ }
461
+ ```
462
+
463
+ ### GET `/api/feedback/email-performance/{segment_id}`
464
+ Get email campaign performance.
465
+
466
+ **Path Parameters:**
467
+ - `segment_id` (string, required)
468
+
469
+ **Response:**
470
+ ```json
471
+ {
472
+ "segment_id": "507f...",
473
+ "total_sent": 150,
474
+ "open_rate": 0.65,
475
+ "click_rate": 0.32,
476
+ "conversion_rate": 0.12,
477
+ "unsubscribe_rate": 0.02
478
+ }
479
+ ```
480
+
481
+ ---
482
+
483
+ ## Administration
484
+
485
+ ### POST `/api/admin/indexes/create`
486
+ Create all MongoDB indexes (run once during setup).
487
+
488
+ **Response:**
489
+ ```json
490
+ {
491
+ "status": "success",
492
+ "indexes_created": [
493
+ "idx_payment_event_status_user",
494
+ "idx_follow_event_user",
495
+ "idx_comment_event_date",
496
+ "..."
497
+ ]
498
+ }
499
+ ```
500
+
501
+ ### POST `/api/admin/models/retrain`
502
+ Trigger model retraining based on feedback.
503
+
504
+ **Request Body:**
505
+ ```json
506
+ {
507
+ "model_type": "segmentation",
508
+ "event_code": "event_123"
509
+ }
510
+ ```
511
+
512
+ **Response:**
513
+ ```json
514
+ {
515
+ "status": "started",
516
+ "job_id": "retrain_abc123"
517
+ }
518
+ ```
519
+
520
+ ---
521
+
522
+ ## Error Responses
523
+
524
+ All endpoints may return error responses in the following format:
525
+
526
+ ```json
527
+ {
528
+ "detail": "Error message description",
529
+ "status_code": 400
530
+ }
531
+ ```
532
+
533
+
README.md CHANGED
@@ -1,12 +1,133 @@
1
- ---
2
- title: Aus F
3
- emoji: 👁
4
- colorFrom: indigo
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: 6.0.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Audience Segmentation AI System
2
+
3
+ Hệ thống phân khúc khách hàng và phân tích cảm xúc sử dụng AI cho nền tảng quản lý sự kiện.
4
+
5
+ ## Tính năng
6
+
7
+ ### 1. Phân khúc khách hàng (Audience Segmentation)
8
+ - **Phân cụm tự động** dựa trên hành vi mua vé (RFM Analysis)
9
+ - **Phân loại theo sở thích** về danh mục sự kiện
10
+ - **Đặt tên tự động** cho từng phân khúc bằng tiếng Việt
11
+ - **Tạo nội dung email marketing** tự động cho từng nhóm khách hàng
12
+
13
+ ### 2. Phân tích cảm xúc (Sentiment Analysis)
14
+ - **Phân loại cảm xúc** của bình luận (Tích cực/Tiêu cực/Trung tính)
15
+ - **Sử dụng PhoBERT** - mô hình NLP chuyên biệt cho tiếng Việt
16
+ - **Trích xuất từ khóa** tự động từ feedback
17
+
18
+ ### 3. Tạo Insight tự động (Generative AI)
19
+ - **Top 5 vấn đề** cần cải thiện
20
+ - **Gợi ý cải thiện** cho từng vấn đề
21
+ - **Dự đoán NPS Score** dựa trên tone của comments
22
+ - **Sử dụng Vistral-7B-Chat** - LLM tiên tiến cho tiếng Việt
23
+
24
+ ## Cấu trúc thư mục
25
+
26
+ ```
27
+ AudienceSegmentation/
28
+ ├── models/ # MongoDB data models
29
+ │ ├── segmentation_models.py # Audience segment models
30
+ │ └── sentiment_models.py # Sentiment analysis models
31
+ ├── services/ # Business logic
32
+ │ ├── data_aggregation.py # MongoDB aggregation pipelines
33
+ │ ├── segmentation_service.py # K-Means clustering
34
+ │ ├── sentiment_service.py # PhoBERT sentiment analysis
35
+ │ └── genai_service.py # Vistral-7B content generation
36
+ ├── config.py # Configuration
37
+ ├── database.py # MongoDB connection manager
38
+ ├── main.py # Main orchestration script
39
+ ├── requirements.txt # Python dependencies
40
+ └── .env.example # Environment variables template
41
+ ```
42
+
43
+ ## Cài đặt
44
+
45
+ ### 1. Clone repository
46
+ ```bash
47
+ cd AudienceSegmentation
48
+ ```
49
+
50
+ ### 2. Tạo môi trường
51
+ ```bash
52
+ python -m venv venv
53
+ source venv/bin/activate # Linux/Mac
54
+ # hoặc
55
+ venv\Scripts\activate # Windows
56
+ ```
57
+
58
+ ### 3. Cài đặt dependencies
59
+ ```bash
60
+ pip install -r requirements.txt
61
+ ```
62
+
63
+ ### 4. Download Vistral-7B-Chat
64
+ ```bash
65
+ # Tải mô hình GGUF từ Hugging Face (CPU nên tải)
66
+ mkdir -p models/vistral-7b-chat
67
+ # Download từ: https://huggingface.co/Vistral/Vistral-7B-Chat-GGUF
68
+ ```
69
+
70
+ ### 5. Cấu hình môi trường
71
+ ```bash
72
+ cp .env.example .env
73
+ # Chỉnh sửa .env với thông tin MongoDB của bạn
74
+ ```
75
+
76
+ ## Sử dụng
77
+
78
+ ### Chạy toàn bộ pipeline
79
+ ```bash
80
+ python main.py --task all
81
+ ```
82
+
83
+ ### Chỉ chạy phân khúc khách hàng
84
+ ```bash
85
+ python main.py --task segmentation
86
+ ```
87
+
88
+ ### Chỉ chạy phân tích cảm xúc
89
+ ```bash
90
+ python main.py --task sentiment
91
+ ```
92
+
93
+ ### Chỉ tạo nội dung email
94
+ ```bash
95
+ python main.py --task email
96
+ ```
97
+
98
+ ### Tạo insights cho sự kiện cụ thể
99
+ ```bash
100
+ python main.py --task insights --event-code <event_id>
101
+ ```
102
+
103
+ ## Kiến trúc kỹ thuật
104
+
105
+ ### MongoDB Aggregation Framework
106
+ Hệ thống tận dụng MongoDB Aggregation để:
107
+ - **Tính toán RFM** (Recency, Frequency, Monetary) trực tiếp trên database
108
+ - **Đếm danh mục sự kiện** mà user quan tâm
109
+ - **Lọc dữ liệu chưa xử lý** để tránh duplicate
110
+ - **Giảm thiểu network transfer** - chỉ truyền kết quả cuối cùng
111
+
112
+ ### AI Models
113
+
114
+ #### 1. Segmentation: scikit-learn K-Means
115
+ - **Input**: Feature vector [R, F, M, Category1, Category2, ...]
116
+ - **Output**: Cluster labels + Confidence scores
117
+ - **Số cụm**: 5 (có thể cấu hình)
118
+
119
+ #### 2. Sentiment: wonrax/phobert-base-vietnamese-sentiment
120
+ - **Model**: PhoBERT fine-tuned cho Vietnamese
121
+ - **Output**: Positive/Negative/Neutral + Confidence
122
+ - **Batch size**: 32
123
+
124
+
125
+ ## Collections MongoDB
126
+
127
+
128
+ ### Output Collections (New)
129
+ - `AudienceSegment` - Các phân khúc khách hàng
130
+ - `UserSegmentAssignment` - Gán user vào segment
131
+ - `SentimentAnalysisResult` - Kết quả phân tích cảm xúc
132
+ - `EventInsightReport` - Báo cáo insight cho sự kiện
133
+
app.py ADDED
@@ -0,0 +1,447 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ FastAPI Application for Event-Centric Audience Segmentation AI
3
+ Author: AI Generated
4
+ Created: 2025-11-24 (Refactored)
5
+ Purpose: REST API with event-based endpoints
6
+ """
7
+
8
+ from fastapi import FastAPI, HTTPException, BackgroundTasks, status, Query
9
+ from fastapi.middleware.cors import CORSMiddleware
10
+ from pydantic import BaseModel
11
+ from typing import List, Dict, Optional, Any
12
+ from datetime import datetime
13
+ from bson import ObjectId
14
+
15
+ # Import services
16
+ from services.segmentation_service import SegmentationService
17
+ from services.sentiment_service import SentimentAnalysisService
18
+ from services.genai_service import GenerativeAIService
19
+ from database import db
20
+ from config import settings
21
+
22
+
23
+ # FastAPI app
24
+ app = FastAPI(
25
+ title="Audience Segmentation AI - Event-Centric",
26
+ description="REST API for per-event audience analysis",
27
+ version="2.0.0",
28
+ docs_url="/api/docs",
29
+ redoc_url="/api/redoc"
30
+ )
31
+
32
+ # CORS
33
+ app.add_middleware(
34
+ CORSMiddleware,
35
+ allow_origins=["*"],
36
+ allow_credentials=True,
37
+ allow_methods=["*"],
38
+ allow_headers=["*"],
39
+ )
40
+
41
+
42
+ # Helper
43
+ def serialize_doc(doc: Dict) -> Optional[Dict]:
44
+ """Convert MongoDB document to JSON-serializable dict"""
45
+ if doc is None:
46
+ return None
47
+ if '_id' in doc:
48
+ doc['id'] = str(doc.pop('_id'))
49
+
50
+ # Handle nested ObjectIds and lists
51
+ for key, value in list(doc.items()):
52
+ if isinstance(value, ObjectId):
53
+ doc[key] = str(value)
54
+ elif isinstance(value, list):
55
+ doc[key] = [str(v) if isinstance(v, ObjectId) else v for v in value]
56
+ elif isinstance(value, dict):
57
+ doc[key] = serialize_doc(value)
58
+
59
+ return doc
60
+
61
+
62
+ # ===== HEALTH =====
63
+ @app.get("/health", tags=["System"])
64
+ async def health_check():
65
+ """Health check"""
66
+ try:
67
+ db.client.server_info()
68
+ return {
69
+ "status": "healthy",
70
+ "timestamp": datetime.utcnow(),
71
+ "database": "connected"
72
+ }
73
+ except Exception as e:
74
+ raise HTTPException(
75
+ status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
76
+ detail=f"Unhealthy: {str(e)}"
77
+ )
78
+
79
+
80
+ # ===== EVENT ANALYSIS =====
81
+ @app.post("/api/events/{event_code}/analyze", tags=["Event Analysis"])
82
+ async def analyze_event(event_code: str, background_tasks: BackgroundTasks):
83
+ """Run full AI pipeline for an event"""
84
+
85
+ def run_pipeline():
86
+ # Step 1: Segmentation
87
+ seg_service = SegmentationService(event_code)
88
+ seg_service.run_segmentation()
89
+
90
+ # Step 2: Sentiment
91
+ sent_service = SentimentAnalysisService(event_code)
92
+ sent_service.analyze_event_comments()
93
+
94
+ # Step 3: Email generation
95
+ genai_service = GenerativeAIService(event_code)
96
+ genai_service.generate_emails_for_all_segments()
97
+
98
+ # Step 4: Insights
99
+ genai_service.update_sentiment_summary_with_insights()
100
+
101
+ background_tasks.add_task(run_pipeline)
102
+
103
+ return {
104
+ "status": "started",
105
+ "message": f"Analysis pipeline started for event {event_code}"
106
+ }
107
+
108
+
109
+ @app.get("/api/events/{event_code}/dashboard", tags=["Event Analysis"])
110
+ async def get_event_dashboard(event_code: str):
111
+ """Get complete dashboard for Event Owner"""
112
+
113
+ # Get segments
114
+ segments = list(db.event_audience_segments.find({"event_code": event_code}))
115
+
116
+ # Get sentiment summary
117
+ sentiment_summary = db.event_sentiment_summary.find_one({"event_code": event_code})
118
+
119
+ return {
120
+ "event_code": event_code,
121
+ "segments": [serialize_doc(s) for s in segments],
122
+ "sentiment_summary": serialize_doc(sentiment_summary) if sentiment_summary else None
123
+ }
124
+
125
+
126
+ # ===== SEGMENTATION =====
127
+ @app.post("/api/events/{event_code}/segmentation/run", tags=["Segmentation"])
128
+ async def run_event_segmentation(
129
+ event_code: str,
130
+ background_tasks: BackgroundTasks,
131
+ n_clusters: int = Query(default=5, ge=2, le=10)
132
+ ):
133
+ """Run segmentation for an event"""
134
+
135
+ def run_task():
136
+ service = SegmentationService(event_code, n_clusters=n_clusters)
137
+ service.run_segmentation()
138
+
139
+ background_tasks.add_task(run_task)
140
+
141
+ return {
142
+ "status": "started",
143
+ "message": f"Segmentation started for event {event_code}",
144
+ "event_code": event_code
145
+ }
146
+
147
+
148
+ @app.get("/api/events/{event_code}/segments", tags=["Segmentation"])
149
+ async def get_event_segments(
150
+ event_code: str,
151
+ status_filter: Optional[str] = Query(default=None, description="Filter by Draft, Approved, Sent")
152
+ ):
153
+ """Get all segments for an event"""
154
+
155
+ query = {"event_code": event_code}
156
+ if status_filter:
157
+ query["marketing_content.status"] = status_filter
158
+
159
+ segments = list(db.event_audience_segments.find(query))
160
+
161
+ return [serialize_doc(s) for s in segments]
162
+
163
+
164
+ @app.get("/api/events/{event_code}/segments/{segment_id}", tags=["Segmentation"])
165
+ async def get_segment_detail(event_code: str, segment_id: str):
166
+ """Get specific segment details"""
167
+
168
+ segment = db.event_audience_segments.find_one({
169
+ "_id": ObjectId(segment_id),
170
+ "event_code": event_code
171
+ })
172
+
173
+ if not segment:
174
+ raise HTTPException(status_code=404, detail="Segment not found")
175
+
176
+ return serialize_doc(segment)
177
+
178
+
179
+ @app.get("/api/events/{event_code}/segments/{segment_id}/users", tags=["Segmentation"])
180
+ async def get_segment_users(
181
+ event_code: str,
182
+ segment_id: str,
183
+ skip: int = 0,
184
+ limit: int = 100
185
+ ):
186
+ """Get users in a segment with details"""
187
+
188
+ segment = db.event_audience_segments.find_one({
189
+ "_id": ObjectId(segment_id),
190
+ "event_code": event_code
191
+ })
192
+
193
+ if not segment:
194
+ raise HTTPException(status_code=404, detail="Segment not found")
195
+
196
+ user_ids = segment.get('user_ids', [])
197
+ total_users = len(user_ids)
198
+
199
+ # Paginate
200
+ paginated_ids = user_ids[skip:skip + limit]
201
+
202
+ # Get user details
203
+ users = list(db.users.find({
204
+ "_id": {"$in": paginated_ids}
205
+ }))
206
+
207
+ # Enrich with stats (optional)
208
+ enriched_users = []
209
+ for user in users:
210
+ enriched_users.append({
211
+ "user_id": str(user['_id']),
212
+ "email": user.get('email'),
213
+ "full_name": f"{user.get('FirstName', '')} {user.get('LastName', '')}".strip()
214
+ })
215
+
216
+ return {
217
+ "segment_id": segment_id,
218
+ "total_users": total_users,
219
+ "users": enriched_users
220
+ }
221
+
222
+
223
+ # ===== APPROVAL WORKFLOW =====
224
+ @app.post("/api/events/{event_code}/segments/{segment_id}/approve", tags=["Approval"])
225
+ async def approve_segment(
226
+ event_code: str,
227
+ segment_id: str,
228
+ approved_by: Optional[str] = None,
229
+ modified_subject: Optional[str] = None,
230
+ modified_body: Optional[str] = None
231
+ ):
232
+ """Event Owner approves marketing content"""
233
+
234
+ segment = db.event_audience_segments.find_one({
235
+ "_id": ObjectId(segment_id),
236
+ "event_code": event_code
237
+ })
238
+
239
+ if not segment:
240
+ raise HTTPException(status_code=404, detail="Segment not found")
241
+
242
+ # Update fields
243
+ update = {
244
+ "marketing_content.status": "Approved",
245
+ "marketing_content.approved_at": datetime.utcnow(),
246
+ "marketing_content.approved_by": approved_by,
247
+ "last_updated": datetime.utcnow()
248
+ }
249
+
250
+ if modified_subject:
251
+ update["marketing_content.email_subject"] = modified_subject
252
+ if modified_body:
253
+ update["marketing_content.email_body"] = modified_body
254
+
255
+ db.event_audience_segments.update_one(
256
+ {"_id": ObjectId(segment_id)},
257
+ {"$set": update}
258
+ )
259
+
260
+ updated_segment = db.event_audience_segments.find_one({"_id": ObjectId(segment_id)})
261
+
262
+ return {
263
+ "status": "success",
264
+ "message": "Segment approved",
265
+ "segment_id": segment_id,
266
+ "marketing_content": updated_segment.get('marketing_content')
267
+ }
268
+
269
+
270
+ @app.post("/api/events/{event_code}/segments/{segment_id}/send-email", tags=["Approval"])
271
+ async def send_segment_email(
272
+ event_code: str,
273
+ segment_id: str,
274
+ send_immediately: bool = True
275
+ ):
276
+ """Send approved marketing email"""
277
+
278
+ segment = db.event_audience_segments.find_one({
279
+ "_id": ObjectId(segment_id),
280
+ "event_code": event_code
281
+ })
282
+
283
+ if not segment:
284
+ raise HTTPException(status_code=404, detail="Segment not found")
285
+
286
+ marketing_content = segment.get('marketing_content', {})
287
+ if marketing_content.get('status') != "Approved":
288
+ raise HTTPException(status_code=400, detail="Segment not approved yet")
289
+
290
+ # TODO: Integrate with email service (SendGrid, AWS SES, etc.)
291
+ # For now, just mark as sent
292
+
293
+ db.event_audience_segments.update_one(
294
+ {"_id": ObjectId(segment_id)},
295
+ {"$set": {
296
+ "marketing_content.status": "Sent",
297
+ "last_updated": datetime.utcnow()
298
+ }}
299
+ )
300
+
301
+ return {
302
+ "status": "success",
303
+ "message": f"Email sent to {segment.get('user_count', 0)} users",
304
+ "segment_id": segment_id,
305
+ "emails_sent": segment.get('user_count', 0),
306
+ "emails_failed": 0
307
+ }
308
+
309
+
310
+ # ===== SENTIMENT =====
311
+ @app.post("/api/events/{event_code}/sentiment/analyze", tags=["Sentiment"])
312
+ async def analyze_event_sentiment(event_code: str, background_tasks: BackgroundTasks):
313
+ """Analyze sentiment for event comments"""
314
+
315
+ def run_task():
316
+ service = SentimentAnalysisService(event_code)
317
+ service.analyze_event_comments()
318
+
319
+ background_tasks.add_task(run_task)
320
+
321
+ return {
322
+ "status": "started",
323
+ "message": f"Sentiment analysis started for event {event_code}"
324
+ }
325
+
326
+
327
+ @app.get("/api/events/{event_code}/sentiment/summary", tags=["Sentiment"])
328
+ async def get_sentiment_summary(event_code: str):
329
+ """Get sentiment summary for an event"""
330
+
331
+ summary = db.event_sentiment_summary.find_one({"event_code": event_code})
332
+
333
+ if not summary:
334
+ raise HTTPException(status_code=404, detail="No sentiment data for this event")
335
+
336
+ return serialize_doc(summary)
337
+
338
+
339
+ @app.get("/api/events/{event_code}/sentiment/results", tags=["Sentiment"])
340
+ async def get_sentiment_results(
341
+ event_code: str,
342
+ sentiment_label: Optional[str] = None,
343
+ skip: int = 0,
344
+ limit: int = 100
345
+ ):
346
+ """Get detailed sentiment results"""
347
+
348
+ query = {"event_code": event_code}
349
+ if sentiment_label:
350
+ query["sentiment_label"] = sentiment_label
351
+
352
+ total = db.sentiment_results.count_documents(query)
353
+ results = list(
354
+ db.sentiment_results.find(query)
355
+ .sort("analyzed_at", -1)
356
+ .skip(skip)
357
+ .limit(limit)
358
+ )
359
+
360
+ return {
361
+ "total": total,
362
+ "results": [serialize_doc(r) for r in results]
363
+ }
364
+
365
+
366
+ # ===== GENAI =====
367
+ @app.post("/api/events/{event_code}/genai/generate-emails", tags=["GenAI"])
368
+ async def generate_event_emails(event_code: str, background_tasks: BackgroundTasks):
369
+ """Generate marketing emails for all segments"""
370
+
371
+ def run_task():
372
+ service = GenerativeAIService(event_code)
373
+ service.generate_emails_for_all_segments()
374
+
375
+ background_tasks.add_task(run_task)
376
+
377
+ return {
378
+ "status": "started",
379
+ "message": "Email generation started"
380
+ }
381
+
382
+
383
+ @app.post("/api/events/{event_code}/genai/generate-insights", tags=["GenAI"])
384
+ async def generate_event_insights(event_code: str, background_tasks: BackgroundTasks):
385
+ """Generate AI insights from negative feedback"""
386
+
387
+ def run_task():
388
+ service = GenerativeAIService(event_code)
389
+ service.update_sentiment_summary_with_insights()
390
+
391
+ background_tasks.add_task(run_task)
392
+
393
+ return {
394
+ "status": "started",
395
+ "message": "Insight generation started"
396
+ }
397
+
398
+
399
+ # ===== MONITORING =====
400
+ @app.get("/api/monitoring/pipelines/{pipeline}/metrics", tags=["Monitoring"])
401
+ async def get_pipeline_metrics(
402
+ pipeline: str,
403
+ event_code: Optional[str] = None,
404
+ days: int = 7
405
+ ):
406
+ """Get performance metrics"""
407
+ # TODO: Implement based on monitoring.py
408
+ return {
409
+ "pipeline": pipeline,
410
+ "event_code": event_code,
411
+ "message": "Metrics endpoint - implement as needed"
412
+ }
413
+
414
+
415
+ # ===== ADMIN =====
416
+ @app.post("/api/admin/indexes/create", tags=["Admin"])
417
+ async def create_indexes():
418
+ """Create MongoDB indexes"""
419
+ from scripts.create_indexes import create_all_indexes
420
+
421
+ try:
422
+ create_all_indexes()
423
+ return {"status": "success", "message": "Indexes created"}
424
+ except Exception as e:
425
+ raise HTTPException(status_code=500, detail=str(e))
426
+
427
+
428
+ # ===== ROOT =====
429
+ @app.get("/")
430
+ async def root():
431
+ """API root"""
432
+ return {
433
+ "name": "Audience Segmentation AI - Event-Centric",
434
+ "version": "2.0.0",
435
+ "docs": "/api/docs",
436
+ "health": "/health"
437
+ }
438
+
439
+
440
+ if __name__ == "__main__":
441
+ import uvicorn
442
+ uvicorn.run(
443
+ "app:app",
444
+ host="0.0.0.0",
445
+ port=7860,
446
+ reload=True
447
+ )
config.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration for Audience Segmentation System
3
+ """
4
+
5
+ import os
6
+ from pydantic_settings import BaseSettings
7
+
8
+
9
+ class Settings(BaseSettings):
10
+ """Application settings"""
11
+
12
+ # MongoDB Configuration
13
+ MONGODB_URI: str = os.getenv("MONGODB_URI", "mongodb://localhost:27017")
14
+ DB_NAME: str = os.getenv("DB_NAME", "audience_segmentation")
15
+
16
+ # Hugging Face Token (optional, not required for our models)
17
+ HF_TOKEN: str = os.getenv("HF_TOKEN", "")
18
+
19
+ # Collection Names
20
+ COLLECTION_USERS: str = "User"
21
+ COLLECTION_PAYMENTS: str = "Payment"
22
+ COLLECTION_EVENT_VERSIONS: str = "EventVersion"
23
+ COLLECTION_USER_FOLLOWS: str = "UserFollow"
24
+ COLLECTION_USER_COMMENT_POST: str = "UserCommentPost"
25
+ COLLECTION_POST_SOCIAL_MEDIA: str = "PostSocialMedia"
26
+
27
+ # AI Result Collections
28
+ COLLECTION_AUDIENCE_SEGMENTS: str = "AudienceSegment"
29
+ COLLECTION_USER_SEGMENT_ASSIGNMENTS: str = "UserSegmentAssignment"
30
+ COLLECTION_SENTIMENT_RESULTS: str = "SentimentAnalysisResult"
31
+ COLLECTION_EVENT_INSIGHTS: str = "EventInsightReport"
32
+
33
+ # AI Model Configuration
34
+ SENTIMENT_MODEL: str = "wonrax/phobert-base-vietnamese-sentiment"
35
+ LLM_MODEL: str = "Vistral-7B-Chat"
36
+ LLM_LOCAL_PATH: str = os.getenv("LLM_LOCAL_PATH", "./models/vistral-7b-chat")
37
+
38
+ # Clustering Configuration
39
+ N_CLUSTERS: int = 5 # Number of audience segments
40
+ RANDOM_STATE: int = 42
41
+
42
+ # Batch Processing
43
+ BATCH_SIZE: int = 32
44
+
45
+ class Config:
46
+ env_file = ".env"
47
+ case_sensitive = True
48
+
49
+
50
+ settings = Settings()
database.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ MongoDB Database Connection Manager
3
+ Author: AI Generated
4
+ Created: 2025-11-24
5
+ Purpose: Handle MongoDB connection and collection access
6
+ """
7
+
8
+ from pymongo import MongoClient
9
+ from pymongo.database import Database
10
+ from pymongo.collection import Collection
11
+ from config import settings
12
+
13
+
14
+ class DatabaseManager:
15
+ """Singleton MongoDB connection manager"""
16
+
17
+ _instance = None
18
+ _client: MongoClient = None
19
+ _db: Database = None
20
+
21
+ def __new__(cls):
22
+ if cls._instance is None:
23
+ cls._instance = super().__new__(cls)
24
+ return cls._instance
25
+
26
+ def __init__(self):
27
+ if self._client is None:
28
+ self.connect()
29
+
30
+ @property
31
+ def client(self):
32
+ """Get MongoDB client"""
33
+ return self._client
34
+
35
+ @property
36
+ def db_name(self):
37
+ """Get database name"""
38
+ return settings.DB_NAME
39
+
40
+ def connect(self):
41
+ """Establish connection to MongoDB"""
42
+ self._client = MongoClient(settings.MONGODB_URI)
43
+ self._db = self._client[settings.DB_NAME]
44
+ print(f"✓ Connected to MongoDB: {settings.DB_NAME}")
45
+
46
+ def get_collection(self, collection_name: str) -> Collection:
47
+ """Get a MongoDB collection"""
48
+ return self._db[collection_name]
49
+
50
+ def close(self):
51
+ """Close MongoDB connection"""
52
+ if self._client:
53
+ self._client.close()
54
+ print("✓ MongoDB connection closed")
55
+
56
+ # Collection accessors
57
+ @property
58
+ def users(self) -> Collection:
59
+ return self.get_collection(settings.COLLECTION_USERS)
60
+
61
+ @property
62
+ def payments(self) -> Collection:
63
+ return self.get_collection(settings.COLLECTION_PAYMENTS)
64
+
65
+ @property
66
+ def event_versions(self) -> Collection:
67
+ return self.get_collection(settings.COLLECTION_EVENT_VERSIONS)
68
+
69
+ @property
70
+ def user_follows(self) -> Collection:
71
+ return self.get_collection(settings.COLLECTION_USER_FOLLOWS)
72
+
73
+ @property
74
+ def user_comment_post(self) -> Collection:
75
+ return self.get_collection(settings.COLLECTION_USER_COMMENT_POST)
76
+
77
+ # AI Result Collections (DEPRECATED - use event-centric versions)
78
+ @property
79
+ def audience_segments(self) -> Collection:
80
+ """AudienceSegment collection (DEPRECATED - use event_audience_segments)"""
81
+ return self.get_collection(settings.COLLECTION_AUDIENCE_SEGMENTS)
82
+
83
+ @property
84
+ def user_segment_assignments(self) -> Collection:
85
+ """UserSegmentAssignment collection"""
86
+ return self.get_collection(settings.COLLECTION_USER_SEGMENT_ASSIGNMENTS)
87
+
88
+ @property
89
+ def sentiment_results(self) -> Collection:
90
+ """SentimentAnalysisResult collection"""
91
+ return self.get_collection(settings.COLLECTION_SENTIMENT_RESULTS)
92
+
93
+ @property
94
+ def event_insights(self) -> Collection:
95
+ """EventInsightReport collection"""
96
+ return self.get_collection(settings.COLLECTION_EVENT_INSIGHTS)
97
+
98
+ # NEW: Event-centric collections
99
+ @property
100
+ def event_audience_segments(self) -> Collection:
101
+ """EventAudienceSegment collection"""
102
+ return self.get_collection("EventAudienceSegment")
103
+
104
+ @property
105
+ def event_sentiment_summary(self) -> Collection:
106
+ """EventSentimentSummary collection"""
107
+ return self.get_collection("EventSentimentSummary")
108
+
109
+
110
+ # Global database instance
111
+ db = DatabaseManager()
main.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Main Orchestration Script
3
+ Author: AI Generated
4
+ Created: 2025-11-24
5
+ Purpose: Run the complete AI pipeline for Audience Segmentation
6
+ """
7
+
8
+ import argparse
9
+ from services.segmentation_service import SegmentationService
10
+ from services.sentiment_service import SentimentAnalysisService
11
+ from services.genai_service import GenerativeAIService
12
+ from database import db
13
+
14
+
15
+ def run_segmentation():
16
+ """Run audience segmentation pipeline"""
17
+ service = SegmentationService()
18
+ segment_ids = service.run_segmentation()
19
+ return segment_ids
20
+
21
+
22
+ def run_sentiment_analysis():
23
+ """Run sentiment analysis pipeline"""
24
+ service = SentimentAnalysisService()
25
+ service.analyze_unprocessed_comments()
26
+
27
+
28
+ def run_email_generation():
29
+ """Run email content generation for segments"""
30
+ service = GenerativeAIService()
31
+ service.generate_emails_for_all_segments()
32
+
33
+
34
+ def run_insight_generation(event_code: str = None):
35
+ """Run insight generation for events"""
36
+ service = GenerativeAIService()
37
+
38
+ if event_code:
39
+ service.generate_insights_for_event(event_code)
40
+ else:
41
+ # Get all unique event codes from comments
42
+ event_codes = db.user_comment_post.distinct("EventCode")
43
+ for code in event_codes:
44
+ if code:
45
+ service.generate_insights_for_event(code)
46
+
47
+
48
+ def main():
49
+ parser = argparse.ArgumentParser(description='Audience Segmentation AI Pipeline')
50
+ parser.add_argument(
51
+ '--task',
52
+ choices=['segmentation', 'sentiment', 'email', 'insights', 'all'],
53
+ default='all',
54
+ help='Which task to run'
55
+ )
56
+ parser.add_argument(
57
+ '--event-code',
58
+ type=str,
59
+ help='Specific event code for insight generation'
60
+ )
61
+
62
+ args = parser.parse_args()
63
+
64
+ try:
65
+ if args.task in ['segmentation', 'all']:
66
+ run_segmentation()
67
+
68
+ if args.task in ['sentiment', 'all']:
69
+ run_sentiment_analysis()
70
+
71
+ if args.task in ['email', 'all']:
72
+ run_email_generation()
73
+
74
+ if args.task in ['insights', 'all']:
75
+ run_insight_generation(args.event_code)
76
+
77
+ print("\n" + "=" * 60)
78
+ print("🎉 ALL TASKS COMPLETED SUCCESSFULLY!")
79
+ print("=" * 60)
80
+
81
+ except Exception as e:
82
+ print(f"\n❌ Error: {e}")
83
+ import traceback
84
+ traceback.print_exc()
85
+
86
+ finally:
87
+ db.close()
88
+
89
+
90
+ if __name__ == "__main__":
91
+ main()
requirements.txt ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # FastAPI Backend Requirements
2
+ # Updated for November 2025
3
+
4
+ # Web Framework
5
+ fast api==0.121.3
6
+ uvicorn[standard]==0.38.0
7
+ python-multipart==0.0.20
8
+
9
+ # Database
10
+ pymongo==4.15.4
11
+ motor==3.7.0 # Async MongoDB driver for FastAPI
12
+
13
+ # Data Validation
14
+ pydantic==2.10.4
15
+ pydantic-settings==2.12.0
16
+
17
+ # Data Processing
18
+ pandas==2.3.3
19
+ numpy==2.3.5
20
+ scikit-learn==1.7.2
21
+
22
+ # NLP & AI
23
+ transformers==4.57.1
24
+ torch==2.9.1
25
+ tokenizers==0.21.0
26
+
27
+ # Vietnamese NLP
28
+ pyvi==0.1.1
29
+
30
+ # Generative AI (CPU-optimized)
31
+ llama-cpp-python==0.3.6
32
+
33
+ # Utilities
34
+ python-dotenv==1.0.1
35
+ tqdm==4.67.1
36
+
37
+ # CORS & Security
38
+ python-jose[cryptography]==3.4.0
39
+ passlib[bcrypt]==1.7.4
40
+
41
+ # Optional: Monitoring & Logging
42
+ # prometheus-client==0.21.0