AlshimaaAhmed commited on
Commit
defafce
·
verified ·
1 Parent(s): 63dd7c1

upload app folder

Browse files
app/config/domain_mapping.yaml ADDED
@@ -0,0 +1,718 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ STARTUP_SECTOR_GROUPS:
2
+ "Healthcare & MedTech":
3
+ - HealthTech
4
+ - MedTech
5
+ - Health
6
+ - Digital Health
7
+ - Telehealth
8
+ - Telemedicine
9
+ - Mental Health
10
+ - Clinical AI
11
+ - Medical
12
+ - Medical Admin
13
+ - Medical Coding
14
+ - Medical Imaging
15
+ - Doctor Booking
16
+ - Hospital Mgmt
17
+ - Hospital System
18
+ - Home Health
19
+ - Chronic Care
20
+ - Diagnostics
21
+ - Dental
22
+ - Dental Tech
23
+ - Elderly Care
24
+ - FemTech
25
+ - Health Data
26
+ - Health IT
27
+ - Health Insurance
28
+ - Community Health
29
+ - Public Health
30
+ - Pharmacy
31
+ - e-Pharmacy
32
+ - Home Care
33
+ - Mobile Health
34
+ - Patient Engagement
35
+ - Patient Comm
36
+ - Patient Support
37
+ - Teleradiology
38
+ - Wearable
39
+ - Nutrition
40
+ - Queue Mgmt
41
+ - Queue Management
42
+ - Scheduling
43
+ - Appointment Booking
44
+ - Booking
45
+ - Booking Software
46
+
47
+ "FinTech & Finance":
48
+ - FinTech
49
+ - Finance
50
+ - Banking
51
+ - Banking API
52
+ - Neobank
53
+ - Core Banking
54
+ - Payments
55
+ - Payments API
56
+ - Mobile Payments
57
+ - Global Payments
58
+ - Cross-Border Payments
59
+ - Crypto
60
+ - DeFi
61
+ - Lending
62
+ - P2P Lending
63
+ - Micro-Lending
64
+ - BNPL
65
+ - Insurance
66
+ - InsureTech
67
+ - Payroll
68
+ - Tax
69
+ - Personal Finance
70
+ - Wealth Tech
71
+ - Robo-Advisor
72
+ - Accounting
73
+ - Invoicing
74
+ - Billing
75
+ - Expense Management
76
+ - Corporate Card
77
+ - Cash Flow Mgmt
78
+ - Investment Platform
79
+ - Stock Trading
80
+ - Crypto Trading
81
+
82
+ "Developer Tools":
83
+ - DevTools
84
+ - Dev
85
+ - API
86
+ - API Platform
87
+ - Backend
88
+ - CI/CD
89
+ - Cloud
90
+ - Cloud Computing
91
+ - Cloud Hosting
92
+ - Cloud Infrastructure
93
+ - Code Gen
94
+ - Coding
95
+ - Database
96
+ - DevOps
97
+ - Framework
98
+ - Frontend
99
+ - GraphQL Engine
100
+ - IDE
101
+ - Infrastructure
102
+ - Kubernetes
103
+ - Monitoring
104
+ - Open Source
105
+ - Serverless Database
106
+ - Testing
107
+ - Version Control
108
+ - Error Monitoring
109
+
110
+ "AI & Automation":
111
+ - AI
112
+ - AI API
113
+ - AI Agent
114
+ - AI Analytics
115
+ - AI Assistant
116
+ - AI Automation
117
+ - AI Builder
118
+ - AI Content
119
+ - AI Design
120
+ - AI Infrastructure
121
+ - AI Writing
122
+ - AI workflows
123
+ - Automation
124
+ - Autonomous Agents
125
+ - Generative AI
126
+ - LLM
127
+ - LLM API
128
+ - Machine Learning
129
+ - ML Ops
130
+ - NLP
131
+ - Deep Learning
132
+ - Edge AI
133
+ - No-Code AI
134
+ - Industrial AI
135
+ - Real-Time AI
136
+
137
+ "Marketing & Sales":
138
+ - MarketingTech
139
+ - Marketing
140
+ - Marketing Automation
141
+ - Marketing Suite
142
+ - Digital Marketing
143
+ - Email Marketing
144
+ - SMS Marketing
145
+ - SEO
146
+ - SEO AI
147
+ - SEO Tool
148
+ - CRM
149
+ - CRM Automation
150
+ - Sales AI
151
+ - Sales Automation
152
+ - Lead Gen
153
+ - Lead Generation
154
+ - Ad Tech
155
+ - Ad Automation
156
+ - Content
157
+ - Content AI
158
+ - Content Creation
159
+ - Copywriting
160
+ - Conversion
161
+
162
+ "HR & Recruitment":
163
+ - HR Tech
164
+ - HRIS
165
+ - HRMS
166
+ - HCM
167
+ - HR Software
168
+ - Global HR
169
+ - Recruitment
170
+ - Recruitment AI
171
+ - Hiring
172
+ - Staffing
173
+ - Payroll
174
+ - Employee Engagement
175
+ - Employee Experience
176
+ - Performance Mgmt
177
+ - People Analytics
178
+ - L&D
179
+ - LMS
180
+ - Skill Development
181
+ - Career
182
+ - Career Coaching
183
+ - Onboarding
184
+ - Workforce Management
185
+
186
+ "E-commerce & Retail":
187
+ - E-commerce
188
+ - Ecommerce
189
+ - Retail
190
+ - Headless Commerce
191
+ - Social Commerce
192
+ - Live Shopping
193
+ - Quick Commerce
194
+ - On-Demand Delivery
195
+ - Delivery App
196
+ - Food Delivery
197
+ - Marketplace
198
+ - Inventory
199
+ - Checkout
200
+ - Shopping
201
+ - Price Comparison
202
+ - Loyalty
203
+ - Loyalty Program
204
+ - Store Builder
205
+ - B2B Retail
206
+
207
+ "Productivity & PM":
208
+ - Productivity
209
+ - Project Management
210
+ - Task Management
211
+ - Work Management
212
+ - Workflow
213
+ - Workflow Automation
214
+ - Calendar
215
+ - Scheduling
216
+ - Enterprise Scheduling
217
+ - Time Tracking
218
+ - Collaboration
219
+ - Team Chat
220
+ - Team Workspace
221
+ - Note Taking
222
+ - Knowledge
223
+ - Knowledge Management
224
+ - Document Management
225
+ - Document AI
226
+ - Forms
227
+ - Kanban Boards
228
+ - OKRs
229
+ - Goal Management
230
+
231
+ "Education & Learning":
232
+ - EdTech
233
+ - Education
234
+ - E-learning
235
+ - Learning
236
+ - LMS
237
+ - Course Platform
238
+ - Tutoring
239
+ - AI Tutor
240
+ - K-12 Education
241
+ - Language Learning
242
+ - Coding Bootcamp
243
+ - Microlearning
244
+ - Virtual Classroom
245
+ - Teacher Assistant
246
+
247
+ "Travel & Tourism":
248
+ - TravelTech
249
+ - Travel
250
+ - Flight Booking
251
+ - Hotel Tech
252
+ - OTA
253
+ - Tour Marketplace
254
+ - Trip Planning
255
+ - Accommodation
256
+ - Short-Term Rentals
257
+ - Vacation Rentals
258
+ - Hospitality
259
+ - Adventure Travel
260
+ - Business Travel
261
+ - AI Travel Agent
262
+ - AI Trip Planner
263
+
264
+ "Logistics & Supply Chain":
265
+ - Logistics
266
+ - Supply Chain
267
+ - Last-Mile
268
+ - Last-Mile Delivery
269
+ - Freight
270
+ - Freight Marketplace
271
+ - Trucking Marketplace
272
+ - Shipping Services
273
+ - Cargo Tech
274
+ - Inventory Mgmt
275
+ - Procurement
276
+ - WMS
277
+ - TMS
278
+
279
+ "Real Estate & Construction":
280
+ - Real Estate
281
+ - Real Estate Marketplace
282
+ - Property Management
283
+ - Property Mgmt
284
+ - Property Inspection
285
+ - Fractional Real Estate
286
+ - Rental Management
287
+ - Student Housing
288
+ - Flexible Housing
289
+ - Construction
290
+ - Construction Tech
291
+ - Facility Management
292
+
293
+ "Social & Community":
294
+ - Social Media
295
+ - Social Network
296
+ - Community
297
+ - Community Platform
298
+ - Community Builder
299
+ - Community Management
300
+ - Niche Community
301
+ - Professional Network
302
+ - Professional Networking
303
+ - Non-Profit
304
+ - Impact Investing
305
+ - Civic Tech
306
+ - Democracy
307
+ - Humanitarian Aid
308
+
309
+ "Design & Creative":
310
+ - Design Tools
311
+ - Design
312
+ - UI Design
313
+ - UI Builder
314
+ - UI/UX Design
315
+ - Graphic Design
316
+ - Vector Design
317
+ - Animation
318
+ - 3D Design
319
+ - 3D Printing
320
+ - Video Editing
321
+ - Photo Editing
322
+ - Image Gen
323
+ - Creative Automation
324
+ - Motion Design
325
+ - Illustration
326
+ - Presentation AI
327
+ - Diagramming
328
+ - Whiteboard
329
+
330
+ "Agriculture & Environment":
331
+ - AgriTech
332
+ - Agriculture
333
+ - Sustainable Agriculture
334
+ - Farm Management
335
+ - Precision Farming
336
+ - Fisheries
337
+ - Environment
338
+ - CleanTech
339
+ - Solar Energy
340
+ - Energy Management
341
+ - Carbon Accounting
342
+ - Sustainability
343
+ - Waste Mgmt
344
+ - Water
345
+ - Green Finance
346
+ - Recycling
347
+
348
+ "Government & Public Services":
349
+ - Civic Tech
350
+ - GovTech
351
+ - e-Governance
352
+ - Public Health
353
+ - Public Safety
354
+ - Digital Identity
355
+ - Open Data
356
+ - Geospatial
357
+ - GIS
358
+ - Urban Planning
359
+ - Smart Cities
360
+ - Smart Infrastructure
361
+ - Mass Transit
362
+ - Smart Mobility
363
+
364
+ "Mobility & Transportation":
365
+ - MobilityTech
366
+ - Mobility
367
+ - Ride Hailing
368
+ - Carpooling
369
+ - Car Sharing
370
+ - Car Rental
371
+ - EV
372
+ - Charging
373
+ - Connected Car
374
+ - Air Mobility
375
+ - eVTOL
376
+ - Maritime Tracking
377
+ - Drones
378
+ - Navigation
379
+ - Traffic App
380
+ - Transport Booking
381
+
382
+ "Food & Beverage":
383
+ - FoodTech
384
+ - Food Delivery
385
+ - Restaurant
386
+ - Restaurant Tech
387
+ - Cloud Kitchens
388
+ - E-Grocery
389
+ - Meal Planning
390
+ - Nutrition
391
+ - Cooking
392
+ - B2B Food Supply
393
+
394
+ "Security & Privacy":
395
+ - Cybersecurity
396
+ - Security
397
+ - Cloud Security
398
+ - Privacy
399
+ - Privacy AI
400
+ - Privacy Compliance
401
+ - Authentication
402
+ - Identity Verification
403
+ - Secure Messaging
404
+ - Compliance
405
+ - Fraud Prevention
406
+ - Safety
407
+ - Public Safety
408
+
409
+ "Media & Entertainment":
410
+ - Media
411
+ - Entertainment
412
+ - Music
413
+ - Music Streaming
414
+ - Podcasting
415
+ - Video
416
+ - Video Streaming
417
+ - Live Streaming
418
+ - Gaming
419
+ - Gamification
420
+ - Books
421
+ - E-books
422
+ - Audiobooks
423
+ - News
424
+ - Newsletter
425
+ - Digital Publishing
426
+
427
+
428
+ PROBLEM_TO_STARTUP_GROUPS:
429
+ Healthcare:
430
+ - Healthcare & MedTech
431
+ - AI & Automation
432
+ Health:
433
+ - Healthcare & MedTech
434
+ - AI & Automation
435
+ healthcare:
436
+ - Healthcare & MedTech
437
+ - AI & Automation
438
+ "Tourism / Healthcare":
439
+ - Healthcare & MedTech
440
+ - Travel & Tourism
441
+ "Healthcare / Education":
442
+ - Healthcare & MedTech
443
+ - Education & Learning
444
+ "Environment / Health":
445
+ - Healthcare & MedTech
446
+ - Agriculture & Environment
447
+ Veterinary:
448
+ - Healthcare & MedTech
449
+ Finance:
450
+ - FinTech & Finance
451
+ - AI & Automation
452
+ Fintech:
453
+ - FinTech & Finance
454
+ - AI & Automation
455
+ "Finance / Legal":
456
+ - FinTech & Finance
457
+ - Government & Public Services
458
+ "Finance / Real Estate":
459
+ - FinTech & Finance
460
+ - Real Estate & Construction
461
+ "Insurance / Fintech":
462
+ - FinTech & Finance
463
+ "Financial Services":
464
+ - FinTech & Finance
465
+ Economy:
466
+ - FinTech & Finance
467
+ Transportation:
468
+ - Mobility & Transportation
469
+ - Logistics & Supply Chain
470
+ transportation:
471
+ - Mobility & Transportation
472
+ - Logistics & Supply Chain
473
+ Transport:
474
+ - Mobility & Transportation
475
+ - Logistics & Supply Chain
476
+ Logistics:
477
+ - Logistics & Supply Chain
478
+ - Mobility & Transportation
479
+ "Supply Chain":
480
+ - Logistics & Supply Chain
481
+ "Tourism / Transport":
482
+ - Mobility & Transportation
483
+ - Travel & Tourism
484
+ Automotive:
485
+ - Mobility & Transportation
486
+ Auto:
487
+ - Mobility & Transportation
488
+ Education:
489
+ - Education & Learning
490
+ - AI & Automation
491
+ education:
492
+ - Education & Learning
493
+ - AI & Automation
494
+ "Education / Employment":
495
+ - Education & Learning
496
+ - HR & Recruitment
497
+ "Education / Innovation":
498
+ - Education & Learning
499
+ - AI & Automation
500
+ "Education / Research":
501
+ - Education & Learning
502
+ "Family / Education":
503
+ - Education & Learning
504
+ Government:
505
+ - Government & Public Services
506
+ - AI & Automation
507
+ GovTech:
508
+ - Government & Public Services
509
+ "GovTech / Legal":
510
+ - Government & Public Services
511
+ "Public Services":
512
+ - Government & Public Services
513
+ "public administration":
514
+ - Government & Public Services
515
+ "Public Administration":
516
+ - Government & Public Services
517
+ "Smart City":
518
+ - Government & Public Services
519
+ "Smart Cities":
520
+ - Government & Public Services
521
+ "Urban Planning":
522
+ - Government & Public Services
523
+ "Urban Infrastructure":
524
+ - Government & Public Services
525
+ "urban infrastructure":
526
+ - Government & Public Services
527
+ Agriculture:
528
+ - Agriculture & Environment
529
+ - AI & Automation
530
+ agriculture:
531
+ - Agriculture & Environment
532
+ "Agriculture / Industry":
533
+ - Agriculture & Environment
534
+ "Energy / Agriculture":
535
+ - Agriculture & Environment
536
+ Agri:
537
+ - Agriculture & Environment
538
+ Environment:
539
+ - Agriculture & Environment
540
+ environment:
541
+ - Agriculture & Environment
542
+ Energy:
543
+ - Agriculture & Environment
544
+ "Energy/Mining":
545
+ - Agriculture & Environment
546
+ Waste:
547
+ - Agriculture & Environment
548
+ Water:
549
+ - Agriculture & Environment
550
+ "Real Estate":
551
+ - Real Estate & Construction
552
+ Construction:
553
+ - Real Estate & Construction
554
+ Housing:
555
+ - Real Estate & Construction
556
+ "Real Estate / Services":
557
+ - Real Estate & Construction
558
+ "Real Estate / Hospitality":
559
+ - Real Estate & Construction
560
+ - Travel & Tourism
561
+ "Housing / Legal":
562
+ - Real Estate & Construction
563
+ Retail:
564
+ - E-commerce & Retail
565
+ Commerce:
566
+ - E-commerce & Retail
567
+ commerce:
568
+ - E-commerce & Retail
569
+ Trade:
570
+ - E-commerce & Retail
571
+ Legal:
572
+ - Government & Public Services
573
+ - AI & Automation
574
+ Law:
575
+ - Government & Public Services
576
+ "Technology / Legal":
577
+ - Developer Tools
578
+ - Government & Public Services
579
+ "Business / Legal":
580
+ - Government & Public Services
581
+ Employment:
582
+ - HR & Recruitment
583
+ - AI & Automation
584
+ "Gig Economy":
585
+ - HR & Recruitment
586
+ Labor:
587
+ - HR & Recruitment
588
+ HR:
589
+ - HR & Recruitment
590
+ "Food & Beverage":
591
+ - Food & Beverage
592
+ - E-commerce & Retail
593
+ Food:
594
+ - Food & Beverage
595
+ Hospitality:
596
+ - Food & Beverage
597
+ - Travel & Tourism
598
+ Tourism:
599
+ - Travel & Tourism
600
+ Travel:
601
+ - Travel & Tourism
602
+ Recreation:
603
+ - Travel & Tourism
604
+ - Media & Entertainment
605
+ Manufacturing:
606
+ - AI & Automation
607
+ - Logistics & Supply Chain
608
+ Industry:
609
+ - AI & Automation
610
+ - Logistics & Supply Chain
611
+ "Industry / Manufacturing":
612
+ - AI & Automation
613
+ - Logistics & Supply Chain
614
+ Technology:
615
+ - Developer Tools
616
+ - AI & Automation
617
+ Tech:
618
+ - Developer Tools
619
+ - AI & Automation
620
+ "digital & telecom":
621
+ - Developer Tools
622
+ - Media & Entertainment
623
+ Telecommunications:
624
+ - Developer Tools
625
+ - Media & Entertainment
626
+ Telecom:
627
+ - Developer Tools
628
+ - Media & Entertainment
629
+ Social:
630
+ - Social & Community
631
+ Non-Profit:
632
+ - Social & Community
633
+ "Social Impact":
634
+ - Social & Community
635
+ "Social Services":
636
+ - Social & Community
637
+ "Social Protection":
638
+ - Social & Community
639
+ Creative:
640
+ - Design & Creative
641
+ - Media & Entertainment
642
+ "Creative Services":
643
+ - Design & Creative
644
+ "Creative Industries":
645
+ - Design & Creative
646
+ Media:
647
+ - Media & Entertainment
648
+ Entertainment:
649
+ - Media & Entertainment
650
+ "Arts / Culture":
651
+ - Design & Creative
652
+ - Media & Entertainment
653
+ Arts:
654
+ - Design & Creative
655
+ Sports:
656
+ - Media & Entertainment
657
+ "Sports / Recreation":
658
+ - Media & Entertainment
659
+ Safety:
660
+ - Security & Privacy
661
+ Marketing:
662
+ - Marketing & Sales
663
+ Business:
664
+ - Productivity & PM
665
+ - Marketing & Sales
666
+ "Business Services":
667
+ - Productivity & PM
668
+ Services:
669
+ - Productivity & PM
670
+ "Startup Ecosystem":
671
+ - Productivity & PM
672
+ - FinTech & Finance
673
+ Utilities:
674
+ - Government & Public Services
675
+ utilities:
676
+ - Government & Public Services
677
+ "Public Sector":
678
+ - Government & Public Services
679
+ "Urban Living":
680
+ - Real Estate & Construction
681
+ Wellness:
682
+ - Healthcare & MedTech
683
+ Lifestyle:
684
+ - Healthcare & MedTech
685
+ - Media & Entertainment
686
+ Fashion:
687
+ - E-commerce & Retail
688
+ - Design & Creative
689
+ Beauty:
690
+ - E-commerce & Retail
691
+ Family:
692
+ - Social & Community
693
+ "Family / Tech":
694
+ - Social & Community
695
+ - AI & Automation
696
+ Science:
697
+ - Developer Tools
698
+ - AI & Automation
699
+ Research:
700
+ - Developer Tools
701
+ - AI & Automation
702
+ Events:
703
+ - Marketing & Sales
704
+ - Productivity & PM
705
+ Insurance:
706
+ - FinTech & Finance
707
+ Fisheries:
708
+ - Agriculture & Environment
709
+ Mining:
710
+ - Agriculture & Environment
711
+ Parks:
712
+ - Travel & Tourism
713
+
714
+ BOILERPLATE_SIGNALS :
715
+ - the program addresses these needs by offering targeted interventions
716
+ - may currently rely on manual workarounds or a patchwork of generic tools
717
+ - ecosystem engagement, founder readiness training
718
+ - this makes it easier for people to stay organized
app/config/logging_config.yaml ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: 1
2
+ disable_existing_loggers: false
3
+
4
+ formatters:
5
+ default:
6
+ format: "%(asctime)s | %(levelname)s | %(name)s | %(message)s"
7
+
8
+ handlers:
9
+ console:
10
+ class: logging.StreamHandler
11
+ level: INFO
12
+ formatter: default
13
+
14
+ file:
15
+ class: logging.FileHandler
16
+ filename: app.log
17
+ level: WARNING
18
+ formatter: default
19
+
20
+ root:
21
+ level: DEBUG
22
+ handlers: [console, file]
app/config/model_config.yaml ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ encoder_model: all-mpnet-base-v2
2
+ reranker: cross-encoder/ms-marco-MiniLM-L-6-v2
3
+ sparse_model : Qdrant/bm25
app/main.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from dotenv import load_dotenv
2
+
3
+ from app.src.engine.core.reasoning_router import route_reasoning
4
+ load_dotenv(".env")
5
+
6
+ from app.src.chat_schemas.response_schema import ChatRequest, ChatResponse
7
+ from fastapi import FastAPI
8
+
9
+ from app.src.engine.core.reasoning_router import route_reasoning
10
+ from app.src.engine.core.logger import setup_logging
11
+
12
+ setup_logging()
13
+
14
+ app = FastAPI(title="Startup AI Service")
15
+
16
+ @app.post("/chat", response_model=ChatResponse)
17
+ def chat_endpoint(request: ChatRequest):
18
+ result = route_reasoning(
19
+ user_input=request.content,
20
+ data=request.data,
21
+ isNewConversation=request.isNewConversation,
22
+ conversationId=request.conversationId,
23
+ domain=request.domain
24
+ )
25
+
26
+ return result
27
+
28
+
29
+
30
+
app/src/chat_schemas/__pycache__/response_schema.cpython-313.pyc ADDED
Binary file (1.61 kB). View file
 
app/src/chat_schemas/response_schema.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pydantic import BaseModel
2
+ from typing import Optional, List, Dict
3
+
4
+ class IntentSchema(BaseModel):
5
+ primary_intent: str
6
+ secondary_intents: List[str] = []
7
+
8
+ class ChatResponse(BaseModel):
9
+ content: str
10
+ conversationId: str
11
+ conversation_title:Optional[str]
12
+ role: str = 'ai',
13
+ is_idea_saved: bool = False
14
+ is_full_idea: bool
15
+ data: Optional[Dict] = None
16
+ inspired_by: Optional[List[str]] = None
17
+
18
+
19
+ class ChatRequest(BaseModel):
20
+ content: str
21
+ conversationId: str
22
+ isNewConversation: bool
23
+ clientMessageId: str = None
24
+ domain: Optional[str] = None
25
+ data: Optional[Dict] = None
26
+
27
+
app/src/engine/core/intent_classification.py ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import re
3
+ from typing import Dict
4
+ from app.src.llm.groq_provider import groq_provider
5
+ from app.src.prompt_Engineering.tamplates import INTENTS_DETECTION_TEMPLATE
6
+ import logging
7
+
8
+ logger = logging.getLogger(__name__)
9
+
10
+ llm_provider = groq_provider()
11
+
12
+ def classify_intent(message: str) -> Dict:
13
+ """
14
+ Classify user intent from message using LLM
15
+ """
16
+ logger.info("Classifying intent")
17
+ try:
18
+ response = llm_provider.generate([
19
+ {"role": "user", "content": INTENTS_DETECTION_TEMPLATE.format(user_message=message)}
20
+ ])
21
+
22
+ logger.debug(f"Raw LLM response: {response}")
23
+
24
+ cleaned_response = clean_json_response(response)
25
+ parsed = json.loads(cleaned_response)
26
+
27
+ logger.info("Intent classification succeeded")
28
+
29
+ return parsed
30
+
31
+ except json.JSONDecodeError as e:
32
+ logger.error(f"JSON parsing failed in classify_intent: {e}")
33
+ return get_default_intent(message)
34
+
35
+ except Exception as e:
36
+ logger.exception(f"Unexpected error in classify_intent: {type(e).__name__}: {e}")
37
+ return get_default_intent(message)
38
+
39
+
40
+ def extract_problem_and_requirements(user_input: str) -> Dict:
41
+
42
+ logger.info("Extracting problem and requirements")
43
+
44
+ extraction_prompt = f"""Extract information from this input. Return ONLY valid JSON.
45
+
46
+ User input: "{user_input}"
47
+
48
+ Return this exact JSON format (no other text):
49
+ {{"core_problem": "", "requirements": [], "references_previous": false, "questions": [], "constraints": []}}
50
+
51
+ Fill the fields based on the user input. If a field is empty, use empty string or empty list."""
52
+
53
+ try:
54
+ response = llm_provider.generate([
55
+ {"role": "user", "content": extraction_prompt}
56
+ ])
57
+
58
+ logger.debug(f"Raw extraction response: {response}")
59
+
60
+ cleaned_response = extract_json_only(response)
61
+
62
+ parsed = json.loads(cleaned_response)
63
+
64
+ logger.info("Extraction succeeded")
65
+
66
+ return {
67
+ "core_problem": parsed.get("core_problem", ""),
68
+ "requirements": parsed.get("requirements", []),
69
+ "references_previous": parsed.get("references_previous", False),
70
+ "questions": parsed.get("questions", []),
71
+ "constraints": parsed.get("constraints", [])
72
+ }
73
+
74
+ except json.JSONDecodeError as e:
75
+ logger.error(f"JSON Parse Error: {e}")
76
+ return get_default_extraction(user_input)
77
+
78
+ except Exception as e:
79
+ logger.exception(f"Error: {type(e).__name__}: {e}")
80
+ return get_default_extraction(user_input)
81
+
82
+
83
+ def extract_json_only(text: str) -> str:
84
+ """
85
+ Extract ONLY the first valid JSON object from text
86
+ """
87
+ import re
88
+
89
+ # Remove markdown
90
+ text = re.sub(r'```.*?```', '', text, flags=re.DOTALL)
91
+
92
+ # Find first '{'
93
+ start = text.find('{')
94
+ if start == -1:
95
+ return '{}'
96
+
97
+ # Count braces to find matching '}'
98
+ count = 0
99
+ for i in range(start, len(text)):
100
+ if text[i] == '{':
101
+ count += 1
102
+ elif text[i] == '}':
103
+ count -= 1
104
+ if count == 0:
105
+ return text[start:i+1]
106
+
107
+ return '{}'
108
+
109
+ def clean_json_response(response: str) -> str:
110
+ """
111
+ Clean LLM response by removing markdown and extra text
112
+ """
113
+ import re
114
+
115
+ # Remove markdown code block markers
116
+ response = re.sub(r'```(?:json|python|text)?\s*\n?', '', response)
117
+ response = re.sub(r'\n?```', '', response)
118
+
119
+ # Remove any text before first '{'
120
+ json_start = response.find('{')
121
+ if json_start != -1:
122
+ response = response[json_start:]
123
+
124
+ # Remove any text after last '}'
125
+ json_end = response.rfind('}')
126
+ if json_end != -1:
127
+ response = response[:json_end + 1]
128
+
129
+ return response.strip()
130
+
131
+
132
+ def get_default_intent(user_input: str) -> Dict:
133
+ """
134
+ Return default intent when LLM parsing fails
135
+ """
136
+ return {
137
+ "detected_intents": [
138
+ {
139
+ "intent": "general_chat",
140
+ "confidence": "high",
141
+ "relevant_text": user_input,
142
+ "priority": 1
143
+ }
144
+ ],
145
+ "primary_intent": "general_chat",
146
+ "secondary_intents": []
147
+ }
148
+
149
+
150
+ def get_default_extraction(user_input: str) -> Dict:
151
+ """
152
+ Return default extraction when parsing fails
153
+ """
154
+ return {
155
+ "core_problem": "",
156
+ "requirements": [],
157
+ "references_previous": False,
158
+ "questions": [user_input],
159
+ "constraints": []
160
+ }
app/src/engine/core/logger.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import logging.config
3
+ import yaml
4
+ from pathlib import Path
5
+
6
+ def setup_logging():
7
+ config_path = "app/config/logging_config.yaml"
8
+
9
+ with open(config_path, "r", encoding="utf-8") as f:
10
+ config = yaml.safe_load(f)
11
+
12
+ logging.config.dictConfig(config)
app/src/engine/core/providers/embedding_provider.py ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from huggingface_hub import InferenceClient
3
+
4
+
5
+ class HFEmbeddingProvider:
6
+ """
7
+ Remote embedding model (no local download)
8
+ """
9
+
10
+ def __init__(self):
11
+ self.client = InferenceClient(token=os.getenv("HF_TOKEN"))
12
+ self.model = "sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
13
+
14
+ def encode(self, text: str):
15
+ if not text:
16
+ return []
17
+
18
+ result = self.client.feature_extraction(
19
+ model=self.model,
20
+ text=text
21
+ )
22
+
23
+ # 🧠 الحل الصح
24
+ import numpy as np
25
+
26
+ # لو numpy array
27
+ if isinstance(result, np.ndarray):
28
+ return result.tolist()
29
+
30
+ # لو nested list
31
+ if isinstance(result, list) and isinstance(result[0], list):
32
+ return result[0]
33
+
34
+ # لو list عادي
35
+ if isinstance(result, list):
36
+ return result
37
+
38
+ # fallback
39
+ return list(result)
app/src/engine/core/providers/providers_factory.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from app.src.engine.core.providers.embedding_provider import HFEmbeddingProvider
3
+ from app.src.engine.core.providers.reranker_provider import HFRerankerProvider
4
+ from app.src.engine.core.providers.sparse_provider import SparseProvider
5
+ import yaml
6
+
7
+
8
+ def load_model_config():
9
+ path = "app/config/model_config.yaml"
10
+ with open(path, "r") as f:
11
+ return yaml.safe_load(f)
12
+
13
+
14
+ class ProviderFactory:
15
+ def __init__(self):
16
+ config = load_model_config()
17
+
18
+ self.embedding = HFEmbeddingProvider()
19
+ self.reranker = HFRerankerProvider()
20
+ self.sparse = SparseProvider(
21
+ model_name=config["sparse_model"]
22
+ )
app/src/engine/core/providers/reranker_provider.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ from huggingface_hub import InferenceClient
4
+
5
+ class HFRerankerProvider:
6
+ def __init__(self):
7
+ self.client = InferenceClient(token=os.getenv("HF_TOKEN"))
8
+ self.model = "BAAI/bge-reranker-base"
9
+
10
+ def score(self, query: str, doc: str) -> float:
11
+ if not query or not doc:
12
+ return 0.0
13
+ try:
14
+ response = self.client.post(
15
+ json={
16
+ "inputs": {
17
+ "text": query,
18
+ "text_pair": doc
19
+ }
20
+ },
21
+ model=self.model,
22
+ )
23
+ result = json.loads(response)
24
+ # بيرجع list of dicts زي: [{"label": "LABEL_0", "score": 0.98}]
25
+ if isinstance(result, list) and len(result) > 0:
26
+ if isinstance(result[0], list): # nested list
27
+ return float(result[0][0].get("score", 0.0))
28
+ return float(result[0].get("score", 0.0))
29
+ except Exception as e:
30
+ return 0.0
31
+ return 0.0
app/src/engine/core/providers/sparse_provider.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastembed import SparseTextEmbedding
2
+
3
+
4
+ class SparseProvider:
5
+ """
6
+ Local sparse model (خفيف ومش محتاج سيرفر)
7
+ """
8
+
9
+ def __init__(self, model_name: str):
10
+ self.model = SparseTextEmbedding(model_name=model_name)
11
+
12
+ def encode(self, text: str):
13
+ """
14
+ Returns sparse vector (indices + values)
15
+ """
16
+ result = list(self.model.embed([text]))[0]
17
+ return result
app/src/engine/core/reasoning_router.py ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from typing import Dict, List
3
+ import pandas as pd
4
+ from app.src.chat_schemas.response_schema import ChatResponse, IntentSchema
5
+ from app.src.engine.core.intent_classification import (
6
+ classify_intent,
7
+ extract_problem_and_requirements
8
+ )
9
+ from app.src.engine.rag.retriver import retrieve_topk
10
+ from app.src.prompt_Engineering.tamplates import FULL_IDEA_TEMPLATE
11
+ from app.src.prompt_Engineering.tamplates import build_unified_prompt
12
+ from app.src.llm.groq_provider import groq_provider
13
+ import logging
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+ llm_provider = groq_provider()
18
+
19
+ def route_reasoning(
20
+ user_input: str,
21
+ data: Dict,
22
+ domain: str,
23
+ isNewConversation: bool,
24
+ conversationId: str
25
+ ) -> Dict:
26
+
27
+
28
+ logger.debug(f"\nProcessing user input: {user_input}")
29
+
30
+ structured_data = None
31
+ new_data = None
32
+
33
+ # Step 1: Detect intents
34
+ intents_response = classify_intent(user_input)
35
+ logger.debug(f"Detected intents: {intents_response['detected_intents']}")
36
+
37
+ # Step 2: Extract problem and requirements
38
+ if intents_response["primary_intent"] == "random_solution":
39
+ # Filter problems by domain
40
+ try:
41
+ df = pd.read_excel('data/raw/Problems.xlsx')
42
+ random_domain_based_problem = df[
43
+ df['problem_sector'].str.lower() == domain.lower()
44
+ ].sample(n=1)['problem_description'].values[0]
45
+ extracted = extract_problem_and_requirements(random_domain_based_problem)
46
+ print(f"Random domain based problem: {random_domain_based_problem}\n")
47
+ except Exception as e:
48
+ print(f"Error reading problems: {e}")
49
+ extracted = extract_problem_and_requirements(user_input)
50
+ else:
51
+ extracted = extract_problem_and_requirements(user_input)
52
+
53
+ logger.debug(f"Extracted data: {extracted}")
54
+
55
+ # Step 3: Get context from retriever layer
56
+ def make_context_cards(points):
57
+ logger.info("Making The Context Cards")
58
+ cards = []
59
+ for i, p in enumerate(points, 1):
60
+ pl = p.payload or {}
61
+ cards.append(f"""[{i}]
62
+ name: {pl.get("name","")}
63
+ domain: {pl.get("domain","")}
64
+ use_case: {pl.get("use_case","")}
65
+ solution: {pl.get("solution","")}
66
+ link: {pl.get("link","") or pl.get("site","")}""".strip())
67
+ return "\n\n".join(cards)
68
+
69
+
70
+ points = retrieve_topk(
71
+ problem_text=extracted.get('core_problem', 'Problem not clearly specified'),
72
+ sector=domain
73
+ )
74
+ context = make_context_cards(points)
75
+ inspired_by = [point.payload.get("name","") for point in points] if points else None
76
+
77
+ logger.info(f"Inspired by: {inspired_by}")
78
+
79
+ logger.debug(f"The Context {context}")
80
+
81
+ # Step 4: Generate or retrieve idea data
82
+ primary_intent = intents_response['primary_intent']
83
+
84
+ if primary_intent in ["problem_solving", "random_solution"]:
85
+ # Generate new idea
86
+ logger.debug(f"Generating new startup idea...")
87
+ core_problem = extracted.get('core_problem', 'Problem not clearly specified')
88
+ new_data = llm_provider.generate([
89
+ {"role": "user", "content": FULL_IDEA_TEMPLATE.format(core_problem=core_problem)}
90
+ ])
91
+
92
+ elif primary_intent == "alternative_idea":
93
+ logger.info(f"Generating alternative startup idea...")
94
+ problem = data.get('problem_description', extracted.get('core_problem', 'Problem not clearly specified'))
95
+ new_data = llm_provider.generate([
96
+ {"role": "user", "content": FULL_IDEA_TEMPLATE.format(core_problem=problem)}
97
+ ])
98
+
99
+ else:
100
+ logger.debug(f"Using existing idea data...")
101
+ new_data = data
102
+
103
+ if new_data:
104
+ try:
105
+ structured_data = json.loads(new_data) if isinstance(new_data, str) else new_data
106
+ logger.info(f"Structured data parsed")
107
+ except (json.JSONDecodeError, TypeError) as e:
108
+ logger.exception(f"Failed to parse structured data: {e}")
109
+ structured_data = {
110
+ "raw_text": str(new_data),
111
+ "parse_error": str(e)
112
+ }
113
+ else:
114
+ structured_data = {
115
+ "raw_text": "No idea data generated"
116
+ }
117
+
118
+ # Step 5: Build unified prompt (to generate the response's content)
119
+ logger.info(f"Building unified prompt...")
120
+
121
+ final_prompt = build_unified_prompt(
122
+ detected_intents=intents_response['detected_intents'],
123
+ extracted_data=extracted,
124
+ context=context,
125
+ primary_intent=primary_intent,
126
+ idea_data=structured_data
127
+ )
128
+
129
+ # Step 6: Call LLM with the final prompt to generate response
130
+ logger.info(f"Generating response...")
131
+
132
+ content = llm_provider.generate([
133
+ {"role": "user", "content": final_prompt}
134
+ ])
135
+
136
+ logger.info(f"Response received")
137
+
138
+ # Determine if this is an idea response
139
+ is_idea = primary_intent in ["problem_solving", "random_solution", "alternative_idea"]
140
+
141
+ # Step 7: Return response
142
+ if isNewConversation:
143
+ conversation_title = extracted.get('core_problem', 'New Conversation')
144
+
145
+ return ChatResponse(
146
+ content=content,
147
+ conversationId=conversationId,
148
+ conversation_title=conversation_title,
149
+ role='ai',
150
+ is_idea_saved=False,
151
+ is_full_idea=is_idea,
152
+ data=structured_data,
153
+ inspired_by= inspired_by
154
+ ).dict()
155
+
156
+ else:
157
+ return ChatResponse(
158
+ content=content,
159
+ conversationId=conversationId,
160
+ role='ai',
161
+ is_idea_saved=False,
162
+ is_full_idea=is_idea,
163
+ data=structured_data,
164
+ inspired_by= inspired_by
165
+ ).dict()
app/src/engine/rag/retriver.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+
4
+ from qdrant_client import QdrantClient, models
5
+ from qdrant_client.models import Prefetch, FusionQuery, Fusion
6
+ from deep_translator import GoogleTranslator
7
+ import langdetect
8
+
9
+ import yaml
10
+ import logging
11
+
12
+ from app.src.engine.core.providers.providers_factory import ProviderFactory
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+ from dotenv import load_dotenv
17
+ load_dotenv(".env")
18
+
19
+ def load_sector_mappings():
20
+ path = "app/config/domain_mapping.yaml"
21
+ with open(path, "r", encoding="utf-8") as f:
22
+ data = yaml.safe_load(f)
23
+
24
+ return (
25
+ data["STARTUP_SECTOR_GROUPS"],
26
+ data["PROBLEM_TO_STARTUP_GROUPS"],
27
+ data["BOILERPLATE_SIGNALS"]
28
+ )
29
+
30
+ def load_models_names():
31
+ path = "app/config/model_config.yaml"
32
+ with open(path, "r", encoding="utf-8") as f:
33
+ data = yaml.safe_load(f)
34
+
35
+ return (
36
+ data["encoder_model"],
37
+ data["reranker"],
38
+ data["sparse_model"]
39
+ )
40
+
41
+ STARTUP_SECTOR_GROUPS, PROBLEM_TO_STARTUP_GROUPS, BOILERPLATE_SIGNALS = load_sector_mappings()
42
+
43
+ def get_startup_sectors_for_problem(problem_sector: str) -> list[str]:
44
+ logger.info(f"problem Sector mapping")
45
+ group_names = PROBLEM_TO_STARTUP_GROUPS.get(problem_sector, [])
46
+ sectors = []
47
+ for g in group_names:
48
+ sectors.extend(STARTUP_SECTOR_GROUPS.get(g, []))
49
+ return list(set(sectors))
50
+
51
+ encoder_model_name, reranker_name, sparse_model_name = load_models_names()
52
+
53
+ providers = ProviderFactory()
54
+
55
+ embedding_provider = providers.embedding
56
+ reranker_provider = providers.reranker
57
+ sparse_provider = providers.sparse
58
+
59
+
60
+ def is_boilerplate(payload: dict) -> bool:
61
+ text = " ".join([payload.get("use_case",""), payload.get("solution",""), payload.get("description","")]).lower()
62
+ return any(s in text for s in BOILERPLATE_SIGNALS)
63
+
64
+ def translate_to_english(text: str) -> str:
65
+ try:
66
+ if langdetect.detect(text) == "ar":
67
+ translated = GoogleTranslator(source="ar", target="en").translate(text)
68
+ logger.debug(f"Translated: {translated}")
69
+ return translated
70
+ except Exception:
71
+ pass
72
+ return text
73
+
74
+ qdrant_client= QdrantClient(url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"))
75
+
76
+ def retrieve_topk(
77
+ problem_text: str,
78
+ k: int = 5,
79
+ sector: str | None = None,
80
+ topN: int = 150,
81
+ debug: bool = True
82
+ ):
83
+ logger.info(f"Getting The top 5 Startups")
84
+ problem_en = translate_to_english(problem_text)
85
+ ce_query = f"{sector}: {problem_en}" if sector else problem_en
86
+
87
+ dense_vec = embedding_provider.encode(problem_en)
88
+ sparse_vec = sparse_provider.encode(problem_en)
89
+
90
+ # Soft sector filter (SHOULD = boost, not hard exclusion)
91
+ startup_sectors = get_startup_sectors_for_problem(sector) if sector else []
92
+ soft_filter = None
93
+ if startup_sectors:
94
+ soft_filter = models.Filter(
95
+ should=[models.FieldCondition(
96
+ key="sector",
97
+ match=models.MatchAny(any=startup_sectors[:50])
98
+ )]
99
+ )
100
+ if debug:
101
+ logger.debug(f"'{sector}' → {len(startup_sectors)} startup sectors boosted")
102
+ else:
103
+ if debug:
104
+ logger.warning(f"No mapping for '{sector}' — searching without sector boost")
105
+
106
+ def run_query(use_filter):
107
+ return qdrant_client.query_points(
108
+ collection_name= os.getenv("COLLECTION"),
109
+ prefetch=[
110
+ Prefetch(query=dense_vec, using="dense", limit=topN, filter=use_filter),
111
+ Prefetch(
112
+ query=models.SparseVector(
113
+ indices=sparse_vec.indices.tolist(),
114
+ values=sparse_vec.values.tolist()
115
+ ),
116
+ using="sparse", limit=topN, filter=use_filter
117
+ ),
118
+ ],
119
+ query=FusionQuery(fusion=Fusion.RRF),
120
+ limit=topN,
121
+ with_payload=True,
122
+ )
123
+
124
+ results = run_query(soft_filter)
125
+
126
+ # Fallback: if fewer than k results, retry without filter
127
+ if len(results.points) < k and soft_filter is not None:
128
+ if debug:
129
+ logger.warning(f"Only {len(results.points)} results with filter — retrying without")
130
+ results = run_query(None)
131
+
132
+ # Clean: remove boilerplate + dedup by name
133
+ seen, clean, skipped = set(), [], 0
134
+ for p in results.points:
135
+ # if is_boilerplate(p.payload):
136
+ # skipped += 1
137
+ # continue
138
+ name = (p.payload.get("name") or "").strip().lower()
139
+ if name not in seen:
140
+ seen.add(name)
141
+ clean.append(p)
142
+
143
+ if debug:
144
+ logger.debug(f"{len(results.points)} retrieved → {skipped} boilerplate removed → {len(clean)} unique clean")
145
+ logger.debug(f"{len(results.points)} → {len(clean)} unique clean")
146
+
147
+ # Cross-encoder rerank
148
+ pairs = [[ce_query, " | ".join(filter(bool, [
149
+ p.payload.get("use_case",""),
150
+ p.payload.get("solution",""),
151
+ p.payload.get("description",""),
152
+ p.payload.get("sector",""),
153
+ ]))] for p in clean]
154
+
155
+ cross_scores = [
156
+ reranker_provider.score(q, d)
157
+ for q, d in pairs
158
+ ]
159
+ ranked = sorted(zip(cross_scores, clean), key=lambda x: x[0], reverse=True)
160
+
161
+ if debug:
162
+ logger.debug(f"\n=== TOP-{k} ===")
163
+ for score, p in ranked[:k]:
164
+ pl = p.payload
165
+ logger.debug(f" {round(float(score),3):>7} | {pl.get('name',''):<28} | {pl.get('sector',''):<22} | {pl.get('domain','')}")
166
+ logger.debug(f" {str(pl.get('use_case',''))[:110]}")
167
+
168
+ return [p for _, p in ranked[:k]]
app/src/llm/base.py ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from abc import ABC, abstractmethod
2
+
3
+ class BaseLLM(ABC):
4
+
5
+ @abstractmethod
6
+ def generate(self, message: list):
7
+ pass
8
+
9
+ @abstractmethod
10
+ def stream(self, message : list):
11
+ pass
app/src/llm/groq_provider.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from groq import Groq
3
+ from app.src.llm.base import BaseLLM
4
+
5
+ from dotenv import load_dotenv
6
+ load_dotenv(".env")
7
+
8
+ class groq_provider(BaseLLM):
9
+ def __init__(self):
10
+ self.client = Groq(api_key=os.getenv("GROQ_API_KEY"))
11
+
12
+ def generate(self, messages: list) -> str:
13
+ """
14
+ Generate response from Groq
15
+ """
16
+ if isinstance(messages, str):
17
+ messages = [{"role": "user", "content": messages}]
18
+
19
+ response = self.client.chat.completions.create(
20
+ model="llama-3.1-8b-instant",
21
+ messages=messages,
22
+ temperature=0.2,
23
+ )
24
+ return response.choices[0].message.content
25
+
26
+
27
+
28
+
29
+ def stream(self , message: list):
30
+ stream = self.client.chat.completions.create(
31
+ model="llama-3.1-8b-instant",
32
+ messages = message,
33
+ stream=True
34
+ )
35
+ return stream
36
+
37
+
app/src/prompt_Engineering/chain.py ADDED
File without changes
app/src/prompt_Engineering/few_shot.py ADDED
File without changes
app/src/prompt_Engineering/tamplates.py ADDED
@@ -0,0 +1,335 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Dict
2
+
3
+ # INTENTS DETECTION TEMPLATE
4
+ INTENTS_DETECTION_TEMPLATE = """You are an intent classification expert.
5
+
6
+ User input: "{user_message}"
7
+
8
+ Analyze this user input and detect ALL applicable intents.
9
+
10
+ CRITICAL DISTINCTIONS:
11
+ - problem_solving: User describes a SPECIFIC problem and wants a startup solution
12
+ Examples: "I want to solve expensive education", "transportation in Cairo is bad"
13
+
14
+ - random_solution: User asks for ANY startup idea WITHOUT describing a problem
15
+ Examples: "Give me a startup idea", "What's a good business"
16
+
17
+ - follow_up: User continues discussion on a PREVIOUS idea
18
+ Examples: "Tell me more about that idea", "How can we improve it?"
19
+
20
+ - alternative_idea: User wants a DIFFERENT solution for the SAME problem
21
+ Examples: "Another approach to education", "Different solution"
22
+
23
+ - details: User asks for more information/details
24
+ Examples: "Explain more", "Give me details"
25
+
26
+ - feasibility: User asks about viability/feasibility
27
+ Examples: "Is it feasible?", "Can we implement this?"
28
+
29
+ - novelty: User asks about innovation/uniqueness
30
+ Examples: "Is it innovative?", "Is it unique?"
31
+
32
+ - general_chat: General conversation with no specific startup request
33
+ Examples: "Hi how can you help me?", "How is the market?", "What's trending?"
34
+
35
+ RULES:
36
+ 1. If user mentions a SPECIFIC problem → problem_solving
37
+ 2. If user asks for ANY startup WITHOUT mentioning a problem → random_solution
38
+ 3. If user references PREVIOUS discussion → follow_up or alternative_idea
39
+ 4. If user asks for MORE about something already discussed → details
40
+ 5. If user questions FEASIBILITY → feasibility
41
+ 6. If user questions INNOVATION → novelty
42
+ 7. If it's GENERAL conversation → general_chat
43
+
44
+ Return ONLY valid JSON (no explanations):
45
+ {{"detected_intents": [{{"intent": "intent_name", "confidence": "high/medium/low", "relevant_text": "the relevant part", "priority": 1}}], "primary_intent": "main_intent", "secondary_intents": ["other_intents"]}}
46
+
47
+ Examples:
48
+
49
+ Input: "Hi how can you help me"
50
+ Output: {{"detected_intents": [{{"intent": "general_chat", "confidence": "high", "relevant_text": "Hi how can you help me", "priority": 1}}], "primary_intent": "general_chat", "secondary_intents": []}}
51
+
52
+ Input: "I want to solve expensive education in Egypt"
53
+ Output: {{"detected_intents": [{{"intent": "problem_solving", "confidence": "high", "relevant_text": "solve expensive education", "priority": 1}}], "primary_intent": "problem_solving", "secondary_intents": []}}
54
+
55
+ Input: "Give me a startup idea"
56
+ Output: {{"detected_intents": [{{"intent": "random_solution", "confidence": "high", "relevant_text": "Give me a startup idea", "priority": 1}}], "primary_intent": "random_solution", "secondary_intents": []}}
57
+ """
58
+
59
+ FULL_IDEA_TEMPLATE = """
60
+ You are an expert in entrepreneurship and startup innovation focused on the MENA region.
61
+
62
+ Your task is to generate a complete startup concept based on the given problem:
63
+ {core_problem}
64
+
65
+
66
+ Requirements:
67
+ - Focus on realistic and practical solutions.
68
+ - Adapt the idea for the Egypt or MENA market.
69
+ - Use concise and clear text.
70
+ - Provide multiple items for list fields when possible.
71
+
72
+ Important Rules:
73
+ - Return ONLY valid JSON.
74
+ - Do NOT write any text outside JSON.
75
+ - Do NOT add explanations or comments.
76
+ - Follow the exact data types:
77
+ - Text fields → string
78
+ - Lists → array
79
+ - Nested sections → object
80
+ - novelty_score → number between 0 and 100
81
+ - business_model MUST be an object (not a string).
82
+ - feasibility MUST be an object.
83
+ - market_analysis MUST be an object.
84
+ - impact MUST be an object.
85
+ - mvp_plan MUST be an object.
86
+
87
+ Return the response using this exact structure:
88
+
89
+ {{
90
+ "problem_title": "",
91
+ "problem_description": "",
92
+ "root_cause": "",
93
+ "target_users": "",
94
+ "market_region": "Egypt or MENA",
95
+ "why_now": "",
96
+ "evidence_signals": [],
97
+
98
+ "solution_name": "",
99
+ "solution_description": "",
100
+ "how_it_works": [],
101
+ "key_features": [],
102
+ "technology_stack": [],
103
+
104
+ "business_model": {{
105
+ "value_proposition": "",
106
+ "revenue_streams": [],
107
+ "pricing_model": "",
108
+ "customer_acquisition": []
109
+ }},
110
+
111
+ "market_analysis": {{
112
+ "market_size": "",
113
+ "competitors": [],
114
+ "competitive_advantage": ""
115
+ }},
116
+
117
+ "feasibility": {{
118
+ "technical_feasibility": "Low",
119
+ "market_feasibility": "Low",
120
+ "risk_factors": []
121
+ }},
122
+
123
+ "novelty_score": 0,
124
+
125
+ "impact": {{
126
+ "economic_impact": "",
127
+ "social_impact": ""
128
+ }},
129
+
130
+ "mvp_plan": {{
131
+ "mvp_features": [],
132
+ "first_steps": []
133
+ }}
134
+ }}
135
+ Important:
136
+ Return ONLY valid JSON.
137
+ Do not repeat any section.
138
+ Do not truncate the response.
139
+ If you are unsure, return a shorter but complete JSON.
140
+ If you cannot complete the JSON correctly, return a shorter but valid JSON.
141
+ Never cut arrays or objects.
142
+ Never leave fields incomplete.
143
+ """
144
+
145
+ import json
146
+ from typing import List, Dict
147
+
148
+ def build_unified_prompt(
149
+ detected_intents: List[Dict],
150
+ extracted_data: Dict,
151
+ context: str = None,
152
+ primary_intent: str = None,
153
+ idea_data: Dict = None
154
+ ) -> str:
155
+
156
+ if not primary_intent:
157
+ sorted_intents = sorted(detected_intents, key=lambda x: x.get("priority", 999))
158
+ primary_intent = sorted_intents[0]["intent"]
159
+
160
+ prompt = """
161
+ You are an expert in entrepreneurship and startup innovation focused on the MENA region.
162
+
163
+ Always answer clearly and practically.
164
+ Use the provided idea data as reference.
165
+ Base your response on the idea data provided.
166
+ """
167
+
168
+ # -------------------------
169
+ # EXISTING IDEA CONTEXT
170
+ # -------------------------
171
+
172
+ if idea_data:
173
+ idea_json = json.dumps(idea_data, indent=2)
174
+
175
+ prompt += f"""
176
+
177
+ STARTUP IDEA DATA (REFERENCE):
178
+
179
+ {idea_json}
180
+
181
+ Important rules:
182
+ - Use this data as your source of truth when answering.
183
+ - Extract information from this data to answer user questions.
184
+ - Do NOT generate new ideas if this data exists.
185
+ - Base all your answers on this data.
186
+ """
187
+
188
+ prompt += f"\n\nPRIMARY REQUEST ({primary_intent}):\n"
189
+
190
+
191
+ if primary_intent in ["problem_solving", "random_solution" , "alternative_idea"]:
192
+
193
+ prompt += """
194
+ The user is asking for a startup idea solution.
195
+
196
+ Your task:
197
+ - Describe the startup solution in a clear, compelling way
198
+ - Use the idea data provided above as your reference
199
+ - Return ONLY the description of the solution (not the full JSON)
200
+
201
+ Format your response as clear paragraphs or bullet points.
202
+ Make it practical and actionable for the Egypt/MENA market.
203
+ """
204
+
205
+
206
+
207
+ elif primary_intent == "follow_up":
208
+
209
+ user_questions = extracted_data.get('questions', ['General questions about the idea'])
210
+ questions_str = ', '.join(user_questions) if isinstance(user_questions, list) else user_questions
211
+
212
+ prompt += f"""
213
+ The user is following up with questions about the existing idea.
214
+
215
+ User's questions/requests:
216
+ {questions_str}
217
+
218
+ Your task:
219
+ - Answer based on the idea data provided above
220
+ - Expand or clarify specific aspects
221
+ - Provide detailed explanations
222
+ - Return a clear narrative response (not JSON)
223
+
224
+ Focus on the aspects the user is asking about.
225
+ """
226
+
227
+
228
+ elif primary_intent == "details":
229
+
230
+ prompt += """
231
+ The user wants more detailed information about the startup idea.
232
+
233
+ Your task:
234
+ - Provide comprehensive details based on the idea data
235
+ - Expand on implementation, business model, and execution
236
+ - Return a detailed narrative response (not JSON)
237
+ - Cover:
238
+ * Detailed problem analysis
239
+ * Complete solution description
240
+ * Implementation steps and timeline
241
+ * Business model breakdown
242
+ * Target customer segments
243
+ * Revenue streams and pricing
244
+ * Required resources and team
245
+ * Key success metrics
246
+
247
+ Format as detailed sections or bullet points.
248
+ Be specific and practical.
249
+ """
250
+
251
+
252
+ elif primary_intent == "general_chat":
253
+
254
+ topic = extracted_data.get('core_problem', 'general startup topics')
255
+
256
+ prompt += f"""
257
+ The user wants to have a general discussion about: {topic}
258
+
259
+ Your task:
260
+ - Provide thoughtful insights and analysis
261
+ - Use the idea data as context if available
262
+ - Return a conversational, informative response
263
+ - Be helpful and engaging
264
+
265
+ Format as clear narrative paragraphs.
266
+ """
267
+
268
+
269
+ secondary_intents = extracted_data.get("secondary_intents", [])
270
+
271
+ if secondary_intents:
272
+
273
+ prompt += "\n\nADDITIONAL ASPECTS TO ADDRESS:\n"
274
+
275
+ for intent in secondary_intents:
276
+
277
+ if intent == "details":
278
+
279
+ prompt += """
280
+ - Include more detailed information about:
281
+ * Implementation steps and timeline
282
+ * Business model specifics
283
+ * Target customers
284
+ * Revenue streams and pricing strategy
285
+ * Team and resources needed
286
+ """
287
+
288
+ elif intent == "feasibility":
289
+
290
+ prompt += """
291
+ - Analyze and discuss feasibility:
292
+ * Technical feasibility based on the idea data
293
+ * Market feasibility in Egypt/MENA region
294
+ * Risk factors and mitigation strategies
295
+ * Resource requirements
296
+ * Realistic timeline to MVP
297
+ * Success probability
298
+ """
299
+
300
+ elif intent == "novelty":
301
+
302
+ prompt += """
303
+ - Evaluate innovation and uniqueness:
304
+ * What's new and innovative about this solution
305
+ * Competitive advantages over existing solutions
306
+ * Unique value proposition
307
+ * Market differentiation factors
308
+ * Why customers would choose this
309
+ """
310
+
311
+ context_text = context or "Startup discussion focused on solving real problems in Egypt and the MENA region."
312
+
313
+ prompt += f"""
314
+
315
+ CONTEXT:
316
+ {context_text}
317
+
318
+ USER REQUIREMENTS:
319
+ {', '.join(extracted_data.get('requirements', ['comprehensive analysis']))}
320
+
321
+ CONSTRAINTS:
322
+ {', '.join(extracted_data.get('constraints', ['Egypt/MENA market focus']))}
323
+
324
+ IMPORTANT INSTRUCTIONS:
325
+ - Return a clear, practical, narrative response (NOT JSON or code)
326
+ - Base everything on the idea data provided
327
+ - Keep language simple and actionable
328
+ - Focus on Egypt/MENA market realities
329
+ - Be specific with examples where possible
330
+ - Do NOT return the raw JSON data
331
+ - Format response as readable text or bullet points
332
+ - Make it engaging and professional
333
+ """
334
+
335
+ return prompt