Spaces:
Sleeping
Sleeping
| # Implementation Plan - Improve Prompt with Semantic Names | |
| The goal is to improve the `mentee_query_text` used for embedding and reranking by replacing numeric IDs (Career ID, Domain IDs, Skill IDs) with their semantic text names (e.g., "Web Development", "Python"). This helps the language models understand the user's intent better. | |
| ## User Review Required | |
| > [!IMPORTANT] | |
| > This change requires a source of "Master Data" (mappings from ID -> Name). | |
| > I will provide a script `scripts/extract_mappings.py` that you can run against your `mentor_profiles.json` to automatically generate this mapping file. | |
| ## Proposed Changes | |
| ### Data Layer | |
| #### [NEW] [services/data_service.py](file:///Users/tamtanbk62/datn/mentorme/services/data_service.py) | |
| - Create `DataService` class to load and hold mappings for: | |
| - Careers (id -> name) | |
| - Domains (id -> name) | |
| - Skills (id -> name) | |
| - It will load from `data/master_data.json` (if exists) or fail gracefully/return IDs. | |
| #### [NEW] [scripts/extract_mappings.py](file:///Users/tamtanbk62/datn/mentorme/scripts/extract_mappings.py) | |
| - A standalone script to scan a mentor JSON file (like `mentor_profiles_1000.json`) and extract all unique IDs and Names into `data/master_data.json`. | |
| ### Service Layer | |
| #### [MODIFY] [services/recommendation_service.py](file:///Users/tamtanbk62/datn/mentorme/services/recommendation_service.py) | |
| - Initialize `DataService`. | |
| - In `recommend_mentors`: | |
| - Fetch names for `career_id`, `domain_ids`, `skill_ids`, `mentor_domain_ids` from `DataService`. | |
| - Pass these resolved names (or the mapping dict) to `build_mentee_query_text`. | |
| #### [MODIFY] [utils/text_builder.py](file:///Users/tamtanbk62/datn/mentorme/utils/text_builder.py) | |
| - Update `build_mentee_query_text` to accept an optional `mappings` or `resolved_names` argument. | |
| - Use names in the generated text instead of "IDs: 1, 2, 3". | |
| - Example: `Preferred Domains: Web Development, Data Science` instead of `Preferred Domains (IDs): 1, 2`. | |
| ## Verification Plan | |
| ### Automated Tests | |
| - Run `verify_prompt_improvement.py` (a new test script I will create) which: | |
| 1. Mocks `DataService` with some sample mappings. | |
| 2. Calls `build_mentee_query_text` with sample Mentee data. | |
| 3. Asserts that the output string contains Names, not just IDs. | |
| ### Manual Verification | |
| - You can run the extraction script on your `mentor_profiles_1000.json`. | |
| - Then run `test_api.py` and inspect the logs to see the generated Query Text. | |