mentorme858 / implementation_plan.md
Nguyễn Thanh Tùng
Improve prompt with semantic names
1b7ef16

Implementation Plan - Improve Prompt with Semantic Names

The goal is to improve the mentee_query_text used for embedding and reranking by replacing numeric IDs (Career ID, Domain IDs, Skill IDs) with their semantic text names (e.g., "Web Development", "Python"). This helps the language models understand the user's intent better.

User Review Required

This change requires a source of "Master Data" (mappings from ID -> Name). I will provide a script scripts/extract_mappings.py that you can run against your mentor_profiles.json to automatically generate this mapping file.

Proposed Changes

Data Layer

[NEW] services/data_service.py

  • Create DataService class to load and hold mappings for:
    • Careers (id -> name)
    • Domains (id -> name)
    • Skills (id -> name)
  • It will load from data/master_data.json (if exists) or fail gracefully/return IDs.

[NEW] scripts/extract_mappings.py

  • A standalone script to scan a mentor JSON file (like mentor_profiles_1000.json) and extract all unique IDs and Names into data/master_data.json.

Service Layer

[MODIFY] services/recommendation_service.py

  • Initialize DataService.
  • In recommend_mentors:
    • Fetch names for career_id, domain_ids, skill_ids, mentor_domain_ids from DataService.
    • Pass these resolved names (or the mapping dict) to build_mentee_query_text.

[MODIFY] utils/text_builder.py

  • Update build_mentee_query_text to accept an optional mappings or resolved_names argument.
  • Use names in the generated text instead of "IDs: 1, 2, 3".
  • Example: Preferred Domains: Web Development, Data Science instead of Preferred Domains (IDs): 1, 2.

Verification Plan

Automated Tests

  • Run verify_prompt_improvement.py (a new test script I will create) which:
    1. Mocks DataService with some sample mappings.
    2. Calls build_mentee_query_text with sample Mentee data.
    3. Asserts that the output string contains Names, not just IDs.

Manual Verification

  • You can run the extraction script on your mentor_profiles_1000.json.
  • Then run test_api.py and inspect the logs to see the generated Query Text.