Spaces:
Sleeping
Sleeping
Implementation Plan - Improve Prompt with Semantic Names
The goal is to improve the mentee_query_text used for embedding and reranking by replacing numeric IDs (Career ID, Domain IDs, Skill IDs) with their semantic text names (e.g., "Web Development", "Python"). This helps the language models understand the user's intent better.
User Review Required
This change requires a source of "Master Data" (mappings from ID -> Name). I will provide a script
scripts/extract_mappings.pythat you can run against yourmentor_profiles.jsonto automatically generate this mapping file.
Proposed Changes
Data Layer
[NEW] services/data_service.py
- Create
DataServiceclass to load and hold mappings for:- Careers (id -> name)
- Domains (id -> name)
- Skills (id -> name)
- It will load from
data/master_data.json(if exists) or fail gracefully/return IDs.
[NEW] scripts/extract_mappings.py
- A standalone script to scan a mentor JSON file (like
mentor_profiles_1000.json) and extract all unique IDs and Names intodata/master_data.json.
Service Layer
[MODIFY] services/recommendation_service.py
- Initialize
DataService. - In
recommend_mentors:- Fetch names for
career_id,domain_ids,skill_ids,mentor_domain_idsfromDataService. - Pass these resolved names (or the mapping dict) to
build_mentee_query_text.
- Fetch names for
[MODIFY] utils/text_builder.py
- Update
build_mentee_query_textto accept an optionalmappingsorresolved_namesargument. - Use names in the generated text instead of "IDs: 1, 2, 3".
- Example:
Preferred Domains: Web Development, Data Scienceinstead ofPreferred Domains (IDs): 1, 2.
Verification Plan
Automated Tests
- Run
verify_prompt_improvement.py(a new test script I will create) which:- Mocks
DataServicewith some sample mappings. - Calls
build_mentee_query_textwith sample Mentee data. - Asserts that the output string contains Names, not just IDs.
- Mocks
Manual Verification
- You can run the extraction script on your
mentor_profiles_1000.json. - Then run
test_api.pyand inspect the logs to see the generated Query Text.