Spaces:
Sleeping
Sleeping
| # Text Module V2 - Aspect-Based Scoring | |
| ## Overview | |
| Enhanced text analysis using prototype-based aspect extraction with `all-mpnet-base-v2` embeddings. | |
| ## Changes from V1 | |
| - **Model**: Upgraded from `all-MiniLM-L6-v2` (384d) to `all-mpnet-base-v2` (768d) | |
| - **Approach**: Moved from simple reference embeddings to aspect-based prototype scoring | |
| - **Aspects**: 10 employability aspects (leadership, technical_skills, problem_solving, etc.) | |
| - **Admin**: Runtime seed updates via REST API | |
| ## Configuration | |
| ### Model Selection | |
| Set via environment variable or constructor: | |
| ```bash | |
| export ASPECT_MODEL_NAME=all-mpnet-base-v2 # default | |
| # or | |
| export ASPECT_MODEL_NAME=all-MiniLM-L6-v2 # fallback | |
| ``` | |
| ```python | |
| from services.text_module_v2 import TextModuleV2 | |
| # Default (all-mpnet-base-v2) | |
| text_module = TextModuleV2() | |
| # Override model | |
| text_module = TextModuleV2(model_name='all-MiniLM-L6-v2') | |
| ``` | |
| ### Aspect Seeds | |
| Seeds loaded from `./aspect_seeds.json` (created by default). Edit this file to customize aspect definitions. | |
| **Location**: `analytics/backend/aspect_seeds.json` | |
| ### Centroids Cache | |
| Pre-computed centroids saved to `./aspect_centroids.npz` for fast cold starts. | |
| ## Usage | |
| ### Basic Scoring | |
| ```python | |
| text_module = TextModuleV2() | |
| text_responses = { | |
| 'text_q1': "I developed ML pipelines using Python and scikit-learn...", | |
| 'text_q2': "My career goal is to become a data scientist...", | |
| 'text_q3': "I led a team of 5 students in a hackathon project..." | |
| } | |
| score, confidence, features = text_module.score(text_responses) | |
| print(f"Score: {score:.2f}, Confidence: {confidence:.2f}") | |
| print(f"Features: {features}") | |
| ``` | |
| ### Get Current Seeds | |
| ```python | |
| seeds = text_module.get_aspect_seeds() | |
| print(f"Loaded {len(seeds)} aspects") | |
| ``` | |
| ## Admin API | |
| ### Setup | |
| ```python | |
| from flask import Flask | |
| from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint | |
| app = Flask(__name__) | |
| text_module = TextModuleV2() | |
| # Register admin endpoints | |
| register_admin_seed_endpoint(app, text_module) | |
| app.run(port=5001) | |
| ``` | |
| Set admin token: | |
| ```bash | |
| export ADMIN_SEED_TOKEN=your-secret-token | |
| ``` | |
| ### Endpoints | |
| #### GET /admin/aspect-seeds | |
| Get current loaded seeds. | |
| **Request**: | |
| ```bash | |
| curl -H "X-Admin-Token: your-secret-token" \ | |
| http://localhost:5001/admin/aspect-seeds | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "success": true, | |
| "seeds": { | |
| "leadership": ["led a team", "managed project", ...], | |
| "technical_skills": [...] | |
| }, | |
| "num_aspects": 10 | |
| } | |
| ``` | |
| #### POST /admin/aspect-seeds | |
| Update aspect seeds (recomputes centroids). | |
| **Request**: | |
| ```bash | |
| curl -X POST \ | |
| -H "X-Admin-Token: your-secret-token" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "seeds": { | |
| "leadership": [ | |
| "led a team", | |
| "managed stakeholders", | |
| "organized events" | |
| ], | |
| "technical_skills": [ | |
| "developed web API", | |
| "built ML models" | |
| ] | |
| }, | |
| "persist": true | |
| }' \ | |
| http://localhost:5001/admin/aspect-seeds | |
| ``` | |
| **Response**: | |
| ```json | |
| { | |
| "success": true, | |
| "message": "Aspect seeds updated successfully", | |
| "stats": { | |
| "num_aspects": 2, | |
| "avg_seed_count": 2.5, | |
| "timestamp": "2025-12-09T10:30:00Z" | |
| } | |
| } | |
| ``` | |
| ## Advanced: Seed Expansion | |
| Suggest new seed phrases from a corpus: | |
| ```python | |
| corpus = [ | |
| "I led the product development team and managed stakeholders", | |
| "Implemented CI/CD pipelines for automated testing", | |
| # ... more texts | |
| ] | |
| suggestions = text_module.suggest_seed_expansions( | |
| corpus_texts=corpus, | |
| aspect_key='leadership', | |
| top_n=20 | |
| ) | |
| print("Suggested seeds:", suggestions) | |
| ``` | |
| ## Aspect → Question Mapping | |
| ```python | |
| from services.text_module_v2 import get_relevant_aspects_for_question | |
| # Q1: Strengths & skills | |
| aspects_q1 = get_relevant_aspects_for_question('text_q1') | |
| # ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication'] | |
| # Q2: Career interests | |
| aspects_q2 = get_relevant_aspects_for_question('text_q2') | |
| # ['career_alignment', 'learning_agility', 'initiative', 'communication'] | |
| # Q3: Extracurriculars & leadership | |
| aspects_q3 = get_relevant_aspects_for_question('text_q3') | |
| # ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication'] | |
| ``` | |
| ## Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `services/text_module_v2.py` | Main module implementation | | |
| | `aspect_seeds.json` | Aspect seed definitions (editable) | | |
| | `aspect_centroids.npz` | Cached centroids (auto-generated) | | |
| ## Performance | |
| - **Model Load**: ~3s (first time) | |
| - **Centroid Build**: ~1s for 10 aspects with 20 seeds each | |
| - **Text Scoring**: ~200-500ms per 3-question set (CPU) | |
| ## Logging | |
| Module logs to Python's `logging` system: | |
| ```python | |
| import logging | |
| logging.basicConfig(level=logging.INFO) | |
| ``` | |
| Key events logged: | |
| - Model loading | |
| - Seed updates (with masked token) | |
| - Centroid recomputation | |
| - File I/O operations | |