Spaces:
Sleeping
Sleeping
Text Module V2 - Aspect-Based Scoring
Overview
Enhanced text analysis using prototype-based aspect extraction with all-mpnet-base-v2 embeddings.
Changes from V1
- Model: Upgraded from
all-MiniLM-L6-v2(384d) toall-mpnet-base-v2(768d) - Approach: Moved from simple reference embeddings to aspect-based prototype scoring
- Aspects: 10 employability aspects (leadership, technical_skills, problem_solving, etc.)
- Admin: Runtime seed updates via REST API
Configuration
Model Selection
Set via environment variable or constructor:
export ASPECT_MODEL_NAME=all-mpnet-base-v2 # default
# or
export ASPECT_MODEL_NAME=all-MiniLM-L6-v2 # fallback
from services.text_module_v2 import TextModuleV2
# Default (all-mpnet-base-v2)
text_module = TextModuleV2()
# Override model
text_module = TextModuleV2(model_name='all-MiniLM-L6-v2')
Aspect Seeds
Seeds loaded from ./aspect_seeds.json (created by default). Edit this file to customize aspect definitions.
Location: analytics/backend/aspect_seeds.json
Centroids Cache
Pre-computed centroids saved to ./aspect_centroids.npz for fast cold starts.
Usage
Basic Scoring
text_module = TextModuleV2()
text_responses = {
'text_q1': "I developed ML pipelines using Python and scikit-learn...",
'text_q2': "My career goal is to become a data scientist...",
'text_q3': "I led a team of 5 students in a hackathon project..."
}
score, confidence, features = text_module.score(text_responses)
print(f"Score: {score:.2f}, Confidence: {confidence:.2f}")
print(f"Features: {features}")
Get Current Seeds
seeds = text_module.get_aspect_seeds()
print(f"Loaded {len(seeds)} aspects")
Admin API
Setup
from flask import Flask
from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint
app = Flask(__name__)
text_module = TextModuleV2()
# Register admin endpoints
register_admin_seed_endpoint(app, text_module)
app.run(port=5001)
Set admin token:
export ADMIN_SEED_TOKEN=your-secret-token
Endpoints
GET /admin/aspect-seeds
Get current loaded seeds.
Request:
curl -H "X-Admin-Token: your-secret-token" \
http://localhost:5001/admin/aspect-seeds
Response:
{
"success": true,
"seeds": {
"leadership": ["led a team", "managed project", ...],
"technical_skills": [...]
},
"num_aspects": 10
}
POST /admin/aspect-seeds
Update aspect seeds (recomputes centroids).
Request:
curl -X POST \
-H "X-Admin-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"seeds": {
"leadership": [
"led a team",
"managed stakeholders",
"organized events"
],
"technical_skills": [
"developed web API",
"built ML models"
]
},
"persist": true
}' \
http://localhost:5001/admin/aspect-seeds
Response:
{
"success": true,
"message": "Aspect seeds updated successfully",
"stats": {
"num_aspects": 2,
"avg_seed_count": 2.5,
"timestamp": "2025-12-09T10:30:00Z"
}
}
Advanced: Seed Expansion
Suggest new seed phrases from a corpus:
corpus = [
"I led the product development team and managed stakeholders",
"Implemented CI/CD pipelines for automated testing",
# ... more texts
]
suggestions = text_module.suggest_seed_expansions(
corpus_texts=corpus,
aspect_key='leadership',
top_n=20
)
print("Suggested seeds:", suggestions)
Aspect → Question Mapping
from services.text_module_v2 import get_relevant_aspects_for_question
# Q1: Strengths & skills
aspects_q1 = get_relevant_aspects_for_question('text_q1')
# ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication']
# Q2: Career interests
aspects_q2 = get_relevant_aspects_for_question('text_q2')
# ['career_alignment', 'learning_agility', 'initiative', 'communication']
# Q3: Extracurriculars & leadership
aspects_q3 = get_relevant_aspects_for_question('text_q3')
# ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication']
Files
| File | Purpose |
|---|---|
services/text_module_v2.py |
Main module implementation |
aspect_seeds.json |
Aspect seed definitions (editable) |
aspect_centroids.npz |
Cached centroids (auto-generated) |
Performance
- Model Load: ~3s (first time)
- Centroid Build: ~1s for 10 aspects with 20 seeds each
- Text Scoring: ~200-500ms per 3-question set (CPU)
Logging
Module logs to Python's logging system:
import logging
logging.basicConfig(level=logging.INFO)
Key events logged:
- Model loading
- Seed updates (with masked token)
- Centroid recomputation
- File I/O operations