FCT / services /README_text_v2.md
Parthnuwal7
Adding analytical content
3d015cd
# Text Module V2 - Aspect-Based Scoring
## Overview
Enhanced text analysis using prototype-based aspect extraction with `all-mpnet-base-v2` embeddings.
## Changes from V1
- **Model**: Upgraded from `all-MiniLM-L6-v2` (384d) to `all-mpnet-base-v2` (768d)
- **Approach**: Moved from simple reference embeddings to aspect-based prototype scoring
- **Aspects**: 10 employability aspects (leadership, technical_skills, problem_solving, etc.)
- **Admin**: Runtime seed updates via REST API
## Configuration
### Model Selection
Set via environment variable or constructor:
```bash
export ASPECT_MODEL_NAME=all-mpnet-base-v2 # default
# or
export ASPECT_MODEL_NAME=all-MiniLM-L6-v2 # fallback
```
```python
from services.text_module_v2 import TextModuleV2
# Default (all-mpnet-base-v2)
text_module = TextModuleV2()
# Override model
text_module = TextModuleV2(model_name='all-MiniLM-L6-v2')
```
### Aspect Seeds
Seeds loaded from `./aspect_seeds.json` (created by default). Edit this file to customize aspect definitions.
**Location**: `analytics/backend/aspect_seeds.json`
### Centroids Cache
Pre-computed centroids saved to `./aspect_centroids.npz` for fast cold starts.
## Usage
### Basic Scoring
```python
text_module = TextModuleV2()
text_responses = {
'text_q1': "I developed ML pipelines using Python and scikit-learn...",
'text_q2': "My career goal is to become a data scientist...",
'text_q3': "I led a team of 5 students in a hackathon project..."
}
score, confidence, features = text_module.score(text_responses)
print(f"Score: {score:.2f}, Confidence: {confidence:.2f}")
print(f"Features: {features}")
```
### Get Current Seeds
```python
seeds = text_module.get_aspect_seeds()
print(f"Loaded {len(seeds)} aspects")
```
## Admin API
### Setup
```python
from flask import Flask
from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint
app = Flask(__name__)
text_module = TextModuleV2()
# Register admin endpoints
register_admin_seed_endpoint(app, text_module)
app.run(port=5001)
```
Set admin token:
```bash
export ADMIN_SEED_TOKEN=your-secret-token
```
### Endpoints
#### GET /admin/aspect-seeds
Get current loaded seeds.
**Request**:
```bash
curl -H "X-Admin-Token: your-secret-token" \
http://localhost:5001/admin/aspect-seeds
```
**Response**:
```json
{
"success": true,
"seeds": {
"leadership": ["led a team", "managed project", ...],
"technical_skills": [...]
},
"num_aspects": 10
}
```
#### POST /admin/aspect-seeds
Update aspect seeds (recomputes centroids).
**Request**:
```bash
curl -X POST \
-H "X-Admin-Token: your-secret-token" \
-H "Content-Type: application/json" \
-d '{
"seeds": {
"leadership": [
"led a team",
"managed stakeholders",
"organized events"
],
"technical_skills": [
"developed web API",
"built ML models"
]
},
"persist": true
}' \
http://localhost:5001/admin/aspect-seeds
```
**Response**:
```json
{
"success": true,
"message": "Aspect seeds updated successfully",
"stats": {
"num_aspects": 2,
"avg_seed_count": 2.5,
"timestamp": "2025-12-09T10:30:00Z"
}
}
```
## Advanced: Seed Expansion
Suggest new seed phrases from a corpus:
```python
corpus = [
"I led the product development team and managed stakeholders",
"Implemented CI/CD pipelines for automated testing",
# ... more texts
]
suggestions = text_module.suggest_seed_expansions(
corpus_texts=corpus,
aspect_key='leadership',
top_n=20
)
print("Suggested seeds:", suggestions)
```
## Aspect → Question Mapping
```python
from services.text_module_v2 import get_relevant_aspects_for_question
# Q1: Strengths & skills
aspects_q1 = get_relevant_aspects_for_question('text_q1')
# ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication']
# Q2: Career interests
aspects_q2 = get_relevant_aspects_for_question('text_q2')
# ['career_alignment', 'learning_agility', 'initiative', 'communication']
# Q3: Extracurriculars & leadership
aspects_q3 = get_relevant_aspects_for_question('text_q3')
# ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication']
```
## Files
| File | Purpose |
|------|---------|
| `services/text_module_v2.py` | Main module implementation |
| `aspect_seeds.json` | Aspect seed definitions (editable) |
| `aspect_centroids.npz` | Cached centroids (auto-generated) |
## Performance
- **Model Load**: ~3s (first time)
- **Centroid Build**: ~1s for 10 aspects with 20 seeds each
- **Text Scoring**: ~200-500ms per 3-question set (CPU)
## Logging
Module logs to Python's `logging` system:
```python
import logging
logging.basicConfig(level=logging.INFO)
```
Key events logged:
- Model loading
- Seed updates (with masked token)
- Centroid recomputation
- File I/O operations