File size: 4,869 Bytes
3d015cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
# Text Module V2 - Aspect-Based Scoring

## Overview
Enhanced text analysis using prototype-based aspect extraction with `all-mpnet-base-v2` embeddings.

## Changes from V1
- **Model**: Upgraded from `all-MiniLM-L6-v2` (384d) to `all-mpnet-base-v2` (768d)
- **Approach**: Moved from simple reference embeddings to aspect-based prototype scoring
- **Aspects**: 10 employability aspects (leadership, technical_skills, problem_solving, etc.)
- **Admin**: Runtime seed updates via REST API

## Configuration

### Model Selection
Set via environment variable or constructor:
```bash
export ASPECT_MODEL_NAME=all-mpnet-base-v2  # default
# or
export ASPECT_MODEL_NAME=all-MiniLM-L6-v2   # fallback
```

```python
from services.text_module_v2 import TextModuleV2

# Default (all-mpnet-base-v2)
text_module = TextModuleV2()

# Override model
text_module = TextModuleV2(model_name='all-MiniLM-L6-v2')
```

### Aspect Seeds
Seeds loaded from `./aspect_seeds.json` (created by default). Edit this file to customize aspect definitions.

**Location**: `analytics/backend/aspect_seeds.json`

### Centroids Cache
Pre-computed centroids saved to `./aspect_centroids.npz` for fast cold starts.

## Usage

### Basic Scoring
```python
text_module = TextModuleV2()

text_responses = {
    'text_q1': "I developed ML pipelines using Python and scikit-learn...",
    'text_q2': "My career goal is to become a data scientist...",
    'text_q3': "I led a team of 5 students in a hackathon project..."
}

score, confidence, features = text_module.score(text_responses)

print(f"Score: {score:.2f}, Confidence: {confidence:.2f}")
print(f"Features: {features}")
```

### Get Current Seeds
```python
seeds = text_module.get_aspect_seeds()
print(f"Loaded {len(seeds)} aspects")
```

## Admin API

### Setup
```python
from flask import Flask
from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint

app = Flask(__name__)
text_module = TextModuleV2()

# Register admin endpoints
register_admin_seed_endpoint(app, text_module)

app.run(port=5001)
```

Set admin token:
```bash
export ADMIN_SEED_TOKEN=your-secret-token
```

### Endpoints

#### GET /admin/aspect-seeds
Get current loaded seeds.

**Request**:
```bash
curl -H "X-Admin-Token: your-secret-token" \
  http://localhost:5001/admin/aspect-seeds
```

**Response**:
```json
{
  "success": true,
  "seeds": {
    "leadership": ["led a team", "managed project", ...],
    "technical_skills": [...]
  },
  "num_aspects": 10
}
```

#### POST /admin/aspect-seeds
Update aspect seeds (recomputes centroids).

**Request**:
```bash
curl -X POST \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "seeds": {
      "leadership": [
        "led a team",
        "managed stakeholders",
        "organized events"
      ],
      "technical_skills": [
        "developed web API",
        "built ML models"
      ]
    },
    "persist": true
  }' \
  http://localhost:5001/admin/aspect-seeds
```

**Response**:
```json
{
  "success": true,
  "message": "Aspect seeds updated successfully",
  "stats": {
    "num_aspects": 2,
    "avg_seed_count": 2.5,
    "timestamp": "2025-12-09T10:30:00Z"
  }
}
```

## Advanced: Seed Expansion

Suggest new seed phrases from a corpus:

```python
corpus = [
    "I led the product development team and managed stakeholders",
    "Implemented CI/CD pipelines for automated testing",
    # ... more texts
]

suggestions = text_module.suggest_seed_expansions(
    corpus_texts=corpus,
    aspect_key='leadership',
    top_n=20
)

print("Suggested seeds:", suggestions)
```

## Aspect → Question Mapping

```python
from services.text_module_v2 import get_relevant_aspects_for_question

# Q1: Strengths & skills
aspects_q1 = get_relevant_aspects_for_question('text_q1')
# ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication']

# Q2: Career interests
aspects_q2 = get_relevant_aspects_for_question('text_q2')
# ['career_alignment', 'learning_agility', 'initiative', 'communication']

# Q3: Extracurriculars & leadership
aspects_q3 = get_relevant_aspects_for_question('text_q3')
# ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication']
```

## Files

| File | Purpose |
|------|---------|
| `services/text_module_v2.py` | Main module implementation |
| `aspect_seeds.json` | Aspect seed definitions (editable) |
| `aspect_centroids.npz` | Cached centroids (auto-generated) |

## Performance

- **Model Load**: ~3s (first time)
- **Centroid Build**: ~1s for 10 aspects with 20 seeds each
- **Text Scoring**: ~200-500ms per 3-question set (CPU)

## Logging

Module logs to Python's `logging` system:
```python
import logging
logging.basicConfig(level=logging.INFO)
```

Key events logged:
- Model loading
- Seed updates (with masked token)
- Centroid recomputation
- File I/O operations