Spaces:

parthnuwal7
/

FCT

Sleeping

File size: 4,869 Bytes

3d015cd

# Text Module V2 - Aspect-Based Scoring

## Overview
Enhanced text analysis using prototype-based aspect extraction with `all-mpnet-base-v2` embeddings.

## Changes from V1
- **Model**: Upgraded from `all-MiniLM-L6-v2` (384d) to `all-mpnet-base-v2` (768d)
- **Approach**: Moved from simple reference embeddings to aspect-based prototype scoring
- **Aspects**: 10 employability aspects (leadership, technical_skills, problem_solving, etc.)
- **Admin**: Runtime seed updates via REST API

## Configuration

### Model Selection
Set via environment variable or constructor:
```bash
export ASPECT_MODEL_NAME=all-mpnet-base-v2  # default
# or
export ASPECT_MODEL_NAME=all-MiniLM-L6-v2   # fallback
```

```python
from services.text_module_v2 import TextModuleV2

# Default (all-mpnet-base-v2)
text_module = TextModuleV2()

# Override model
text_module = TextModuleV2(model_name='all-MiniLM-L6-v2')
```

### Aspect Seeds
Seeds loaded from `./aspect_seeds.json` (created by default). Edit this file to customize aspect definitions.

**Location**: `analytics/backend/aspect_seeds.json`

### Centroids Cache
Pre-computed centroids saved to `./aspect_centroids.npz` for fast cold starts.

## Usage

### Basic Scoring
```python
text_module = TextModuleV2()

text_responses = {
    'text_q1': "I developed ML pipelines using Python and scikit-learn...",
    'text_q2': "My career goal is to become a data scientist...",
    'text_q3': "I led a team of 5 students in a hackathon project..."
}

score, confidence, features = text_module.score(text_responses)

print(f"Score: {score:.2f}, Confidence: {confidence:.2f}")
print(f"Features: {features}")
```

### Get Current Seeds
```python
seeds = text_module.get_aspect_seeds()
print(f"Loaded {len(seeds)} aspects")
```

## Admin API

### Setup
```python
from flask import Flask
from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint

app = Flask(__name__)
text_module = TextModuleV2()

# Register admin endpoints
register_admin_seed_endpoint(app, text_module)

app.run(port=5001)
```

Set admin token:
```bash
export ADMIN_SEED_TOKEN=your-secret-token
```

### Endpoints

#### GET /admin/aspect-seeds
Get current loaded seeds.

**Request**:
```bash
curl -H "X-Admin-Token: your-secret-token" \
  http://localhost:5001/admin/aspect-seeds
```

**Response**:
```json
{
  "success": true,
  "seeds": {
    "leadership": ["led a team", "managed project", ...],
    "technical_skills": [...]
  },
  "num_aspects": 10
}
```

#### POST /admin/aspect-seeds
Update aspect seeds (recomputes centroids).

**Request**:
```bash
curl -X POST \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "seeds": {
      "leadership": [
        "led a team",
        "managed stakeholders",
        "organized events"
      ],
      "technical_skills": [
        "developed web API",
        "built ML models"
      ]
    },
    "persist": true
  }' \
  http://localhost:5001/admin/aspect-seeds
```

**Response**:
```json
{
  "success": true,
  "message": "Aspect seeds updated successfully",
  "stats": {
    "num_aspects": 2,
    "avg_seed_count": 2.5,
    "timestamp": "2025-12-09T10:30:00Z"
  }
}
```

## Advanced: Seed Expansion

Suggest new seed phrases from a corpus:

```python
corpus = [
    "I led the product development team and managed stakeholders",
    "Implemented CI/CD pipelines for automated testing",
    # ... more texts
]

suggestions = text_module.suggest_seed_expansions(
    corpus_texts=corpus,
    aspect_key='leadership',
    top_n=20
)

print("Suggested seeds:", suggestions)
```

## Aspect → Question Mapping

```python
from services.text_module_v2 import get_relevant_aspects_for_question

# Q1: Strengths & skills
aspects_q1 = get_relevant_aspects_for_question('text_q1')
# ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication']

# Q2: Career interests
aspects_q2 = get_relevant_aspects_for_question('text_q2')
# ['career_alignment', 'learning_agility', 'initiative', 'communication']

# Q3: Extracurriculars & leadership
aspects_q3 = get_relevant_aspects_for_question('text_q3')
# ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication']
```

## Files

| File | Purpose |
|------|---------|
| `services/text_module_v2.py` | Main module implementation |
| `aspect_seeds.json` | Aspect seed definitions (editable) |
| `aspect_centroids.npz` | Cached centroids (auto-generated) |

## Performance

- **Model Load**: ~3s (first time)
- **Centroid Build**: ~1s for 10 aspects with 20 seeds each
- **Text Scoring**: ~200-500ms per 3-question set (CPU)

## Logging

Module logs to Python's `logging` system:
```python
import logging
logging.basicConfig(level=logging.INFO)
```

Key events logged:
- Model loading
- Seed updates (with masked token)
- Centroid recomputation
- File I/O operations