FCT / services /README_text_v2.md
Parthnuwal7
Adding analytical content
3d015cd

Text Module V2 - Aspect-Based Scoring

Overview

Enhanced text analysis using prototype-based aspect extraction with all-mpnet-base-v2 embeddings.

Changes from V1

  • Model: Upgraded from all-MiniLM-L6-v2 (384d) to all-mpnet-base-v2 (768d)
  • Approach: Moved from simple reference embeddings to aspect-based prototype scoring
  • Aspects: 10 employability aspects (leadership, technical_skills, problem_solving, etc.)
  • Admin: Runtime seed updates via REST API

Configuration

Model Selection

Set via environment variable or constructor:

export ASPECT_MODEL_NAME=all-mpnet-base-v2  # default
# or
export ASPECT_MODEL_NAME=all-MiniLM-L6-v2   # fallback
from services.text_module_v2 import TextModuleV2

# Default (all-mpnet-base-v2)
text_module = TextModuleV2()

# Override model
text_module = TextModuleV2(model_name='all-MiniLM-L6-v2')

Aspect Seeds

Seeds loaded from ./aspect_seeds.json (created by default). Edit this file to customize aspect definitions.

Location: analytics/backend/aspect_seeds.json

Centroids Cache

Pre-computed centroids saved to ./aspect_centroids.npz for fast cold starts.

Usage

Basic Scoring

text_module = TextModuleV2()

text_responses = {
    'text_q1': "I developed ML pipelines using Python and scikit-learn...",
    'text_q2': "My career goal is to become a data scientist...",
    'text_q3': "I led a team of 5 students in a hackathon project..."
}

score, confidence, features = text_module.score(text_responses)

print(f"Score: {score:.2f}, Confidence: {confidence:.2f}")
print(f"Features: {features}")

Get Current Seeds

seeds = text_module.get_aspect_seeds()
print(f"Loaded {len(seeds)} aspects")

Admin API

Setup

from flask import Flask
from services.text_module_v2 import TextModuleV2, register_admin_seed_endpoint

app = Flask(__name__)
text_module = TextModuleV2()

# Register admin endpoints
register_admin_seed_endpoint(app, text_module)

app.run(port=5001)

Set admin token:

export ADMIN_SEED_TOKEN=your-secret-token

Endpoints

GET /admin/aspect-seeds

Get current loaded seeds.

Request:

curl -H "X-Admin-Token: your-secret-token" \
  http://localhost:5001/admin/aspect-seeds

Response:

{
  "success": true,
  "seeds": {
    "leadership": ["led a team", "managed project", ...],
    "technical_skills": [...]
  },
  "num_aspects": 10
}

POST /admin/aspect-seeds

Update aspect seeds (recomputes centroids).

Request:

curl -X POST \
  -H "X-Admin-Token: your-secret-token" \
  -H "Content-Type: application/json" \
  -d '{
    "seeds": {
      "leadership": [
        "led a team",
        "managed stakeholders",
        "organized events"
      ],
      "technical_skills": [
        "developed web API",
        "built ML models"
      ]
    },
    "persist": true
  }' \
  http://localhost:5001/admin/aspect-seeds

Response:

{
  "success": true,
  "message": "Aspect seeds updated successfully",
  "stats": {
    "num_aspects": 2,
    "avg_seed_count": 2.5,
    "timestamp": "2025-12-09T10:30:00Z"
  }
}

Advanced: Seed Expansion

Suggest new seed phrases from a corpus:

corpus = [
    "I led the product development team and managed stakeholders",
    "Implemented CI/CD pipelines for automated testing",
    # ... more texts
]

suggestions = text_module.suggest_seed_expansions(
    corpus_texts=corpus,
    aspect_key='leadership',
    top_n=20
)

print("Suggested seeds:", suggestions)

Aspect → Question Mapping

from services.text_module_v2 import get_relevant_aspects_for_question

# Q1: Strengths & skills
aspects_q1 = get_relevant_aspects_for_question('text_q1')
# ['technical_skills', 'problem_solving', 'learning_agility', 'initiative', 'communication']

# Q2: Career interests
aspects_q2 = get_relevant_aspects_for_question('text_q2')
# ['career_alignment', 'learning_agility', 'initiative', 'communication']

# Q3: Extracurriculars & leadership
aspects_q3 = get_relevant_aspects_for_question('text_q3')
# ['leadership', 'teamwork', 'project_execution', 'internships_experience', 'communication']

Files

File Purpose
services/text_module_v2.py Main module implementation
aspect_seeds.json Aspect seed definitions (editable)
aspect_centroids.npz Cached centroids (auto-generated)

Performance

  • Model Load: ~3s (first time)
  • Centroid Build: ~1s for 10 aspects with 20 seeds each
  • Text Scoring: ~200-500ms per 3-question set (CPU)

Logging

Module logs to Python's logging system:

import logging
logging.basicConfig(level=logging.INFO)

Key events logged:

  • Model loading
  • Seed updates (with masked token)
  • Centroid recomputation
  • File I/O operations