coderound / README.md
ketannnn's picture
feat: implement multi-stage candidate ingestion and matching pipeline with UI tracking and backend schema support
72d1c14
metadata
title: TalentPulse AI Candidate Matching
emoji: 
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860

TalentPulse: AI-Powered Candidate Matching System

Overview

TalentPulse is a production-grade, full-stack AI system for matching job descriptions against large candidate pools. It replaces manual resume screening with semantic retrieval, neural reranking, structured gap analysis, and LLM-generated explanations.

The platform is built for recruiters and hiring teams who need fast, explainable, and configurable candidate matching. It supports session-based candidate batches, dynamic scoring weights, trajectory analysis, and reusable matching workflows for A/B testing and precision hiring.

Key Features

Session-based Candidate Management

Group candidates into named sessions for isolated workflows and repeatable matching experiments.

Two-stage AI Matching Pipeline

  • Stage 1: Retrieval — Fast vector search in Qdrant with structured scoring for skills, experience, and other signals.
  • Stage 2: Reranking — Cross-encoder reranking of the shortlist, fused with Reciprocal Rank Fusion.

Live Weight Sliders

Adjust matching priorities in real time and rerank results in memory without running new model inference.

Structured Gap Analysis

Detect missing skills, experience gaps, and mismatches to generate grounded candidate explanations.

LLM-generated Explanations

Use Groq-powered LLM responses based on the precomputed gap analysis.

Trajectory Scoring

Estimate career growth velocity from work history and reward strong advancement patterns.

JD Quality Feedback

Evaluate job descriptions for clarity, breadth, and missing signals.

Tech Stack

Layer Technology
Frontend Next.js 16, React, Tailwind CSS v4
Backend FastAPI, Uvicorn
Database Neon Postgres, Asyncpg, SQLAlchemy, Alembic
Vector Search Qdrant Cloud
Async Jobs Celery
Cache Redis Cloud
Embeddings BAAI/bge-small-en-v1.5 via SentenceTransformers
Reranking BAAI/bge-reranker-v2-m3 via FlagEmbedding
LLM Provider Groq (llama-3.3-70b-versatile)
Deployment Docker, Nginx, Supervisord, HuggingFace Spaces

Architecture Overview

graph TD
    UI[Next.js Frontend] -->|REST API| Proxy[Nginx Reverse Proxy]
    Proxy --> API[FastAPI Backend]

    API -->|Async Tasks| Queue[Redis / Celery Queue]
    Queue --> Worker[Celery Workers]

    API -->|Read / Write| DB[(Neon Postgres)]
    Worker -->|Persist Metadata| DB

    API -->|Vector Search| VectorDB[(Qdrant Cloud)]
    Worker -->|Store Embeddings| VectorDB

    API -->|In-Memory Rerank| LocalAI[Local Reranker Model]
    API -->|LLM Explanations| LLM[Groq API]
    Worker -->|LLM Jobs| LLM

Project Structure

/
├── backend/
│   ├── alembic/
│   ├── src/
│   │   ├── matching/
│   │   ├── ml/
│   │   ├── models/
│   │   ├── routers/
│   │   ├── schemas/
│   │   └── workers/
│   ├── main.py
│   └── requirements.txt
├── frontend/
│   ├── public/
│   ├── src/
│   │   ├── app/
│   │   └── lib/
│   ├── next.config.ts
│   └── globals.css
├── docker-compose.yml
├── Dockerfile
├── supervisord.conf
└── nginx.conf

Core Modules & Responsibilities

Backend

  • backend/src/ml Handles model loading, text embedding, and feature extraction.

  • backend/src/matching Implements retrieval, reranking, weighted scoring, and explanation logic.

  • backend/src/workers Runs background jobs such as candidate ingestion and explanation generation.

  • backend/src/routers Exposes API endpoints for sessions, JDs, candidates, matching, and health checks.

Frontend

  • frontend/src/app Contains user-facing routes such as sessions, JD details, and pipeline orchestration.

  • frontend/src/lib Centralized API client wrappers.

Application Flows

Candidate Upload & Ingestion Flow

sequenceDiagram
    actor User
    participant UI as Next.js UI
    participant API as FastAPI Router
    participant Queue as Redis / Celery Queue
    participant Worker as Celery Worker
    participant Store as Postgres + Qdrant

    User->>UI: Upload candidate CSV/JSON
    UI->>API: POST /api/candidates/upload
    API->>Queue: Dispatch ingest_candidates_batch
    API-->>UI: Return task ID
    UI->>API: Poll /api/candidates/status/{task_id}
    Worker->>Queue: Fetch task
    Worker->>Worker: Parse candidate data
    Worker->>Worker: Compute embeddings and growth velocity
    Worker->>Store: Save metadata and vector points
    Worker-->>Queue: Mark task complete
    API-->>UI: Return success status

Matching & Reranking Flow

sequenceDiagram
    actor User
    participant UI as Next.js UI
    participant API as FastAPI Router
    participant Qdrant as Vector DB
    participant Reranker as Local Reranker
    participant Cache as Redis Cache

    User->>UI: Open JD and click Match
    UI->>API: POST /api/match/{jd_id}
    API->>Qdrant: Retrieve top candidates
    Qdrant-->>API: Return top-K vectors
    API->>Reranker: Cross-encoder reranking
    Reranker-->>API: Return adjusted scores
    API->>API: Apply rank fusion and weights
    API->>Cache: Store result
    API-->>UI: Return ranked candidates

    User->>UI: Adjust weight sliders
    UI->>API: POST /api/match/{jd_id}/rerank
    API->>API: Recompute ranking in memory
    API-->>UI: Return updated ordering

Explain & Refine Flow

sequenceDiagram
    actor User
    participant UI as Next.js UI
    participant API as FastAPI Router
    participant DB as Postgres
    participant LLM as Groq API

    User->>UI: Open candidate match details
    UI->>API: POST /api/match/{jd_id}/candidates/{candidate_id}/explain
    API->>DB: Load match data and gap analysis
    API->>LLM: Generate grounded explanation
    LLM-->>API: Return explanation text
    API-->>UI: Show explanation to user

API Documentation

Method Path Purpose
POST /api/sessions Create a candidate session
GET /api/sessions List sessions
POST /api/jds Create a job description
GET /api/jds List job descriptions
POST /api/candidates/upload?session_id= Upload candidate files
GET /api/candidates/status/{task_id} Check task progress
POST /api/match/{jd_id}?session_id= Run full matching pipeline
POST /api/match/{jd_id}/rerank Rerank in memory
POST /api/match/{jd_id}/candidates/{candidate_id}/explain Generate explanation
GET /health Health check

Database Models

  • Session — Candidate batch container
  • JobDescription — Stores JD text and parsed requirements
  • Candidate — Stores profile, skills, work history, embeddings
  • MatchResult — Stores scores, gaps, explanations, weights

Authentication & Security

  • No formal authentication yet
  • CORS allows all origins
  • Minimal admin utility route exists

State Management

  • React Hooks (useState, useEffect, useCallback)
  • Local storage for persistence
  • Redis for backend caching

Caching & Performance

  • Cached match results by jd_id + session_id
  • Models pre-downloaded into Docker image
  • SQLAlchemy cache tuned for Neon pooling

Setup & Installation

Run Locally

docker-compose up --build

Database Migration

cd backend
alembic upgrade head

Environment Variables

DATABASE_URL=
QDRANT_URL=
QDRANT_API_KEY=
REDIS_URL=
GROQ_API_KEY=
GROQ_MODEL=
EMBEDDING_MODEL=
RERANKER_MODEL=
NEXT_PUBLIC_API_URL=

Deployment

  • Multi-stage Docker build
  • Runs FastAPI + Next.js + Celery + Nginx
  • Optimized for HuggingFace Spaces
  • Exposes port 7860

Improvement Recommendations

  • Add JWT auth + RBAC
  • Replace polling with WebSockets / SSE
  • Add object storage
  • Add automated tests
  • Add observability & metrics

Quick Summary

TalentPulse combines semantic search, reranking, and LLM reasoning to help recruiters identify the best candidates faster, with explainable AI-powered hiring workflows.