coderound / README.md
ketannnn's picture
feat: implement multi-stage candidate ingestion and matching pipeline with UI tracking and backend schema support
72d1c14
---
title: TalentPulse AI Candidate Matching
emoji:
colorFrom: purple
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
---
# TalentPulse: AI-Powered Candidate Matching System
## Overview
TalentPulse is a production-grade, full-stack AI system for matching job descriptions against large candidate pools. It replaces manual resume screening with semantic retrieval, neural reranking, structured gap analysis, and LLM-generated explanations.
The platform is built for recruiters and hiring teams who need fast, explainable, and configurable candidate matching. It supports session-based candidate batches, dynamic scoring weights, trajectory analysis, and reusable matching workflows for A/B testing and precision hiring.
## Key Features
### Session-based Candidate Management
Group candidates into named sessions for isolated workflows and repeatable matching experiments.
### Two-stage AI Matching Pipeline
- **Stage 1: Retrieval** — Fast vector search in Qdrant with structured scoring for skills, experience, and other signals.
- **Stage 2: Reranking** — Cross-encoder reranking of the shortlist, fused with Reciprocal Rank Fusion.
### Live Weight Sliders
Adjust matching priorities in real time and rerank results in memory without running new model inference.
### Structured Gap Analysis
Detect missing skills, experience gaps, and mismatches to generate grounded candidate explanations.
### LLM-generated Explanations
Use Groq-powered LLM responses based on the precomputed gap analysis.
### Trajectory Scoring
Estimate career growth velocity from work history and reward strong advancement patterns.
### JD Quality Feedback
Evaluate job descriptions for clarity, breadth, and missing signals.
## Tech Stack
| Layer | Technology |
|------|------------|
| Frontend | Next.js 16, React, Tailwind CSS v4 |
| Backend | FastAPI, Uvicorn |
| Database | Neon Postgres, Asyncpg, SQLAlchemy, Alembic |
| Vector Search | Qdrant Cloud |
| Async Jobs | Celery |
| Cache | Redis Cloud |
| Embeddings | BAAI/bge-small-en-v1.5 via SentenceTransformers |
| Reranking | BAAI/bge-reranker-v2-m3 via FlagEmbedding |
| LLM Provider | Groq (llama-3.3-70b-versatile) |
| Deployment | Docker, Nginx, Supervisord, HuggingFace Spaces |
## Architecture Overview
```mermaid
graph TD
UI[Next.js Frontend] -->|REST API| Proxy[Nginx Reverse Proxy]
Proxy --> API[FastAPI Backend]
API -->|Async Tasks| Queue[Redis / Celery Queue]
Queue --> Worker[Celery Workers]
API -->|Read / Write| DB[(Neon Postgres)]
Worker -->|Persist Metadata| DB
API -->|Vector Search| VectorDB[(Qdrant Cloud)]
Worker -->|Store Embeddings| VectorDB
API -->|In-Memory Rerank| LocalAI[Local Reranker Model]
API -->|LLM Explanations| LLM[Groq API]
Worker -->|LLM Jobs| LLM
````
## Project Structure
```text
/
├── backend/
│ ├── alembic/
│ ├── src/
│ │ ├── matching/
│ │ ├── ml/
│ │ ├── models/
│ │ ├── routers/
│ │ ├── schemas/
│ │ └── workers/
│ ├── main.py
│ └── requirements.txt
├── frontend/
│ ├── public/
│ ├── src/
│ │ ├── app/
│ │ └── lib/
│ ├── next.config.ts
│ └── globals.css
├── docker-compose.yml
├── Dockerfile
├── supervisord.conf
└── nginx.conf
```
## Core Modules & Responsibilities
### Backend
* **backend/src/ml**
Handles model loading, text embedding, and feature extraction.
* **backend/src/matching**
Implements retrieval, reranking, weighted scoring, and explanation logic.
* **backend/src/workers**
Runs background jobs such as candidate ingestion and explanation generation.
* **backend/src/routers**
Exposes API endpoints for sessions, JDs, candidates, matching, and health checks.
### Frontend
* **frontend/src/app**
Contains user-facing routes such as sessions, JD details, and pipeline orchestration.
* **frontend/src/lib**
Centralized API client wrappers.
## Application Flows
### Candidate Upload & Ingestion Flow
```mermaid
sequenceDiagram
actor User
participant UI as Next.js UI
participant API as FastAPI Router
participant Queue as Redis / Celery Queue
participant Worker as Celery Worker
participant Store as Postgres + Qdrant
User->>UI: Upload candidate CSV/JSON
UI->>API: POST /api/candidates/upload
API->>Queue: Dispatch ingest_candidates_batch
API-->>UI: Return task ID
UI->>API: Poll /api/candidates/status/{task_id}
Worker->>Queue: Fetch task
Worker->>Worker: Parse candidate data
Worker->>Worker: Compute embeddings and growth velocity
Worker->>Store: Save metadata and vector points
Worker-->>Queue: Mark task complete
API-->>UI: Return success status
```
### Matching & Reranking Flow
```mermaid
sequenceDiagram
actor User
participant UI as Next.js UI
participant API as FastAPI Router
participant Qdrant as Vector DB
participant Reranker as Local Reranker
participant Cache as Redis Cache
User->>UI: Open JD and click Match
UI->>API: POST /api/match/{jd_id}
API->>Qdrant: Retrieve top candidates
Qdrant-->>API: Return top-K vectors
API->>Reranker: Cross-encoder reranking
Reranker-->>API: Return adjusted scores
API->>API: Apply rank fusion and weights
API->>Cache: Store result
API-->>UI: Return ranked candidates
User->>UI: Adjust weight sliders
UI->>API: POST /api/match/{jd_id}/rerank
API->>API: Recompute ranking in memory
API-->>UI: Return updated ordering
```
### Explain & Refine Flow
```mermaid
sequenceDiagram
actor User
participant UI as Next.js UI
participant API as FastAPI Router
participant DB as Postgres
participant LLM as Groq API
User->>UI: Open candidate match details
UI->>API: POST /api/match/{jd_id}/candidates/{candidate_id}/explain
API->>DB: Load match data and gap analysis
API->>LLM: Generate grounded explanation
LLM-->>API: Return explanation text
API-->>UI: Show explanation to user
```
## API Documentation
| Method | Path | Purpose |
| ------ | ---------------------------------------------------- | -------------------------- |
| POST | /api/sessions | Create a candidate session |
| GET | /api/sessions | List sessions |
| POST | /api/jds | Create a job description |
| GET | /api/jds | List job descriptions |
| POST | /api/candidates/upload?session_id= | Upload candidate files |
| GET | /api/candidates/status/{task_id} | Check task progress |
| POST | /api/match/{jd_id}?session_id= | Run full matching pipeline |
| POST | /api/match/{jd_id}/rerank | Rerank in memory |
| POST | /api/match/{jd_id}/candidates/{candidate_id}/explain | Generate explanation |
| GET | /health | Health check |
## Database Models
* **Session** — Candidate batch container
* **JobDescription** — Stores JD text and parsed requirements
* **Candidate** — Stores profile, skills, work history, embeddings
* **MatchResult** — Stores scores, gaps, explanations, weights
## Authentication & Security
* No formal authentication yet
* CORS allows all origins
* Minimal admin utility route exists
## State Management
* React Hooks (`useState`, `useEffect`, `useCallback`)
* Local storage for persistence
* Redis for backend caching
## Caching & Performance
* Cached match results by `jd_id + session_id`
* Models pre-downloaded into Docker image
* SQLAlchemy cache tuned for Neon pooling
## Setup & Installation
### Run Locally
```bash
docker-compose up --build
```
### Database Migration
```bash
cd backend
alembic upgrade head
```
## Environment Variables
```env
DATABASE_URL=
QDRANT_URL=
QDRANT_API_KEY=
REDIS_URL=
GROQ_API_KEY=
GROQ_MODEL=
EMBEDDING_MODEL=
RERANKER_MODEL=
NEXT_PUBLIC_API_URL=
```
## Deployment
* Multi-stage Docker build
* Runs FastAPI + Next.js + Celery + Nginx
* Optimized for HuggingFace Spaces
* Exposes port `7860`
## Improvement Recommendations
* Add JWT auth + RBAC
* Replace polling with WebSockets / SSE
* Add object storage
* Add automated tests
* Add observability & metrics
## Quick Summary
TalentPulse combines semantic search, reranking, and LLM reasoning to help recruiters identify the best candidates faster, with explainable AI-powered hiring workflows.