Spaces:

ketannnn
/

coderound

Sleeping

App Files Files Community

coderound / README.md

ketannnn

feat: implement multi-stage candidate ingestion and matching pipeline with UI tracking and backend schema support

72d1c14 26 days ago

preview code

raw

history blame contribute delete

8.8 kB

	---
	title: TalentPulse AI Candidate Matching
	emoji: ⚡
	colorFrom: purple
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 7860
	---


	# TalentPulse: AI-Powered Candidate Matching System


	## Overview

	TalentPulse is a production-grade, full-stack AI system for matching job descriptions against large candidate pools. It replaces manual resume screening with semantic retrieval, neural reranking, structured gap analysis, and LLM-generated explanations.

	The platform is built for recruiters and hiring teams who need fast, explainable, and configurable candidate matching. It supports session-based candidate batches, dynamic scoring weights, trajectory analysis, and reusable matching workflows for A/B testing and precision hiring.

	## Key Features

	### Session-based Candidate Management
	Group candidates into named sessions for isolated workflows and repeatable matching experiments.

	### Two-stage AI Matching Pipeline
	- Stage 1: Retrieval — Fast vector search in Qdrant with structured scoring for skills, experience, and other signals.
	- Stage 2: Reranking — Cross-encoder reranking of the shortlist, fused with Reciprocal Rank Fusion.

	### Live Weight Sliders
	Adjust matching priorities in real time and rerank results in memory without running new model inference.

	### Structured Gap Analysis
	Detect missing skills, experience gaps, and mismatches to generate grounded candidate explanations.

	### LLM-generated Explanations
	Use Groq-powered LLM responses based on the precomputed gap analysis.

	### Trajectory Scoring
	Estimate career growth velocity from work history and reward strong advancement patterns.

	### JD Quality Feedback
	Evaluate job descriptions for clarity, breadth, and missing signals.

	## Tech Stack

	\| Layer \| Technology \|
	\|------\|------------\|
	\| Frontend \| Next.js 16, React, Tailwind CSS v4 \|
	\| Backend \| FastAPI, Uvicorn \|
	\| Database \| Neon Postgres, Asyncpg, SQLAlchemy, Alembic \|
	\| Vector Search \| Qdrant Cloud \|
	\| Async Jobs \| Celery \|
	\| Cache \| Redis Cloud \|
	\| Embeddings \| BAAI/bge-small-en-v1.5 via SentenceTransformers \|
	\| Reranking \| BAAI/bge-reranker-v2-m3 via FlagEmbedding \|
	\| LLM Provider \| Groq (llama-3.3-70b-versatile) \|
	\| Deployment \| Docker, Nginx, Supervisord, HuggingFace Spaces \|

	## Architecture Overview

	```mermaid
	graph TD
	UI[Next.js Frontend] -->\|REST API\| Proxy[Nginx Reverse Proxy]
	Proxy --> API[FastAPI Backend]

	API -->\|Async Tasks\| Queue[Redis / Celery Queue]
	Queue --> Worker[Celery Workers]

	API -->\|Read / Write\| DB[(Neon Postgres)]
	Worker -->\|Persist Metadata\| DB

	API -->\|Vector Search\| VectorDB[(Qdrant Cloud)]
	Worker -->\|Store Embeddings\| VectorDB

	API -->\|In-Memory Rerank\| LocalAI[Local Reranker Model]
	API -->\|LLM Explanations\| LLM[Groq API]
	Worker -->\|LLM Jobs\| LLM
	````

	## Project Structure

	```text
	/
	├── backend/
	│ ├── alembic/
	│ ├── src/
	│ │ ├── matching/
	│ │ ├── ml/
	│ │ ├── models/
	│ │ ├── routers/
	│ │ ├── schemas/
	│ │ └── workers/
	│ ├── main.py
	│ └── requirements.txt
	├── frontend/
	│ ├── public/
	│ ├── src/
	│ │ ├── app/
	│ │ └── lib/
	│ ├── next.config.ts
	│ └── globals.css
	├── docker-compose.yml
	├── Dockerfile
	├── supervisord.conf
	└── nginx.conf
	```

	## Core Modules & Responsibilities

	### Backend

	* backend/src/ml
	Handles model loading, text embedding, and feature extraction.

	* backend/src/matching
	Implements retrieval, reranking, weighted scoring, and explanation logic.

	* backend/src/workers
	Runs background jobs such as candidate ingestion and explanation generation.

	* backend/src/routers
	Exposes API endpoints for sessions, JDs, candidates, matching, and health checks.

	### Frontend

	* frontend/src/app
	Contains user-facing routes such as sessions, JD details, and pipeline orchestration.

	* frontend/src/lib
	Centralized API client wrappers.

	## Application Flows

	### Candidate Upload & Ingestion Flow

	```mermaid
	sequenceDiagram
	actor User
	participant UI as Next.js UI
	participant API as FastAPI Router
	participant Queue as Redis / Celery Queue
	participant Worker as Celery Worker
	participant Store as Postgres + Qdrant

	User->>UI: Upload candidate CSV/JSON
	UI->>API: POST /api/candidates/upload
	API->>Queue: Dispatch ingest_candidates_batch
	API-->>UI: Return task ID
	UI->>API: Poll /api/candidates/status/{task_id}
	Worker->>Queue: Fetch task
	Worker->>Worker: Parse candidate data
	Worker->>Worker: Compute embeddings and growth velocity
	Worker->>Store: Save metadata and vector points
	Worker-->>Queue: Mark task complete
	API-->>UI: Return success status
	```

	### Matching & Reranking Flow

	```mermaid
	sequenceDiagram
	actor User
	participant UI as Next.js UI
	participant API as FastAPI Router
	participant Qdrant as Vector DB
	participant Reranker as Local Reranker
	participant Cache as Redis Cache

	User->>UI: Open JD and click Match
	UI->>API: POST /api/match/{jd_id}
	API->>Qdrant: Retrieve top candidates
	Qdrant-->>API: Return top-K vectors
	API->>Reranker: Cross-encoder reranking
	Reranker-->>API: Return adjusted scores
	API->>API: Apply rank fusion and weights
	API->>Cache: Store result
	API-->>UI: Return ranked candidates

	User->>UI: Adjust weight sliders
	UI->>API: POST /api/match/{jd_id}/rerank
	API->>API: Recompute ranking in memory
	API-->>UI: Return updated ordering
	```

	### Explain & Refine Flow

	```mermaid
	sequenceDiagram
	actor User
	participant UI as Next.js UI
	participant API as FastAPI Router
	participant DB as Postgres
	participant LLM as Groq API

	User->>UI: Open candidate match details
	UI->>API: POST /api/match/{jd_id}/candidates/{candidate_id}/explain
	API->>DB: Load match data and gap analysis
	API->>LLM: Generate grounded explanation
	LLM-->>API: Return explanation text
	API-->>UI: Show explanation to user
	```

	## API Documentation

	\| Method \| Path \| Purpose \|
	\| ------ \| ---------------------------------------------------- \| -------------------------- \|
	\| POST \| /api/sessions \| Create a candidate session \|
	\| GET \| /api/sessions \| List sessions \|
	\| POST \| /api/jds \| Create a job description \|
	\| GET \| /api/jds \| List job descriptions \|
	\| POST \| /api/candidates/upload?session_id= \| Upload candidate files \|
	\| GET \| /api/candidates/status/{task_id} \| Check task progress \|
	\| POST \| /api/match/{jd_id}?session_id= \| Run full matching pipeline \|
	\| POST \| /api/match/{jd_id}/rerank \| Rerank in memory \|
	\| POST \| /api/match/{jd_id}/candidates/{candidate_id}/explain \| Generate explanation \|
	\| GET \| /health \| Health check \|

	## Database Models

	* Session — Candidate batch container
	* JobDescription — Stores JD text and parsed requirements
	* Candidate — Stores profile, skills, work history, embeddings
	* MatchResult — Stores scores, gaps, explanations, weights

	## Authentication & Security

	* No formal authentication yet
	* CORS allows all origins
	* Minimal admin utility route exists

	## State Management

	* React Hooks (`useState`, `useEffect`, `useCallback`)
	* Local storage for persistence
	* Redis for backend caching

	## Caching & Performance

	* Cached match results by `jd_id + session_id`
	* Models pre-downloaded into Docker image
	* SQLAlchemy cache tuned for Neon pooling

	## Setup & Installation

	### Run Locally

	```bash
	docker-compose up --build
	```

	### Database Migration

	```bash
	cd backend
	alembic upgrade head
	```

	## Environment Variables

	```env
	DATABASE_URL=
	QDRANT_URL=
	QDRANT_API_KEY=
	REDIS_URL=
	GROQ_API_KEY=
	GROQ_MODEL=
	EMBEDDING_MODEL=
	RERANKER_MODEL=
	NEXT_PUBLIC_API_URL=
	```

	## Deployment

	* Multi-stage Docker build
	* Runs FastAPI + Next.js + Celery + Nginx
	* Optimized for HuggingFace Spaces
	* Exposes port `7860`

	## Improvement Recommendations

	* Add JWT auth + RBAC
	* Replace polling with WebSockets / SSE
	* Add object storage
	* Add automated tests
	* Add observability & metrics

	## Quick Summary

	TalentPulse combines semantic search, reranking, and LLM reasoning to help recruiters identify the best candidates faster, with explainable AI-powered hiring workflows.