ketannnn commited on
Commit
72d1c14
·
1 Parent(s): c70669c

feat: implement multi-stage candidate ingestion and matching pipeline with UI tracking and backend schema support

Browse files
README.md CHANGED
@@ -8,155 +8,284 @@ pinned: false
8
  app_port: 7860
9
  ---
10
 
11
- # TalentPulse — AI Candidate Matching System
12
 
13
- ## Project Overview
14
- **TalentPulse** is a production-grade, two-stage AI pipeline for matching job descriptions (JDs) against large candidate pools. The system provides immense business value to technical recruiters and hiring managers by replacing manual resume screening with semantic vector search and neural reranking. It enables session-based candidate batching, career trajectory scoring, and LLM-generated explanations grounded in structured gap analysis.
 
 
 
 
 
 
15
 
16
  ## Key Features
17
- * **Session-based Architecture**: Candidates are uploaded to named sessions, allowing JDs to be matched against specific candidate batches independently for A/B testing and organized workflows.
18
- * **Two-Stage AI Matching Pipeline**:
19
- * *Stage 1 (Retrieval)*: Fast bi-encoder vector search in Qdrant (~50-100ms) combined with weighted structured scoring (skill overlap, years of experience, etc.).
20
- * *Stage 2 (Reranking)*: Cross-encoder reranking jointly re-scores the top-50 shortlist, fused via Reciprocal Rank Fusion.
21
- * **Live Weight Sliders**: Users can dynamically adjust the weights of scoring components (e.g., semantic vs. skills). This triggers a pure in-memory rerank returning in <100ms without new model inference.
22
- * **Structured Gap Analysis & LLM Explanations**: The system pre-computes missing skills, experience gaps, and location mismatches. A Groq LLM generates explanations directly grounded in this data.
23
- * **Trajectory Scoring**: Computes career growth velocity from work history timelines, rewarding fast promotions at funded product companies.
24
- * **JD Quality Feedback**: Evaluates Job Descriptions for vagueness, breadth, and missing signals.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## Tech Stack
27
- | Component | Technology |
28
- | :--- | :--- |
29
- | **Backend Framework** | FastAPI (Python 3.11/3.12), Uvicorn |
30
- | **Frontend Framework** | Next.js 16 (Node.js 20), React, Tailwind CSS v4 |
31
- | **Database & ORM** | Neon Postgres (Asyncpg), SQLAlchemy, Alembic |
32
- | **Vector Database** | Qdrant Cloud |
33
- | **Task Queue & Cache**| Celery, Redis Cloud |
34
- | **Embedding Model** | `BAAI/bge-small-en-v1.5` (Local CPU via SentenceTransformers) |
35
- | **Reranker Model** | `BAAI/bge-reranker-v2-m3` (Local CPU via FlagEmbedding) |
36
- | **LLM Provider** | Groq (`llama-3.3-70b-versatile`) |
37
- | **Infrastructure** | Docker, Nginx, Supervisord (HuggingFace Spaces deployment) |
 
 
38
 
39
  ## Architecture Overview
40
- * **Frontend to Backend Flow**: The Next.js frontend communicates with the FastAPI backend via REST API calls routed through an Nginx reverse proxy.
41
- * **Data & Async Flow**: Uploaded candidates are sent to an async Celery worker queue backed by Redis. Workers extract text, embed it using SentenceTransformers, and store vectors in Qdrant and relational data in Postgres.
42
- * **API Flow**: Complex matching requests retrieve candidates from Qdrant, re-score them using the FlagReranker model locally, and cache the finalized matches in Redis.
43
- * **File Handling**: Handled via multipart file uploads into the FastAPI server and processed in memory/chunks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Project Structure
 
46
  ```text
47
  /
48
  ├── backend/
49
- │ ├── alembic/ # Database migrations
50
  │ ├── src/
51
- │ │ ├── matching/ # Stage 1 retrieval, Stage 2 reranking, LLM logic
52
- │ │ ├── ml/ # Embedding models, feature building, cross-encoders
53
- │ │ ├── models/ # SQLAlchemy ORM definitions
54
- │ │ ├── routers/ # FastAPI endpoints
55
- │ │ ├── schemas/ # Pydantic validation schemas
56
- │ │ └── workers/ # Celery tasks (ingest, explanations)
57
- ── main.py # FastAPI application entry point
 
58
  ├── frontend/
59
- │ ├── public/ # Static assets (SVGs)
60
- │ ├── src/app/ # Next.js App Router pages (jds, sessions, pipeline)
61
- ── src/lib/api.ts # API client wrappers
62
- ── docker-compose.yml # Local development compose configuration
63
- ├── Dockerfile # Multi-stage build for production HuggingFace deployment
64
- ── supervisord.conf # Process manager for containerized backend, frontend, and workers
65
- ── nginx.conf # Reverse proxy configuration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ```
67
 
68
- ## Backend Documentation
69
- * **Modules**: The backend is divided into specialized modules for `ml` (models and feature extractors), `matching` (retrieval and scoring logic), and `workers` (Celery background tasks).
70
- * **Routers**: Divided into `sessions.py`, `jds.py`, `candidates.py`, and `matching.py`.
71
- * **Controllers & Services**: Heavy logic is offloaded to ML utility scripts like `stage1_retrieve` and `stage2_rerank`. Explanation generation utilizes a revolving list of Groq API keys to manage limits.
72
- * **Middleware**: CORS middleware is configured to allow all origins (`*`).
73
- * **Jobs**: Managed via Celery workers for tasks like `ingest_candidates_batch` and `generate_top_explanations`.
74
-
75
- ## Frontend Documentation
76
- * **Pages & Layouts**: Uses Next.js App Router (`src/app/`). Key pages include `sessions/page.tsx` (Candidate pools), `jds/[id]/page.tsx` (JD detail and matching), and `pipeline/page.tsx` (Automated run orchestration).
77
- * **Styling**: Powered by Tailwind CSS v4 utilizing custom CSS variables defined in `globals.css`.
78
- * **API Client**: Centralized in `frontend/src/lib/api.ts` utilizing native `fetch` wrappers.
79
- * **State Management**: Native React Hooks (`useState`, `useEffect`, `useCallback`) alongside local storage for pipeline state.
80
-
81
- ## Database Design
82
- Powered by **PostgreSQL** with schema managed via **Alembic**.
83
- * **`sessions`**: Holds candidate grouping metadata (`id`, `name`, `candidate_count`).
84
- * **`job_descriptions`**: Stores JD raw text, parsed skill requirements, quality assessments, and custom scoring weights.
85
- * **`candidates`**: Extensive model tracking candidate demographics, parsed work experience (JSON), skills, generated trajectory scores (`growth_velocity`), and embeddings (`qdrant_id`).
86
- * **`match_results`**: Links JDs and Candidates. Stores stage 1 and 2 scores, gap analysis (JSON), and generated LLM explanations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ## Caching & Performance
89
- * **Cache Usage**: Redis is utilized to cache complete `/api/match` results based on `jd_id` and `session_id`.
90
- * **Optimization**: AI embedding models are pre-downloaded and baked into the Docker image universally using `HF_HOME="/app/models"`, eliminating runtime downloads.
91
- * **Database**: SQLAlchemy's internal prepared statement cache is explicitly disabled to work properly with Asyncpg and Neon Db pools.
92
-
93
- ## Storage & File Handling
94
- * **Uploads**: Users upload CSV/JSON candidate files directly to the API endpoint `/api/candidates/upload` via `FormData`. File processing is dispatched to Celery.
95
-
96
- ## Real-Time Features
97
- * **Polling**: The frontend polls the backend's `/api/candidates/status/{task_id}` endpoint every 3 seconds to update the UI on Celery ingest and vector embedding progress.
98
-
99
- ## Authentication & Authorization
100
- * **Current State**: The API accepts all origins without authenticated session barriers in the current logic. No formal login or RBAC is implemented.
101
-
102
- ## Main User Flows
103
- 1. **Candidate Pool Creation**: User creates a Session -> Uploads a CSV -> Celery `ingest_candidates_batch` parses resumes, extracts growth velocity, embeddings via SentenceTransformers, and saves points to Qdrant/Postgres.
104
- 2. **Matching Pipeline**: User submits JD -> System runs `parse_jd_requirements` -> User views JD Detail -> Triggers Match. The backend queries Qdrant (Stage 1) -> Reranks top candidates via Cross-Encoder (Stage 2) -> Applies Rank Fusion.
105
- 3. **Explain & Refine**: User clicks a matched candidate -> Triggers async Groq LLM assessment comparing candidate gaps to JD requirements. User drags weight sliders -> triggers purely in-memory rerank.
106
-
107
- ## API Reference
108
- | Method | Path | Purpose |
109
- | :--- | :--- | :--- |
110
- | **POST** | `/api/sessions` | Create a candidate session. |
111
- | **GET** | `/api/sessions` | List all sessions. |
112
- | **POST** | `/api/jds` | Create a Job Description. |
113
- | **POST** | `/api/candidates/upload?session_id=` | Upload Candidate files to a session. |
114
- | **POST** | `/api/match/{jd_id}?session_id=` | Trigger full Qdrant Retrieval + Reranking pipeline. |
115
- | **POST** | `/api/match/{jd_id}/rerank` | Rescore candidates purely in memory using new weights. |
116
- | **POST** | `/api/match/{jd_id}/candidates/{candidate_id}/explain` | Generate LLM explanation. |
117
- | **GET** | `/api/candidates/status/{task_id}` | Poll Celery task status. |
118
-
119
- ## Setup Instructions
120
- 1. **Install & Clone**: Ensure Docker and Node.js are installed.
121
- 2. **Env Config**: Create a `.env` in `backend/` and populate it with `DATABASE_URL`, `REDIS_URL`, `QDRANT_URL`, `QDRANT_API_KEY`, and `GROQ_API_KEY`.
122
- 3. **Run Locally (Docker Compose)**:
123
- For local development with an external Postgres/Qdrant instance, run:
124
- ```bash
125
- docker-compose up --build
126
- ```
127
- This spins up the FastAPI backend on port `8000`, Next.js on `3000`, and a Celery worker.
128
- 4. **Database Migrations**:
129
- Apply Alembic schemas:
130
- ```bash
131
- cd backend
132
- alembic upgrade head
133
- ```
134
- *(Alternatively, you can wipe and recreate the db using `python clean_db.py`)*
135
 
136
  ## Environment Variables
137
- * `DATABASE_URL`: Postgres connection string (should use `postgresql+asyncpg` internally).
138
- * `QDRANT_URL` / `QDRANT_API_KEY`: Credentials for Qdrant Vector Cloud.
139
- * `REDIS_URL`: Redis connection used for Celery task queuing and matching cache.
140
- * `GROQ_API_KEY`: API Key(s) for LLM generation. Comma-separated list for cycling limits.
141
- * `GROQ_MODEL`: Recommended `llama-3.3-70b-versatile`.
142
- * `NEXT_PUBLIC_API_URL`: Frontend configuration targeting backend (e.g., `http://localhost:8000`).
143
-
144
- ## Scripts
145
- * **`clean_db.py`**: Drops the public schema natively and recreates it. A fast teardown utility.
146
- * **Docker Build**: Uses `npm run build` to generate Next.js `.next/standalone` outputs.
147
-
148
- ## Deployment Notes
149
- The application is designed to be deployed as a unified Docker container, specifically optimized for **HuggingFace Spaces**.
150
- * **Build Process**: A multi-stage `Dockerfile` compiles the Next.js frontend into standalone static files, then installs Python 3.11, Nginx, Node, and Supervisord into the final image.
151
- * **AI Pre-baking**: BAAI Embedding and Reranker models are downloaded during the image build step to `/app/models` to ensure instantaneous startup without runtime downloading.
152
- * **Routing**: Exposes port `7860`. Supervisord manages Uvicorn (FastAPI), Node (Next.js server), Nginx (Reverse Proxy), and Celery worker simultaneously inside the single container.
153
-
154
- ## Troubleshooting
155
- * **Alembic asyncpg URLs**: If Alembic fails, ensure `DATABASE_URL` is cleaned. The `env.py` automatically converts `postgresql://` to `postgresql+asyncpg://` and strips SSL queries if needed.
156
- * **Nginx Permissions**: The Dockerfile aggressively configures `/tmp` directories for Nginx and runs the container as `appuser` (UID 1000) to comply with non-root hosting requirements.
157
- * **Database Prepared Statements Warning**: Neon Serverless Postgres pools require disabling SQLAlchemy's statement cache. This is configured natively in `database.py`.
158
-
159
- ## Future Improvements
160
- * **Authentication**: Add JWT token generation and a robust User/RBAC data model since the endpoints currently lack auth middleware.
161
- * **Real-time Streaming**: Replace aggressive client-side polling loops with Websockets or Server-Sent Events (SSE) to broadcast pipeline events.
162
- * **Object Storage Integration**: Offload raw parsed resumes and CSVs to an S3-compatible service to prevent local container bloating.
 
 
 
 
 
 
 
8
  app_port: 7860
9
  ---
10
 
 
11
 
12
+ # TalentPulse: AI-Powered Candidate Matching System
13
+
14
+
15
+ ## Overview
16
+
17
+ TalentPulse is a production-grade, full-stack AI system for matching job descriptions against large candidate pools. It replaces manual resume screening with semantic retrieval, neural reranking, structured gap analysis, and LLM-generated explanations.
18
+
19
+ The platform is built for recruiters and hiring teams who need fast, explainable, and configurable candidate matching. It supports session-based candidate batches, dynamic scoring weights, trajectory analysis, and reusable matching workflows for A/B testing and precision hiring.
20
 
21
  ## Key Features
22
+
23
+ ### Session-based Candidate Management
24
+ Group candidates into named sessions for isolated workflows and repeatable matching experiments.
25
+
26
+ ### Two-stage AI Matching Pipeline
27
+ - **Stage 1: Retrieval** Fast vector search in Qdrant with structured scoring for skills, experience, and other signals.
28
+ - **Stage 2: Reranking** Cross-encoder reranking of the shortlist, fused with Reciprocal Rank Fusion.
29
+
30
+ ### Live Weight Sliders
31
+ Adjust matching priorities in real time and rerank results in memory without running new model inference.
32
+
33
+ ### Structured Gap Analysis
34
+ Detect missing skills, experience gaps, and mismatches to generate grounded candidate explanations.
35
+
36
+ ### LLM-generated Explanations
37
+ Use Groq-powered LLM responses based on the precomputed gap analysis.
38
+
39
+ ### Trajectory Scoring
40
+ Estimate career growth velocity from work history and reward strong advancement patterns.
41
+
42
+ ### JD Quality Feedback
43
+ Evaluate job descriptions for clarity, breadth, and missing signals.
44
 
45
  ## Tech Stack
46
+
47
+ | Layer | Technology |
48
+ |------|------------|
49
+ | Frontend | Next.js 16, React, Tailwind CSS v4 |
50
+ | Backend | FastAPI, Uvicorn |
51
+ | Database | Neon Postgres, Asyncpg, SQLAlchemy, Alembic |
52
+ | Vector Search | Qdrant Cloud |
53
+ | Async Jobs | Celery |
54
+ | Cache | Redis Cloud |
55
+ | Embeddings | BAAI/bge-small-en-v1.5 via SentenceTransformers |
56
+ | Reranking | BAAI/bge-reranker-v2-m3 via FlagEmbedding |
57
+ | LLM Provider | Groq (llama-3.3-70b-versatile) |
58
+ | Deployment | Docker, Nginx, Supervisord, HuggingFace Spaces |
59
 
60
  ## Architecture Overview
61
+
62
+ ```mermaid
63
+ graph TD
64
+ UI[Next.js Frontend] -->|REST API| Proxy[Nginx Reverse Proxy]
65
+ Proxy --> API[FastAPI Backend]
66
+
67
+ API -->|Async Tasks| Queue[Redis / Celery Queue]
68
+ Queue --> Worker[Celery Workers]
69
+
70
+ API -->|Read / Write| DB[(Neon Postgres)]
71
+ Worker -->|Persist Metadata| DB
72
+
73
+ API -->|Vector Search| VectorDB[(Qdrant Cloud)]
74
+ Worker -->|Store Embeddings| VectorDB
75
+
76
+ API -->|In-Memory Rerank| LocalAI[Local Reranker Model]
77
+ API -->|LLM Explanations| LLM[Groq API]
78
+ Worker -->|LLM Jobs| LLM
79
+ ````
80
 
81
  ## Project Structure
82
+
83
  ```text
84
  /
85
  ├── backend/
86
+ │ ├── alembic/
87
  │ ├── src/
88
+ │ │ ├── matching/
89
+ │ │ ├── ml/
90
+ │ │ ├── models/
91
+ │ │ ├── routers/
92
+ │ │ ├── schemas/
93
+ │ │ └── workers/
94
+ ── main.py
95
+ │ └── requirements.txt
96
  ├── frontend/
97
+ │ ├── public/
98
+ │ ├── src/
99
+ │ ├── app/
100
+ │ │ └── lib/
101
+ ├── next.config.ts
102
+ │ └── globals.css
103
+ ── docker-compose.yml
104
+ ├── Dockerfile
105
+ ├── supervisord.conf
106
+ └── nginx.conf
107
+ ```
108
+
109
+ ## Core Modules & Responsibilities
110
+
111
+ ### Backend
112
+
113
+ * **backend/src/ml**
114
+ Handles model loading, text embedding, and feature extraction.
115
+
116
+ * **backend/src/matching**
117
+ Implements retrieval, reranking, weighted scoring, and explanation logic.
118
+
119
+ * **backend/src/workers**
120
+ Runs background jobs such as candidate ingestion and explanation generation.
121
+
122
+ * **backend/src/routers**
123
+ Exposes API endpoints for sessions, JDs, candidates, matching, and health checks.
124
+
125
+ ### Frontend
126
+
127
+ * **frontend/src/app**
128
+ Contains user-facing routes such as sessions, JD details, and pipeline orchestration.
129
+
130
+ * **frontend/src/lib**
131
+ Centralized API client wrappers.
132
+
133
+ ## Application Flows
134
+
135
+ ### Candidate Upload & Ingestion Flow
136
+
137
+ ```mermaid
138
+ sequenceDiagram
139
+ actor User
140
+ participant UI as Next.js UI
141
+ participant API as FastAPI Router
142
+ participant Queue as Redis / Celery Queue
143
+ participant Worker as Celery Worker
144
+ participant Store as Postgres + Qdrant
145
+
146
+ User->>UI: Upload candidate CSV/JSON
147
+ UI->>API: POST /api/candidates/upload
148
+ API->>Queue: Dispatch ingest_candidates_batch
149
+ API-->>UI: Return task ID
150
+ UI->>API: Poll /api/candidates/status/{task_id}
151
+ Worker->>Queue: Fetch task
152
+ Worker->>Worker: Parse candidate data
153
+ Worker->>Worker: Compute embeddings and growth velocity
154
+ Worker->>Store: Save metadata and vector points
155
+ Worker-->>Queue: Mark task complete
156
+ API-->>UI: Return success status
157
  ```
158
 
159
+ ### Matching & Reranking Flow
160
+
161
+ ```mermaid
162
+ sequenceDiagram
163
+ actor User
164
+ participant UI as Next.js UI
165
+ participant API as FastAPI Router
166
+ participant Qdrant as Vector DB
167
+ participant Reranker as Local Reranker
168
+ participant Cache as Redis Cache
169
+
170
+ User->>UI: Open JD and click Match
171
+ UI->>API: POST /api/match/{jd_id}
172
+ API->>Qdrant: Retrieve top candidates
173
+ Qdrant-->>API: Return top-K vectors
174
+ API->>Reranker: Cross-encoder reranking
175
+ Reranker-->>API: Return adjusted scores
176
+ API->>API: Apply rank fusion and weights
177
+ API->>Cache: Store result
178
+ API-->>UI: Return ranked candidates
179
+
180
+ User->>UI: Adjust weight sliders
181
+ UI->>API: POST /api/match/{jd_id}/rerank
182
+ API->>API: Recompute ranking in memory
183
+ API-->>UI: Return updated ordering
184
+ ```
185
+
186
+ ### Explain & Refine Flow
187
+
188
+ ```mermaid
189
+ sequenceDiagram
190
+ actor User
191
+ participant UI as Next.js UI
192
+ participant API as FastAPI Router
193
+ participant DB as Postgres
194
+ participant LLM as Groq API
195
+
196
+ User->>UI: Open candidate match details
197
+ UI->>API: POST /api/match/{jd_id}/candidates/{candidate_id}/explain
198
+ API->>DB: Load match data and gap analysis
199
+ API->>LLM: Generate grounded explanation
200
+ LLM-->>API: Return explanation text
201
+ API-->>UI: Show explanation to user
202
+ ```
203
+
204
+ ## API Documentation
205
+
206
+ | Method | Path | Purpose |
207
+ | ------ | ---------------------------------------------------- | -------------------------- |
208
+ | POST | /api/sessions | Create a candidate session |
209
+ | GET | /api/sessions | List sessions |
210
+ | POST | /api/jds | Create a job description |
211
+ | GET | /api/jds | List job descriptions |
212
+ | POST | /api/candidates/upload?session_id= | Upload candidate files |
213
+ | GET | /api/candidates/status/{task_id} | Check task progress |
214
+ | POST | /api/match/{jd_id}?session_id= | Run full matching pipeline |
215
+ | POST | /api/match/{jd_id}/rerank | Rerank in memory |
216
+ | POST | /api/match/{jd_id}/candidates/{candidate_id}/explain | Generate explanation |
217
+ | GET | /health | Health check |
218
+
219
+ ## Database Models
220
+
221
+ * **Session** — Candidate batch container
222
+ * **JobDescription** — Stores JD text and parsed requirements
223
+ * **Candidate** — Stores profile, skills, work history, embeddings
224
+ * **MatchResult** — Stores scores, gaps, explanations, weights
225
+
226
+ ## Authentication & Security
227
+
228
+ * No formal authentication yet
229
+ * CORS allows all origins
230
+ * Minimal admin utility route exists
231
+
232
+ ## State Management
233
+
234
+ * React Hooks (`useState`, `useEffect`, `useCallback`)
235
+ * Local storage for persistence
236
+ * Redis for backend caching
237
 
238
  ## Caching & Performance
239
+
240
+ * Cached match results by `jd_id + session_id`
241
+ * Models pre-downloaded into Docker image
242
+ * SQLAlchemy cache tuned for Neon pooling
243
+
244
+ ## Setup & Installation
245
+
246
+ ### Run Locally
247
+
248
+ ```bash
249
+ docker-compose up --build
250
+ ```
251
+
252
+ ### Database Migration
253
+
254
+ ```bash
255
+ cd backend
256
+ alembic upgrade head
257
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
258
 
259
  ## Environment Variables
260
+
261
+ ```env
262
+ DATABASE_URL=
263
+ QDRANT_URL=
264
+ QDRANT_API_KEY=
265
+ REDIS_URL=
266
+ GROQ_API_KEY=
267
+ GROQ_MODEL=
268
+ EMBEDDING_MODEL=
269
+ RERANKER_MODEL=
270
+ NEXT_PUBLIC_API_URL=
271
+ ```
272
+
273
+ ## Deployment
274
+
275
+ * Multi-stage Docker build
276
+ * Runs FastAPI + Next.js + Celery + Nginx
277
+ * Optimized for HuggingFace Spaces
278
+ * Exposes port `7860`
279
+
280
+ ## Improvement Recommendations
281
+
282
+ * Add JWT auth + RBAC
283
+ * Replace polling with WebSockets / SSE
284
+ * Add object storage
285
+ * Add automated tests
286
+ * Add observability & metrics
287
+
288
+ ## Quick Summary
289
+
290
+ TalentPulse combines semantic search, reranking, and LLM reasoning to help recruiters identify the best candidates faster, with explainable AI-powered hiring workflows.
291
+
backend/src/routers/candidates.py CHANGED
@@ -15,7 +15,7 @@ from ..workers.ingest import ingest_candidates_batch
15
 
16
  router = APIRouter()
17
 
18
- BATCH_SIZE = 100
19
 
20
 
21
  @router.post("/upload", response_model=UploadResponse)
@@ -65,6 +65,7 @@ async def upload_candidates(
65
 
66
  return UploadResponse(
67
  task_id=task_ids[0] if task_ids else "",
 
68
  queued=len(rows),
69
  message=f"Queued {len(rows)} candidates across {len(task_ids)} batches",
70
  )
 
15
 
16
  router = APIRouter()
17
 
18
+ BATCH_SIZE = 500 # Large enough to keep typical uploads in one batch
19
 
20
 
21
  @router.post("/upload", response_model=UploadResponse)
 
65
 
66
  return UploadResponse(
67
  task_id=task_ids[0] if task_ids else "",
68
+ task_ids=task_ids,
69
  queued=len(rows),
70
  message=f"Queued {len(rows)} candidates across {len(task_ids)} batches",
71
  )
backend/src/schemas/candidate.py CHANGED
@@ -26,7 +26,8 @@ class CandidateResponse(BaseModel):
26
 
27
 
28
  class UploadResponse(BaseModel):
29
- task_id: str
 
30
  queued: int
31
  message: str
32
 
 
26
 
27
 
28
  class UploadResponse(BaseModel):
29
+ task_id: str # First task ID (backward compat)
30
+ task_ids: list[str] = [] # ALL task IDs — poll all to confirm full ingestion
31
  queued: int
32
  message: str
33
 
frontend/src/app/pipeline/page.tsx CHANGED
@@ -24,7 +24,7 @@ const DEFAULT_STATE: PipelineState = { status: "idle", sessionName: "", jdsInfo:
24
 
25
  export default function PipelinePage() {
26
  const router = useRouter();
27
-
28
  // Pipeline definition
29
  const steps = [
30
  { id: "idle", label: "Configure Run", icon: "📝" },
@@ -47,7 +47,7 @@ export default function PipelinePage() {
47
  // Architecture state
48
  const [state, setState] = useState<PipelineState>(DEFAULT_STATE);
49
  const [error, setError] = useState<string | null>(null);
50
-
51
  const timerRef = useRef<ReturnType<typeof setInterval> | null>(null);
52
 
53
  useEffect(() => {
@@ -62,7 +62,7 @@ export default function PipelinePage() {
62
  if (p.status === "embedding" && p.taskId) pollEmbedding(p.taskId, p);
63
  if (p.status === "matching" && p.jdIds.length > 0 && p.sessionId) runMatches(p.jdIds, p.sessionId, p);
64
  }
65
- } catch (e) {}
66
  }
67
  }, []);
68
 
@@ -72,8 +72,8 @@ export default function PipelinePage() {
72
  if (!timerRef.current && state.startTime) {
73
  timerRef.current = setInterval(() => {
74
  setState(s => {
75
- if (s.status === "idle" || s.status === "complete") return s;
76
- return { ...s, elapsedTime: Math.floor((Date.now() - (s.startTime || Date.now())) / 1000) };
77
  });
78
  }, 1000);
79
  }
@@ -83,7 +83,7 @@ export default function PipelinePage() {
83
  timerRef.current = null;
84
  }
85
  if (state.status === "complete") {
86
- localStorage.removeItem("talentpulse_pipeline");
87
  }
88
  }
89
  }, [state.status, state.startTime]);
@@ -156,21 +156,24 @@ export default function PipelinePage() {
156
  // 1. Create Session first
157
  const session = await api.createSession(sessionName, "Automated Candidate Batch Ingestion");
158
  const sessionIdStr = (session as any).id;
159
-
160
  // 2. Create JDs scoped to that session
161
  const jdPromises = jds.map(jd => api.createJD(jd.title, jd.desc, sessionIdStr));
162
  const createdJDs = await Promise.all(jdPromises);
163
  const jdIds = createdJDs.map(j => (j as any).id);
164
-
165
  updateState({ sessionId: sessionIdStr, jdIds });
166
 
167
- // 3. Upload file
168
  const uploadRes = await api.uploadCandidates(file, sessionIdStr);
169
-
 
 
 
170
  updateState({ status: "embedding", taskId: uploadRes.task_id });
171
-
172
- // 3. Poll embedding
173
- pollEmbedding(uploadRes.task_id, { ...state, status: "embedding", sessionId: (session as any).id, jdIds, startTime: start });
174
 
175
  } catch (e: any) {
176
  setError("Pipeline failed: " + e.message);
@@ -178,17 +181,21 @@ export default function PipelinePage() {
178
  }
179
  };
180
 
181
- const pollEmbedding = async (taskId: string, currentState: PipelineState) => {
 
182
  const poll = setInterval(async () => {
183
  try {
184
- const s = await api.taskStatus(taskId);
185
- if (s.status === "SUCCESS") {
 
 
 
186
  clearInterval(poll);
187
  updateState({ status: "matching" });
188
  runMatches(currentState.jdIds, currentState.sessionId!, currentState);
189
- } else if (s.status === "FAILURE") {
190
  clearInterval(poll);
191
- setError("Vector embedding failed.");
192
  updateState({ status: "idle" });
193
  }
194
  } catch (e) {
@@ -216,7 +223,7 @@ export default function PipelinePage() {
216
  }
217
  }
218
  }
219
-
220
  if (stillPending.length > 0) {
221
  pendingJds = stillPending;
222
  setTimeout(pollMatches, 3000);
@@ -227,14 +234,14 @@ export default function PipelinePage() {
227
  const existing = JSON.parse(localStorage.getItem("tp_session_jds") || "{}");
228
  existing[currentState.sessionId!] = currentState.jdIds;
229
  localStorage.setItem("tp_session_jds", JSON.stringify(existing));
230
- } catch (e) {}
231
  }
232
  } catch (e: any) {
233
  setError("Matching failed: " + e.message);
234
  updateState({ status: "idle" });
235
  }
236
  };
237
-
238
  pollMatches();
239
  };
240
 
@@ -256,9 +263,9 @@ export default function PipelinePage() {
256
  {/* STEPPER UI */}
257
  <div className="mb-12 relative">
258
  <div className="absolute top-6 left-[10%] right-[10%] h-0.5 bg-[var(--color-border-strong)] -z-10" />
259
- <div className="absolute top-6 left-[10%] h-0.5 bg-[var(--color-brand)] -z-10 transition-all duration-700"
260
- style={{ width: `${Math.max(0, (currentStepIdx / (steps.length - 1)) * 80)}%` }} />
261
-
262
  <div className="flex justify-between relative z-10">
263
  {steps.map((step, idx) => {
264
  const isActive = state.status === step.id;
@@ -266,9 +273,9 @@ export default function PipelinePage() {
266
  return (
267
  <div key={step.id} className="flex flex-col items-center w-24">
268
  <div className={`w-12 h-12 rounded-full flex items-center justify-center text-xl mb-3 border-2 transition-all duration-500
269
- ${isActive ? 'bg-[var(--color-brand-dim)] border-[var(--color-brand-light)] text-white shadow-[0_0_20px_var(--color-brand-dim)]'
270
- : isPast ? 'bg-[var(--color-brand)] border-[var(--color-brand)] text-white'
271
- : 'bg-[var(--color-surface-2)] border-[var(--color-border-strong)] text-[var(--color-muted)] opacity-50' }`}
272
  >
273
  {step.icon}
274
  </div>
@@ -302,39 +309,39 @@ export default function PipelinePage() {
302
  <div className="bg-[var(--color-card)] border border-[var(--color-border)] rounded-2xl p-8 shadow-xl shadow-black/5">
303
  <div className="mb-6">
304
  <label className="block text-xs font-bold text-[var(--color-muted)] mb-2 uppercase tracking-wider">Candidate Batch Name</label>
305
- <input type="text" placeholder="e.g. Q3 Engineering Batch (100k)"
306
  className="w-full bg-[var(--color-surface-2)] border border-[var(--color-border-strong)] rounded-xl px-4 py-3 text-sm outline-none focus:border-[var(--color-brand)] transition-all"
307
  value={sessionName} onChange={e => setSessionName(e.target.value)} />
308
  </div>
309
 
310
  <div className="mb-8">
311
- <label className="block text-xs font-medium text-[var(--color-muted)] mb-2">Candidates CSV (.csv, .json)</label>
312
- <input type="file" accept=".csv,.json,.jsonl"
313
- className="w-full text-sm text-[var(--color-muted)] file:mr-4 file:py-2 file:px-4 file:rounded-xl file:border-0 file:text-sm file:font-semibold file:bg-[var(--color-brand-dim)] file:text-[var(--color-brand-light)] hover:file:bg-[var(--color-brand)] hover:file:text-white transition-all cursor-pointer border border-[var(--color-border-strong)] rounded-xl p-2"
314
- onChange={handleFileChange} />
315
- {csvRowCount > 0 && (
316
- <p className="mt-2 text-xs text-[var(--color-muted)]">
317
- 📄 Detected <strong className="text-[var(--color-brand-light)]">{csvRowCount}</strong> candidate rows (excluding header)
318
- </p>
319
- )}
320
  </div>
321
 
322
  <div className="mb-6 border-t border-[var(--color-border-strong)] pt-6">
323
  <div className="flex items-center justify-between mb-4">
324
  <label className="block text-sm font-bold text-[var(--color-text)]">Job Descriptions to Match</label>
325
- <button
326
  onClick={addJd}
327
  className="text-xs px-3 py-1.5 rounded-lg bg-[var(--color-surface-2)] border border-[var(--color-border)] text-[var(--color-muted)] hover:text-[var(--color-text)] transition-colors"
328
  >
329
  + Add Another JD
330
  </button>
331
  </div>
332
-
333
  <div className="space-y-6">
334
  {jds.map((jd, idx) => (
335
  <div key={idx} className="bg-[var(--color-surface-2)] p-4 rounded-xl border border-[var(--color-border)] relative group">
336
  {jds.length > 1 && (
337
- <button
338
  onClick={() => removeJd(idx)}
339
  className="absolute -top-2 -right-2 w-6 h-6 rounded-full bg-[var(--color-card)] border border-[var(--color-border-strong)] text-[var(--color-muted)] hover:text-red-400 hover:border-red-400 flex items-center justify-center text-xs opacity-0 group-hover:opacity-100 transition-all z-10"
340
  >
@@ -343,7 +350,7 @@ export default function PipelinePage() {
343
  )}
344
  <div className="mb-3">
345
  <label className="block text-xs font-medium text-[var(--color-muted)] mb-2">JD {idx + 1} Title</label>
346
- <input type="text" placeholder="e.g. Senior Backend Engineer"
347
  className="w-full bg-[var(--color-card)] border border-[var(--color-border-strong)] rounded-lg px-3 py-2 text-sm outline-none focus:border-[var(--color-brand)] transition-all"
348
  value={jd.title} onChange={e => updateJd(idx, "title", e.target.value)} />
349
  </div>
@@ -376,11 +383,9 @@ export default function PipelinePage() {
376
  onChange={e => setRankingCap(Number(e.target.value))}
377
  className="w-full h-2 rounded-lg appearance-none cursor-pointer"
378
  style={{
379
- background: `linear-gradient(to right, var(--color-brand) ${
380
- ((rankingCap / (csvRowCount > 0 ? csvRowCount : 200)) * 100).toFixed(1)
381
- }%, var(--color-border-strong) ${
382
- ((rankingCap / (csvRowCount > 0 ? csvRowCount : 200)) * 100).toFixed(1)
383
- }%)`
384
  }}
385
  />
386
  <div className="flex justify-between text-[10px] text-[var(--color-muted)] mt-1">
@@ -388,12 +393,12 @@ export default function PipelinePage() {
388
  <span>{csvRowCount > 0 ? csvRowCount : 200}</span>
389
  </div>
390
  {/* RAM Warning for BGE model */}
391
- <div className="mt-3 flex items-start gap-2 bg-amber-500/10 border border-amber-500/25 rounded-xl px-4 py-3">
392
  <span className="text-amber-400 text-sm mt-0.5">⚠️</span>
393
  <p className="text-xs text-amber-300/90 leading-relaxed">
394
  <strong>Hugging Face Free Tier Notice:</strong> We use <code className="font-mono bg-black/20 px-1 rounded">BAAI/bge-reranker-v2-m3</code> for neural reranking. On the free tier, this model exceeds available RAM above ~72 candidates and the backend will crash. <strong>Keep the cap at or below 72</strong> for stable results.
395
  </p>
396
- </div>
397
  </div>
398
 
399
  <button onClick={startPipeline}
@@ -403,46 +408,46 @@ export default function PipelinePage() {
403
  </div>
404
  ) : state.status === "complete" ? (
405
  <div className="text-center bg-[var(--color-card)] border border-[var(--color-border)] rounded-2xl p-10 shadow-xl shadow-black/5 animate-fade-in">
406
- <div className="text-6xl mb-4">🎉</div>
407
- <h2 className="text-2xl font-bold mb-2">Automated Inference Complete!</h2>
408
- <p className="text-[var(--color-muted)] mb-8 max-w-sm mx-auto">
409
- 100% of candidate logic calculated safely for <strong>{state.jdIds.length}</strong> Job Descriptions. The background worker is aggressively pulling LLM explanations for the top 60 right now.
410
- </p>
411
-
412
- <div className="max-w-md mx-auto bg-[var(--color-surface-2)] rounded-xl border border-[var(--color-border)] p-4 mb-6">
413
- <div className="text-xs font-bold text-[var(--color-muted)] uppercase tracking-wider mb-3 text-left">View Matches By JD:</div>
414
- <div className="space-y-2">
415
- {state.jdsInfo.map((info, idx) => (
416
- <Link
417
- key={idx}
418
- href={`/sessions/${state.sessionId}?jd_id=${state.jdIds[idx]}`}
419
- className="flex justify-between items-center bg-[var(--color-card)] hover:bg-[var(--color-card-hover)] p-3 rounded-lg border border-[var(--color-border-strong)] hover:border-[var(--color-brand)] transition-all group"
420
- >
421
- <span className="font-semibold text-sm truncate pr-4">{info.title || `Job Description ${idx + 1}`}</span>
422
- <span className="text-[10px] px-2 py-1 bg-[var(--color-brand-dim)] text-[var(--color-brand-light)] border border-[var(--color-brand-glow)] rounded-full flex-shrink-0">
423
- View Ranking →
424
- </span>
425
- </Link>
426
- ))}
427
- </div>
428
- </div>
429
-
430
- <button onClick={() => updateState({ status: "idle", jdsInfo: [{ title: "", desc: "" }], jdIds: [], sessionName: "", startTime: undefined })}
431
- className="text-xs text-[var(--color-muted)] hover:text-[var(--color-text)] underline underline-offset-2">
432
- Start a new pipeline run
433
- </button>
434
  </div>
435
  ) : (
436
  <div className="text-center bg-[var(--color-card)] border border-dashed border-[var(--color-border-strong)] rounded-2xl p-16 animate-fade-in">
437
- <div className="w-16 h-16 border-4 border-[var(--color-brand-dim)] border-t-[var(--color-brand-light)] rounded-full animate-spin mx-auto mb-6" />
438
- <h2 className="text-xl font-semibold mb-2">
439
- {state.status === "uploading" ? "Broadcasting to Postgres DB..."
440
- : state.status === "embedding" ? "Running Core CPU Vector Space Projection..."
441
- : `Executing Dual-Stage Neural Match for ${state.jdIds.length} JDs...`}
442
- </h2>
443
- <p className="text-[var(--color-dimmer)] text-sm">
444
- Do not close this tab. The timer will automatically pause and redirect upon completion.
445
- </p>
446
  </div>
447
  )}
448
  </div>
 
24
 
25
  export default function PipelinePage() {
26
  const router = useRouter();
27
+
28
  // Pipeline definition
29
  const steps = [
30
  { id: "idle", label: "Configure Run", icon: "📝" },
 
47
  // Architecture state
48
  const [state, setState] = useState<PipelineState>(DEFAULT_STATE);
49
  const [error, setError] = useState<string | null>(null);
50
+
51
  const timerRef = useRef<ReturnType<typeof setInterval> | null>(null);
52
 
53
  useEffect(() => {
 
62
  if (p.status === "embedding" && p.taskId) pollEmbedding(p.taskId, p);
63
  if (p.status === "matching" && p.jdIds.length > 0 && p.sessionId) runMatches(p.jdIds, p.sessionId, p);
64
  }
65
+ } catch (e) { }
66
  }
67
  }, []);
68
 
 
72
  if (!timerRef.current && state.startTime) {
73
  timerRef.current = setInterval(() => {
74
  setState(s => {
75
+ if (s.status === "idle" || s.status === "complete") return s;
76
+ return { ...s, elapsedTime: Math.floor((Date.now() - (s.startTime || Date.now())) / 1000) };
77
  });
78
  }, 1000);
79
  }
 
83
  timerRef.current = null;
84
  }
85
  if (state.status === "complete") {
86
+ localStorage.removeItem("talentpulse_pipeline");
87
  }
88
  }
89
  }, [state.status, state.startTime]);
 
156
  // 1. Create Session first
157
  const session = await api.createSession(sessionName, "Automated Candidate Batch Ingestion");
158
  const sessionIdStr = (session as any).id;
159
+
160
  // 2. Create JDs scoped to that session
161
  const jdPromises = jds.map(jd => api.createJD(jd.title, jd.desc, sessionIdStr));
162
  const createdJDs = await Promise.all(jdPromises);
163
  const jdIds = createdJDs.map(j => (j as any).id);
164
+
165
  updateState({ sessionId: sessionIdStr, jdIds });
166
 
167
+ // 3. Upload file — may return multiple batch task IDs for large CSVs
168
  const uploadRes = await api.uploadCandidates(file, sessionIdStr);
169
+ const allTaskIds: string[] = (uploadRes as any).task_ids?.length
170
+ ? (uploadRes as any).task_ids
171
+ : [uploadRes.task_id];
172
+
173
  updateState({ status: "embedding", taskId: uploadRes.task_id });
174
+
175
+ // Poll ALL batch tasks — only proceed to matching when every batch is done
176
+ pollEmbedding(allTaskIds, { ...state, status: "embedding", sessionId: (session as any).id, jdIds, startTime: start });
177
 
178
  } catch (e: any) {
179
  setError("Pipeline failed: " + e.message);
 
181
  }
182
  };
183
 
184
+ const pollEmbedding = async (taskIds: string | string[], currentState: PipelineState) => {
185
+ const ids = Array.isArray(taskIds) ? taskIds : [taskIds];
186
  const poll = setInterval(async () => {
187
  try {
188
+ // Check ALL batch tasks — only proceed when EVERY one is SUCCESS
189
+ const statuses = await Promise.all(ids.map(id => api.taskStatus(id)));
190
+ const allDone = statuses.every(s => s.status === "SUCCESS");
191
+ const anyFailed = statuses.some(s => s.status === "FAILURE");
192
+ if (allDone) {
193
  clearInterval(poll);
194
  updateState({ status: "matching" });
195
  runMatches(currentState.jdIds, currentState.sessionId!, currentState);
196
+ } else if (anyFailed) {
197
  clearInterval(poll);
198
+ setError("Vector embedding failed for one or more batches.");
199
  updateState({ status: "idle" });
200
  }
201
  } catch (e) {
 
223
  }
224
  }
225
  }
226
+
227
  if (stillPending.length > 0) {
228
  pendingJds = stillPending;
229
  setTimeout(pollMatches, 3000);
 
234
  const existing = JSON.parse(localStorage.getItem("tp_session_jds") || "{}");
235
  existing[currentState.sessionId!] = currentState.jdIds;
236
  localStorage.setItem("tp_session_jds", JSON.stringify(existing));
237
+ } catch (e) { }
238
  }
239
  } catch (e: any) {
240
  setError("Matching failed: " + e.message);
241
  updateState({ status: "idle" });
242
  }
243
  };
244
+
245
  pollMatches();
246
  };
247
 
 
263
  {/* STEPPER UI */}
264
  <div className="mb-12 relative">
265
  <div className="absolute top-6 left-[10%] right-[10%] h-0.5 bg-[var(--color-border-strong)] -z-10" />
266
+ <div className="absolute top-6 left-[10%] h-0.5 bg-[var(--color-brand)] -z-10 transition-all duration-700"
267
+ style={{ width: `${Math.max(0, (currentStepIdx / (steps.length - 1)) * 80)}%` }} />
268
+
269
  <div className="flex justify-between relative z-10">
270
  {steps.map((step, idx) => {
271
  const isActive = state.status === step.id;
 
273
  return (
274
  <div key={step.id} className="flex flex-col items-center w-24">
275
  <div className={`w-12 h-12 rounded-full flex items-center justify-center text-xl mb-3 border-2 transition-all duration-500
276
+ ${isActive ? 'bg-[var(--color-brand-dim)] border-[var(--color-brand-light)] text-white shadow-[0_0_20px_var(--color-brand-dim)]'
277
+ : isPast ? 'bg-[var(--color-brand)] border-[var(--color-brand)] text-white'
278
+ : 'bg-[var(--color-surface-2)] border-[var(--color-border-strong)] text-[var(--color-muted)] opacity-50'}`}
279
  >
280
  {step.icon}
281
  </div>
 
309
  <div className="bg-[var(--color-card)] border border-[var(--color-border)] rounded-2xl p-8 shadow-xl shadow-black/5">
310
  <div className="mb-6">
311
  <label className="block text-xs font-bold text-[var(--color-muted)] mb-2 uppercase tracking-wider">Candidate Batch Name</label>
312
+ <input type="text" placeholder="e.g. Q3 Engineering Batch (100k)"
313
  className="w-full bg-[var(--color-surface-2)] border border-[var(--color-border-strong)] rounded-xl px-4 py-3 text-sm outline-none focus:border-[var(--color-brand)] transition-all"
314
  value={sessionName} onChange={e => setSessionName(e.target.value)} />
315
  </div>
316
 
317
  <div className="mb-8">
318
+ <label className="block text-xs font-medium text-[var(--color-muted)] mb-2">Candidates CSV (.csv, .json)</label>
319
+ <input type="file" accept=".csv,.json,.jsonl"
320
+ className="w-full text-sm text-[var(--color-muted)] file:mr-4 file:py-2 file:px-4 file:rounded-xl file:border-0 file:text-sm file:font-semibold file:bg-[var(--color-brand-dim)] file:text-[var(--color-brand-light)] hover:file:bg-[var(--color-brand)] hover:file:text-white transition-all cursor-pointer border border-[var(--color-border-strong)] rounded-xl p-2"
321
+ onChange={handleFileChange} />
322
+ {csvRowCount > 0 && (
323
+ <p className="mt-2 text-xs text-[var(--color-muted)]">
324
+ 📄 Detected <strong className="text-[var(--color-brand-light)]">{csvRowCount}</strong> candidate rows (excluding header)
325
+ </p>
326
+ )}
327
  </div>
328
 
329
  <div className="mb-6 border-t border-[var(--color-border-strong)] pt-6">
330
  <div className="flex items-center justify-between mb-4">
331
  <label className="block text-sm font-bold text-[var(--color-text)]">Job Descriptions to Match</label>
332
+ <button
333
  onClick={addJd}
334
  className="text-xs px-3 py-1.5 rounded-lg bg-[var(--color-surface-2)] border border-[var(--color-border)] text-[var(--color-muted)] hover:text-[var(--color-text)] transition-colors"
335
  >
336
  + Add Another JD
337
  </button>
338
  </div>
339
+
340
  <div className="space-y-6">
341
  {jds.map((jd, idx) => (
342
  <div key={idx} className="bg-[var(--color-surface-2)] p-4 rounded-xl border border-[var(--color-border)] relative group">
343
  {jds.length > 1 && (
344
+ <button
345
  onClick={() => removeJd(idx)}
346
  className="absolute -top-2 -right-2 w-6 h-6 rounded-full bg-[var(--color-card)] border border-[var(--color-border-strong)] text-[var(--color-muted)] hover:text-red-400 hover:border-red-400 flex items-center justify-center text-xs opacity-0 group-hover:opacity-100 transition-all z-10"
347
  >
 
350
  )}
351
  <div className="mb-3">
352
  <label className="block text-xs font-medium text-[var(--color-muted)] mb-2">JD {idx + 1} Title</label>
353
+ <input type="text" placeholder="e.g. Senior Backend Engineer"
354
  className="w-full bg-[var(--color-card)] border border-[var(--color-border-strong)] rounded-lg px-3 py-2 text-sm outline-none focus:border-[var(--color-brand)] transition-all"
355
  value={jd.title} onChange={e => updateJd(idx, "title", e.target.value)} />
356
  </div>
 
383
  onChange={e => setRankingCap(Number(e.target.value))}
384
  className="w-full h-2 rounded-lg appearance-none cursor-pointer"
385
  style={{
386
+ background: `linear-gradient(to right, var(--color-brand) ${((rankingCap / (csvRowCount > 0 ? csvRowCount : 200)) * 100).toFixed(1)
387
+ }%, var(--color-border-strong) ${((rankingCap / (csvRowCount > 0 ? csvRowCount : 200)) * 100).toFixed(1)
388
+ }%)`
 
 
389
  }}
390
  />
391
  <div className="flex justify-between text-[10px] text-[var(--color-muted)] mt-1">
 
393
  <span>{csvRowCount > 0 ? csvRowCount : 200}</span>
394
  </div>
395
  {/* RAM Warning for BGE model */}
396
+ {/* <div className="mt-3 flex items-start gap-2 bg-amber-500/10 border border-amber-500/25 rounded-xl px-4 py-3">
397
  <span className="text-amber-400 text-sm mt-0.5">⚠️</span>
398
  <p className="text-xs text-amber-300/90 leading-relaxed">
399
  <strong>Hugging Face Free Tier Notice:</strong> We use <code className="font-mono bg-black/20 px-1 rounded">BAAI/bge-reranker-v2-m3</code> for neural reranking. On the free tier, this model exceeds available RAM above ~72 candidates and the backend will crash. <strong>Keep the cap at or below 72</strong> for stable results.
400
  </p>
401
+ </div> */}
402
  </div>
403
 
404
  <button onClick={startPipeline}
 
408
  </div>
409
  ) : state.status === "complete" ? (
410
  <div className="text-center bg-[var(--color-card)] border border-[var(--color-border)] rounded-2xl p-10 shadow-xl shadow-black/5 animate-fade-in">
411
+ <div className="text-6xl mb-4">🎉</div>
412
+ <h2 className="text-2xl font-bold mb-2">Automated Inference Complete!</h2>
413
+ <p className="text-[var(--color-muted)] mb-8 max-w-sm mx-auto">
414
+ 100% of candidate logic calculated safely for <strong>{state.jdIds.length}</strong> Job Descriptions. The background worker is aggressively pulling LLM explanations for the top 60 right now.
415
+ </p>
416
+
417
+ <div className="max-w-md mx-auto bg-[var(--color-surface-2)] rounded-xl border border-[var(--color-border)] p-4 mb-6">
418
+ <div className="text-xs font-bold text-[var(--color-muted)] uppercase tracking-wider mb-3 text-left">View Matches By JD:</div>
419
+ <div className="space-y-2">
420
+ {state.jdsInfo.map((info, idx) => (
421
+ <Link
422
+ key={idx}
423
+ href={`/sessions/${state.sessionId}?jd_id=${state.jdIds[idx]}`}
424
+ className="flex justify-between items-center bg-[var(--color-card)] hover:bg-[var(--color-card-hover)] p-3 rounded-lg border border-[var(--color-border-strong)] hover:border-[var(--color-brand)] transition-all group"
425
+ >
426
+ <span className="font-semibold text-sm truncate pr-4">{info.title || `Job Description ${idx + 1}`}</span>
427
+ <span className="text-[10px] px-2 py-1 bg-[var(--color-brand-dim)] text-[var(--color-brand-light)] border border-[var(--color-brand-glow)] rounded-full flex-shrink-0">
428
+ View Ranking →
429
+ </span>
430
+ </Link>
431
+ ))}
432
+ </div>
433
+ </div>
434
+
435
+ <button onClick={() => updateState({ status: "idle", jdsInfo: [{ title: "", desc: "" }], jdIds: [], sessionName: "", startTime: undefined })}
436
+ className="text-xs text-[var(--color-muted)] hover:text-[var(--color-text)] underline underline-offset-2">
437
+ Start a new pipeline run
438
+ </button>
439
  </div>
440
  ) : (
441
  <div className="text-center bg-[var(--color-card)] border border-dashed border-[var(--color-border-strong)] rounded-2xl p-16 animate-fade-in">
442
+ <div className="w-16 h-16 border-4 border-[var(--color-brand-dim)] border-t-[var(--color-brand-light)] rounded-full animate-spin mx-auto mb-6" />
443
+ <h2 className="text-xl font-semibold mb-2">
444
+ {state.status === "uploading" ? "Broadcasting to Postgres DB..."
445
+ : state.status === "embedding" ? "Running Core CPU Vector Space Projection..."
446
+ : `Executing Dual-Stage Neural Match for ${state.jdIds.length} JDs...`}
447
+ </h2>
448
+ <p className="text-[var(--color-dimmer)] text-sm">
449
+ Do not close this tab. The timer will automatically pause and redirect upon completion.
450
+ </p>
451
  </div>
452
  )}
453
  </div>