File size: 25,072 Bytes
9fad5d1
f316ebe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1012d4d
9b4d6bb
1012d4d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b4d6bb
1012d4d
 
 
 
7da2641
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
# AI Recruiting Assistant — Guide Book (Updated)

## 0) Overview

### What this tool does

This AI Recruiting Assistant is a **decision-support** system that helps recruiters and hiring managers:

* Extract **job requirements** from a job description (JD)
* Evaluate resumes against **verified requirements** using **evidence-based** matching
* Assess job-relevant **culture/working-style signals** using retrieved company documents
* Run **factuality checks** to detect ungrounded claims
* Run a **bias & fairness audit** across the JD, analyses, and the model’s final recommendation

### The problem it addresses

Recruiting teams often face three recurring issues when using AI:

1. **Hallucinated requirements**: LLMs may “invent” skills that are not explicitly required.
2. **Opaque scoring**: Many tools produce fit scores without clearly showing evidence.
3. **Bias risks**: Hiring language and reasoning can leak pedigree/class proxies or subjective criteria.

This tool addresses those issues by enforcing:

* **Deterministic verification gates** (requirements are verified before scoring)
* **Evidence-backed scoring** (only verified requirements are scored; each match includes a quote)
* **Self-verification and self-correction** (factuality checks can trigger automatic revision)
* **Bias auditing** (flags risky language and inconsistent standards)

### How it differentiates from typical recruiting tools

Compared with “black-box” resume screeners or generic LLM chatbots, this system emphasizes:

* **Transparency**: Outputs include *what was required*, *what was verified*, *what was dropped*, and *why*.
* **Auditability**: The scoring math is deterministic and traceable to inputs.
* **Self-verifying behavior**: Claims are checked against source text; unverified claims can be removed.
* **Bias checks by design**: Bias-sensitive content is audited explicitly instead of implicitly influencing scores.
* **Culture check that’s job-performance aligned**: Culture attributes are framed as job-relevant behaviors, not background proxies.

---

## 1) Inputs and Document Handling

### 1.1 What the user uploads

The tool operates on three inputs:

1. **Company culture / values documents** (PDF/DOCX)
2. **Resumes** (PDF/DOCX)
3. **Job description** (pasted text)

### 1.2 Resume anonymization

Before resumes are stored or analyzed, the tool applies heuristic redaction:

* Emails, phone numbers, URLs
* Addresses / location identifiers
* Explicit demographic fields
* Likely name header (first line)

This reduces exposure of personal identifiers and keeps analysis focused on job evidence.

### 1.3 Vector stores (retrieval)

The tool maintains two separate Chroma collections:

* **Resumes** (anonymized + chunked)
* **Culture docs** (chunked)

Chunking uses a recursive splitter with overlap to preserve context.

---

## 2) End-to-End Logic Flow (Step-by-Step)

Below is the stepwise flow executed when a recruiter clicks **Analyze Candidates**.

### Step 0 — Prerequisite: Documents exist in storage

* Culture docs and resumes must be stored first.
* If not stored, retrieval will be empty or low-signal.

### Step 1 — Extract required skills from the Job Description (LLM-driven)

**Goal:** Identify only skills that are explicitly required.

* The tool prompts the LLM to return **JSON only**:

  * `required_skills: [{skill, evidence_quote}]`
* The LLM is instructed to:

  * include only **MUST HAVE** / explicitly required skills
  * exclude “nice-to-haves” and implied skills
  * copy a short **verbatim quote** as evidence

**LLM role:** structured extraction.

**Failure behavior:** If JSON parsing fails, the tool stops and prints the raw output.

### Step 2 — Verify extracted skills against the JD (deterministic, Python)

**Goal:** Block hallucinated requirements from entering scoring.

Each extracted item is classified:

* **Quote-verified (strong):** the evidence quote appears verbatim in the JD
* **Name-only (weak):** the skill name appears in the JD, but the quote doesn’t match
* **Unverified (dropped):** neither quote nor name appears

**Deterministic gate:**

* Only **quote-verified** skills are used as the final required list for scoring.
* Name-only and dropped skills are reported for transparency.

**Output:** “Requirements Verification” section shows:

* extracted count
* quote-verified vs name-only vs dropped
* list of skills used for scoring
* list of retracted/dropped items (with reason)

### Step 3 — Retrieve the most relevant culture chunks (deterministic retrieval)

**Goal:** Ground culture evaluation in actual company documents.

* The tool runs similarity search over culture docs using the JD as query.
* It selects the top **k** chunks (e.g., k=3).

**Deterministic component:** vector retrieval parameters.

**Output artifact:** `culture_context` is the concatenated text of retrieved culture chunks.

### Step 4 — Generate job-performance culture attributes (LLM-driven)

**Goal:** Create a small set of job-relevant behavioral attributes to evaluate consistently.

* The tool prompts the LLM to return JSON:

  * `cultural_attributes: ["...", "..."]` (4–6 items)

**Attribute rules:**

* Must be job-performance aligned behaviors (e.g., “evidence-based decision making”).
* Must avoid pedigree / class / prestige language.
* Must avoid non-performance preferences (e.g., remote-first, time zone).

**LLM role:** label generation from retrieved culture context.

### Step 5 — Retrieve top resume chunks for the JD (deterministic retrieval)

**Goal:** Identify the most relevant candidates and their relevant resume text.

* The tool runs similarity search over resumes using the JD.
* It retrieves top **k** chunks (e.g., k=10) and groups them by `resume_id`.

**Note:** Only retrieved chunks are analyzed. If relevant evidence isn’t retrieved, it may be missed.

### Step 6 — Culture evidence matching per candidate (LLM + deterministic cleanup + deterministic scoring)

**Goal:** Determine which culture attributes are supported by resume evidence.

**LLM-driven matching:**

* For each attribute, the LLM may return a match with:

  * `evidence_type`: `direct` or `inferred`
  * `evidence_quotes`: 1–2 verbatim resume quotes
  * `inference`: required for inferred
  * `confidence`: 1–5

**Deterministic cleanup rules (Python):**
A match is kept only if:

* attribute is present
* evidence_type is `direct` or `inferred`
* at least one non-trivial quote exists
* confidence is an integer 1–5
* inferred matches include an inference sentence
* inferred matches can be required to meet a minimum confidence

**Deterministic culture scoring (Python):**

* Direct evidence weight: **1.0**
* Inferred evidence weight: **0.5**

Culture score is computed as:

* `(sum(weights for matched attributes) / number_of_required_attributes) * 100`

### Step 7 — Skills matching per candidate (LLM + deterministic scoring)

**Goal:** Match only the verified required skills to resume evidence.

**Inputs:**

* Candidate resume text (retrieved chunks)
* Verified required skills list (quote-only)

**LLM output (JSON):**

* `matched: [{skill, evidence_snippet}]`
* `missing: [skill]` (treated as advisory; missing is recomputed deterministically)

**Deterministic missing calculation (Python):**

* Missing = required_set − matched_set

**Deterministic skills scoring (Python):**

* `(number_of_matched_required_skills / number_of_required_skills) * 100`

### Step 8 — Implied competencies (NOT SCORED) for phone-screen guidance (LLM-driven, advisory)

**Goal:** When a required skill is missing explicitly, suggest whether it may be **implied** by adjacent evidence.

* This step is **not scored** and does not affect proceed/do-not-proceed.
* The LLM may suggest implied competencies only if it:

  * uses conservative language (“may be implied”)
  * includes **verbatim resume quotes**
  * provides a **phone-screen validation question**

**Hard guardrail:** Tool-specific skills (e.g., R/SAS/MATLAB) must be explicitly present in the resume to be suggested.

### Step 9 — Factuality verification (LLM-driven verifier)

**Goal:** Detect ungrounded evidence claims.

* The verifier checks evidence-backed match lines (e.g., `- Skill: snippet`).
* It ignores:

  * numeric score lines
  * missing lists
  * policy text

**Outputs:**

* verified claims (✓)
* unverified claims (✗)
* factuality score

### Step 10 — Final recommendation (LLM, policy-constrained)

**Goal:** Produce a structured recommendation without changing scores.

* The model is given:

  * skills analysis
  * culture analysis
  * fixed computed scores
  * deterministic decision policy

**Decision policy:**

* If skills_score ≥ 70 → PROCEED
* If skills_score < 60 → DO NOT PROCEED
* If 60 ≤ skills_score < 70 → PROCEED only if culture_score ≥ 70 else DO NOT PROCEED

**Non-negotiables:**

* LLM must not re-score.
* LLM must not introduce new claims.

### Step 11 — Self-correction (triggered by verification issues)

**Goal:** Remove/correct any unverified claims while preserving scores/policy.

* If any unverified claims exist:

  * The tool asks the LLM to revise the recommendation
  * Only the flagged claims may be removed/corrected
  * Scores and policy must remain unchanged

### Step 12 — Bias audit (LLM-driven audit across docs + reasoning)

**Goal:** Flag biased reasoning, biased JD language, or inconsistent standards.

**Audit scope includes:**

* Job description
* Skills analysis
* Culture analysis
* Final recommendation text
* Culture context

**What it flags (examples):**

* Prestige/pedigree signals (elite employers/education as proxy)
* Vague “polish/executive presence” language not tied to job requirements
* Non-job-related culture screening
* Inconsistent standards (penalizing requirements not in JD)
* Overclaiming certainty

**Outputs:**

* structured list of bias indicators (category, severity, trigger text, why it matters, recommended fix)
* recruiter guidance

---

## 3) Scoring and Decision Rules (Deterministic)

### 3.1 Skills score

* Only quote-verified required skills count.
* Score = matches / required.

### 3.2 Culture score

* Score = weighted matches / attributes.
* Direct = 1.0; inferred = 0.5.

### 3.3 Labels

* ≥70: Strong fit
* 50–69: Moderate fit
* <50: Not a fit

### 3.4 Recommendation

Recommendation follows the fixed policy described in Step 10.

---

## 4) System Flow Diagram (Textual)

Below is a simplified, end-to-end flow of how data moves through the system.

```
[User Uploads]
   |
   v
+-------------------+
| Culture Documents |
+-------------------+        +-----------+
           |                 | Job Desc  |
           v                 +-----------+
+-------------------+               |
| Culture Vector DB |<--------------+
+-------------------+               |
           |                        v
           |               +---------------------+
           |               | Skill Extraction    |
           |               | (LLM, JSON Output)  |
           |               +---------------------+
           |                        |
           |                        v
           |               +---------------------+
           |               | Requirement         |
           |               | Verification        |
           |               | (Deterministic)     |
           |               +---------------------+
           |                        |
           |                        v
           |               Verified Required Skills
           |                        |
           |                        v
+-------------------+        +---------------------+
| Resume Documents  |------->| Resume Vector DB    |
+-------------------+        +---------------------+
                                   |
                                   v
                           Similarity Search (k=10)
                                   |
                                   v
                           Resume Chunks (Grouped)
                                   |
                                   v
                     +-----------------------------+
                     | Culture Attribute Generator |
                     | (LLM, JSON Output)          |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Culture Evidence Matching   |
                     | (LLM + Rules + Weights)     |
                     +-----------------------------+
                                   |
                                   v
                     Culture Score (Deterministic)
                                   |
                                   v
                     +-----------------------------+
                     | Technical Skill Matching    |
                     | (LLM + Deterministic Scoring)|
                     +-----------------------------+
                                   |
                                   v
                     Skills Score (Deterministic)
                                   |
                                   v
                     +-----------------------------+
                     | Implied Competencies (LLM)  |
                     | (Not Scored, Advisory)      |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Factuality Verification     |
                     | (LLM Verifier)              |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Recommendation Generator    |
                     | (Policy-Constrained LLM)    |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Bias & Fairness Audit        |
                     | (LLM Audit)                 |
                     +-----------------------------+
                                   |
                                   v
                           Final Recruiter Report
```

---

## 5) Audit Artifacts and Traceability

For every analysis run, the system produces and retains multiple audit artifacts that enable post-hoc review, regulatory defensibility, and debugging.

### 5.1 Input Artifacts

1. **Original Job Description**

   * Full pasted JD text

2. **Sanitized Resume Text**

   * Redacted resume content
   * Redaction summary (internal)

3. **Retrieved Culture Chunks**

   * Top-k (default: 3) culture document segments
   * Vector similarity scores (internal)

4. **Retrieved Resume Chunks**

   * Top-k (default: 10) resume segments
   * Resume ID metadata

---

### 5.2 Requirement Verification Artifacts

1. **Raw LLM Skill Extraction Output**
2. **Parsed Required Skills JSON**
3. **Verification Classification Table**

   * Quote-verified
   * Name-only
   * Dropped
4. **Dropped-Skill Justifications**

---

### 5.3 Culture Analysis Artifacts

1. **Generated Culture Attribute List**
2. **LLM Raw Matching Output**
3. **Cleaned Match Records**

   * Evidence type
   * Quotes
   * Inference
   * Confidence
4. **Weighted Match Table**
5. **Computed Culture Score**

---

### 5.4 Skills Analysis Artifacts

1. **Verified Required Skill List**
2. **LLM Raw Matching Output**
3. **Accepted Matched Skills**
4. **Deterministic Missing-Skill Set**
5. **Computed Skills Score**

---

### 5.5 Implied Competency Artifacts (Advisory)

1. **Missing Skill List**
2. **LLM Implied Output (JSON)**
3. **Accepted Implied Records**

   * Resume quotes
   * Explanation
   * Phone-screen questions
4. **Rejected Inferences (internal)**

---

### 5.6 Verification and Correction Artifacts

1. **Verifier Prompt and Output**
2. **Verified / Unverified Claim Lists**
3. **Factuality Scores**
4. **Self-Correction Prompts and Revisions (if triggered)**

---

### 5.7 Recommendation and Policy Artifacts

1. **Final Recommendation Prompt**
2. **Policy Threshold Snapshot**
3. **Immutable Score Values**
4. **Generated Recommendation Text**

---

### 5.8 Bias Audit Artifacts

1. **Bias Audit Prompt**
2. **Audit Input Bundle (JD + Analyses + Recommendation)**
3. **Structured Bias Indicator List**
4. **Severity and Mitigation Suggestions**
5. **Recruiter Guidance Text**

---

### 5.9 System Metadata

1. Timestamp of run
2. Model version
3. Prompt versions
4. Chunking parameters
5. Retrieval k-values
6. Scoring parameters

---

## 6) Known Limitations

1. **Retrieval scope**: evaluation depends on retrieved chunks; some evidence may be missed.
2. **Attribute generation variance**: culture attributes can vary per run unless cached or cataloged.
3. **LLM evidence overreach**: mitigated by verification and cleanup, but not eliminated.
4. **Bias audit is advisory**: it flags issues; it does not enforce policy changes unless you add an auto-rewrite step.

---

## 6) Governance and Change Control

* Prompt changes must preserve JSON contracts.
* Any change that affects scoring or policy should be versioned.
* Audit outputs should be retained for traceability.

---

## 7) Intended Use

This tool is built for:

* faster, evidence-based screening
* transparent reasoning
* safer use of LLMs via verification and audits

It is not a substitute for:

* human judgment
* legal review
* formal HR policy compliance

---

### High-level pipeline (inputs → outputs)

**Inputs uploaded by recruiter**

1. Company culture/values docs (PDF/DOCX)
2. Resumes (PDF/DOCX)
3. Job description (text)

⬇️

**Indexing (deterministic, Python)**

* Culture docs → chunk + embed → `culture_store`
* Resumes → anonymize → chunk + embed → `resume_store`

⬇️

**Candidate assessment (per JD run)**

1. **Extract required skills (LLM)** → JSON `required_skills[{skill,evidence_quote}]`

2. **Verify extracted skills (Python)** → quote-verified / name-only / dropped → *quote-only list used for scoring*

3. **Retrieve relevant culture context (deterministic retrieval)**

* Query: JD
* Retrieve: top-k culture chunks (**current: k=3**)
* Output: `culture_context`

4. **Generate job-relevant culture attributes (LLM)** → JSON `cultural_attributes[4–6]`

5. **Retrieve relevant resume chunks (deterministic retrieval)**

* Query: JD
* Retrieve: top-k resume chunks (**current: k=10**)
* Group by `resume_id`

6. **Per candidate: culture matching (LLM → cleanup → deterministic score)**

* LLM proposes matches (direct/inferred) + quotes
* Python enforces validity gates
* Deterministic weighted culture score (direct=1.0, inferred=0.5)

7. **Per candidate: skills matching (LLM → deterministic score)**

* LLM proposes matched skills + evidence snippets
* Python recomputes missing list deterministically
* Deterministic skills score using quote-verified requirements only

8. **Per candidate: implied competencies (LLM, NOT SCORED)**

* Inputs: missing skills + matched skills + resume + JD
* Output: implied items with quotes + phone-screen questions
* Guardrail: tool-like skills (R/SAS/MATLAB) require explicit mention

9. **Factuality verification (LLM verifier)** → ✓/✗ for evidence-backed match lines + factuality score

10. **Recommendation (LLM, policy constrained)** → uses fixed scores + fixed decision policy

11. **Self-correction (conditional)** → triggered if any unverified claims exist

12. **Bias audit (LLM)** → audits JD + analyses + recommendation → structured bias indicators + guidance

⬇️

**Outputs per candidate**

* Requirements verification summary (global)
* Culture analysis + score
* Skills analysis + score
* Implied (not scored) follow-ups
* Fact-check results
* Final recommendation (+ revision note if corrected)
* Bias audit

---

### Component map (LLM vs deterministic)

**LLM-driven components**

* Required skill extraction (JSON)
* Culture attribute generation (JSON)
* Culture match proposals (JSON)
* Skills match proposals (JSON)
* Implied (not scored) follow-ups (JSON)
* Factuality verification (✓/✗)
* Final recommendation (policy constrained)
* Bias audit (structured)

**Deterministic / Python-enforced components**

* Resume anonymization
* Chunking + embedding + storage
* Retrieval parameters (top-k)
* Required-skill verification (quote/name-only/dropped)
* Deduplication of requirements
* Culture match cleanup rules (validity gates)
* Skills missing list recomputation
* Skills score computation
* Culture score computation with weights
* Decision thresholds (proceed / do not proceed)
* Self-correction trigger (presence of unverified claims)

---

## Audit Artifacts

This section lists the primary artifacts produced (or recommended to persist) to make runs reviewable and defensible.

### Inputs (source-of-truth)

* Job description text (as provided)
* Culture documents (original files)
* Resumes (original files)

### Pre-processing

* Sanitized resume text (post-anonymization)
* Redaction notes (what was removed/masked)
* Chunking configuration (chunk_size, chunk_overlap)
* Embedding configuration (embedding model + settings)

### Retrieval

* Culture retrieval query: JD text
* Culture retrieved chunks: top-k (**current: k=3**)
* Resume retrieval query: JD text
* Resume retrieved chunks: top-k (**current: k=10**)
* Candidate grouping: chunks grouped by `resume_id`

### Requirements verification

* LLM `required_skills` JSON (raw)
* Normalized required skill list (deduped)
* Verification output:

  * quote-verified list
  * name-only list
  * dropped/unverified list
  * counts and factuality score
* Final scoring-required list: quote-verified only

### Per-candidate analyses

**Culture analysis**

* Raw LLM culture-match JSON
* Post-cleanup matched culture list
* Missing culture attributes list
* Culture score + label
* Culture evidence lines shown to recruiters

**Skills analysis**

* Raw LLM skills-match JSON
* Matched skills list (with evidence snippets)
* Deterministically computed missing skills list
* Skills score + label

**Implied (NOT SCORED)**

* Raw LLM implied JSON
* Filtered implied list (must include resume quotes + phone-screen questions)

### Verification & correction

* Verifier raw output (✓/✗ lines)
* Verified claims list
* Unverified claims list
* Factuality score
* Self-correction trigger status (yes/no)
* Corrected recommendation (if triggered) + revision note

### Bias audit

* Bias audit raw output (structured)
* Bias indicators list (category, severity, trigger_text, why_it_matters, recommended_fix)
* Overall assessment
* Recruiter guidance

### Run-level trace (recommended)

For reproducibility/governance, also persist:

* Timestamp, model name, temperature, seed
* Prompt versions (hash or version ID)
* Retrieval parameters (k values)
* Score thresholds and policy version
* Any configuration overrides used during the run


## End-to-End Pipeline (Swim-Lane View)

| Step | Recruiter / Input | Python / Deterministic Logic | LLM (Groq) | Storage / Output |
|------|------------------|------------------------------|-----------|------------------|
| 1 | Upload culture documents | Chunk + embed | — | `culture_store` (indexed) |
| 2 | Upload resumes | Anonymize → chunk → embed | — | `resume_store` (indexed) |
| 3 | Paste JD + Run | Send JD to LLM | Extract required skills + evidence quotes | `required_skills` JSON |
| 4 | — | Verify requirements (quote / name-only / dropped) | — | Verified list + debug report |
| 5 | — | Retrieve culture context (k=3) | — | `culture_context` |
| 6 | — | — | Generate culture attributes (job-performance aligned) | `cultural_attributes` JSON |
| 7 | — | Retrieve resume chunks (k=10), group by `resume_id` | — | Candidate chunks |
| 8 | — | — | Propose culture matches (direct/inferred + quotes) | Raw culture-match JSON |
| 9 | — | Cleanup + weighted scoring (direct=1.0, inferred=0.5) | — | Culture score + evidence |
| 10 | — | — | Propose skill matches + evidence snippets | Raw skills-match JSON |
| 11 | — | Compute missing list + skills score (verified reqs only) | — | Skills score + missing list |
| 12 | — | — | Infer implied skills (NOT SCORED) + phone questions | Implied follow-ups |
| 13 | — | — | Verify evidence (✓/✗) | Factuality report |
| 14 | — | — | Generate recommendation (policy constrained) | Final recommendation |
| 15 | — | Trigger self-correction (if needed) | Revise flagged claims only | Corrected recommendation |
| 16 | — | — | Run bias audit (JD + analyses + decision) | Bias indicators + guidance |
| 17 | Review output | Assemble final report | — | Full candidate report |

### Current Retrieval Parameters

- Culture store: `k = 3` chunks (JD query)
- Resume store: `k = 10` chunks (JD query)