Spaces:

19arjun89
/

AI_Recruiting_Agent

Running

File size: 25,072 Bytes

# AI Recruiting Assistant — Guide Book (Updated)

## 0) Overview

### What this tool does

This AI Recruiting Assistant is a **decision-support** system that helps recruiters and hiring managers:

* Extract **job requirements** from a job description (JD)
* Evaluate resumes against **verified requirements** using **evidence-based** matching
* Assess job-relevant **culture/working-style signals** using retrieved company documents
* Run **factuality checks** to detect ungrounded claims
* Run a **bias & fairness audit** across the JD, analyses, and the model’s final recommendation

### The problem it addresses

Recruiting teams often face three recurring issues when using AI:

1. **Hallucinated requirements**: LLMs may “invent” skills that are not explicitly required.
2. **Opaque scoring**: Many tools produce fit scores without clearly showing evidence.
3. **Bias risks**: Hiring language and reasoning can leak pedigree/class proxies or subjective criteria.

This tool addresses those issues by enforcing:

* **Deterministic verification gates** (requirements are verified before scoring)
* **Evidence-backed scoring** (only verified requirements are scored; each match includes a quote)
* **Self-verification and self-correction** (factuality checks can trigger automatic revision)
* **Bias auditing** (flags risky language and inconsistent standards)

### How it differentiates from typical recruiting tools

Compared with “black-box” resume screeners or generic LLM chatbots, this system emphasizes:

* **Transparency**: Outputs include *what was required*, *what was verified*, *what was dropped*, and *why*.
* **Auditability**: The scoring math is deterministic and traceable to inputs.
* **Self-verifying behavior**: Claims are checked against source text; unverified claims can be removed.
* **Bias checks by design**: Bias-sensitive content is audited explicitly instead of implicitly influencing scores.
* **Culture check that’s job-performance aligned**: Culture attributes are framed as job-relevant behaviors, not background proxies.

---

## 1) Inputs and Document Handling

### 1.1 What the user uploads

The tool operates on three inputs:

1. **Company culture / values documents** (PDF/DOCX)
2. **Resumes** (PDF/DOCX)
3. **Job description** (pasted text)

### 1.2 Resume anonymization

Before resumes are stored or analyzed, the tool applies heuristic redaction:

* Emails, phone numbers, URLs
* Addresses / location identifiers
* Explicit demographic fields
* Likely name header (first line)

This reduces exposure of personal identifiers and keeps analysis focused on job evidence.

### 1.3 Vector stores (retrieval)

The tool maintains two separate Chroma collections:

* **Resumes** (anonymized + chunked)
* **Culture docs** (chunked)

Chunking uses a recursive splitter with overlap to preserve context.

---

## 2) End-to-End Logic Flow (Step-by-Step)

Below is the stepwise flow executed when a recruiter clicks **Analyze Candidates**.

### Step 0 — Prerequisite: Documents exist in storage

* Culture docs and resumes must be stored first.
* If not stored, retrieval will be empty or low-signal.

### Step 1 — Extract required skills from the Job Description (LLM-driven)

**Goal:** Identify only skills that are explicitly required.

* The tool prompts the LLM to return **JSON only**:

  * `required_skills: [{skill, evidence_quote}]`
* The LLM is instructed to:

  * include only **MUST HAVE** / explicitly required skills
  * exclude “nice-to-haves” and implied skills
  * copy a short **verbatim quote** as evidence

**LLM role:** structured extraction.

**Failure behavior:** If JSON parsing fails, the tool stops and prints the raw output.

### Step 2 — Verify extracted skills against the JD (deterministic, Python)

**Goal:** Block hallucinated requirements from entering scoring.

Each extracted item is classified:

* **Quote-verified (strong):** the evidence quote appears verbatim in the JD
* **Name-only (weak):** the skill name appears in the JD, but the quote doesn’t match
* **Unverified (dropped):** neither quote nor name appears

**Deterministic gate:**

* Only **quote-verified** skills are used as the final required list for scoring.
* Name-only and dropped skills are reported for transparency.

**Output:** “Requirements Verification” section shows:

* extracted count
* quote-verified vs name-only vs dropped
* list of skills used for scoring
* list of retracted/dropped items (with reason)

### Step 3 — Retrieve the most relevant culture chunks (deterministic retrieval)

**Goal:** Ground culture evaluation in actual company documents.

* The tool runs similarity search over culture docs using the JD as query.
* It selects the top **k** chunks (e.g., k=3).

**Deterministic component:** vector retrieval parameters.

**Output artifact:** `culture_context` is the concatenated text of retrieved culture chunks.

### Step 4 — Generate job-performance culture attributes (LLM-driven)

**Goal:** Create a small set of job-relevant behavioral attributes to evaluate consistently.

* The tool prompts the LLM to return JSON:

  * `cultural_attributes: ["...", "..."]` (4–6 items)

**Attribute rules:**

* Must be job-performance aligned behaviors (e.g., “evidence-based decision making”).
* Must avoid pedigree / class / prestige language.
* Must avoid non-performance preferences (e.g., remote-first, time zone).

**LLM role:** label generation from retrieved culture context.

### Step 5 — Retrieve top resume chunks for the JD (deterministic retrieval)

**Goal:** Identify the most relevant candidates and their relevant resume text.

* The tool runs similarity search over resumes using the JD.
* It retrieves top **k** chunks (e.g., k=10) and groups them by `resume_id`.

**Note:** Only retrieved chunks are analyzed. If relevant evidence isn’t retrieved, it may be missed.

### Step 6 — Culture evidence matching per candidate (LLM + deterministic cleanup + deterministic scoring)

**Goal:** Determine which culture attributes are supported by resume evidence.

**LLM-driven matching:**

* For each attribute, the LLM may return a match with:

  * `evidence_type`: `direct` or `inferred`
  * `evidence_quotes`: 1–2 verbatim resume quotes
  * `inference`: required for inferred
  * `confidence`: 1–5

**Deterministic cleanup rules (Python):**
A match is kept only if:

* attribute is present
* evidence_type is `direct` or `inferred`
* at least one non-trivial quote exists
* confidence is an integer 1–5
* inferred matches include an inference sentence
* inferred matches can be required to meet a minimum confidence

**Deterministic culture scoring (Python):**

* Direct evidence weight: **1.0**
* Inferred evidence weight: **0.5**

Culture score is computed as:

* `(sum(weights for matched attributes) / number_of_required_attributes) * 100`

### Step 7 — Skills matching per candidate (LLM + deterministic scoring)

**Goal:** Match only the verified required skills to resume evidence.

**Inputs:**

* Candidate resume text (retrieved chunks)
* Verified required skills list (quote-only)

**LLM output (JSON):**

* `matched: [{skill, evidence_snippet}]`
* `missing: [skill]` (treated as advisory; missing is recomputed deterministically)

**Deterministic missing calculation (Python):**

* Missing = required_set − matched_set

**Deterministic skills scoring (Python):**

* `(number_of_matched_required_skills / number_of_required_skills) * 100`

### Step 8 — Implied competencies (NOT SCORED) for phone-screen guidance (LLM-driven, advisory)

**Goal:** When a required skill is missing explicitly, suggest whether it may be **implied** by adjacent evidence.

* This step is **not scored** and does not affect proceed/do-not-proceed.
* The LLM may suggest implied competencies only if it:

  * uses conservative language (“may be implied”)
  * includes **verbatim resume quotes**
  * provides a **phone-screen validation question**

**Hard guardrail:** Tool-specific skills (e.g., R/SAS/MATLAB) must be explicitly present in the resume to be suggested.

### Step 9 — Factuality verification (LLM-driven verifier)

**Goal:** Detect ungrounded evidence claims.

* The verifier checks evidence-backed match lines (e.g., `- Skill: snippet`).
* It ignores:

  * numeric score lines
  * missing lists
  * policy text

**Outputs:**

* verified claims (✓)
* unverified claims (✗)
* factuality score

### Step 10 — Final recommendation (LLM, policy-constrained)

**Goal:** Produce a structured recommendation without changing scores.

* The model is given:

  * skills analysis
  * culture analysis
  * fixed computed scores
  * deterministic decision policy

**Decision policy:**

* If skills_score ≥ 70 → PROCEED
* If skills_score < 60 → DO NOT PROCEED
* If 60 ≤ skills_score < 70 → PROCEED only if culture_score ≥ 70 else DO NOT PROCEED

**Non-negotiables:**

* LLM must not re-score.
* LLM must not introduce new claims.

### Step 11 — Self-correction (triggered by verification issues)

**Goal:** Remove/correct any unverified claims while preserving scores/policy.

* If any unverified claims exist:

  * The tool asks the LLM to revise the recommendation
  * Only the flagged claims may be removed/corrected
  * Scores and policy must remain unchanged

### Step 12 — Bias audit (LLM-driven audit across docs + reasoning)

**Goal:** Flag biased reasoning, biased JD language, or inconsistent standards.

**Audit scope includes:**

* Job description
* Skills analysis
* Culture analysis
* Final recommendation text
* Culture context

**What it flags (examples):**

* Prestige/pedigree signals (elite employers/education as proxy)
* Vague “polish/executive presence” language not tied to job requirements
* Non-job-related culture screening
* Inconsistent standards (penalizing requirements not in JD)
* Overclaiming certainty

**Outputs:**

* structured list of bias indicators (category, severity, trigger text, why it matters, recommended fix)
* recruiter guidance

---

## 3) Scoring and Decision Rules (Deterministic)

### 3.1 Skills score

* Only quote-verified required skills count.
* Score = matches / required.

### 3.2 Culture score

* Score = weighted matches / attributes.
* Direct = 1.0; inferred = 0.5.

### 3.3 Labels

* ≥70: Strong fit
* 50–69: Moderate fit
* <50: Not a fit

### 3.4 Recommendation

Recommendation follows the fixed policy described in Step 10.

---

## 4) System Flow Diagram (Textual)

Below is a simplified, end-to-end flow of how data moves through the system.

```
[User Uploads]
   |
   v
+-------------------+
| Culture Documents |
+-------------------+        +-----------+
           |                 | Job Desc  |
           v                 +-----------+
+-------------------+               |
| Culture Vector DB |<--------------+
+-------------------+               |
           |                        v
           |               +---------------------+
           |               | Skill Extraction    |
           |               | (LLM, JSON Output)  |
           |               +---------------------+
           |                        |
           |                        v
           |               +---------------------+
           |               | Requirement         |
           |               | Verification        |
           |               | (Deterministic)     |
           |               +---------------------+
           |                        |
           |                        v
           |               Verified Required Skills
           |                        |
           |                        v
+-------------------+        +---------------------+
| Resume Documents  |------->| Resume Vector DB    |
+-------------------+        +---------------------+
                                   |
                                   v
                           Similarity Search (k=10)
                                   |
                                   v
                           Resume Chunks (Grouped)
                                   |
                                   v
                     +-----------------------------+
                     | Culture Attribute Generator |
                     | (LLM, JSON Output)          |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Culture Evidence Matching   |
                     | (LLM + Rules + Weights)     |
                     +-----------------------------+
                                   |
                                   v
                     Culture Score (Deterministic)
                                   |
                                   v
                     +-----------------------------+
                     | Technical Skill Matching    |
                     | (LLM + Deterministic Scoring)|
                     +-----------------------------+
                                   |
                                   v
                     Skills Score (Deterministic)
                                   |
                                   v
                     +-----------------------------+
                     | Implied Competencies (LLM)  |
                     | (Not Scored, Advisory)      |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Factuality Verification     |
                     | (LLM Verifier)              |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Recommendation Generator    |
                     | (Policy-Constrained LLM)    |
                     +-----------------------------+
                                   |
                                   v
                     +-----------------------------+
                     | Bias & Fairness Audit        |
                     | (LLM Audit)                 |
                     +-----------------------------+
                                   |
                                   v
                           Final Recruiter Report
```

---

## 5) Audit Artifacts and Traceability

For every analysis run, the system produces and retains multiple audit artifacts that enable post-hoc review, regulatory defensibility, and debugging.

### 5.1 Input Artifacts

1. **Original Job Description**

   * Full pasted JD text

2. **Sanitized Resume Text**

   * Redacted resume content
   * Redaction summary (internal)

3. **Retrieved Culture Chunks**

   * Top-k (default: 3) culture document segments
   * Vector similarity scores (internal)

4. **Retrieved Resume Chunks**

   * Top-k (default: 10) resume segments
   * Resume ID metadata

---

### 5.2 Requirement Verification Artifacts

1. **Raw LLM Skill Extraction Output**
2. **Parsed Required Skills JSON**
3. **Verification Classification Table**

   * Quote-verified
   * Name-only
   * Dropped
4. **Dropped-Skill Justifications**

---

### 5.3 Culture Analysis Artifacts

1. **Generated Culture Attribute List**
2. **LLM Raw Matching Output**
3. **Cleaned Match Records**

   * Evidence type
   * Quotes
   * Inference
   * Confidence
4. **Weighted Match Table**
5. **Computed Culture Score**

---

### 5.4 Skills Analysis Artifacts

1. **Verified Required Skill List**
2. **LLM Raw Matching Output**
3. **Accepted Matched Skills**
4. **Deterministic Missing-Skill Set**
5. **Computed Skills Score**

---

### 5.5 Implied Competency Artifacts (Advisory)

1. **Missing Skill List**
2. **LLM Implied Output (JSON)**
3. **Accepted Implied Records**

   * Resume quotes
   * Explanation
   * Phone-screen questions
4. **Rejected Inferences (internal)**

---

### 5.6 Verification and Correction Artifacts

1. **Verifier Prompt and Output**
2. **Verified / Unverified Claim Lists**
3. **Factuality Scores**
4. **Self-Correction Prompts and Revisions (if triggered)**

---

### 5.7 Recommendation and Policy Artifacts

1. **Final Recommendation Prompt**
2. **Policy Threshold Snapshot**
3. **Immutable Score Values**
4. **Generated Recommendation Text**

---

### 5.8 Bias Audit Artifacts

1. **Bias Audit Prompt**
2. **Audit Input Bundle (JD + Analyses + Recommendation)**
3. **Structured Bias Indicator List**
4. **Severity and Mitigation Suggestions**
5. **Recruiter Guidance Text**

---

### 5.9 System Metadata

1. Timestamp of run
2. Model version
3. Prompt versions
4. Chunking parameters
5. Retrieval k-values
6. Scoring parameters

---

## 6) Known Limitations

1. **Retrieval scope**: evaluation depends on retrieved chunks; some evidence may be missed.
2. **Attribute generation variance**: culture attributes can vary per run unless cached or cataloged.
3. **LLM evidence overreach**: mitigated by verification and cleanup, but not eliminated.
4. **Bias audit is advisory**: it flags issues; it does not enforce policy changes unless you add an auto-rewrite step.

---

## 6) Governance and Change Control

* Prompt changes must preserve JSON contracts.
* Any change that affects scoring or policy should be versioned.
* Audit outputs should be retained for traceability.

---

## 7) Intended Use

This tool is built for:

* faster, evidence-based screening
* transparent reasoning
* safer use of LLMs via verification and audits

It is not a substitute for:

* human judgment
* legal review
* formal HR policy compliance

---

### High-level pipeline (inputs → outputs)

**Inputs uploaded by recruiter**

1. Company culture/values docs (PDF/DOCX)
2. Resumes (PDF/DOCX)
3. Job description (text)

⬇️

**Indexing (deterministic, Python)**

* Culture docs → chunk + embed → `culture_store`
* Resumes → anonymize → chunk + embed → `resume_store`

⬇️

**Candidate assessment (per JD run)**

1. **Extract required skills (LLM)** → JSON `required_skills[{skill,evidence_quote}]`

2. **Verify extracted skills (Python)** → quote-verified / name-only / dropped → *quote-only list used for scoring*

3. **Retrieve relevant culture context (deterministic retrieval)**

* Query: JD
* Retrieve: top-k culture chunks (**current: k=3**)
* Output: `culture_context`

4. **Generate job-relevant culture attributes (LLM)** → JSON `cultural_attributes[4–6]`

5. **Retrieve relevant resume chunks (deterministic retrieval)**

* Query: JD
* Retrieve: top-k resume chunks (**current: k=10**)
* Group by `resume_id`

6. **Per candidate: culture matching (LLM → cleanup → deterministic score)**

* LLM proposes matches (direct/inferred) + quotes
* Python enforces validity gates
* Deterministic weighted culture score (direct=1.0, inferred=0.5)

7. **Per candidate: skills matching (LLM → deterministic score)**

* LLM proposes matched skills + evidence snippets
* Python recomputes missing list deterministically
* Deterministic skills score using quote-verified requirements only

8. **Per candidate: implied competencies (LLM, NOT SCORED)**

* Inputs: missing skills + matched skills + resume + JD
* Output: implied items with quotes + phone-screen questions
* Guardrail: tool-like skills (R/SAS/MATLAB) require explicit mention

9. **Factuality verification (LLM verifier)** → ✓/✗ for evidence-backed match lines + factuality score

10. **Recommendation (LLM, policy constrained)** → uses fixed scores + fixed decision policy

11. **Self-correction (conditional)** → triggered if any unverified claims exist

12. **Bias audit (LLM)** → audits JD + analyses + recommendation → structured bias indicators + guidance

⬇️

**Outputs per candidate**

* Requirements verification summary (global)
* Culture analysis + score
* Skills analysis + score
* Implied (not scored) follow-ups
* Fact-check results
* Final recommendation (+ revision note if corrected)
* Bias audit

---

### Component map (LLM vs deterministic)

**LLM-driven components**

* Required skill extraction (JSON)
* Culture attribute generation (JSON)
* Culture match proposals (JSON)
* Skills match proposals (JSON)
* Implied (not scored) follow-ups (JSON)
* Factuality verification (✓/✗)
* Final recommendation (policy constrained)
* Bias audit (structured)

**Deterministic / Python-enforced components**

* Resume anonymization
* Chunking + embedding + storage
* Retrieval parameters (top-k)
* Required-skill verification (quote/name-only/dropped)
* Deduplication of requirements
* Culture match cleanup rules (validity gates)
* Skills missing list recomputation
* Skills score computation
* Culture score computation with weights
* Decision thresholds (proceed / do not proceed)
* Self-correction trigger (presence of unverified claims)

---

## Audit Artifacts

This section lists the primary artifacts produced (or recommended to persist) to make runs reviewable and defensible.

### Inputs (source-of-truth)

* Job description text (as provided)
* Culture documents (original files)
* Resumes (original files)

### Pre-processing

* Sanitized resume text (post-anonymization)
* Redaction notes (what was removed/masked)
* Chunking configuration (chunk_size, chunk_overlap)
* Embedding configuration (embedding model + settings)

### Retrieval

* Culture retrieval query: JD text
* Culture retrieved chunks: top-k (**current: k=3**)
* Resume retrieval query: JD text
* Resume retrieved chunks: top-k (**current: k=10**)
* Candidate grouping: chunks grouped by `resume_id`

### Requirements verification

* LLM `required_skills` JSON (raw)
* Normalized required skill list (deduped)
* Verification output:

  * quote-verified list
  * name-only list
  * dropped/unverified list
  * counts and factuality score
* Final scoring-required list: quote-verified only

### Per-candidate analyses

**Culture analysis**

* Raw LLM culture-match JSON
* Post-cleanup matched culture list
* Missing culture attributes list
* Culture score + label
* Culture evidence lines shown to recruiters

**Skills analysis**

* Raw LLM skills-match JSON
* Matched skills list (with evidence snippets)
* Deterministically computed missing skills list
* Skills score + label

**Implied (NOT SCORED)**

* Raw LLM implied JSON
* Filtered implied list (must include resume quotes + phone-screen questions)

### Verification & correction

* Verifier raw output (✓/✗ lines)
* Verified claims list
* Unverified claims list
* Factuality score
* Self-correction trigger status (yes/no)
* Corrected recommendation (if triggered) + revision note

### Bias audit

* Bias audit raw output (structured)
* Bias indicators list (category, severity, trigger_text, why_it_matters, recommended_fix)
* Overall assessment
* Recruiter guidance

### Run-level trace (recommended)

For reproducibility/governance, also persist:

* Timestamp, model name, temperature, seed
* Prompt versions (hash or version ID)
* Retrieval parameters (k values)
* Score thresholds and policy version
* Any configuration overrides used during the run


## End-to-End Pipeline (Swim-Lane View)

| Step | Recruiter / Input | Python / Deterministic Logic | LLM (Groq) | Storage / Output |
|------|------------------|------------------------------|-----------|------------------|
| 1 | Upload culture documents | Chunk + embed | — | `culture_store` (indexed) |
| 2 | Upload resumes | Anonymize → chunk → embed | — | `resume_store` (indexed) |
| 3 | Paste JD + Run | Send JD to LLM | Extract required skills + evidence quotes | `required_skills` JSON |
| 4 | — | Verify requirements (quote / name-only / dropped) | — | Verified list + debug report |
| 5 | — | Retrieve culture context (k=3) | — | `culture_context` |
| 6 | — | — | Generate culture attributes (job-performance aligned) | `cultural_attributes` JSON |
| 7 | — | Retrieve resume chunks (k=10), group by `resume_id` | — | Candidate chunks |
| 8 | — | — | Propose culture matches (direct/inferred + quotes) | Raw culture-match JSON |
| 9 | — | Cleanup + weighted scoring (direct=1.0, inferred=0.5) | — | Culture score + evidence |
| 10 | — | — | Propose skill matches + evidence snippets | Raw skills-match JSON |
| 11 | — | Compute missing list + skills score (verified reqs only) | — | Skills score + missing list |
| 12 | — | — | Infer implied skills (NOT SCORED) + phone questions | Implied follow-ups |
| 13 | — | — | Verify evidence (✓/✗) | Factuality report |
| 14 | — | — | Generate recommendation (policy constrained) | Final recommendation |
| 15 | — | Trigger self-correction (if needed) | Revise flagged claims only | Corrected recommendation |
| 16 | — | — | Run bias audit (JD + analyses + decision) | Bias indicators + guidance |
| 17 | Review output | Assemble final report | — | Full candidate report |

### Current Retrieval Parameters

- Culture store: `k = 3` chunks (JD query)
- Resume store: `k = 10` chunks (JD query)