Spaces:

19arjun89
/

AI_Recruiting_Agent

Running

App Files Files Community

19arjun89 commited on Feb 3

Commit

5b3bd15

verified ·

1 Parent(s): 0dc4344

Update Guidebook.md

Browse files

Files changed (1) hide show

Guidebook.md +1 -767

Guidebook.md CHANGED Viewed

@@ -1,769 +1,3 @@
-# AI Recruiting Assistant — Guide Book (Updated)
-## 0) Overview
-### What this tool does
-This AI Recruiting Assistant is a **decision-support** system that helps recruiters and hiring managers:
-* Extract **job requirements** from a job description (JD)
-* Evaluate resumes against **verified requirements** using **evidence-based** matching
-* Assess job-relevant **culture/working-style signals** using retrieved company documents
-* Run **factuality checks** to detect ungrounded claims
-* Run a **bias & fairness audit** across the JD, analyses, and the model’s final recommendation
-### The problem it addresses
-Recruiting teams often face three recurring issues when using AI:
-1. **Hallucinated requirements**: LLMs may “invent” skills that are not explicitly required.
-2. **Opaque scoring**: Many tools produce fit scores without clearly showing evidence.
-3. **Bias risks**: Hiring language and reasoning can leak pedigree/class proxies or subjective criteria.
-This tool addresses those issues by enforcing:
-* **Deterministic verification gates** (requirements are verified before scoring)
-* **Evidence-backed scoring** (only verified requirements are scored; each match includes a quote)
-* **Self-verification and self-correction** (factuality checks can trigger automatic revision)
-* **Bias auditing** (flags risky language and inconsistent standards)
-### How it differentiates from typical recruiting tools
-Compared with “black-box” resume screeners or generic LLM chatbots, this system emphasizes:
-* **Transparency**: Outputs include *what was required*, *what was verified*, *what was dropped*, and *why*.
-* **Auditability**: The scoring math is deterministic and traceable to inputs.
-* **Self-verifying behavior**: Claims are checked against source text; unverified claims can be removed.
-* **Bias checks by design**: Bias-sensitive content is audited explicitly instead of implicitly influencing scores.
-* **Culture check that’s job-performance aligned**: Culture attributes are framed as job-relevant behaviors, not background proxies.
----
-## 1) Inputs and Document Handling
-### 1.1 What the user uploads
-The tool operates on three inputs:
-1. **Company culture / values documents** (PDF/DOCX)
-2. **Resumes** (PDF/DOCX)
-3. **Job description** (pasted text)
-### 1.2 Resume anonymization
-Before resumes are stored or analyzed, the tool applies heuristic redaction:
-* Emails, phone numbers, URLs
-* Addresses / location identifiers
-* Explicit demographic fields
-* Likely name header (first line)
-This reduces exposure of personal identifiers and keeps analysis focused on job evidence.
-### 1.3 Vector stores (retrieval)
-The tool maintains two separate Chroma collections:
-* **Resumes** (anonymized + chunked)
-* **Culture docs** (chunked)
-Chunking uses a recursive splitter with overlap to preserve context.
----
-## 2) End-to-End Logic Flow (Step-by-Step)
-Below is the stepwise flow executed when a recruiter clicks **Analyze Candidates**.
-### Step 0 — Prerequisite: Documents exist in storage
-* Culture docs and resumes must be stored first.
-* If not stored, retrieval will be empty or low-signal.
-### Step 1 — Extract required skills from the Job Description (LLM-driven)
-**Goal:** Identify only skills that are explicitly required.
-* The tool prompts the LLM to return **JSON only**:
-  * `required_skills: [{skill, evidence_quote}]`
-* The LLM is instructed to:
-  * include only **MUST HAVE** / explicitly required skills
-  * exclude “nice-to-haves” and implied skills
-  * copy a short **verbatim quote** as evidence
-**LLM role:** structured extraction.
-**Failure behavior:** If JSON parsing fails, the tool stops and prints the raw output.
-### Step 2 — Verify extracted skills against the JD (deterministic, Python)
-**Goal:** Block hallucinated requirements from entering scoring.
-Each extracted item is classified:
-* **Quote-verified (strong):** the evidence quote appears verbatim in the JD
-* **Name-only (weak):** the skill name appears in the JD, but the quote doesn’t match
-* **Unverified (dropped):** neither quote nor name appears
-**Deterministic gate:**
-* Only **quote-verified** skills are used as the final required list for scoring.
-* Name-only and dropped skills are reported for transparency.
-**Output:** “Requirements Verification” section shows:
-* extracted count
-* quote-verified vs name-only vs dropped
-* list of skills used for scoring
-* list of retracted/dropped items (with reason)
-### Step 3 — Retrieve the most relevant culture chunks (deterministic retrieval)
-**Goal:** Ground culture evaluation in actual company documents.
-* The tool runs similarity search over culture docs using the JD as query.
-* It selects the top **k** chunks (e.g., k=3).
-**Deterministic component:** vector retrieval parameters.
-**Output artifact:** `culture_context` is the concatenated text of retrieved culture chunks.
-### Step 4 — Generate job-performance culture attributes (LLM-driven)
-**Goal:** Create a small set of job-relevant behavioral attributes to evaluate consistently.
-* The tool prompts the LLM to return JSON:
-  * `cultural_attributes: ["...", "..."]` (4–6 items)
-**Attribute rules:**
-* Must be job-performance aligned behaviors (e.g., “evidence-based decision making”).
-* Must avoid pedigree / class / prestige language.
-* Must avoid non-performance preferences (e.g., remote-first, time zone).
-**LLM role:** label generation from retrieved culture context.
-### Step 5 — Retrieve top resume chunks for the JD (deterministic retrieval)
-**Goal:** Identify the most relevant candidates and their relevant resume text.
-* The tool runs similarity search over resumes using the JD.
-* It retrieves top **k** chunks (e.g., k=10) and groups them by `resume_id`.
-**Note:** Only retrieved chunks are analyzed. If relevant evidence isn’t retrieved, it may be missed.
-### Step 6 — Culture evidence matching per candidate (LLM + deterministic cleanup + deterministic scoring)
-**Goal:** Determine which culture attributes are supported by resume evidence.
-**LLM-driven matching:**
-* For each attribute, the LLM may return a match with:
-  * `evidence_type`: `direct` or `inferred`
-  * `evidence_quotes`: 1–2 verbatim resume quotes
-  * `inference`: required for inferred
-  * `confidence`: 1–5
-**Deterministic cleanup rules (Python):**
-A match is kept only if:
-* attribute is present
-* evidence_type is `direct` or `inferred`
-* at least one non-trivial quote exists
-* confidence is an integer 1–5
-* inferred matches include an inference sentence
-* inferred matches can be required to meet a minimum confidence
-**Deterministic culture scoring (Python):**
-* Direct evidence weight: **1.0**
-* Inferred evidence weight: **0.5**
-Culture score is computed as:
-* `(sum(weights for matched attributes) / number_of_required_attributes) * 100`
-### Step 7 — Skills matching per candidate (LLM + deterministic scoring)
-**Goal:** Match only the verified required skills to resume evidence.
-**Inputs:**
-* Candidate resume text (retrieved chunks)
-* Verified required skills list (quote-only)
-**LLM output (JSON):**
-* `matched: [{skill, evidence_snippet}]`
-* `missing: [skill]` (treated as advisory; missing is recomputed deterministically)
-**Deterministic missing calculation (Python):**
-* Missing = required_set − matched_set
-**Deterministic skills scoring (Python):**
-* `(number_of_matched_required_skills / number_of_required_skills) * 100`
-### Step 8 — Implied competencies (NOT SCORED) for phone-screen guidance (LLM-driven, advisory)
-**Goal:** When a required skill is missing explicitly, suggest whether it may be **implied** by adjacent evidence.
-* This step is **not scored** and does not affect proceed/do-not-proceed.
-* The LLM may suggest implied competencies only if it:
-  * uses conservative language (“may be implied”)
-  * includes **verbatim resume quotes**
-  * provides a **phone-screen validation question**
-**Hard guardrail:** Tool-specific skills (e.g., R/SAS/MATLAB) must be explicitly present in the resume to be suggested.
-### Step 9 — Factuality verification (LLM-driven verifier)
-**Goal:** Detect ungrounded evidence claims.
-* The verifier checks evidence-backed match lines (e.g., `- Skill: snippet`).
-* It ignores:
-  * numeric score lines
-  * missing lists
-  * policy text
-**Outputs:**
-* verified claims (✓)
-* unverified claims (✗)
-* factuality score
-### Step 10 — Final recommendation (LLM, policy-constrained)
-**Goal:** Produce a structured recommendation without changing scores.
-* The model is given:
-  * skills analysis
-  * culture analysis
-  * fixed computed scores
-  * deterministic decision policy
-**Decision policy:**
-* If skills_score ≥ 70 → PROCEED
-* If skills_score < 60 → DO NOT PROCEED
-* If 60 ≤ skills_score < 70 → PROCEED only if culture_score ≥ 70 else DO NOT PROCEED
-**Non-negotiables:**
-* LLM must not re-score.
-* LLM must not introduce new claims.
-### Step 11 — Self-correction (triggered by verification issues)
-**Goal:** Remove/correct any unverified claims while preserving scores/policy.
-* If any unverified claims exist:
-  * The tool asks the LLM to revise the recommendation
-  * Only the flagged claims may be removed/corrected
-  * Scores and policy must remain unchanged
-### Step 12 — Bias audit (LLM-driven audit across docs + reasoning)
-**Goal:** Flag biased reasoning, biased JD language, or inconsistent standards.
-**Audit scope includes:**
-* Job description
-* Skills analysis
-* Culture analysis
-* Final recommendation text
-* Culture context
-**What it flags (examples):**
-* Prestige/pedigree signals (elite employers/education as proxy)
-* Vague “polish/executive presence” language not tied to job requirements
-* Non-job-related culture screening
-* Inconsistent standards (penalizing requirements not in JD)
-* Overclaiming certainty
-**Outputs:**
-* structured list of bias indicators (category, severity, trigger text, why it matters, recommended fix)
-* recruiter guidance
----
-## 3) Scoring and Decision Rules (Deterministic)
-### 3.1 Skills score
-* Only quote-verified required skills count.
-* Score = matches / required.
-### 3.2 Culture score
-* Score = weighted matches / attributes.
-* Direct = 1.0; inferred = 0.5.
-### 3.3 Labels
-* ≥70: Strong fit
-* 50–69: Moderate fit
-* <50: Not a fit
-### 3.4 Recommendation
-Recommendation follows the fixed policy described in Step 10.
----
-## 4) System Flow Diagram (Textual)
-Below is a simplified, end-to-end flow of how data moves through the system.
-```
-[User Uploads]
-   |
-   v
-+-------------------+
-| Culture Documents |
-+-------------------+        +-----------+
-           |                 | Job Desc  |
-           v                 +-----------+
-+-------------------+               |
-| Culture Vector DB |<--------------+
-+-------------------+               |
-           |                        v
-           |               +---------------------+
-           |               | Skill Extraction    |
-           |               | (LLM, JSON Output)  |
-           |               +---------------------+
-           |                        |
-           |                        v
-           |               +---------------------+
-           |               | Requirement         |
-           |               | Verification        |
-           |               | (Deterministic)     |
-           |               +---------------------+
-           |                        |
-           |                        v
-           |               Verified Required Skills
-           |                        |
-           |                        v
-+-------------------+        +---------------------+
-| Resume Documents  |------->| Resume Vector DB    |
-+-------------------+        +---------------------+
-                                   |
-                                   v
-                           Similarity Search (k=10)
-                                   |
-                                   v
-                           Resume Chunks (Grouped)
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Culture Attribute Generator |
-                     | (LLM, JSON Output)          |
-                     +-----------------------------+
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Culture Evidence Matching   |
-                     | (LLM + Rules + Weights)     |
-                     +-----------------------------+
-                                   |
-                                   v
-                     Culture Score (Deterministic)
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Technical Skill Matching    |
-                     | (LLM + Deterministic Scoring)|
-                     +-----------------------------+
-                                   |
-                                   v
-                     Skills Score (Deterministic)
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Implied Competencies (LLM)  |
-                     | (Not Scored, Advisory)      |
-                     +-----------------------------+
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Factuality Verification     |
-                     | (LLM Verifier)              |
-                     +-----------------------------+
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Recommendation Generator    |
-                     | (Policy-Constrained LLM)    |
-                     +-----------------------------+
-                                   |
-                                   v
-                     +-----------------------------+
-                     | Bias & Fairness Audit        |
-                     | (LLM Audit)                 |
-                     +-----------------------------+
-                                   |
-                                   v
-                           Final Recruiter Report
-```
----
-## 5) Audit Artifacts and Traceability
-For every analysis run, the system produces and retains multiple audit artifacts that enable post-hoc review, regulatory defensibility, and debugging.
-### 5.1 Input Artifacts
-1. **Original Job Description**
-   * Full pasted JD text
-2. **Sanitized Resume Text**
-   * Redacted resume content
-   * Redaction summary (internal)
-3. **Retrieved Culture Chunks**
-   * Top-k (default: 3) culture document segments
-   * Vector similarity scores (internal)
-4. **Retrieved Resume Chunks**
-   * Top-k (default: 10) resume segments
-   * Resume ID metadata
----
-### 5.2 Requirement Verification Artifacts
-1. **Raw LLM Skill Extraction Output**
-2. **Parsed Required Skills JSON**
-3. **Verification Classification Table**
-   * Quote-verified
-   * Name-only
-   * Dropped
-4. **Dropped-Skill Justifications**
----
-### 5.3 Culture Analysis Artifacts
-1. **Generated Culture Attribute List**
-2. **LLM Raw Matching Output**
-3. **Cleaned Match Records**
-   * Evidence type
-   * Quotes
-   * Inference
-   * Confidence
-4. **Weighted Match Table**
-5. **Computed Culture Score**
----
-### 5.4 Skills Analysis Artifacts
-1. **Verified Required Skill List**
-2. **LLM Raw Matching Output**
-3. **Accepted Matched Skills**
-4. **Deterministic Missing-Skill Set**
-5. **Computed Skills Score**
----
-### 5.5 Implied Competency Artifacts (Advisory)
-1. **Missing Skill List**
-2. **LLM Implied Output (JSON)**
-3. **Accepted Implied Records**
-   * Resume quotes
-   * Explanation
-   * Phone-screen questions
-4. **Rejected Inferences (internal)**
----
-### 5.6 Verification and Correction Artifacts
-1. **Verifier Prompt and Output**
-2. **Verified / Unverified Claim Lists**
-3. **Factuality Scores**
-4. **Self-Correction Prompts and Revisions (if triggered)**
----
-### 5.7 Recommendation and Policy Artifacts
-1. **Final Recommendation Prompt**
-2. **Policy Threshold Snapshot**
-3. **Immutable Score Values**
-4. **Generated Recommendation Text**
----
-### 5.8 Bias Audit Artifacts
-1. **Bias Audit Prompt**
-2. **Audit Input Bundle (JD + Analyses + Recommendation)**
-3. **Structured Bias Indicator List**
-4. **Severity and Mitigation Suggestions**
-5. **Recruiter Guidance Text**
----
-### 5.9 System Metadata
-1. Timestamp of run
-2. Model version
-3. Prompt versions
-4. Chunking parameters
-5. Retrieval k-values
-6. Scoring parameters
----
-## 6) Known Limitations
-1. **Retrieval scope**: evaluation depends on retrieved chunks; some evidence may be missed.
-2. **Attribute generation variance**: culture attributes can vary per run unless cached or cataloged.
-3. **LLM evidence overreach**: mitigated by verification and cleanup, but not eliminated.
-4. **Bias audit is advisory**: it flags issues; it does not enforce policy changes unless you add an auto-rewrite step.
----
-## 6) Governance and Change Control
-* Prompt changes must preserve JSON contracts.
-* Any change that affects scoring or policy should be versioned.
-* Audit outputs should be retained for traceability.
----
-## 7) Intended Use
-This tool is built for:
-* faster, evidence-based screening
-* transparent reasoning
-* safer use of LLMs via verification and audits
-It is not a substitute for:
-* human judgment
-* legal review
-* formal HR policy compliance
----
-## Diagram Flow
-### High-level pipeline (inputs → outputs)
-**Inputs uploaded by recruiter**
-1. Company culture/values docs (PDF/DOCX)
-2. Resumes (PDF/DOCX)
-3. Job description (text)
-⬇️
-**Indexing (deterministic, Python)**
-* Culture docs → chunk + embed → `culture_store`
-* Resumes → anonymize → chunk + embed → `resume_store`
-⬇️
-**Candidate assessment (per JD run)**
-1. **Extract required skills (LLM)** → JSON `required_skills[{skill,evidence_quote}]`
-2. **Verify extracted skills (Python)** → quote-verified / name-only / dropped → *quote-only list used for scoring*
-3. **Retrieve relevant culture context (deterministic retrieval)**
-* Query: JD
-* Retrieve: top-k culture chunks (**current: k=3**)
-* Output: `culture_context`
-4. **Generate job-relevant culture attributes (LLM)** → JSON `cultural_attributes[4–6]`
-5. **Retrieve relevant resume chunks (deterministic retrieval)**
-* Query: JD
-* Retrieve: top-k resume chunks (**current: k=10**)
-* Group by `resume_id`
-6. **Per candidate: culture matching (LLM → cleanup → deterministic score)**
-* LLM proposes matches (direct/inferred) + quotes
-* Python enforces validity gates
-* Deterministic weighted culture score (direct=1.0, inferred=0.5)
-7. **Per candidate: skills matching (LLM → deterministic score)**
-* LLM proposes matched skills + evidence snippets
-* Python recomputes missing list deterministically
-* Deterministic skills score using quote-verified requirements only
-8. **Per candidate: implied competencies (LLM, NOT SCORED)**
-* Inputs: missing skills + matched skills + resume + JD
-* Output: implied items with quotes + phone-screen questions
-* Guardrail: tool-like skills (R/SAS/MATLAB) require explicit mention
-9. **Factuality verification (LLM verifier)** → ✓/✗ for evidence-backed match lines + factuality score
-10. **Recommendation (LLM, policy constrained)** → uses fixed scores + fixed decision policy
-11. **Self-correction (conditional)** → triggered if any unverified claims exist
-12. **Bias audit (LLM)** → audits JD + analyses + recommendation → structured bias indicators + guidance
-⬇️
-**Outputs per candidate**
-* Requirements verification summary (global)
-* Culture analysis + score
-* Skills analysis + score
-* Implied (not scored) follow-ups
-* Fact-check results
-* Final recommendation (+ revision note if corrected)
-* Bias audit
----
-### Component map (LLM vs deterministic)
-**LLM-driven components**
-* Required skill extraction (JSON)
-* Culture attribute generation (JSON)
-* Culture match proposals (JSON)
-* Skills match proposals (JSON)
-* Implied (not scored) follow-ups (JSON)
-* Factuality verification (✓/✗)
-* Final recommendation (policy constrained)
-* Bias audit (structured)
-**Deterministic / Python-enforced components**
-* Resume anonymization
-* Chunking + embedding + storage
-* Retrieval parameters (top-k)
-* Required-skill verification (quote/name-only/dropped)
-* Deduplication of requirements
-* Culture match cleanup rules (validity gates)
-* Skills missing list recomputation
-* Skills score computation
-* Culture score computation with weights
-* Decision thresholds (proceed / do not proceed)
-* Self-correction trigger (presence of unverified claims)
----
-## Audit Artifacts
-This section lists the primary artifacts produced (or recommended to persist) to make runs reviewable and defensible.
-### Inputs (source-of-truth)
-* Job description text (as provided)
-* Culture documents (original files)
-* Resumes (original files)
-### Pre-processing
-* Sanitized resume text (post-anonymization)
-* Redaction notes (what was removed/masked)
-* Chunking configuration (chunk_size, chunk_overlap)
-* Embedding configuration (embedding model + settings)
-### Retrieval
-* Culture retrieval query: JD text
-* Culture retrieved chunks: top-k (**current: k=3**)
-* Resume retrieval query: JD text
-* Resume retrieved chunks: top-k (**current: k=10**)
-* Candidate grouping: chunks grouped by `resume_id`
-### Requirements verification
-* LLM `required_skills` JSON (raw)
-* Normalized required skill list (deduped)
-* Verification output:
-  * quote-verified list
-  * name-only list
-  * dropped/unverified list
-  * counts and factuality score
-* Final scoring-required list: quote-verified only
-### Per-candidate analyses
-**Culture analysis**
-* Raw LLM culture-match JSON
-* Post-cleanup matched culture list
-* Missing culture attributes list
-* Culture score + label
-* Culture evidence lines shown to recruiters
-**Skills analysis**
-* Raw LLM skills-match JSON
-* Matched skills list (with evidence snippets)
-* Deterministically computed missing skills list
-* Skills score + label
-**Implied (NOT SCORED)**
-* Raw LLM implied JSON
-* Filtered implied list (must include resume quotes + phone-screen questions)
-### Verification & correction
-* Verifier raw output (✓/✗ lines)
-* Verified claims list
-* Unverified claims list
-* Factuality score
-* Self-correction trigger status (yes/no)
-* Corrected recommendation (if triggered) + revision note
-### Bias audit
-* Bias audit raw output (structured)
-* Bias indicators list (category, severity, trigger_text, why_it_matters, recommended_fix)
-* Overall assessment
-* Recruiter guidance
-### Run-level trace (recommended)
-For reproducibility/governance, also persist:
-* Timestamp, model name, temperature, seed
-* Prompt versions (hash or version ID)
-* Retrieval parameters (k values)
-* Score thresholds and policy version
-* Any configuration overrides used during the run
 AI RECRUITING ASSISTANT — TABULAR PIPELINE (SWIM-LANE VIEW)
 +------+-------------------+----------------------------+------------------------------+------------------------------+
@@ -823,4 +57,4 @@ AI RECRUITING ASSISTANT — TABULAR PIPELINE (SWIM-LANE VIEW)
 Current Retrieval Parameters:
 - Culture store: k = 3 chunks (JD query)
-- Resume store:  k = 10 chunks (JD query)

 AI RECRUITING ASSISTANT — TABULAR PIPELINE (SWIM-LANE VIEW)
 +------+-------------------+----------------------------+------------------------------+------------------------------+
 Current Retrieval Parameters:
 - Culture store: k = 3 chunks (JD query)
+- Resume store:  k = 10 chunks (JD query)