Spaces:

Chaitanya-aitf
/

Resume-Verification-System

Sleeping

App Files Files Community

Resume-Verification-System / README.md

Chaitanya-aitf

Update README.md

de038cf verified 6 months ago

preview code

raw

history blame contribute delete

8.05 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Resume Verification System
emoji: 😻
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Resume Verification System 🔍

An advanced AI-powered CV analysis tool that identifies claims, verifies evidence, detects red flags, and generates targeted interview questions. Built with Gradio and Google Gemini Flash 2.5 for deployment on Hugging Face Spaces.

🎯 Key Features

Core Capabilities

Smart Claim Extraction: Automatically identifies and categorizes all factual claims from resumes
Multi-Tier Evidence Validation: Verifies claims through link checking, repository forensics, and cross-section triangulation
Advanced Red Flag Detection: Identifies role-achievement mismatches, timeline inconsistencies, and implausible metrics
SOTA Verification: Validates research claims against known state-of-the-art benchmarks
Interview Question Generation: Creates targeted questions based on unverified claims and red flags
Comprehensive Reporting: Exports detailed analysis in PDF, HTML, CSV, JSON, and interview checklist formats

Advanced Features

Dual-Score Model: Credibility Score + Consistency Score with weighted final assessment
Seniority-Aware Analysis: Adaptive thresholds based on candidate level (Intern/Junior/Mid/Senior/Lead)
Repository Forensics: Deep analysis of GitHub/GitLab repositories including commit history and authorship
Artifact Credibility Tiers: Weighted evidence scoring (DOI/arXiv > Corporate Blog > Personal Blog)
Buzzword Detection: Identifies and penalizes vague claims and excessive buzzwords
Timeline Validation: Detects employment gaps, overlapping positions, and technology anachronisms
Bias Mitigation: Strips protected attributes and ensures fair assessment

🚀 Quick Start

Prerequisites

Python 3.8+
Google Gemini API Key (get from https://makersuite.google.com/app/apikey)

Installation

Clone the repository:

git clone https://github.com/yourusername/resume_verifier.git
cd resume_verifier

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Open browser to http://localhost:7860

📖 Usage Guide

Step 1: Initialize Session

Enter your Gemini API key in the Setup tab
Click "Initialize Session"
Wait for confirmation message

Step 2: Upload Resume

Go to the Analysis tab
Upload a resume (PDF, DOCX, or TXT)
Select seniority level
Choose analysis strictness (Low/Medium/High)
Enable deep analysis for thorough verification

Step 3: Run Analysis

Click "Analyze Resume"
Wait for progress completion (typically 30-60 seconds)
Review the summary in the Analysis tab

Step 4: Review Results

Results Dashboard: View credibility scores and evidence heatmap
Interview Prep: Review red flags and generated interview questions
Export: Download comprehensive reports in various formats

🏗️ System Architecture

resume_verifier/
├── app.py                          # Main Gradio application
├── requirements.txt                # Python dependencies
├── README.md                       # Documentation
├── config/
│   ├── __init__.py
│   └── prompts.py                 # Gemini prompt templates
├── modules/
│   ├── __init__.py
│   ├── cv_parser.py              # Document parsing and text extraction
│   ├── claim_extractor.py        # Claim identification via Gemini
│   ├── evidence_validator.py     # Evidence scoring and validation
│   ├── red_flag_detector.py      # Red flag and inconsistency detection
│   └── sota_checker.py           # Research claim verification
├── utils/
│   ├── __init__.py
│   └── gemini_client.py          # Gemini API wrapper with caching
├── visualization/
│   ├── __init__.py
│   ├── evidence_heatmap.py       # Interactive visualizations
│   └── report_generator.py       # Multi-format report generation

🔧 Configuration

Seniority Levels

Intern: Lenient thresholds, expects limited evidence
Junior: Basic verification, 1-3 years experience
Mid: Standard verification, 3-5 years experience
Senior: Strict verification, 5+ years experience
Lead: Highest scrutiny for leadership claims

Strictness Levels

Low: 0.7x severity multiplier, fewer red flags
Medium: 1.0x standard detection (recommended)
High: 1.3x severity multiplier, aggressive flagging

Evidence Tiers

Tier	Weight	Examples
DOI/ArXiv	1.0	doi.org, arxiv.org, ACM, IEEE
GitHub Active	0.9	GitHub/GitLab repositories with activity
Corporate Blog	0.8	Company engineering blogs
Portfolio	0.7	Personal portfolio sites
Personal Blog	0.6	Medium, dev.to, personal blogs

📊 Scoring System

Final Score Calculation

Final Score = (Credibility × 0.6) + (Consistency × 0.4)

Risk Assessment

Low Risk (75-100): Strong evidence, minimal concerns
Medium Risk (50-74): Some unverified claims, standard verification needed
High Risk (25-49): Multiple red flags, detailed verification required
Critical Risk (0-24): Major concerns, consider rejection

Red Flag Severity

Critical (-30 points): Major inconsistencies or fabrications
High (-20 points): Significant credibility issues
Medium (-10 points): Moderate concerns requiring clarification
Low (-5 points): Minor issues or vagueness

🔍 Red Flag Categories

1. Role-Achievement Mismatch

Leadership claims in junior roles
Senior achievements without corresponding titles
Complex projects with impossibly short tenures

2. Timeline Issues

Overlapping full-time positions
Employment gaps > 3 months
Technologies used before public release

3. Metric Implausibility

Improvements > 200% in short timeframes
User numbers exceeding company scale
SOTA claims beyond published benchmarks

4. Vagueness Indicators

High buzzword density (>20%)
No quantifiable metrics
Generic descriptions without specifics

5. Over-claiming Patterns

15 "expert" level skills
All projects claimed as "successful"
Sole credit for team achievements

📈 SOTA Benchmarks (2025)

Computer Vision

ImageNet Accuracy: 92.8%
COCO mAP: 65.5%
CIFAR-10: 99.5%

NLP

SQUAD F1: 97.8%
GLUE Average: 94.2%
WMT BLEU: 43.1

Speech

LibriSpeech WER (clean): 0.7%

🛡️ Security & Privacy

API Key Security

Session-scoped storage only
No persistent storage
In-memory TTL management

PII Protection

Automatic redaction of phone/email/address
Protected attribute removal
RBAC for multi-user deployments

Rate Limiting

60 requests/minute to Gemini API
Exponential backoff on failures
Response caching to minimize API calls

🚀 Deployment

Hugging Face Spaces

Create new Space
Select Gradio SDK
Upload repository files
Set environment variables:
```
GEMINI_API_KEY=your_key_here
```
Deploy and share URL

Docker Deployment

FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

📝 API Usage

Programmatic Analysis

from modules.cv_parser import CVParser
from modules.claim_extractor import ClaimExtractor
from utils.gemini_client import GeminiClient

# Initialize
client = GeminiClient(api_key="your_key")
parser = CVParser()
extractor = ClaimExtractor(client)

# Analyze
parsed = parser.parse("resume.pdf")
claims = extractor.extract_claims(parsed, seniority_level="mid")

🧪 Testing

Run Tests

pytest tests/

Built with ❤️ for better hiring decisions