Chaitanya-aitf's picture
Update README.md
de038cf verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Resume Verification System
emoji: 😻
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Resume Verification System πŸ”

An advanced AI-powered CV analysis tool that identifies claims, verifies evidence, detects red flags, and generates targeted interview questions. Built with Gradio and Google Gemini Flash 2.5 for deployment on Hugging Face Spaces.

🎯 Key Features

Core Capabilities

  • Smart Claim Extraction: Automatically identifies and categorizes all factual claims from resumes
  • Multi-Tier Evidence Validation: Verifies claims through link checking, repository forensics, and cross-section triangulation
  • Advanced Red Flag Detection: Identifies role-achievement mismatches, timeline inconsistencies, and implausible metrics
  • SOTA Verification: Validates research claims against known state-of-the-art benchmarks
  • Interview Question Generation: Creates targeted questions based on unverified claims and red flags
  • Comprehensive Reporting: Exports detailed analysis in PDF, HTML, CSV, JSON, and interview checklist formats

Advanced Features

  • Dual-Score Model: Credibility Score + Consistency Score with weighted final assessment
  • Seniority-Aware Analysis: Adaptive thresholds based on candidate level (Intern/Junior/Mid/Senior/Lead)
  • Repository Forensics: Deep analysis of GitHub/GitLab repositories including commit history and authorship
  • Artifact Credibility Tiers: Weighted evidence scoring (DOI/arXiv > Corporate Blog > Personal Blog)
  • Buzzword Detection: Identifies and penalizes vague claims and excessive buzzwords
  • Timeline Validation: Detects employment gaps, overlapping positions, and technology anachronisms
  • Bias Mitigation: Strips protected attributes and ensures fair assessment

πŸš€ Quick Start

Prerequisites

Python 3.8+
Google Gemini API Key (get from https://makersuite.google.com/app/apikey)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/resume_verifier.git
cd resume_verifier
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py
  1. Open browser to http://localhost:7860

πŸ“– Usage Guide

Step 1: Initialize Session

  1. Enter your Gemini API key in the Setup tab
  2. Click "Initialize Session"
  3. Wait for confirmation message

Step 2: Upload Resume

  1. Go to the Analysis tab
  2. Upload a resume (PDF, DOCX, or TXT)
  3. Select seniority level
  4. Choose analysis strictness (Low/Medium/High)
  5. Enable deep analysis for thorough verification

Step 3: Run Analysis

  1. Click "Analyze Resume"
  2. Wait for progress completion (typically 30-60 seconds)
  3. Review the summary in the Analysis tab

Step 4: Review Results

  1. Results Dashboard: View credibility scores and evidence heatmap
  2. Interview Prep: Review red flags and generated interview questions
  3. Export: Download comprehensive reports in various formats

πŸ—οΈ System Architecture

resume_verifier/
β”œβ”€β”€ app.py                          # Main Gradio application
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ README.md                       # Documentation
β”œβ”€β”€ config/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── prompts.py                 # Gemini prompt templates
β”œβ”€β”€ modules/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ cv_parser.py              # Document parsing and text extraction
β”‚   β”œβ”€β”€ claim_extractor.py        # Claim identification via Gemini
β”‚   β”œβ”€β”€ evidence_validator.py     # Evidence scoring and validation
β”‚   β”œβ”€β”€ red_flag_detector.py      # Red flag and inconsistency detection
β”‚   └── sota_checker.py           # Research claim verification
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── gemini_client.py          # Gemini API wrapper with caching
β”œβ”€β”€ visualization/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ evidence_heatmap.py       # Interactive visualizations
β”‚   └── report_generator.py       # Multi-format report generation

πŸ”§ Configuration

Seniority Levels

  • Intern: Lenient thresholds, expects limited evidence
  • Junior: Basic verification, 1-3 years experience
  • Mid: Standard verification, 3-5 years experience
  • Senior: Strict verification, 5+ years experience
  • Lead: Highest scrutiny for leadership claims

Strictness Levels

  • Low: 0.7x severity multiplier, fewer red flags
  • Medium: 1.0x standard detection (recommended)
  • High: 1.3x severity multiplier, aggressive flagging

Evidence Tiers

Tier Weight Examples
DOI/ArXiv 1.0 doi.org, arxiv.org, ACM, IEEE
GitHub Active 0.9 GitHub/GitLab repositories with activity
Corporate Blog 0.8 Company engineering blogs
Portfolio 0.7 Personal portfolio sites
Personal Blog 0.6 Medium, dev.to, personal blogs

πŸ“Š Scoring System

Final Score Calculation

Final Score = (Credibility Γ— 0.6) + (Consistency Γ— 0.4)

Risk Assessment

  • Low Risk (75-100): Strong evidence, minimal concerns
  • Medium Risk (50-74): Some unverified claims, standard verification needed
  • High Risk (25-49): Multiple red flags, detailed verification required
  • Critical Risk (0-24): Major concerns, consider rejection

Red Flag Severity

  • Critical (-30 points): Major inconsistencies or fabrications
  • High (-20 points): Significant credibility issues
  • Medium (-10 points): Moderate concerns requiring clarification
  • Low (-5 points): Minor issues or vagueness

πŸ” Red Flag Categories

1. Role-Achievement Mismatch

  • Leadership claims in junior roles
  • Senior achievements without corresponding titles
  • Complex projects with impossibly short tenures

2. Timeline Issues

  • Overlapping full-time positions
  • Employment gaps > 3 months
  • Technologies used before public release

3. Metric Implausibility

  • Improvements > 200% in short timeframes
  • User numbers exceeding company scale
  • SOTA claims beyond published benchmarks

4. Vagueness Indicators

  • High buzzword density (>20%)
  • No quantifiable metrics
  • Generic descriptions without specifics

5. Over-claiming Patterns

  • 15 "expert" level skills

  • All projects claimed as "successful"
  • Sole credit for team achievements

πŸ“ˆ SOTA Benchmarks (2025)

Computer Vision

  • ImageNet Accuracy: 92.8%
  • COCO mAP: 65.5%
  • CIFAR-10: 99.5%

NLP

  • SQUAD F1: 97.8%
  • GLUE Average: 94.2%
  • WMT BLEU: 43.1

Speech

  • LibriSpeech WER (clean): 0.7%

πŸ›‘οΈ Security & Privacy

API Key Security

  • Session-scoped storage only
  • No persistent storage
  • In-memory TTL management

PII Protection

  • Automatic redaction of phone/email/address
  • Protected attribute removal
  • RBAC for multi-user deployments

Rate Limiting

  • 60 requests/minute to Gemini API
  • Exponential backoff on failures
  • Response caching to minimize API calls

πŸš€ Deployment

Hugging Face Spaces

  1. Create new Space
  2. Select Gradio SDK
  3. Upload repository files
  4. Set environment variables:
    GEMINI_API_KEY=your_key_here
    
  5. Deploy and share URL

Docker Deployment

FROM python:3.9-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

πŸ“ API Usage

Programmatic Analysis

from modules.cv_parser import CVParser
from modules.claim_extractor import ClaimExtractor
from utils.gemini_client import GeminiClient

# Initialize
client = GeminiClient(api_key="your_key")
parser = CVParser()
extractor = ClaimExtractor(client)

# Analyze
parsed = parser.parse("resume.pdf")
claims = extractor.extract_claims(parsed, seniority_level="mid")

πŸ§ͺ Testing

Run Tests

pytest tests/

Built with ❀️ for better hiring decisions