Spaces:

sammy786
/

ats-resume-optimizer

Runtime error

ats-resume-optimizer / diagnose_scoring.py

Salim Shaikh

Add comprehensive fraud/gaming detection - 25/25 edge cases pass

3038a81 4 months ago

2.68 kB

	#!/usr/bin/env python3
	"""
	Diagnostic Script - Understanding Why Empty/Minimal Resumes Get High Scores
	"""

	from app import ATSCompatibilityAnalyzer

	analyzer = ATSCompatibilityAnalyzer()

	# Test cases to diagnose
	test_cases = [
	("Empty Resume", "", "Software Engineer with Python experience"),
	("Just Name", "John Doe", "Looking for a Data Analyst with SQL and Python"),
	("Just 'Hi'", "Hi", "Senior Software Engineer"),
	("Random Gibberish", "asdfghjkl qwertyuiop zxcvbnm", "Machine Learning Engineer"),
	("Chef for ML Role", "Chef John - 10 years cooking - French cuisine, pastry", "Machine Learning Engineer with PhD and PyTorch"),
	]

	print("=" * 80)
	print("DIAGNOSIS: WHY ARE EMPTY/MINIMAL RESUMES GETTING HIGH SCORES?")
	print("=" * 80)

	for name, resume, jd in test_cases:
	print(f"\n{'='*60}")
	print(f"TEST: {name}")
	print(f"Resume: '{resume[:50]}...' ({len(resume)} chars)" if len(resume) > 50 else f"Resume: '{resume}' ({len(resume)} chars)")
	print(f"JD: '{jd[:50]}...'" if len(jd) > 50 else f"JD: '{jd}'")
	print("-" * 60)

	result = analyzer.analyze(resume, jd)

	print(f"\n📊 TOTAL SCORE: {result['total_score']}% <-- THIS IS THE PROBLEM!")
	print("\n🔍 BREAKDOWN (with weights):")

	weights = analyzer.weights
	for metric, score in result['breakdown'].items():
	weight = weights.get(metric, 0)
	weighted = score * weight
	print(f" {metric:20} = {score:5.1f}% × {weight:.2f} = {weighted:5.1f}")

	print(f"\n {'='*40}")
	print(f" WEIGHTED TOTAL: {result['total_score']}%")

	print("\n\n" + "=" * 80)
	print("🐛 ROOT CAUSE ANALYSIS")
	print("=" * 80)
	print("""
	The scoring functions have ARTIFICIALLY HIGH BASELINES:

	1. _format_score: baseline = 80 (even empty resume gets 80)
	2. _section_score: baseline = 80 (even no sections gets 80)
	3. _action_verb_score: baseline = 75 (0 verbs = 75%)
	4. _quantification: baseline = 68 (0 numbers = 68%)
	5. _tfidf_score: 60 + (raw * 0.45) (0% match = 60%)
	6. _skills_match: baseline = 75 (0 matches = 75%)
	7. _semantic_match: returns 75-85 default with no match

	This design was meant to prevent "harsh" scoring but it's BROKEN:
	- Empty resumes should score 0-10%, not 70%+
	- Completely irrelevant resumes should score <30%, not 80%+

	RECOMMENDED FIX:
	- Remove artificial baselines
	- Score from 0, not from 60-80
	- Apply minimum thresholds for valid input
	""")

	print("\n" + "=" * 80)
	print("WEIGHTS BEING USED:")
	print("=" * 80)
	for metric, weight in analyzer.weights.items():
	print(f" {metric:20} = {weight:.2f}")
	print(f"\nTotal weights sum: {sum(analyzer.weights.values()):.2f}")