File size: 5,994 Bytes

b76a01a

# 📄 ATS Score Predictor

This repository hosts a **MultinomialNB-based** model optimized for **ATS (Applicant Tracking System) Score Prediction** using text classification techniques. The model predicts how well a resume matches a job description based on ATS criteria.

## 📌 Model Details
- **Model Architecture**: Multinomial Naïve Bayes (MultinomialNB)
- **Task**: Resume Score Prediction
- **Dataset**: Job Listings & Resumes
- **Feature Extraction**: TF-IDF Vectorization
- **Evaluation Metrics**: Accuracy, Precision, Recall

## 🚀 Usage

### Installation
```bash
pip install pandas scikit-learn nltk
```

### Loading the Model
```python
import os
import PyPDF2
import pandas as pd
import re
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
df = pd.read_csv("job_data.csv")  # Replace with actual dataset path
```

### Preprocessing and Feature Extraction
```python
resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))

import re
def cleanResume(resumeText):
    resumeText = re.sub(r'\b\w{1,2}\b', '', resumeText)  
    resumeText = re.sub(r'[^a-zA-Z\s]', ' ', resumeText) 
    return resumeText.lower()

resumeDataSet['Cleaned_Resume'] = resumeDataSet['Resume_str'].apply(lambda x: cleanResume(str(x)))

print(resumeDataSet.head())

def clean_text(text):
    text = re.sub(r'[^\w\s]', '', str(text))  
    text = text.lower()
    return text

df['cleaned_job_info'] = df['JobDescription'].apply(clean_text)tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']

tfidf = TfidfVectorizer(max_features=1000)
X = tfidf.fit_transform(resumeDataSet['Cleaned_Resume'])
y = resumeDataSet['Category']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = MultinomialNB()
model.fit(X_train, y_train)
import joblib

# Train the model
from sklearn.naive_bayes import MultinomialNB

model = MultinomialNB()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, predictions))

def plot_confusion_matrix(y_true, y_pred, labels):
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    plt.figure(figsize=(10, 7))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=labels, yticklabels=labels)
    plt.title("Confusion Matrix")
    plt.ylabel("Actual")
    plt.xlabel("Predicted")
    plt.show()

def extract_text_from_pdf(pdf_path):
    text = ''
    with open(pdf_path, 'rb') as pdf_file:
        reader = PyPDF2.PdfReader(pdf_file)
        for page_num in range(len(reader.pages)):
            page = reader.pages[page_num]
            text += page.extract_text()
    return text

def calculate_ats_score(job_description, resume_text):
    job_keywords = set(re.findall(r'\b\w+\b', job_description.lower()))
    resume_keywords = set(re.findall(r'\b\w+\b', resume_text.lower()))
    
    matched_keywords = job_keywords.intersection(resume_keywords)
    ats_score = len(matched_keywords) / len(job_keywords) * 100  # percentage
    return ats_score

job_description = """
Seeking a Web Developer proficient in React.js and React Native to build scalable web and mobile applications. Must have experience with modern JavaScript frameworks and responsive design
"""
uploaded_pdf_path = "your resume path.pdf"  

if os.path.exists(uploaded_pdf_path):
    resume_text = extract_text_from_pdf(uploaded_pdf_path)
    cleaned_resume = cleanResume(resume_text)
    vectorized_resume = tfidf.transform([cleaned_resume])
prediction = model.predict(vectorized_resume)
print(f"Predicted Category: {prediction[0]}")
ats_score = calculate_ats_score(job_description, cleaned_resume)
print(f"ATS Score: {ats_score:.2f}%")
def plot_ats_score(ats_score):
    plt.figure(figsize=(6, 4))
    plt.barh(['ATS Score'], [ats_score], color='blue')
    plt.xlim(0, 100)
    plt.title('ATS Score Based on Resume Match')
    plt.xlabel('Percentage Match')
    plt.show()

plot_ats_score(ats_score)
```

### Training the Model
```python
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(df['cleaned_job_info'])
y = df['ATS_Score']  # Assume labeled ATS scores exist in dataset

model = MultinomialNB()
model.fit(X, y)
```

### Predicting ATS Score for a Resume
```python
def extract_text_from_pdf(pdf_path):
    document = fitz.open(pdf_path)
    text = ''
    for page_num in range(len(document)):
        page = document.load_page(page_num)
        text += page.get_text()
    return text

resume_text = extract_text_from_pdf('path_to_resume.pdf')
cleaned_resume = clean_text(resume_text)
resume_vector = vectorizer.transform([cleaned_resume])
predicted_score = model.predict(resume_vector)
print(f"Predicted ATS Score: {predicted_score}")
```

## 📊 Evaluation Results
| Metric       | Score  | Description |
|-------------|--------|------------------------------------|
| **Accuracy** | 89.2%  | Predicts ATS scores effectively |
| **Precision** | 85.5% | Correctly identifies well-matched resumes |
| **Recall** | 84.3% | Captures relevant resume-job pairs |

## 📂 Repository Structure
```bash
.
├── model/               # Trained MultinomialNB Model
├── dataset/             # Job Listings and Resume Data
├── results/             # Evaluation Metrics
├── README.md            # Model Documentation
```

## ⚠️ Limitations
- The model depends on **textual content** and does not assess **resume formatting**.
- **Feature extraction** impacts performance based on **resume structure and job descriptions**.
- The dataset should be **large and diverse** for optimal accuracy.