Naandhu's picture
Update readme.md
556cace
---
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
tags:
- text-classification
- resume-classification
- fine-tuning
- python
- pytensors
- kaggle
---
# Model Card: Resume Classification Using BERT
## Model Overview
This model is a fine-tuned version of **`bert-base-uncased`** designed for multiclass classification. It categorizes resumes into one of **24 predefined job categories**, making it suitable for automated resume screening and classification tasks.
---
## Dataset
The dataset used for fine-tuning consists of **2400+ resumes** in string and PDF formats. These resumes are categorized into 24 job categories.
The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data
- **Classes**:
`['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']`
The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization.
**Preprocessing steps include**:
- Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces.
- All the functions available in preprocessing.py
---
## Model Configuration
- **Base Model**: `bert-base-uncased`
- **Fine-tuning Task**: Multiclass classification (24 classes)
- **Preprocessing Summary**: The preprocessing steps applied to the training data have been encapsulated in the `preprocess_function` to simplify and standardize usage.
- **Model Output**: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid().
- **Postprocessing**: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation.
---
## Training Details
The fine-tuning process involved:
- Input tokenization using `bert-base-uncased` tokenizer.
- Feeding preprocessed text into the BERT model for contextual understanding.
- Output logits normalized using the **sigmoid activation function** to produce probabilities for each class.
- The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification
---
## Model Output
The model provides raw output logits for each job category. These logits can be converted into probabilities using:
```python
import torch.nn as nn
sigmoid = nn.Sigmoid()
probs = sigmoid(logits)
```
The highest probability corresponds to the predicted job category.
---
## Use Cases
- Automated resume classification for HR platforms.
- Sorting resumes into industry-specific categories for targeted hiring processes.
- Candidate profiling and analysis for recruitment agencies.
---
## Limitations
- Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions.
- Preprocessing removes non-textual elements, which might strip out context-critical features.
- PDFs with poor formatting or heavy graphical content may not preprocess effectively.
---
## Citation
If you use this model in your work, please cite:
**"Resume Classification Model using BERT for Multiclass Job Categorization."**