Naandhu
/

bert-resume-classifier

Text Classification

resume-classification

Model card Files Files and versions

bert-resume-classifier / README.md

Naandhu's picture

Update readme.md

556cace over 1 year ago

|

history blame contribute delete

3.52 kB

	---
	base_model:
	- google-bert/bert-base-uncased
	pipeline_tag: text-classification
	tags:
	- text-classification
	- resume-classification
	- fine-tuning
	- python
	- pytensors
	- kaggle
	---
	# Model Card: Resume Classification Using BERT

	## Model Overview

	This model is a fine-tuned version of `bert-base-uncased` designed for multiclass classification. It categorizes resumes into one of 24 predefined job categories, making it suitable for automated resume screening and classification tasks.

	---

	## Dataset

	The dataset used for fine-tuning consists of 2400+ resumes in string and PDF formats. These resumes are categorized into 24 job categories.
	The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data

	- Classes:
	`['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']`

	The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization.
	Preprocessing steps include:
	- Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces.
	- All the functions available in preprocessing.py

	---

	## Model Configuration

	- Base Model: `bert-base-uncased`
	- Fine-tuning Task: Multiclass classification (24 classes)
	- Preprocessing Summary: The preprocessing steps applied to the training data have been encapsulated in the `preprocess_function` to simplify and standardize usage.

	- Model Output: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid().

	- Postprocessing: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation.

	---

	## Training Details

	The fine-tuning process involved:
	- Input tokenization using `bert-base-uncased` tokenizer.
	- Feeding preprocessed text into the BERT model for contextual understanding.
	- Output logits normalized using the sigmoid activation function to produce probabilities for each class.
	- The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification

	---

	## Model Output

	The model provides raw output logits for each job category. These logits can be converted into probabilities using:

	```python
	import torch.nn as nn

	sigmoid = nn.Sigmoid()
	probs = sigmoid(logits)
	```

	The highest probability corresponds to the predicted job category.

	---

	## Use Cases

	- Automated resume classification for HR platforms.
	- Sorting resumes into industry-specific categories for targeted hiring processes.
	- Candidate profiling and analysis for recruitment agencies.

	---

	## Limitations

	- Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions.
	- Preprocessing removes non-textual elements, which might strip out context-critical features.
	- PDFs with poor formatting or heavy graphical content may not preprocess effectively.

	---

	## Citation

	If you use this model in your work, please cite:
	"Resume Classification Model using BERT for Multiclass Job Categorization."