| | --- |
| | base_model: |
| | - google-bert/bert-base-uncased |
| | pipeline_tag: text-classification |
| | tags: |
| | - text-classification |
| | - resume-classification |
| | - fine-tuning |
| | - python |
| | - pytensors |
| | - kaggle |
| | --- |
| | # Model Card: Resume Classification Using BERT |
| |
|
| | ## Model Overview |
| |
|
| | This model is a fine-tuned version of **`bert-base-uncased`** designed for multiclass classification. It categorizes resumes into one of **24 predefined job categories**, making it suitable for automated resume screening and classification tasks. |
| |
|
| | --- |
| |
|
| | ## Dataset |
| |
|
| | The dataset used for fine-tuning consists of **2400+ resumes** in string and PDF formats. These resumes are categorized into 24 job categories. |
| | The dataset is available at https://www.kaggle.com/competitions/jarvis-calling-hiring-contest/data |
| |
|
| | - **Classes**: |
| | `['ACCOUNTANT', 'ADVOCATE', 'AGRICULTURE', 'APPAREL', 'ARTS', 'AUTOMOBILE', 'AVIATION', 'BANKING', 'BPO', 'BUSINESS-DEVELOPMENT', 'CHEF', 'CONSTRUCTION', 'CONSULTANT', 'DESIGNER', 'DIGITAL-MEDIA', 'ENGINEERING', 'FINANCE', 'FITNESS', 'HEALTHCARE', 'HR', 'INFORMATION-TECHNOLOGY', 'PUBLIC-RELATIONS', 'SALES', 'TEACHER']` |
| |
|
| | The dataset underwent significant preprocessing to remove noise and improve text quality for tokenization. |
| | **Preprocessing steps include**: |
| | - Removal of HTML tags, URLs, punctuation, unicode characters, escape sequences, stop words, and irrelevant white spaces. |
| | - All the functions available in preprocessing.py |
| |
|
| | --- |
| |
|
| | ## Model Configuration |
| |
|
| | - **Base Model**: `bert-base-uncased` |
| | - **Fine-tuning Task**: Multiclass classification (24 classes) |
| | - **Preprocessing Summary**: The preprocessing steps applied to the training data have been encapsulated in the `preprocess_function` to simplify and standardize usage. |
| |
|
| | - **Model Output**: The raw output consists of logits for each class. To obtain probabilities, you can apply the sigmoid activation function using torch.nn.Sigmoid(). |
| |
|
| | - **Postprocessing**: A postprocessing utility, included as the postprocess_function, converts the raw logits into the corresponding classified class names in text format for easier interpretation. |
| | |
| | --- |
| | |
| | ## Training Details |
| | |
| | The fine-tuning process involved: |
| | - Input tokenization using `bert-base-uncased` tokenizer. |
| | - Feeding preprocessed text into the BERT model for contextual understanding. |
| | - Output logits normalized using the **sigmoid activation function** to produce probabilities for each class. |
| | - The entire training code is available in kaggle: https://www.kaggle.com/code/naandhu/bert-base-uncased-fine-tuned-for-classification |
| | |
| | --- |
| | |
| | ## Model Output |
| | |
| | The model provides raw output logits for each job category. These logits can be converted into probabilities using: |
| | |
| | ```python |
| | import torch.nn as nn |
| | |
| | sigmoid = nn.Sigmoid() |
| | probs = sigmoid(logits) |
| | ``` |
| | |
| | The highest probability corresponds to the predicted job category. |
| | |
| | --- |
| | |
| | ## Use Cases |
| | |
| | - Automated resume classification for HR platforms. |
| | - Sorting resumes into industry-specific categories for targeted hiring processes. |
| | - Candidate profiling and analysis for recruitment agencies. |
| | |
| | --- |
| | |
| | ## Limitations |
| | |
| | - Model performance is reliant on the quality and diversity of the dataset. Biases in the dataset may affect predictions. |
| | - Preprocessing removes non-textual elements, which might strip out context-critical features. |
| | - PDFs with poor formatting or heavy graphical content may not preprocess effectively. |
| | |
| | --- |
| | |
| | ## Citation |
| | |
| | If you use this model in your work, please cite: |
| | **"Resume Classification Model using BERT for Multiclass Job Categorization."** |
| | |