Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,278 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- text-classification
|
| 6 |
+
- bert
|
| 7 |
+
- resume-analysis
|
| 8 |
+
- job-classification
|
| 9 |
+
- nlp
|
| 10 |
+
datasets:
|
| 11 |
+
- resume-dataset
|
| 12 |
+
metrics:
|
| 13 |
+
- accuracy
|
| 14 |
+
- matthews_correlation
|
| 15 |
+
model-index:
|
| 16 |
+
- name: resume-analyser-bert
|
| 17 |
+
results:
|
| 18 |
+
- task:
|
| 19 |
+
type: text-classification
|
| 20 |
+
name: Resume Classification
|
| 21 |
+
dataset:
|
| 22 |
+
name: Resume Dataset
|
| 23 |
+
type: resume-dataset
|
| 24 |
+
metrics:
|
| 25 |
+
- type: accuracy
|
| 26 |
+
value: 1.0
|
| 27 |
+
name: Validation Accuracy
|
| 28 |
+
- type: matthews_correlation
|
| 29 |
+
value: 1.0
|
| 30 |
+
name: Matthews Correlation Coefficient
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
# Resume Analyser - BERT for Job Category Classification
|
| 34 |
+
|
| 35 |
+
## Model Description
|
| 36 |
+
|
| 37 |
+
This is a fine-tuned BERT-base-uncased model for classifying resumes into 25 different job categories. The model achieved **100% validation accuracy** on the test dataset.
|
| 38 |
+
|
| 39 |
+
## Model Details
|
| 40 |
+
|
| 41 |
+
- **Model Type:** BERT for Sequence Classification
|
| 42 |
+
- **Base Model:** bert-base-uncased
|
| 43 |
+
- **Parameters:** 109,501,465 (all trainable)
|
| 44 |
+
- **Language:** English
|
| 45 |
+
- **License:** MIT
|
| 46 |
+
- **Training Data:** 962 resumes from Kaggle Resume Dataset
|
| 47 |
+
- **Categories:** 25 job categories
|
| 48 |
+
|
| 49 |
+
## Intended Use
|
| 50 |
+
|
| 51 |
+
This model is designed to automatically classify resumes into job categories based on their content. It can be used for:
|
| 52 |
+
|
| 53 |
+
- Automated resume screening systems
|
| 54 |
+
- Job recommendation systems
|
| 55 |
+
- HR automation tools
|
| 56 |
+
- Resume parsing applications
|
| 57 |
+
- Career guidance systems
|
| 58 |
+
|
| 59 |
+
## Training Data
|
| 60 |
+
|
| 61 |
+
The model was trained on the [Resume Dataset](https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset) containing:
|
| 62 |
+
- **Total Samples:** 962 resumes
|
| 63 |
+
- **Train/Test Split:** 80/20 (769 training, 193 validation)
|
| 64 |
+
- **Categories:** 25 job categories
|
| 65 |
+
|
| 66 |
+
### Job Categories
|
| 67 |
+
|
| 68 |
+
Data Science, Java Developer, Testing, DevOps Engineer, Python Developer, Web Developer, HR, Hadoop, Blockchain, ETL Developer, Operations Manager, Sales, Mechanical Engineer, Arts, Database, Electrical Engineering, Health and Fitness, PMO, Business Analyst, DotNet Developer, Automation Testing, Network Security Engineer, SAP Developer, Civil Engineer, Advocate
|
| 69 |
+
|
| 70 |
+
## Training Procedure
|
| 71 |
+
|
| 72 |
+
### Training Hyperparameters
|
| 73 |
+
|
| 74 |
+
- **Base Model:** bert-base-uncased
|
| 75 |
+
- **Batch Size:** 4
|
| 76 |
+
- **Epochs:** 5
|
| 77 |
+
- **Learning Rate:** 2e-5
|
| 78 |
+
- **Optimizer:** AdamW (foreach=False)
|
| 79 |
+
- **Max Sequence Length:** 200 tokens
|
| 80 |
+
- **Warmup Steps:** Linear scheduler
|
| 81 |
+
- **GPU:** NVIDIA GeForce RTX 3060 Laptop GPU (6GB VRAM)
|
| 82 |
+
|
| 83 |
+
### Training Results
|
| 84 |
+
|
| 85 |
+
| Epoch | Training Loss | Validation Loss | Validation Accuracy | MCC Score |
|
| 86 |
+
|-------|---------------|-----------------|---------------------|-----------|
|
| 87 |
+
| 1 | 2.6037 | 1.1563 | 53.37% | 0.4993 |
|
| 88 |
+
| 2 | 0.9651 | 0.2858 | 98.96% | 0.9891 |
|
| 89 |
+
| 3 | 0.5804 | 0.2782 | 100.00% | 1.0000 |
|
| 90 |
+
| 4 | 0.4473 | 0.2774 | 100.00% | 1.0000 |
|
| 91 |
+
| 5 | 0.3604 | 0.2767 | 100.00% | 1.0000 |
|
| 92 |
+
|
| 93 |
+
**Training Time:** ~72 minutes for 5 epochs
|
| 94 |
+
|
| 95 |
+
## Performance Metrics
|
| 96 |
+
|
| 97 |
+
- **Validation Accuracy:** 100%
|
| 98 |
+
- **Matthews Correlation Coefficient:** 1.0000 (perfect correlation)
|
| 99 |
+
- **Final Training Loss:** 0.3604
|
| 100 |
+
- **Final Validation Loss:** 0.2767
|
| 101 |
+
|
| 102 |
+
The model achieved perfect classification on the validation set, correctly identifying all 193 test resumes.
|
| 103 |
+
|
| 104 |
+
## Usage
|
| 105 |
+
|
| 106 |
+
### Installation
|
| 107 |
+
|
| 108 |
+
```bash
|
| 109 |
+
pip install transformers torch
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
### Quick Start
|
| 113 |
+
|
| 114 |
+
```python
|
| 115 |
+
from transformers import BertForSequenceClassification, BertTokenizer
|
| 116 |
+
import torch
|
| 117 |
+
|
| 118 |
+
# Load model and tokenizer
|
| 119 |
+
model = BertForSequenceClassification.from_pretrained('SwaKyxd/resume-analyser-bert')
|
| 120 |
+
tokenizer = BertTokenizer.from_pretrained('SwaKyxd/resume-analyser-bert')
|
| 121 |
+
|
| 122 |
+
# Set device
|
| 123 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
| 124 |
+
model.to(device)
|
| 125 |
+
model.eval()
|
| 126 |
+
|
| 127 |
+
# Example resume text
|
| 128 |
+
resume_text = """
|
| 129 |
+
Skills: Python, Machine Learning, Deep Learning, PyTorch, TensorFlow, NLP
|
| 130 |
+
Experience: 3 years in Data Science and AI model development
|
| 131 |
+
Projects: Built recommendation systems, sentiment analysis models
|
| 132 |
+
Education: Masters in Computer Science
|
| 133 |
+
"""
|
| 134 |
+
|
| 135 |
+
# Tokenize
|
| 136 |
+
inputs = tokenizer(
|
| 137 |
+
resume_text,
|
| 138 |
+
return_tensors='pt',
|
| 139 |
+
max_length=200,
|
| 140 |
+
padding='max_length',
|
| 141 |
+
truncation=True
|
| 142 |
+
)
|
| 143 |
+
|
| 144 |
+
# Move to device
|
| 145 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 146 |
+
|
| 147 |
+
# Predict
|
| 148 |
+
with torch.no_grad():
|
| 149 |
+
outputs = model(**inputs)
|
| 150 |
+
predictions = torch.argmax(outputs.logits, dim=1)
|
| 151 |
+
probabilities = torch.softmax(outputs.logits, dim=1)
|
| 152 |
+
confidence = probabilities[0][predictions[0]].item()
|
| 153 |
+
|
| 154 |
+
# Category mapping (example - adjust based on your label encoder)
|
| 155 |
+
categories = [
|
| 156 |
+
"Data Science", "Java Developer", "Testing", "DevOps Engineer",
|
| 157 |
+
"Python Developer", "Web Developer", "HR", "Hadoop", "Blockchain",
|
| 158 |
+
"ETL Developer", "Operations Manager", "Sales", "Mechanical Engineer",
|
| 159 |
+
"Arts", "Database", "Electrical Engineering", "Health and Fitness",
|
| 160 |
+
"PMO", "Business Analyst", "DotNet Developer", "Automation Testing",
|
| 161 |
+
"Network Security Engineer", "SAP Developer", "Civil Engineer", "Advocate"
|
| 162 |
+
]
|
| 163 |
+
|
| 164 |
+
print(f"Predicted Category: {categories[predictions.item()]}")
|
| 165 |
+
print(f"Confidence: {confidence:.2%}")
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
### Batch Processing
|
| 169 |
+
|
| 170 |
+
```python
|
| 171 |
+
resumes = [
|
| 172 |
+
"Python developer with 5 years experience in Django and Flask...",
|
| 173 |
+
"Experienced data scientist with expertise in machine learning...",
|
| 174 |
+
"Java backend developer skilled in Spring Boot and microservices..."
|
| 175 |
+
]
|
| 176 |
+
|
| 177 |
+
# Tokenize batch
|
| 178 |
+
inputs = tokenizer(
|
| 179 |
+
resumes,
|
| 180 |
+
return_tensors='pt',
|
| 181 |
+
max_length=200,
|
| 182 |
+
padding='max_length',
|
| 183 |
+
truncation=True
|
| 184 |
+
)
|
| 185 |
+
|
| 186 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
| 187 |
+
|
| 188 |
+
# Predict
|
| 189 |
+
with torch.no_grad():
|
| 190 |
+
outputs = model(**inputs)
|
| 191 |
+
predictions = torch.argmax(outputs.logits, dim=1)
|
| 192 |
+
|
| 193 |
+
for i, pred in enumerate(predictions):
|
| 194 |
+
print(f"Resume {i+1}: {categories[pred.item()]}")
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
## Limitations and Bias
|
| 198 |
+
|
| 199 |
+
- **Language:** Model is trained only on English resumes
|
| 200 |
+
- **Dataset Size:** Trained on 962 resumes, may not generalize to all resume formats
|
| 201 |
+
- **Domain Specific:** Performance may vary on resumes outside the 25 predefined categories
|
| 202 |
+
- **Text Format:** Best performance on plain text resumes; may need preprocessing for PDFs/DOCs
|
| 203 |
+
- **Perfect Accuracy:** The 100% accuracy suggests possible overfitting; recommend testing on new data
|
| 204 |
+
|
| 205 |
+
## Ethical Considerations
|
| 206 |
+
|
| 207 |
+
- This model should be used as an assistive tool, not as the sole decision-maker in hiring processes
|
| 208 |
+
- Human oversight is recommended for all automated resume screening
|
| 209 |
+
- Be aware of potential biases in the training data that may affect predictions
|
| 210 |
+
- Ensure compliance with employment laws and anti-discrimination regulations
|
| 211 |
+
- Protect candidate privacy and handle resume data securely
|
| 212 |
+
|
| 213 |
+
## Model Architecture
|
| 214 |
+
|
| 215 |
+
```
|
| 216 |
+
BertForSequenceClassification(
|
| 217 |
+
(bert): BertModel(
|
| 218 |
+
12 transformer layers
|
| 219 |
+
768 hidden dimensions
|
| 220 |
+
12 attention heads
|
| 221 |
+
110M parameters
|
| 222 |
+
)
|
| 223 |
+
(dropout): Dropout(p=0.1)
|
| 224 |
+
(classifier): Linear(768 -> 25)
|
| 225 |
+
)
|
| 226 |
+
```
|
| 227 |
+
|
| 228 |
+
## Technical Specifications
|
| 229 |
+
|
| 230 |
+
- **Framework:** PyTorch 2.6.0
|
| 231 |
+
- **Transformers:** 4.47.1
|
| 232 |
+
- **Tokenizer:** BertTokenizer (bert-base-uncased)
|
| 233 |
+
- **Max Sequence Length:** 200 tokens
|
| 234 |
+
- **Model Size:** ~436 MB
|
| 235 |
+
- **Precision:** FP32
|
| 236 |
+
|
| 237 |
+
## Citation
|
| 238 |
+
|
| 239 |
+
If you use this model in your research or application, please cite:
|
| 240 |
+
|
| 241 |
+
```bibtex
|
| 242 |
+
@misc{resume-analyser-bert,
|
| 243 |
+
author = {Sayan Mahalik},
|
| 244 |
+
title = {Resume Analyser - BERT for Job Category Classification},
|
| 245 |
+
year = {2025},
|
| 246 |
+
publisher = {HuggingFace},
|
| 247 |
+
url = {https://huggingface.co/SwaKyxd/resume-analyser-bert}
|
| 248 |
+
}
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
## Related Resources
|
| 252 |
+
|
| 253 |
+
- **GitHub Repository:** [Resume-Analyser](https://github.com/Swakyxd/Resume-Analyser)
|
| 254 |
+
- **Training Notebook:** Available in the GitHub repository
|
| 255 |
+
- **Base Model:** [bert-base-uncased](https://huggingface.co/bert-base-uncased)
|
| 256 |
+
- **Dataset:** [Resume Dataset on Kaggle](https://www.kaggle.com/datasets/gauravduttakiit/resume-dataset)
|
| 257 |
+
|
| 258 |
+
## Contact
|
| 259 |
+
|
| 260 |
+
For questions, issues, or feedback:
|
| 261 |
+
- GitHub: [Swakyxd/Resume-Analyser](https://github.com/Swakyxd/Resume-Analyser)
|
| 262 |
+
- Open an issue on GitHub for bug reports or feature requests
|
| 263 |
+
|
| 264 |
+
## License
|
| 265 |
+
|
| 266 |
+
This model is released under the MIT License. See the LICENSE file for details.
|
| 267 |
+
|
| 268 |
+
## Acknowledgments
|
| 269 |
+
|
| 270 |
+
- Hugging Face Transformers library
|
| 271 |
+
- BERT paper: [Devlin et al., 2018](https://arxiv.org/abs/1810.04805)
|
| 272 |
+
- Kaggle Resume Dataset contributors
|
| 273 |
+
- PyTorch team
|
| 274 |
+
|
| 275 |
+
---
|
| 276 |
+
|
| 277 |
+
**Model Card Version:** 1.0
|
| 278 |
+
**Last Updated:** November 2025
|