AventIQ-AI
/

Resume-Parsing-NER-AI-Model

Model card Files Files and versions

Resume-Parsing-NER-AI-Model / README.md

AmanSengar's picture

Update README.md

34f976f verified 8 months ago

|

history blame contribute delete

4.17 kB

	# 🧠 Resume-Parsing-NER-AI-Model

	A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text.

	---


	## ✨ Model Highlights

	- 📌 Base Model: bert-base-cased-resume-ner
	- 📚 Datasets: Custom annotated resume dataset (BIO format)
	- 🏷️ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title
	- 🔧 Framework: Hugging Face Transformers + PyTorch
	- 💾 Format: transformers model directory (with tokenizer and config)

	---


	## 🧠 Intended Uses

	- ✅ Resume parsing and candidate data extraction
	- ✅ Applicant Tracking Systems (ATS)
	- ✅ Automated HR screening tools
	- ✅ Resume data analytics and visualization
	- ✅ Chatbots and document understanding applications

	---

	## 🚫 Limitations

	- ❌ Performance may degrade on resumes with non-standard formatting
	- ❌ Might not capture entities in handwritten or image-based resumes
	- ❌ May not generalize to other document types without re-training

	---

	## 🏋️‍♂️ Training Details

	\| Attribute \| Value \|
	\|--------------------\|----------------------------------\|
	\| Base Model \| bert-base-cased \|
	\| Dataset \| Food-101-Dataset \|
	\| Task Type \| Token Classification (NER) \|
	\| Epochs \| 3 \|
	\| Batch Size \| 16 \|
	\| Optimizer \| AdamW \|
	\| Loss Function \| CrossEntropyLoss \|
	\| Framework \| PyTorch + Transformers \|
	\| Hardware \| CUDA-enabled GPU \|

	---

	## 📊 Evaluation Metrics


	\| Metric \| Score \|
	\| ----------------------------------------------- \| ----- \|
	\| Accuracy \| 0.98 \|
	\| F1-Score \| 0.98 \|
	\| Precision \| 0.97 \|
	\| Recall \| 0.98 \|


	---

	🚀 Usage
	```python
	from datasets import load_dataset
	from transformers import AutoTokenizer,
	from transformers import AutoModelForTokenClassification,
	from transformers import TrainingArguments, Trainer
	from transformers import pipeline


	# Load model and processor
	model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model"
	model = AutoModelForImageClassification.from_pretrained("bert-base-cased")

	from transformers import pipeline

	ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple")

	text = "John worked at Infosys as an Analyst. Email: john@email.com"
	ner_results = ner_pipe(text)

	for entity in ner_results:
	print(f"{entity['word']} → {entity['entity_group']} ({entity['score']:.2f})")
	label_list = [
	"O", # 0
	"B-NAME", # 1
	"I-NAME", # 2
	"B-EMAIL", # 3
	"I-EMAIL", # 4
	"B-PHONE", # 5
	"I-PHONE", # 6
	"B-EDUCATION", # 7
	"I-EDUCATION", # 8
	"B-SKILL", # 9
	"I-SKILL", # 10
	"B-COMPANY", # 11
	"I-COMPANY", # 12
	"B-JOB", # 13
	"I-JOB" # 14
	]

	```
	---

	- 🧩 Quantization
	- Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices.

	----

	🗂 Repository Structure
	```
	.
	beans-vit-finetuned/
	├── config.json ✅ Model configuration
	├── pytorch_model.bin ✅ Fine-tuned model weights
	├── tokenizer_config.json ✅ Tokenizer configuration
	├── vocab.txt ✅ BERT vocabulary
	├── training_args.bin ✅ Training parameters
	├── preprocessor_config.json ✅ Optional tokenizer pre-processing info
	├── README.md ✅ Model card

	```
	---
	🤝 Contributing

	Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.