# 🧠 Resume-Parsing-NER-AI-Model A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text. --- ## ✨ Model Highlights - 📌 Base Model: bert-base-cased-resume-ner - 📚 Datasets: Custom annotated resume dataset (BIO format) - 🏷️ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title - 🔧 Framework: Hugging Face Transformers + PyTorch - 💾 Format: transformers model directory (with tokenizer and config) --- ## 🧠 Intended Uses - ✅ Resume parsing and candidate data extraction - ✅ Applicant Tracking Systems (ATS) - ✅ Automated HR screening tools - ✅ Resume data analytics and visualization - ✅ Chatbots and document understanding applications --- ## 🚫 Limitations - ❌ Performance may degrade on resumes with non-standard formatting - ❌ Might not capture entities in handwritten or image-based resumes - ❌ May not generalize to other document types without re-training --- ## 🏋️‍♂️ Training Details | Attribute | Value | |--------------------|----------------------------------| | Base Model | bert-base-cased | | Dataset | Food-101-Dataset | | Task Type | Token Classification (NER) | | Epochs | 3 | | Batch Size | 16 | | Optimizer | AdamW | | Loss Function | CrossEntropyLoss | | Framework | PyTorch + Transformers | | Hardware | CUDA-enabled GPU | --- ## 📊 Evaluation Metrics | Metric | Score | | ----------------------------------------------- | ----- | | Accuracy | 0.98 | | F1-Score | 0.98 | | Precision | 0.97 | | Recall | 0.98 | --- 🚀 Usage ```python from datasets import load_dataset from transformers import AutoTokenizer, from transformers import AutoModelForTokenClassification, from transformers import TrainingArguments, Trainer from transformers import pipeline # Load model and processor model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model" model = AutoModelForImageClassification.from_pretrained("bert-base-cased") from transformers import pipeline ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple") text = "John worked at Infosys as an Analyst. Email: john@email.com" ner_results = ner_pipe(text) for entity in ner_results: print(f"{entity['word']} → {entity['entity_group']} ({entity['score']:.2f})") label_list = [ "O", # 0 "B-NAME", # 1 "I-NAME", # 2 "B-EMAIL", # 3 "I-EMAIL", # 4 "B-PHONE", # 5 "I-PHONE", # 6 "B-EDUCATION", # 7 "I-EDUCATION", # 8 "B-SKILL", # 9 "I-SKILL", # 10 "B-COMPANY", # 11 "I-COMPANY", # 12 "B-JOB", # 13 "I-JOB" # 14 ] ``` --- - 🧩 Quantization - Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. ---- 🗂 Repository Structure ``` . beans-vit-finetuned/ ├── config.json ✅ Model configuration ├── pytorch_model.bin ✅ Fine-tuned model weights ├── tokenizer_config.json ✅ Tokenizer configuration ├── vocab.txt ✅ BERT vocabulary ├── training_args.bin ✅ Training parameters ├── preprocessor_config.json ✅ Optional tokenizer pre-processing info ├── README.md ✅ Model card ``` --- 🤝 Contributing Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model.