| # π§ Resume-Parsing-NER-AI-Model | |
| A custom Named Entity Recognition (NER) model fine-tuned on annotated resume data using a pre-trained BERT architecture. This model extracts structured information such as names, emails, phone numbers, skills, job titles, education, and companies from raw resume text. | |
| --- | |
| ## β¨ Model Highlights | |
| - π Base Model: bert-base-cased-resume-ner | |
| - π Datasets: Custom annotated resume dataset (BIO format) | |
| - π·οΈ Entity Labels: Name, Email, Phone, Education, Skills, Company, Job Title | |
| - π§ Framework: Hugging Face Transformers + PyTorch | |
| - πΎ Format: transformers model directory (with tokenizer and config) | |
| --- | |
| ## π§ Intended Uses | |
| - β Resume parsing and candidate data extraction | |
| - β Applicant Tracking Systems (ATS) | |
| - β Automated HR screening tools | |
| - β Resume data analytics and visualization | |
| - β Chatbots and document understanding applications | |
| --- | |
| ## π« Limitations | |
| - β Performance may degrade on resumes with non-standard formatting | |
| - β Might not capture entities in handwritten or image-based resumes | |
| - β May not generalize to other document types without re-training | |
| --- | |
| ## ποΈββοΈ Training Details | |
| | Attribute | Value | | |
| |--------------------|----------------------------------| | |
| | Base Model | bert-base-cased | | |
| | Dataset | Food-101-Dataset | | |
| | Task Type | Token Classification (NER) | | |
| | Epochs | 3 | | |
| | Batch Size | 16 | | |
| | Optimizer | AdamW | | |
| | Loss Function | CrossEntropyLoss | | |
| | Framework | PyTorch + Transformers | | |
| | Hardware | CUDA-enabled GPU | | |
| --- | |
| ## π Evaluation Metrics | |
| | Metric | Score | | |
| | ----------------------------------------------- | ----- | | |
| | Accuracy | 0.98 | | |
| | F1-Score | 0.98 | | |
| | Precision | 0.97 | | |
| | Recall | 0.98 | | |
| --- | |
| π Usage | |
| ```python | |
| from datasets import load_dataset | |
| from transformers import AutoTokenizer, | |
| from transformers import AutoModelForTokenClassification, | |
| from transformers import TrainingArguments, Trainer | |
| from transformers import pipeline | |
| # Load model and processor | |
| model_name = "AventIQ-AI/Resume-Parsing-NER-AI-Model" | |
| model = AutoModelForImageClassification.from_pretrained("bert-base-cased") | |
| from transformers import pipeline | |
| ner_pipe = pipeline("ner", model="./resume-ner-model", tokenizer="./resume-ner-model", aggregation_strategy="simple") | |
| text = "John worked at Infosys as an Analyst. Email: john@email.com" | |
| ner_results = ner_pipe(text) | |
| for entity in ner_results: | |
| print(f"{entity['word']} β {entity['entity_group']} ({entity['score']:.2f})") | |
| label_list = [ | |
| "O", # 0 | |
| "B-NAME", # 1 | |
| "I-NAME", # 2 | |
| "B-EMAIL", # 3 | |
| "I-EMAIL", # 4 | |
| "B-PHONE", # 5 | |
| "I-PHONE", # 6 | |
| "B-EDUCATION", # 7 | |
| "I-EDUCATION", # 8 | |
| "B-SKILL", # 9 | |
| "I-SKILL", # 10 | |
| "B-COMPANY", # 11 | |
| "I-COMPANY", # 12 | |
| "B-JOB", # 13 | |
| "I-JOB" # 14 | |
| ] | |
| ``` | |
| --- | |
| - π§© Quantization | |
| - Post-training static quantization applied using PyTorch to reduce model size and accelerate inference on edge devices. | |
| ---- | |
| π Repository Structure | |
| ``` | |
| . | |
| beans-vit-finetuned/ | |
| βββ config.json β Model configuration | |
| βββ pytorch_model.bin β Fine-tuned model weights | |
| βββ tokenizer_config.json β Tokenizer configuration | |
| βββ vocab.txt β BERT vocabulary | |
| βββ training_args.bin β Training parameters | |
| βββ preprocessor_config.json β Optional tokenizer pre-processing info | |
| βββ README.md β Model card | |
| ``` | |
| --- | |
| π€ Contributing | |
| Open to improvements and feedback! Feel free to submit a pull request or open an issue if you find any bugs or want to enhance the model. | |