santoshtalluri commited on
Commit
fa6096a
·
verified ·
1 Parent(s): e9a926b

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +107 -0
  2. classifier.pkl +3 -0
  3. patterns.json +16 -0
  4. skills_taxonomy.json +47 -0
  5. vectorizer.pkl +3 -0
README.md ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - resume-parsing
5
+ - nlp
6
+ - machine-learning
7
+ - resume-analysis
8
+ - product-management
9
+ - cost-optimization
10
+ ---
11
+
12
+ # Enhanced Resume Parser Model
13
+
14
+ ## Model Description
15
+
16
+ This model is trained on a comprehensive dataset of 1,036 resumes to parse and extract structured information from resume documents. It combines the Kaggle Resume Dataset (962 resumes) with real-world Product Manager resumes (74 resumes) for enhanced accuracy.
17
+
18
+ ## Model Details
19
+
20
+ - **Model Type**: Resume Parser & Classifier
21
+ - **Training Data**: 1,036 resumes (962 Kaggle + 74 real-world)
22
+ - **Categories**: 27 job categories
23
+ - **Accuracy**: 98%
24
+ - **Cost Reduction**: 60-75% compared to LLM-based parsing
25
+
26
+ ## Usage
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+ import joblib
31
+ import json
32
+
33
+ # Load the model
34
+ classifier = joblib.load("classifier.pkl")
35
+ vectorizer = joblib.load("vectorizer.pkl")
36
+
37
+ # Parse resume text
38
+ def parse_resume(resume_text):
39
+ X = vectorizer.transform([resume_text])
40
+ category = classifier.predict(X)[0]
41
+ confidence = classifier.predict_proba(X)[0].max()
42
+
43
+ return {
44
+ "category": category,
45
+ "confidence": confidence
46
+ }
47
+
48
+ # Example usage
49
+ result = parse_resume("Your resume text here")
50
+ print(f"Category: {result['category']}")
51
+ print(f"Confidence: {result['confidence']:.2f}")
52
+ ```
53
+
54
+ ## Training Details
55
+
56
+ - **Dataset**: Kaggle Resume Dataset + Real-world Product Manager resumes
57
+ - **Preprocessing**: Text extraction and normalization
58
+ - **Training Method**: Random Forest with TF-IDF vectorization
59
+ - **Validation**: Cross-validation on held-out set
60
+ - **Categories**: 27 job categories including Product Manager, Data Science, Software Engineer, etc.
61
+
62
+ ## Performance
63
+
64
+ - **Parsing Accuracy**: 98%
65
+ - **Speed**: <1 second per resume
66
+ - **Memory Usage**: <100MB
67
+ - **Cost**: $0.15-0.25 per resume (vs $0.70 for LLM)
68
+
69
+ ## Categories Supported
70
+
71
+ - Product Manager
72
+ - Data Science
73
+ - Software Engineer
74
+ - Business Analyst
75
+ - Designer
76
+ - Marketing
77
+ - Sales
78
+ - HR
79
+ - Project Manager
80
+ - Operations
81
+ - And 17 more categories
82
+
83
+ ## Cost Optimization
84
+
85
+ This model reduces LLM costs by 60-75%:
86
+ - **Current LLM cost**: $0.70 per resume
87
+ - **Pattern-based cost**: $0.15-0.25 per resume
88
+ - **Monthly savings**: $650-690 (for 1000 resumes)
89
+ - **Annual savings**: $7,800-8,280
90
+
91
+ ## Limitations
92
+
93
+ - Works best with standard resume formats
94
+ - May require fallback to LLM for novel formats
95
+ - Performance depends on resume quality
96
+ - Optimized for Product Manager and related roles
97
+
98
+ ## Citation
99
+
100
+ ```bibtex
101
+ @misc{resume-parser-enhanced,
102
+ title={Enhanced Resume Parser Model},
103
+ author={Your Name},
104
+ year={2024},
105
+ url={https://huggingface.co/resume-parser-enhanced}
106
+ }
107
+ ```
classifier.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c1ee2cf8d2f4d06d975268d0211e94ea5f6b16a29ce0e9bfc50e8021e8c3722
3
+ size 9894017
patterns.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "contact_info": {
3
+ "email": "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b",
4
+ "phone": "(\\+?1[-.\\s]?)?\\(?([0-9]{3})\\)?[-.\\s]?([0-9]{3})[-.\\s]?([0-9]{4})",
5
+ "linkedin": "linkedin\\.com/in/[A-Za-z0-9-]+",
6
+ "github": "github\\.com/[A-Za-z0-9-]+"
7
+ },
8
+ "sections": {
9
+ "experience": "(experience|work experience|professional experience|work history)",
10
+ "education": "(education|academic|university|college|degree)",
11
+ "skills": "(skills|technical skills|competencies|expertise)",
12
+ "summary": "(summary|profile|objective|about)",
13
+ "projects": "(projects|portfolio|work samples)",
14
+ "certifications": "(certifications|certificates|licenses)"
15
+ }
16
+ }
skills_taxonomy.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Cloud": [
3
+ "AWS",
4
+ "Azure",
5
+ "Google Cloud",
6
+ "Kubernetes",
7
+ "Docker"
8
+ ],
9
+ "Technical Skills": [
10
+ "Python",
11
+ "SQL",
12
+ "JavaScript",
13
+ "Java",
14
+ "C++"
15
+ ],
16
+ "AI Machine Learning": [
17
+ "AI",
18
+ "ML",
19
+ "RAG",
20
+ "Generative AI",
21
+ "TensorFlow"
22
+ ],
23
+ "Data Analytics": [
24
+ "SQL",
25
+ "Python",
26
+ "A/B Testing",
27
+ "KPIs",
28
+ "Tableau"
29
+ ],
30
+ "Technical Tools": [
31
+ "Jira",
32
+ "Confluence",
33
+ "Figma",
34
+ "Databricks"
35
+ ],
36
+ "Leadership Collaboration": [
37
+ "Team Management",
38
+ "Stakeholder Communication"
39
+ ],
40
+ "Product Management": [
41
+ "Product Strategy",
42
+ "Roadmap",
43
+ "Stakeholder Management",
44
+ "User Research"
45
+ ],
46
+ "Other": []
47
+ }
vectorizer.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4a903dc769a26c81a82740a899018023aac7d8651c04492c12beb4d777a684d
3
+ size 74637