Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +107 -0
classifier.pkl +3 -0
patterns.json +16 -0
skills_taxonomy.json +47 -0
vectorizer.pkl +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,107 @@

+---
+license: apache-2.0
+tags:
+- resume-parsing
+- nlp
+- machine-learning
+- resume-analysis
+- product-management
+- cost-optimization
+---
+# Enhanced Resume Parser Model
+## Model Description
+This model is trained on a comprehensive dataset of 1,036 resumes to parse and extract structured information from resume documents. It combines the Kaggle Resume Dataset (962 resumes) with real-world Product Manager resumes (74 resumes) for enhanced accuracy.
+## Model Details
+- **Model Type**: Resume Parser & Classifier
+- **Training Data**: 1,036 resumes (962 Kaggle + 74 real-world)
+- **Categories**: 27 job categories
+- **Accuracy**: 98%
+- **Cost Reduction**: 60-75% compared to LLM-based parsing
+## Usage
+```python
+from transformers import pipeline
+import joblib
+import json
+# Load the model
+classifier = joblib.load("classifier.pkl")
+vectorizer = joblib.load("vectorizer.pkl")
+# Parse resume text
+def parse_resume(resume_text):
+    X = vectorizer.transform([resume_text])
+    category = classifier.predict(X)[0]
+    confidence = classifier.predict_proba(X)[0].max()
+    return {
+        "category": category,
+        "confidence": confidence
+    }
+# Example usage
+result = parse_resume("Your resume text here")
+print(f"Category: {result['category']}")
+print(f"Confidence: {result['confidence']:.2f}")
+```
+## Training Details
+- **Dataset**: Kaggle Resume Dataset + Real-world Product Manager resumes
+- **Preprocessing**: Text extraction and normalization
+- **Training Method**: Random Forest with TF-IDF vectorization
+- **Validation**: Cross-validation on held-out set
+- **Categories**: 27 job categories including Product Manager, Data Science, Software Engineer, etc.
+## Performance
+- **Parsing Accuracy**: 98%
+- **Speed**: <1 second per resume
+- **Memory Usage**: <100MB
+- **Cost**: $0.15-0.25 per resume (vs $0.70 for LLM)
+## Categories Supported
+- Product Manager
+- Data Science
+- Software Engineer
+- Business Analyst
+- Designer
+- Marketing
+- Sales
+- HR
+- Project Manager
+- Operations
+- And 17 more categories
+## Cost Optimization
+This model reduces LLM costs by 60-75%:
+- **Current LLM cost**: $0.70 per resume
+- **Pattern-based cost**: $0.15-0.25 per resume
+- **Monthly savings**: $650-690 (for 1000 resumes)
+- **Annual savings**: $7,800-8,280
+## Limitations
+- Works best with standard resume formats
+- May require fallback to LLM for novel formats
+- Performance depends on resume quality
+- Optimized for Product Manager and related roles
+## Citation
+```bibtex
+@misc{resume-parser-enhanced,
+  title={Enhanced Resume Parser Model},
+  author={Your Name},
+  year={2024},
+  url={https://huggingface.co/resume-parser-enhanced}
+}
+```

classifier.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c1ee2cf8d2f4d06d975268d0211e94ea5f6b16a29ce0e9bfc50e8021e8c3722
+size 9894017

patterns.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "contact_info": {
+    "email": "\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b",
+    "phone": "(\\+?1[-.\\s]?)?\\(?([0-9]{3})\\)?[-.\\s]?([0-9]{3})[-.\\s]?([0-9]{4})",
+    "linkedin": "linkedin\\.com/in/[A-Za-z0-9-]+",
+    "github": "github\\.com/[A-Za-z0-9-]+"
+  },
+  "sections": {
+    "experience": "(experience|work experience|professional experience|work history)",
+    "education": "(education|academic|university|college|degree)",
+    "skills": "(skills|technical skills|competencies|expertise)",
+    "summary": "(summary|profile|objective|about)",
+    "projects": "(projects|portfolio|work samples)",
+    "certifications": "(certifications|certificates|licenses)"
+  }
+}

skills_taxonomy.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "Cloud": [
+    "AWS",
+    "Azure",
+    "Google Cloud",
+    "Kubernetes",
+    "Docker"
+  ],
+  "Technical Skills": [
+    "Python",
+    "SQL",
+    "JavaScript",
+    "Java",
+    "C++"
+  ],
+  "AI Machine Learning": [
+    "AI",
+    "ML",
+    "RAG",
+    "Generative AI",
+    "TensorFlow"
+  ],
+  "Data Analytics": [
+    "SQL",
+    "Python",
+    "A/B Testing",
+    "KPIs",
+    "Tableau"
+  ],
+  "Technical Tools": [
+    "Jira",
+    "Confluence",
+    "Figma",
+    "Databricks"
+  ],
+  "Leadership Collaboration": [
+    "Team Management",
+    "Stakeholder Communication"
+  ],
+  "Product Management": [
+    "Product Strategy",
+    "Roadmap",
+    "Stakeholder Management",
+    "User Research"
+  ],
+  "Other": []
+}

vectorizer.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a903dc769a26c81a82740a899018023aac7d8651c04492c12beb4d777a684d
+size 74637