med2425
/

bge-CV-fit

@@ -1,199 +1,138 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+language: en
+tags:
+- text-classification
+- resume
+- job-description
+- recruitment
+- bge-m3
+license: mit
 ---
+# Resume Job Fit Classifier
+A cross-encoder model for predicting whether a resume is a fit for a job description.
+## Model Description
+Fine-tuned [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) as a cross-encoder classifier on resume and job description pairs. The model takes a resume and a job description as input and predicts one of three classes: **Good Fit**, **No Fit**, or **Potential Fit**.
+The input is structured as:
+```
+[CLS] resume_text [SEP] job_description_text [SEP]
+```
+The transformer attention mechanism allows every resume token to attend to every JD token simultaneously, making this a true comparison model rather than independent embeddings.
+## Datasets
+Two datasets were used for training:
+1. [cnamuangtoun/resume-job-description-fit](https://huggingface.co/datasets/cnamuangtoun/resume-job-description-fit)
+   - Train: 5,616 pairs
+   - Test: 1,759 pairs (used as evaluation benchmark)
+   - Labels: Good Fit, No Fit, Potential Fit
+2. [kens1ang/resume-job-fit-augmented](https://huggingface.co/datasets/kens1ang/resume-job-fit-augmented)
+   - Train: 31,205 pairs
+   - Labels: Good Fit, No Fit, Potential Fit
+Combined training set: ~36,800 pairs
+Label distribution (combined):
+- No Fit: 50.4%
+- Good Fit: 24.7%
+- Potential Fit: 24.9%
 ## Training Details
+- **Base model:** BAAI/bge-m3 (570M parameters, supports up to 8192 tokens)
+- **Max sequence length:** 8192 tokens (resume: 4096, JD: 4000)
+- **Optimizer:** AdamW with layer-wise learning rates
+  - Bottom layers: LR / 10
+  - Top layers: full LR
+  - Classifier head: full LR
+- **Learning rate:** 8e-6 with cosine scheduler
+- **Warmup ratio:** 15%
+- **Batch size:** 1 per device, gradient accumulation steps: 32 (effective batch: 32)
+- **Epochs:** 40 max with early stopping patience 6
+- **Loss:** Weighted CrossEntropyLoss to handle class imbalance (No Fit = 50%)
+- **Sampling:** WeightedRandomSampler to oversample minority classes
+- **Good Fit weight boost:** 2x to prioritize finding the best candidates
+- **Label smoothing:** 0.1
+- **Dropout:** 0.3 classifier, 0.15 hidden layers
+- **Precision:** fp16 mixed precision
+- **Gradient checkpointing:** enabled
+- **Hardware:** NVIDIA RTX 4090 (24GB VRAM)
+## Results
+| Metric | Eval | Test |
+|---|---|---|
+| Accuracy | 97.06% | 54.80% |
+| Macro F1 | 96.96% | 52.13% |
+| F1 Good Fit | 97.21% | 42.46% |
+| F1 No Fit | 97.38% | 67.43% |
+| F1 Potential Fit | 96.30% | 46.50% |
+## Known Limitations & Open Problem
+There is a significant gap between eval (97%) and test (52%) performance. After extensive experimentation this appears to be caused by **label inconsistency between the two training datasets** — the augmented dataset uses different labeling criteria than the original dataset, and the test set follows the original dataset's labeling logic. The model learns contradictory rules and fails to generalize.
+**Things that were tried:**
+- Full fine-tuning vs frozen layers
+- 2-class (Fit/No Fit) vs 3-class classification — 2 classes gave 69% test F1
+- Layer-wise learning rates
+- Weighted loss + weighted sampling
+- Various dropout, weight decay, label smoothing values
+- Training on original dataset only — best test F1: 69% (2 classes)
+- Training on combined datasets — test F1 dropped to 52%
+**If you have ideas on how to overcome this gap, contributions and suggestions are welcome.** Possible directions:
+- A cleaner dataset labeled consistently by human recruiters
+- A base model pretrained specifically on recruitment text (e.g. JobBERT)
+- A better data mixing strategy to handle label inconsistency between datasets
+- Confidence thresholding at inference time
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+import numpy as np
+model = AutoModelForSequenceClassification.from_pretrained("med2425/bge-resume-fit")
+tokenizer = AutoTokenizer.from_pretrained("med2425/bge-resume-fit")
+model.eval()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+resume = """
+John Smith | Senior ML Engineer
+6 years experience building production ML systems.
+Skills: Python, PyTorch, TensorFlow, NLP, AWS, Docker.
+Built NLP pipelines processing 10M documents/day at TechCorp (2020-Present).
+Fine-tuned BERT models achieving 94% accuracy on document classification.
+B.Sc. Computer Science, State University 2018.
+"""
+jd = """
+Senior Machine Learning Engineer
+Requirements: 5+ years ML experience, strong Python,
+PyTorch or TensorFlow, NLP experience, production deployment on AWS/GCP/Azure,
+Bachelor in Computer Science or related field.
+"""
+inputs = tokenizer(resume, jd, return_tensors="pt", truncation=True, max_length=8192).to(device)
+with torch.no_grad():
+    probs = torch.softmax(model(**inputs).logits, dim=-1).squeeze().tolist()
+id2label = {0: "Good Fit", 1: "No Fit", 2: "Potential Fit"}
+for i, p in enumerate(probs):
+    print(f"{id2label[i]}: {p:.2%}")
+print(f"Prediction: {id2label[np.argmax(probs)]}")
+```
+> **Note:** Use full-length realistic resumes and job descriptions for best results.
+> The model was trained on resumes averaging 700 words and JDs averaging 400 words.
+> Very short inputs may produce unreliable predictions.