BentoUniAcc
/

Stack_Overflow_Salary_Predicting_Model

salary-prediction

developer-survey

feature-engineering

gradient-boosting

Model card Files Files and versions

BentoUniAcc commited on 13 days ago

Commit

a73cdfa

·

verified ·

1 Parent(s): f328788

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -212,7 +212,7 @@ KMeans splits the data into four roughly equal blobs with significant overlap in
 ![12_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/GmVq3G3FJO2ucpXWGywAl.png)
-DBSCAN classifies the vast majority of points as noise, forming only one meaningful cluster. High dimensionality makes distance-based density estimation ineffective on this dataset.
 ![13_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/j6b9SMso4DovtAbKgqnSj.png)
@@ -324,6 +324,13 @@ Each bin shows a clean salary range with minimal overlap at the boundaries, conf
 Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
 ### Results
 | Model | Accuracy | F1 (weighted) |

 ![12_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/GmVq3G3FJO2ucpXWGywAl.png)
+DBSCAN classifies the vast majority of points as noise, forming 7 clusters. High dimensionality makes distance-based density estimation ineffective on this dataset.
 ![13_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/j6b9SMso4DovtAbKgqnSj.png)
 Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
+### Precision vs. Recall & False Positives vs. False Negatives
+**Recall is prioritised over precision** in this task. Misclassifying a developer into a lower salary tier (a false negative) carries real-world cost — under-negotiation, poor benchmarking, missed career leverage — whereas a false positive (over-predicting a tier) is relatively benign.
+**False Negatives are more critical than False Positives.** Predicting "Mid" when a developer is truly "High" or "Very High" obscures their earning potential. For this reason, evaluation uses **weighted F1-score**, which balances precision and recall across all four classes with particular attention to recall in the minority tiers (Low and Very High).
 ### Results
 | Model | Accuracy | F1 (weighted) |