BentoUniAcc commited on
Commit
a73cdfa
·
verified ·
1 Parent(s): f328788

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -212,7 +212,7 @@ KMeans splits the data into four roughly equal blobs with significant overlap in
212
 
213
  ![12_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/GmVq3G3FJO2ucpXWGywAl.png)
214
 
215
- DBSCAN classifies the vast majority of points as noise, forming only one meaningful cluster. High dimensionality makes distance-based density estimation ineffective on this dataset.
216
 
217
 
218
  ![13_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/j6b9SMso4DovtAbKgqnSj.png)
@@ -324,6 +324,13 @@ Each bin shows a clean salary range with minimal overlap at the boundaries, conf
324
 
325
  Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
326
 
 
 
 
 
 
 
 
327
  ### Results
328
 
329
  | Model | Accuracy | F1 (weighted) |
 
212
 
213
  ![12_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/GmVq3G3FJO2ucpXWGywAl.png)
214
 
215
+ DBSCAN classifies the vast majority of points as noise, forming 7 clusters. High dimensionality makes distance-based density estimation ineffective on this dataset.
216
 
217
 
218
  ![13_48_Separate_scatter_plots](https://cdn-uploads.huggingface.co/production/uploads/69d8c774af594a45bf54cc48/j6b9SMso4DovtAbKgqnSj.png)
 
324
 
325
  Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
326
 
327
+ ### Precision vs. Recall & False Positives vs. False Negatives
328
+
329
+ **Recall is prioritised over precision** in this task. Misclassifying a developer into a lower salary tier (a false negative) carries real-world cost — under-negotiation, poor benchmarking, missed career leverage — whereas a false positive (over-predicting a tier) is relatively benign.
330
+
331
+ **False Negatives are more critical than False Positives.** Predicting "Mid" when a developer is truly "High" or "Very High" obscures their earning potential. For this reason, evaluation uses **weighted F1-score**, which balances precision and recall across all four classes with particular attention to recall in the minority tiers (Low and Very High).
332
+
333
+
334
  ### Results
335
 
336
  | Model | Accuracy | F1 (weighted) |