Update README.md
Browse files
README.md
CHANGED
|
@@ -212,7 +212,7 @@ KMeans splits the data into four roughly equal blobs with significant overlap in
|
|
| 212 |
|
| 213 |

|
| 214 |
|
| 215 |
-
DBSCAN classifies the vast majority of points as noise, forming
|
| 216 |
|
| 217 |
|
| 218 |

|
|
@@ -324,6 +324,13 @@ Each bin shows a clean salary range with minimal overlap at the boundaries, conf
|
|
| 324 |
|
| 325 |
Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
|
| 326 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 327 |
### Results
|
| 328 |
|
| 329 |
| Model | Accuracy | F1 (weighted) |
|
|
|
|
| 212 |
|
| 213 |

|
| 214 |
|
| 215 |
+
DBSCAN classifies the vast majority of points as noise, forming 7 clusters. High dimensionality makes distance-based density estimation ineffective on this dataset.
|
| 216 |
|
| 217 |
|
| 218 |

|
|
|
|
| 324 |
|
| 325 |
Same 253-feature matrix as regression, with a stratified 80/20 train/test split.
|
| 326 |
|
| 327 |
+
### Precision vs. Recall & False Positives vs. False Negatives
|
| 328 |
+
|
| 329 |
+
**Recall is prioritised over precision** in this task. Misclassifying a developer into a lower salary tier (a false negative) carries real-world cost — under-negotiation, poor benchmarking, missed career leverage — whereas a false positive (over-predicting a tier) is relatively benign.
|
| 330 |
+
|
| 331 |
+
**False Negatives are more critical than False Positives.** Predicting "Mid" when a developer is truly "High" or "Very High" obscures their earning potential. For this reason, evaluation uses **weighted F1-score**, which balances precision and recall across all four classes with particular attention to recall in the minority tiers (Low and Very High).
|
| 332 |
+
|
| 333 |
+
|
| 334 |
### Results
|
| 335 |
|
| 336 |
| Model | Accuracy | F1 (weighted) |
|