# Dimensionality Reduction Models This directory contains Python scripts defining **dimensionality reduction** techniques (e.g., PCA, t-SNE, UMAP). Each model file sets up a scikit-learn–compatible estimator or follows a similar interface, making it easy to swap in `train_dimred_model.py`. **Key Points**: - **Estimator**: Typically supports `.fit_transform(X)` for dimension reduction. - **Default Settings**: e.g., PCA might default to `n_components=2`; t-SNE might set `n_components=2` and `perplexity=30`; UMAP might define `n_neighbors=15` or `n_components=2`. - **No Supervised Tuning**: Usually we pick hyperparameters based on interpretability or domain. A manual approach or specialized metric can be used if needed. **Note**: The `train_dimred_model.py` script handles dropping columns, label encoding, performing `.fit_transform(X)`, and optionally saving a 2D/3D scatter plot if `--visualize` is used. ## Available Dimensionality Reduction Models - [PCA](pca.py) - [t-SNE](tsne.py) - [UMAP](umap.py) ### Usage To reduce data dimensions: ```bash python scripts/train_dimred_model.py \ --model_module pca \ --data_path data/breast_cancer/data.csv \ --select_columns "radius_mean, texture_mean, area_mean, smoothness_mean" \ --visualize ``` This: 1. Loads `pca.py`, which defines a `PCA(n_components=2)` estimator by default. 2. Applies `.fit_transform(...)` to produce a 2D embedding. 3. Saves the model (`dimred_model.pkl`) and the transformed data (`X_transformed.csv`). 4. If `--visualize` is set and `n_components=2`, it scatter-plots the result.