Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
Dimensionality Reduction Models
This directory contains Python scripts defining dimensionality reduction techniques (e.g., PCA, t-SNE, UMAP). Each model file sets up a scikit-learn–compatible estimator or follows a similar interface, making it easy to swap in train_dimred_model.py.
Key Points:
- Estimator: Typically supports
.fit_transform(X)for dimension reduction. - Default Settings: e.g., PCA might default to
n_components=2; t-SNE might setn_components=2andperplexity=30; UMAP might definen_neighbors=15orn_components=2. - No Supervised Tuning: Usually we pick hyperparameters based on interpretability or domain. A manual approach or specialized metric can be used if needed.
Note: The train_dimred_model.py script handles dropping columns, label encoding, performing .fit_transform(X), and optionally saving a 2D/3D scatter plot if --visualize is used.
Available Dimensionality Reduction Models
Usage
To reduce data dimensions:
python scripts/train_dimred_model.py \
--model_module pca \
--data_path data/breast_cancer/data.csv \
--select_columns "radius_mean, texture_mean, area_mean, smoothness_mean" \
--visualize
This:
- Loads
pca.py, which defines aPCA(n_components=2)estimator by default. - Applies
.fit_transform(...)to produce a 2D embedding. - Saves the model (
dimred_model.pkl) and the transformed data (X_transformed.csv). - If
--visualizeis set andn_components=2, it scatter-plots the result.