Spaces:

mboukabous
/

train_unsupervised

Sleeping

App Files Files Community

train_unsupervised / models /unsupervised /dimred /README.md

mboukabous

first commit

4c91838 about 1 year ago

preview code

raw

history blame contribute delete

1.59 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

Dimensionality Reduction Models

This directory contains Python scripts defining dimensionality reduction techniques (e.g., PCA, t-SNE, UMAP). Each model file sets up a scikit-learn–compatible estimator or follows a similar interface, making it easy to swap in train_dimred_model.py.

Key Points:

Estimator: Typically supports .fit_transform(X) for dimension reduction.
Default Settings: e.g., PCA might default to n_components=2; t-SNE might set n_components=2 and perplexity=30; UMAP might define n_neighbors=15 or n_components=2.
No Supervised Tuning: Usually we pick hyperparameters based on interpretability or domain. A manual approach or specialized metric can be used if needed.

Note: The train_dimred_model.py script handles dropping columns, label encoding, performing .fit_transform(X), and optionally saving a 2D/3D scatter plot if --visualize is used.

Available Dimensionality Reduction Models

Usage

To reduce data dimensions:

python scripts/train_dimred_model.py \
  --model_module pca \
  --data_path data/breast_cancer/data.csv \
  --select_columns "radius_mean, texture_mean, area_mean, smoothness_mean" \
  --visualize

This:

Loads pca.py, which defines a PCA(n_components=2) estimator by default.
Applies .fit_transform(...) to produce a 2D embedding.
Saves the model (dimred_model.pkl) and the transformed data (X_transformed.csv).
If --visualize is set and n_components=2, it scatter-plots the result.