Spaces:

mboukabous
/

train_unsupervised

Sleeping

App Files Files Community

train_unsupervised / models /unsupervised /dimred /README.md

mboukabous

first commit

4c91838 about 1 year ago

preview code

raw

history blame contribute delete

1.59 kB

	# Dimensionality Reduction Models

	This directory contains Python scripts defining dimensionality reduction techniques (e.g., PCA, t-SNE, UMAP). Each model file sets up a scikit-learn–compatible estimator or follows a similar interface, making it easy to swap in `train_dimred_model.py`.

	Key Points:
	- Estimator: Typically supports `.fit_transform(X)` for dimension reduction.
	- Default Settings: e.g., PCA might default to `n_components=2`; t-SNE might set `n_components=2` and `perplexity=30`; UMAP might define `n_neighbors=15` or `n_components=2`.
	- No Supervised Tuning: Usually we pick hyperparameters based on interpretability or domain. A manual approach or specialized metric can be used if needed.

	Note: The `train_dimred_model.py` script handles dropping columns, label encoding, performing `.fit_transform(X)`, and optionally saving a 2D/3D scatter plot if `--visualize` is used.

	## Available Dimensionality Reduction Models

	- [PCA](pca.py)
	- [t-SNE](tsne.py)
	- [UMAP](umap.py)

	### Usage

	To reduce data dimensions:

	```bash
	python scripts/train_dimred_model.py \
	--model_module pca \
	--data_path data/breast_cancer/data.csv \
	--select_columns "radius_mean, texture_mean, area_mean, smoothness_mean" \
	--visualize
	```

	This:
	1. Loads `pca.py`, which defines a `PCA(n_components=2)` estimator by default.
	2. Applies `.fit_transform(...)` to produce a 2D embedding.
	3. Saves the model (`dimred_model.pkl`) and the transformed data (`X_transformed.csv`).
	4. If `--visualize` is set and `n_components=2`, it scatter-plots the result.