{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Library Practice: Scikit-Learn (Utilities)\n", "\n", "While we've covered many algorithms, Scikit-Learn also provides vital utilities for data splitting, pipelines, and hyperparameter tuning.\n", "\n", "### Resources:\n", "Refer to the **[Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for conceptual workflows of cross-validation and preprocessing.\n", "\n", "### Objectives:\n", "1. **Train-Test Split**: Dividing data for validation.\n", "2. **Pipelines**: Chaining preprocessing and modeling.\n", "3. **Cross-Validation**: Robust model evaluation.\n", "4. **Grid Search**: Automated hyperparameter tuning.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data Splitting\n", "\n", "### Task 1: Scaled Split\n", "Using the provided data, split it into 70% train and 30% test, ensuring the split is reproducible." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "from sklearn.datasets import make_classification\n", "\n", "X, y = make_classification(n_samples=1000, n_features=10, random_state=42)\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n", "print(f\"Train size: {len(X_train)}, Test size: {len(X_test)}\")\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Model Pipelines\n", "\n", "### Task 2: Create a Pipeline\n", "Build a pipeline that combines `StandardScaler` and `LogisticRegression`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.pipeline import Pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.linear_model import LogisticRegression\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "pipeline = Pipeline([\n", " ('scaler', StandardScaler()),\n", " ('model', LogisticRegression())\n", "])\n", "pipeline.fit(X_train, y_train)\n", "print(\"Model Score:\", pipeline.score(X_test, y_test))\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Cross-Validation\n", "\n", "### Task 3: 5-Fold Evaluation\n", "Evaluate a `RandomForestClassifier` using 5-fold cross-validation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import cross_val_score\n", "from sklearn.ensemble import RandomForestClassifier\n", "\n", "rf = RandomForestClassifier(n_estimators=100)\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "scores = cross_val_score(rf, X, y, cv=5)\n", "print(\"Cross-validation scores:\", scores)\n", "print(\"Mean accuracy:\", scores.mean())\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Hyperparameter Tuning\n", "\n", "### Task 4: Grid Search\n", "Use `GridSearchCV` to find the best `max_depth` (3, 5, 10, None) for a Decision Tree." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "from sklearn.tree import DecisionTreeClassifier\n", "\n", "dt = DecisionTreeClassifier()\n", "params = {'max_depth': [3, 5, 10, None]}\n", "\n", "# YOUR CODE HERE\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Click to see Solution\n", "\n", "```python\n", "grid = GridSearchCV(dt, params, cv=5)\n", "grid.fit(X, y)\n", "print(\"Best parameters:\", grid.best_params_)\n", "print(\"Best score:\", grid.best_score_)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "### Excellent Utility Practice! \n", "Using these tools ensures your ML experiments are robust and organized. \n", "You have now covered all the core libraries!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.7" } }, "nbformat": 4, "nbformat_minor": 4 }