{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Library Practice: Scikit-Learn (Utilities)\n",
"\n",
"While we've covered many algorithms, Scikit-Learn also provides vital utilities for data splitting, pipelines, and hyperparameter tuning.\n",
"\n",
"### Resources:\n",
"Refer to the **[Machine Learning Guide](https://aashishgarg13.github.io/DataScience/ml_complete-all-topics/)** on your hub for conceptual workflows of cross-validation and preprocessing.\n",
"\n",
"### Objectives:\n",
"1. **Train-Test Split**: Dividing data for validation.\n",
"2. **Pipelines**: Chaining preprocessing and modeling.\n",
"3. **Cross-Validation**: Robust model evaluation.\n",
"4. **Grid Search**: Automated hyperparameter tuning.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Data Splitting\n",
"\n",
"### Task 1: Scaled Split\n",
"Using the provided data, split it into 70% train and 30% test, ensuring the split is reproducible."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"from sklearn.datasets import make_classification\n",
"\n",
"X, y = make_classification(n_samples=1000, n_features=10, random_state=42)\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n",
"print(f\"Train size: {len(X_train)}, Test size: {len(X_test)}\")\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Model Pipelines\n",
"\n",
"### Task 2: Create a Pipeline\n",
"Build a pipeline that combines `StandardScaler` and `LogisticRegression`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"pipeline = Pipeline([\n",
" ('scaler', StandardScaler()),\n",
" ('model', LogisticRegression())\n",
"])\n",
"pipeline.fit(X_train, y_train)\n",
"print(\"Model Score:\", pipeline.score(X_test, y_test))\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Cross-Validation\n",
"\n",
"### Task 3: 5-Fold Evaluation\n",
"Evaluate a `RandomForestClassifier` using 5-fold cross-validation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import cross_val_score\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"\n",
"rf = RandomForestClassifier(n_estimators=100)\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"scores = cross_val_score(rf, X, y, cv=5)\n",
"print(\"Cross-validation scores:\", scores)\n",
"print(\"Mean accuracy:\", scores.mean())\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Hyperparameter Tuning\n",
"\n",
"### Task 4: Grid Search\n",
"Use `GridSearchCV` to find the best `max_depth` (3, 5, 10, None) for a Decision Tree."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import GridSearchCV\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"\n",
"dt = DecisionTreeClassifier()\n",
"params = {'max_depth': [3, 5, 10, None]}\n",
"\n",
"# YOUR CODE HERE\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Click to see Solution
\n",
"\n",
"```python\n",
"grid = GridSearchCV(dt, params, cv=5)\n",
"grid.fit(X, y)\n",
"print(\"Best parameters:\", grid.best_params_)\n",
"print(\"Best score:\", grid.best_score_)\n",
"```\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"--- \n",
"### Excellent Utility Practice! \n",
"Using these tools ensures your ML experiments are robust and organized. \n",
"You have now covered all the core libraries!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}