third iterate models

Files changed (5) hide show

README.md +26 -13
fit_model.ipynb +147 -0
fit_models.ipynb +1045 -0
models/xgboost_third_model.json +0 -0
models/xgboost_third_model_not_2025.json +0 -0

README.md CHANGED Viewed

@@ -26,10 +26,13 @@ The 24 features are of the characteristics:
 1. Sound frequency percentiles
     - https://www.tandfonline.com/doi/full/10.1080/09524622.2020.1730241
     - quantiles = [0.8,0.9,0.925,0.95,0.975,0.99,]
-2. 13 Mel spectrogram cepstral coefficients
-    - Summary statistics of zero crossing rates in 1-second segments
     - Mean, standard deviations, min, max
-Zero crossing rate of the entire clip
 These features summarize an entire clip, irrespective of position in waveform or spectrogram, and technically, the clip does not have to be 5 seconds long.
@@ -74,13 +77,13 @@ First model iterate
 1. Fit decision tree-based classifiers to Freefield and Warblrb10k
     - The Warblrb10k data is about 3/4 does have bird
     - The Freefield data is about 1/4 does not have bird
-    - I prioritized simpler models, as it is very easy to overfit to predict  dataset
 2. Grid search with 25% test and 75% training splits (averaging over 5 randomizations)
-    - RandomForestClassifier, GradientBoostingClassifier, XGBoostClassifier
     - n_estimators: [10, 20, 50,]
     - max_depth: [5, 10, 20,]
     - I saved the results in the following file
-3. I chose to use the XGBoostClassifier with n_estimators=20 and max_depth=5
     - This simpler model does not have too large a gap between training and test metrics
     - The test accuracy is
     - The test precision is
@@ -90,24 +93,34 @@ First model iterate
 Second model iterate
 ---
-1. Fit XGBoost classifier to all heretofore mentioned data
 2. Use first model iterate to predict "hasbird" in Birdclef data
     -   Apply zero padding to the Birdclef data if the final clip longer than 2 seconds
     - Subset Birdclef data to those with
     - Predicted presence > 0.75, or
     - Audio file duration <= 15 seconds, or
     - Amphibian, Insecta, Mammalia as 0 in 2025 data
-3. Grid search with 25% test and 75% training splits (averaging over 10 randomizations)
-n_estimators: [10, 20, 50,]
-max_depth: [5, 10, 20,]
 Third model iterate
 ---
-1. Fit XGBoostClassifier to all heretofore mentioned data
 2. Use the second model iterate to predict "hasbird" in Birdclef data
-    - Subset Birdclef data to those wth
-    - Predicted presence >
     - Amphibia, Insecta, Mammalia as 0 in 2025 data
 Non-2025 model

 1. Sound frequency percentiles
     - https://www.tandfonline.com/doi/full/10.1080/09524622.2020.1730241
     - quantiles = [0.8,0.9,0.925,0.95,0.975,0.99,]
+2. Thirteen Mel spectrogram cepstral coefficients
+    - Averaged over axis 1 (columns)
+    - n_fft=2048, hop_length=512
+3. Summary statistics of zero crossing rates in 1-second segments
     - Mean, standard deviations, min, max
+    - Zero crossing rate of the entire clip
+    - Threshold of 0.02
 These features summarize an entire clip, irrespective of position in waveform or spectrogram, and technically, the clip does not have to be 5 seconds long.
 1. Fit decision tree-based classifiers to Freefield and Warblrb10k
     - The Warblrb10k data is about 3/4 does have bird
     - The Freefield data is about 1/4 does not have bird
+    - No data augmentation
 2. Grid search with 25% test and 75% training splits (averaging over 5 randomizations)
+    - `RandomForestClassifier`, `GradientBoostingClassifier`, `XGBClassifier`
     - n_estimators: [10, 20, 50,]
     - max_depth: [5, 10, 20,]
     - I saved the results in the following file
+3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
     - This simpler model does not have too large a gap between training and test metrics
     - The test accuracy is
     - The test precision is
 Second model iterate
 ---
+1. Fit `XGBClassifier` to all heretofore mentioned data
 2. Use first model iterate to predict "hasbird" in Birdclef data
     -   Apply zero padding to the Birdclef data if the final clip longer than 2 seconds
     - Subset Birdclef data to those with
     - Predicted presence > 0.75, or
     - Audio file duration <= 15 seconds, or
     - Amphibian, Insecta, Mammalia as 0 in 2025 data
+3. Five data augmented instances for each file
+    - Use OneOf in audiomentations
+    - AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=1.)
+    - AddGaussianSNR(min_snr_db=5.0, max_snr_db=40.0, p=1.)
+    - AddColorNoise(min_snr_db=5.0, max_snr_db=40.0, n_fft=128, p=1.)
+4. Grid search with 25% test and 75% training splits (averaging over 10 randomizations)
+    - n_estimators: [10, 20, 50,]
+    - max_depth: [5, 10, 20,]
+5. I chose the XGBClassifier with 50 estimators and max depth 10
+    - The test accuracy is
+    - The test precision is
+    - The test recall is
+    - The test AUROC is
 Third model iterate
 ---
+1. Fit `XGBClassifier` to all heretofore mentioned data
 2. Use the second model iterate to predict "hasbird" in Birdclef data
+3. Subset Birdclef data to those wth
+    - Predicted presence > 0.90
     - Amphibia, Insecta, Mammalia as 0 in 2025 data
 Non-2025 model

fit_model.ipynb ADDED Viewed

	@@ -0,0 +1,147 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b5565b0-0eda-4961-9065-b3e56d683baa",
+   "metadata": {
+    "editable": true,
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:54.063042Z",
+     "iopub.status.busy": "2025-08-21T22:03:54.062823Z",
+     "iopub.status.idle": "2025-08-21T22:03:54.067443Z",
+     "shell.execute_reply": "2025-08-21T22:03:54.066939Z"
+    },
+    "papermill": {
+     "duration": 0.013166,
+     "end_time": "2025-08-21T22:03:54.068798",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:54.055632",
+     "status": "completed"
+    },
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "input_file = None\n",
+    "output_file = None\n",
+    "n_estimators = None\n",
+    "max_depth = None\n",
+    "random_state = None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3c36001f-b354-4457-95b4-01953533dbaa",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:54.091554Z",
+     "iopub.status.busy": "2025-08-21T22:03:54.091171Z",
+     "iopub.status.idle": "2025-08-21T22:03:57.814895Z",
+     "shell.execute_reply": "2025-08-21T22:03:57.814474Z"
+    },
+    "papermill": {
+     "duration": 3.728978,
+     "end_time": "2025-08-21T22:03:57.816040",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:54.087062",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import xgboost as xgb\n",
+    "import sklearn"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "82f8195d-236a-4288-89fc-4952b377f0cc",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:57.825272Z",
+     "iopub.status.busy": "2025-08-21T22:03:57.824333Z",
+     "iopub.status.idle": "2025-08-21T22:04:08.809191Z",
+     "shell.execute_reply": "2025-08-21T22:04:08.808791Z"
+    },
+    "papermill": {
+     "duration": 10.990556,
+     "end_time": "2025-08-21T22:04:08.810302",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:57.819746",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# load data\n",
+    "combined = pd.read_csv(input_file,low_memory=False)\n",
+    "X = combined[[f'feature_{i}' for i in range(24)]].to_numpy()\n",
+    "y = combined['hasbird'].to_numpy()\n",
+    "del combined"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "466457c2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define, fit, and save model\n",
+    "model = xgb.XGBClassifier(n_estimators=int(n_estimators),\n",
+    "                          max_depth=int(max_depth),\n",
+    "                          random_state=int(random_state))\n",
+    "model.fit(X, y)\n",
+    "model.save_model(output_file)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Birdclef",
+   "language": "python",
+   "name": "birdclef"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.11"
+  },
+  "papermill": {
+   "default_parameters": {},
+   "duration": 3943.115939,
+   "end_time": "2025-08-21T23:09:35.210970",
+   "environment_variables": {},
+   "exception": null,
+   "input_path": "fit_model.ipynb",
+   "output_path": "ran/fit_model.ipynb",
+   "parameters": {
+    "input_file": "xgb_rnd3_next.csv",
+    "output_file": "third_model_results.csv"
+   },
+   "start_time": "2025-08-21T22:03:52.095031",
+   "version": "2.6.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

fit_models.ipynb ADDED Viewed

	@@ -0,0 +1,1045 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "2b5565b0-0eda-4961-9065-b3e56d683baa",
+   "metadata": {
+    "editable": true,
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:54.063042Z",
+     "iopub.status.busy": "2025-08-21T22:03:54.062823Z",
+     "iopub.status.idle": "2025-08-21T22:03:54.067443Z",
+     "shell.execute_reply": "2025-08-21T22:03:54.066939Z"
+    },
+    "papermill": {
+     "duration": 0.013166,
+     "end_time": "2025-08-21T22:03:54.068798",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:54.055632",
+     "status": "completed"
+    },
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": [
+     "parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "input_file = None\n",
+    "output_file = None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "2af28fa1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:54.076410Z",
+     "iopub.status.busy": "2025-08-21T22:03:54.076201Z",
+     "iopub.status.idle": "2025-08-21T22:03:54.082455Z",
+     "shell.execute_reply": "2025-08-21T22:03:54.082034Z"
+    },
+    "papermill": {
+     "duration": 0.011177,
+     "end_time": "2025-08-21T22:03:54.083645",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:54.072468",
+     "status": "completed"
+    },
+    "tags": [
+     "injected-parameters"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# Parameters\n",
+    "input_file = \"xgb_rnd3_next.csv\"\n",
+    "output_file = \"third_model_results.csv\"\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3c36001f-b354-4457-95b4-01953533dbaa",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:54.091554Z",
+     "iopub.status.busy": "2025-08-21T22:03:54.091171Z",
+     "iopub.status.idle": "2025-08-21T22:03:57.814895Z",
+     "shell.execute_reply": "2025-08-21T22:03:57.814474Z"
+    },
+    "papermill": {
+     "duration": 3.728978,
+     "end_time": "2025-08-21T22:03:57.816040",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:54.087062",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import itertools\n",
+    "import copy\n",
+    "\n",
+    "import xgboost as xgb\n",
+    "import sklearn\n",
+    "from sklearn.model_selection import train_test_split, GridSearchCV\n",
+    "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc, precision_recall_curve"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "82f8195d-236a-4288-89fc-4952b377f0cc",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:03:57.825272Z",
+     "iopub.status.busy": "2025-08-21T22:03:57.824333Z",
+     "iopub.status.idle": "2025-08-21T22:04:08.809191Z",
+     "shell.execute_reply": "2025-08-21T22:04:08.808791Z"
+    },
+    "papermill": {
+     "duration": 10.990556,
+     "end_time": "2025-08-21T22:04:08.810302",
+     "exception": false,
+     "start_time": "2025-08-21T22:03:57.819746",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# combined = pd.read_csv('xgb_rnd3_next.csv',low_memory=False)\n",
+    "combined = pd.read_csv(input_file,low_memory=False)\n",
+    "X = combined[[f'feature_{i}' for i in range(24)]].to_numpy()\n",
+    "y = combined['hasbird'].to_numpy()\n",
+    "del combined"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "235a0d9e-5476-43e7-ada3-96f9fe1254ff",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:04:08.819492Z",
+     "iopub.status.busy": "2025-08-21T22:04:08.818725Z",
+     "iopub.status.idle": "2025-08-21T22:04:08.822610Z",
+     "shell.execute_reply": "2025-08-21T22:04:08.822231Z"
+    },
+    "papermill": {
+     "duration": 0.009105,
+     "end_time": "2025-08-21T22:04:08.823514",
+     "exception": false,
+     "start_time": "2025-08-21T22:04:08.814409",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "grid_params = {\n",
+    "    'model': [\n",
+    "        xgb.XGBClassifier,\n",
+    "    ],\n",
+    "    'n_estimators': [10, 20, 50,],\n",
+    "    'max_depth': [5, 10, 20,],\n",
+    "    # 'n_estimators': [5,],\n",
+    "    # 'max_depth': [2, 5,],\n",
+    "}\n",
+    "param_combos = list(itertools.product(*grid_params.values()))\n",
+    "param_combos = [{k: v for k, v in zip(grid_params.keys(), combination)} for combination in param_combos]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "7aeeb04b-9884-4780-b17b-882ac06316dc",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T22:04:08.831773Z",
+     "iopub.status.busy": "2025-08-21T22:04:08.831007Z",
+     "iopub.status.idle": "2025-08-21T23:09:34.542148Z",
+     "shell.execute_reply": "2025-08-21T23:09:34.541580Z"
+    },
+    "papermill": {
+     "duration": 3925.716838,
+     "end_time": "2025-08-21T23:09:34.543697",
+     "exception": false,
+     "start_time": "2025-08-21T22:04:08.826859",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 0\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 1\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 2\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 3\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 4\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 5\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 6\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 7\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 8\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Fit 9\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>model</th>\n",
+       "      <th>n_estimators</th>\n",
+       "      <th>max_depth</th>\n",
+       "      <th>random_state</th>\n",
+       "      <th>test_size</th>\n",
+       "      <th>random_state_split</th>\n",
+       "      <th>test_accuracy</th>\n",
+       "      <th>test_precision</th>\n",
+       "      <th>test_recall</th>\n",
+       "      <th>test_f1</th>\n",
+       "      <th>...</th>\n",
+       "      <th>var14</th>\n",
+       "      <th>var15</th>\n",
+       "      <th>var16</th>\n",
+       "      <th>var17</th>\n",
+       "      <th>var18</th>\n",
+       "      <th>var19</th>\n",
+       "      <th>var20</th>\n",
+       "      <th>var21</th>\n",
+       "      <th>var22</th>\n",
+       "      <th>var23</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>62618</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>8564</td>\n",
+       "      <td>0.919485</td>\n",
+       "      <td>0.979252</td>\n",
+       "      <td>0.930664</td>\n",
+       "      <td>0.954340</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.016507</td>\n",
+       "      <td>0.049293</td>\n",
+       "      <td>0.008913</td>\n",
+       "      <td>0.008224</td>\n",
+       "      <td>0.007370</td>\n",
+       "      <td>0.017610</td>\n",
+       "      <td>0.001872</td>\n",
+       "      <td>0.015215</td>\n",
+       "      <td>0.010002</td>\n",
+       "      <td>0.011387</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>38092</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>29471</td>\n",
+       "      <td>0.916542</td>\n",
+       "      <td>0.979042</td>\n",
+       "      <td>0.927811</td>\n",
+       "      <td>0.952738</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.014524</td>\n",
+       "      <td>0.041489</td>\n",
+       "      <td>0.009664</td>\n",
+       "      <td>0.007179</td>\n",
+       "      <td>0.010653</td>\n",
+       "      <td>0.015027</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.012734</td>\n",
+       "      <td>0.009035</td>\n",
+       "      <td>0.009792</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>53379</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>53105</td>\n",
+       "      <td>0.920031</td>\n",
+       "      <td>0.977984</td>\n",
+       "      <td>0.932223</td>\n",
+       "      <td>0.954555</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.029407</td>\n",
+       "      <td>0.042983</td>\n",
+       "      <td>0.011433</td>\n",
+       "      <td>0.007371</td>\n",
+       "      <td>0.007829</td>\n",
+       "      <td>0.019937</td>\n",
+       "      <td>0.006534</td>\n",
+       "      <td>0.014162</td>\n",
+       "      <td>0.008016</td>\n",
+       "      <td>0.010182</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>53990</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>1020</td>\n",
+       "      <td>0.920008</td>\n",
+       "      <td>0.977727</td>\n",
+       "      <td>0.932454</td>\n",
+       "      <td>0.954554</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.017415</td>\n",
+       "      <td>0.040748</td>\n",
+       "      <td>0.009362</td>\n",
+       "      <td>0.009092</td>\n",
+       "      <td>0.004017</td>\n",
+       "      <td>0.019394</td>\n",
+       "      <td>0.000000</td>\n",
+       "      <td>0.013799</td>\n",
+       "      <td>0.011235</td>\n",
+       "      <td>0.011108</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>20157</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>8650</td>\n",
+       "      <td>0.917305</td>\n",
+       "      <td>0.978153</td>\n",
+       "      <td>0.929241</td>\n",
+       "      <td>0.953070</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.019698</td>\n",
+       "      <td>0.052578</td>\n",
+       "      <td>0.011287</td>\n",
+       "      <td>0.006563</td>\n",
+       "      <td>0.005683</td>\n",
+       "      <td>0.013883</td>\n",
+       "      <td>0.008399</td>\n",
+       "      <td>0.010119</td>\n",
+       "      <td>0.008999</td>\n",
+       "      <td>0.007195</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>29087</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>15247</td>\n",
+       "      <td>0.920376</td>\n",
+       "      <td>0.978004</td>\n",
+       "      <td>0.932595</td>\n",
+       "      <td>0.954760</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.014503</td>\n",
+       "      <td>0.042829</td>\n",
+       "      <td>0.009559</td>\n",
+       "      <td>0.006757</td>\n",
+       "      <td>0.010323</td>\n",
+       "      <td>0.013950</td>\n",
+       "      <td>0.007816</td>\n",
+       "      <td>0.010859</td>\n",
+       "      <td>0.013720</td>\n",
+       "      <td>0.011215</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>XGBClassifier</td>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>63289</td>\n",
+       "      <td>0.25</td>\n",
+       "      <td>37405</td>\n",
+       "      <td>0.918176</td>\n",
+       "      <td>0.977817</td>\n",
+       "      <td>0.930484</td>\n",
+       "      <td>0.953563</td>\n",
+       "      <td>...</td>\n",
+       "      <td>0.016545</td>\n",
+       "      <td>0.049997</td>\n",
+       "      <td>0.009340</td>\n",
+       "      <td>0.006492</td>\n",
+       "      <td>0.005942</td>\n",
+       "      <td>0.013955</td>\n",
+       "      <td>0.005050</td>\n",
+       "      <td>0.008405</td>\n",
+       "      <td>0.009122</td>\n",
+       "      <td>0.010000</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>7 rows × 42 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "           model  n_estimators  max_depth  random_state  test_size  \\\n",
+       "0  XGBClassifier            10          5         62618       0.25   \n",
+       "1  XGBClassifier            10          5         38092       0.25   \n",
+       "2  XGBClassifier            10          5         53379       0.25   \n",
+       "3  XGBClassifier            10          5         53990       0.25   \n",
+       "4  XGBClassifier            10          5         20157       0.25   \n",
+       "5  XGBClassifier            10          5         29087       0.25   \n",
+       "6  XGBClassifier            10          5         63289       0.25   \n",
+       "\n",
+       "   random_state_split  test_accuracy  test_precision  test_recall   test_f1  \\\n",
+       "0                8564       0.919485        0.979252     0.930664  0.954340   \n",
+       "1               29471       0.916542        0.979042     0.927811  0.952738   \n",
+       "2               53105       0.920031        0.977984     0.932223  0.954555   \n",
+       "3                1020       0.920008        0.977727     0.932454  0.954554   \n",
+       "4                8650       0.917305        0.978153     0.929241  0.953070   \n",
+       "5               15247       0.920376        0.978004     0.932595  0.954760   \n",
+       "6               37405       0.918176        0.977817     0.930484  0.953563   \n",
+       "\n",
+       "   ...     var14     var15     var16     var17     var18     var19     var20  \\\n",
+       "0  ...  0.016507  0.049293  0.008913  0.008224  0.007370  0.017610  0.001872   \n",
+       "1  ...  0.014524  0.041489  0.009664  0.007179  0.010653  0.015027  0.000000   \n",
+       "2  ...  0.029407  0.042983  0.011433  0.007371  0.007829  0.019937  0.006534   \n",
+       "3  ...  0.017415  0.040748  0.009362  0.009092  0.004017  0.019394  0.000000   \n",
+       "4  ...  0.019698  0.052578  0.011287  0.006563  0.005683  0.013883  0.008399   \n",
+       "5  ...  0.014503  0.042829  0.009559  0.006757  0.010323  0.013950  0.007816   \n",
+       "6  ...  0.016545  0.049997  0.009340  0.006492  0.005942  0.013955  0.005050   \n",
+       "\n",
+       "      var21     var22     var23  \n",
+       "0  0.015215  0.010002  0.011387  \n",
+       "1  0.012734  0.009035  0.009792  \n",
+       "2  0.014162  0.008016  0.010182  \n",
+       "3  0.013799  0.011235  0.011108  \n",
+       "4  0.010119  0.008999  0.007195  \n",
+       "5  0.010859  0.013720  0.011215  \n",
+       "6  0.008405  0.009122  0.010000  \n",
+       "\n",
+       "[7 rows x 42 columns]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# how many duplicated runs of the models\n",
+    "# for averaging results\n",
+    "num_fits = 10\n",
+    "test_size = 0.25\n",
+    "\n",
+    "param_dicts = []\n",
+    "report_dicts = []\n",
+    "fi_dicts = []\n",
+    "itr = 0\n",
+    "\n",
+    "for i in range(num_fits):\n",
+    "\n",
+    "    # randomize the train test split\n",
+    "    rsp = np.random.randint(0,2**16)\n",
+    "    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=rsp)\n",
+    "\n",
+    "    print(f\"Fit {i}\")\n",
+    "    \n",
+    "    for _ in param_combos:\n",
+    "    \n",
+    "        itr += 1\n",
+    "    \n",
+    "        # manipulate the parameter combinations\n",
+    "        rs = np.random.randint(0,2**16)\n",
+    "        d = {k:v for k,v in _.items() if k!='model'}\n",
+    "        d['random_state'] = rs\n",
+    "    \n",
+    "        # fit the model\n",
+    "        model = _['model'](**d)\n",
+    "        model.fit(X_train, y_train)\n",
+    "        y_test_pred = model.predict(X_test)\n",
+    "        y_test_pred_proba = model.predict_proba(X_test)\n",
+    "        y_train_pred = model.predict(X_train)\n",
+    "        y_train_pred_proba = model.predict_proba(X_train)\n",
+    "    \n",
+    "        # parameters dictionary\n",
+    "        param_dict = copy.deepcopy(d)\n",
+    "        if type(model) == type(xgb.XGBClassifier()):\n",
+    "            param_dict['model'] = 'XGBClassifier'\n",
+    "        elif type(model) == type(sklearn.ensemble.RandomForestClassifier()):\n",
+    "            param_dict['model'] = 'RandomForestClassifier'\n",
+    "        elif type(model) == type(sklearn.ensemble.GradientBoostingClassifier()):\n",
+    "            param_dict['model'] = 'GradientBoostingClassifier'\n",
+    "        param_dict['unique_id'] = itr\n",
+    "        param_dict['test_size'] = test_size\n",
+    "        param_dict['random_state_split'] = rsp\n",
+    "        param_dicts.append(param_dict)\n",
+    "    \n",
+    "        # report dictionary to compute and save\n",
+    "        report_dict = {\n",
+    "            'unique_id' : itr,\n",
+    "            'test_accuracy' : sklearn.metrics.accuracy_score(y_test_pred, y_test),\n",
+    "            'test_precision' : sklearn.metrics.precision_score(y_test_pred, y_test),\n",
+    "            'test_recall' : sklearn.metrics.recall_score(y_test_pred, y_test),\n",
+    "            'test_f1' : sklearn.metrics.f1_score(y_test_pred, y_test),\n",
+    "            'test_auroc' : sklearn.metrics.roc_auc_score(y_test, y_test_pred_proba[:,1]),\n",
+    "            'test_log_loss' : sklearn.metrics.log_loss(y_test, y_test_pred_proba[:,1]),\n",
+    "            'train_accuracy' : sklearn.metrics.accuracy_score(y_train_pred, y_train),\n",
+    "            'train_precision' : sklearn.metrics.precision_score(y_train_pred, y_train),\n",
+    "            'train_recall' : sklearn.metrics.recall_score(y_train_pred, y_train),\n",
+    "            'train_f1' : sklearn.metrics.f1_score(y_train_pred, y_train),\n",
+    "            'train_auroc' : sklearn.metrics.roc_auc_score(y_train, y_train_pred_proba[:,1]),\n",
+    "            'train_log_loss' : sklearn.metrics.log_loss(y_train, y_train_pred_proba[:,1]),\n",
+    "        }\n",
+    "        report_dicts.append(report_dict)\n",
+    "\n",
+    "        # record feature importances\n",
+    "        fi = model.feature_importances_\n",
+    "        fi_dict = {'var'+str(i) : float(j) for i,j in enumerate(model.feature_importances_)}\n",
+    "        fi_dict['unique_id'] = itr\n",
+    "        fi_dicts.append(fi_dict)\n",
+    "\n",
+    "fi_table = pd.DataFrame(fi_dicts)\n",
+    "report_table = pd.DataFrame(report_dicts)\n",
+    "param_table = pd.DataFrame(param_dicts)\n",
+    "merged_table = pd.merge(param_table, report_table, on='unique_id')[['model','unique_id'] + \\\n",
+    "    [k for k in param_dict.keys() if k not in ['model','unique_id']] + \\\n",
+    "    [k for k in report_dict.keys() if k != 'unique_id']\n",
+    "]\n",
+    "merged_table = pd.merge(merged_table, fi_table, on='unique_id')\n",
+    "merged_table.drop('unique_id',axis=1,inplace=True)\n",
+    "merged_table.sort_values(by=['model'] + [k for k in d.keys() if k != 'random_state'],inplace=True)\n",
+    "merged_table.reset_index(inplace=True,drop=True)\n",
+    "merged_table.head(7)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "64953ef5-5206-4d47-9519-311c858eb8a1",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T23:09:34.553963Z",
+     "iopub.status.busy": "2025-08-21T23:09:34.553337Z",
+     "iopub.status.idle": "2025-08-21T23:09:34.566190Z",
+     "shell.execute_reply": "2025-08-21T23:09:34.565837Z"
+    },
+    "papermill": {
+     "duration": 0.0184,
+     "end_time": "2025-08-21T23:09:34.566962",
+     "exception": false,
+     "start_time": "2025-08-21T23:09:34.548562",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>n_estimators</th>\n",
+       "      <th>max_depth</th>\n",
+       "      <th>test_accuracy</th>\n",
+       "      <th>train_accuracy</th>\n",
+       "      <th>test_f1</th>\n",
+       "      <th>train_f1</th>\n",
+       "      <th>test_precision</th>\n",
+       "      <th>train_precision</th>\n",
+       "      <th>test_recall</th>\n",
+       "      <th>train_recall</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.919485</td>\n",
+       "      <td>0.919226</td>\n",
+       "      <td>0.954340</td>\n",
+       "      <td>0.954169</td>\n",
+       "      <td>0.979252</td>\n",
+       "      <td>0.979097</td>\n",
+       "      <td>0.930664</td>\n",
+       "      <td>0.930478</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.916542</td>\n",
+       "      <td>0.916715</td>\n",
+       "      <td>0.952738</td>\n",
+       "      <td>0.952818</td>\n",
+       "      <td>0.979042</td>\n",
+       "      <td>0.979237</td>\n",
+       "      <td>0.927811</td>\n",
+       "      <td>0.927787</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.920031</td>\n",
+       "      <td>0.919793</td>\n",
+       "      <td>0.954555</td>\n",
+       "      <td>0.954422</td>\n",
+       "      <td>0.977984</td>\n",
+       "      <td>0.977704</td>\n",
+       "      <td>0.932223</td>\n",
+       "      <td>0.932222</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.920008</td>\n",
+       "      <td>0.919787</td>\n",
+       "      <td>0.954554</td>\n",
+       "      <td>0.954414</td>\n",
+       "      <td>0.977727</td>\n",
+       "      <td>0.977783</td>\n",
+       "      <td>0.932454</td>\n",
+       "      <td>0.932137</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.917305</td>\n",
+       "      <td>0.917457</td>\n",
+       "      <td>0.953070</td>\n",
+       "      <td>0.953178</td>\n",
+       "      <td>0.978153</td>\n",
+       "      <td>0.978051</td>\n",
+       "      <td>0.929241</td>\n",
+       "      <td>0.929539</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.920376</td>\n",
+       "      <td>0.920645</td>\n",
+       "      <td>0.954760</td>\n",
+       "      <td>0.954898</td>\n",
+       "      <td>0.978004</td>\n",
+       "      <td>0.978137</td>\n",
+       "      <td>0.932595</td>\n",
+       "      <td>0.932737</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.918176</td>\n",
+       "      <td>0.918210</td>\n",
+       "      <td>0.953563</td>\n",
+       "      <td>0.953576</td>\n",
+       "      <td>0.977817</td>\n",
+       "      <td>0.978125</td>\n",
+       "      <td>0.930484</td>\n",
+       "      <td>0.930228</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.918414</td>\n",
+       "      <td>0.918595</td>\n",
+       "      <td>0.953670</td>\n",
+       "      <td>0.953794</td>\n",
+       "      <td>0.977988</td>\n",
+       "      <td>0.978111</td>\n",
+       "      <td>0.930531</td>\n",
+       "      <td>0.930657</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.918874</td>\n",
+       "      <td>0.918810</td>\n",
+       "      <td>0.953964</td>\n",
+       "      <td>0.953909</td>\n",
+       "      <td>0.978158</td>\n",
+       "      <td>0.978360</td>\n",
+       "      <td>0.930937</td>\n",
+       "      <td>0.930651</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>10</td>\n",
+       "      <td>5</td>\n",
+       "      <td>0.919651</td>\n",
+       "      <td>0.919729</td>\n",
+       "      <td>0.954408</td>\n",
+       "      <td>0.954418</td>\n",
+       "      <td>0.978668</td>\n",
+       "      <td>0.978603</td>\n",
+       "      <td>0.931322</td>\n",
+       "      <td>0.931399</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>10</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947473</td>\n",
+       "      <td>0.949585</td>\n",
+       "      <td>0.969781</td>\n",
+       "      <td>0.970971</td>\n",
+       "      <td>0.980907</td>\n",
+       "      <td>0.981816</td>\n",
+       "      <td>0.958904</td>\n",
+       "      <td>0.960363</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>11</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947651</td>\n",
+       "      <td>0.949634</td>\n",
+       "      <td>0.969879</td>\n",
+       "      <td>0.970994</td>\n",
+       "      <td>0.980909</td>\n",
+       "      <td>0.981657</td>\n",
+       "      <td>0.959094</td>\n",
+       "      <td>0.960561</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>12</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947336</td>\n",
+       "      <td>0.949079</td>\n",
+       "      <td>0.969703</td>\n",
+       "      <td>0.970703</td>\n",
+       "      <td>0.981415</td>\n",
+       "      <td>0.982128</td>\n",
+       "      <td>0.958268</td>\n",
+       "      <td>0.959541</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>13</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.948380</td>\n",
+       "      <td>0.950519</td>\n",
+       "      <td>0.970292</td>\n",
+       "      <td>0.971503</td>\n",
+       "      <td>0.981094</td>\n",
+       "      <td>0.982121</td>\n",
+       "      <td>0.959725</td>\n",
+       "      <td>0.961112</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>14</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.946944</td>\n",
+       "      <td>0.949269</td>\n",
+       "      <td>0.969474</td>\n",
+       "      <td>0.970812</td>\n",
+       "      <td>0.981403</td>\n",
+       "      <td>0.982112</td>\n",
+       "      <td>0.957831</td>\n",
+       "      <td>0.959769</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>15</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947763</td>\n",
+       "      <td>0.949670</td>\n",
+       "      <td>0.969946</td>\n",
+       "      <td>0.971027</td>\n",
+       "      <td>0.981195</td>\n",
+       "      <td>0.982040</td>\n",
+       "      <td>0.958953</td>\n",
+       "      <td>0.960259</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>16</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947479</td>\n",
+       "      <td>0.949631</td>\n",
+       "      <td>0.969787</td>\n",
+       "      <td>0.971009</td>\n",
+       "      <td>0.981054</td>\n",
+       "      <td>0.982229</td>\n",
+       "      <td>0.958775</td>\n",
+       "      <td>0.960043</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>17</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.946731</td>\n",
+       "      <td>0.948403</td>\n",
+       "      <td>0.969353</td>\n",
+       "      <td>0.970316</td>\n",
+       "      <td>0.981196</td>\n",
+       "      <td>0.981756</td>\n",
+       "      <td>0.957793</td>\n",
+       "      <td>0.959140</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>18</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947526</td>\n",
+       "      <td>0.949605</td>\n",
+       "      <td>0.969822</td>\n",
+       "      <td>0.971000</td>\n",
+       "      <td>0.981211</td>\n",
+       "      <td>0.982460</td>\n",
+       "      <td>0.958694</td>\n",
+       "      <td>0.959805</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>19</th>\n",
+       "      <td>10</td>\n",
+       "      <td>10</td>\n",
+       "      <td>0.947421</td>\n",
+       "      <td>0.949485</td>\n",
+       "      <td>0.969779</td>\n",
+       "      <td>0.970932</td>\n",
+       "      <td>0.981709</td>\n",
+       "      <td>0.982450</td>\n",
+       "      <td>0.958135</td>\n",
+       "      <td>0.959682</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "    n_estimators  max_depth  test_accuracy  train_accuracy   test_f1  \\\n",
+       "0             10          5       0.919485        0.919226  0.954340   \n",
+       "1             10          5       0.916542        0.916715  0.952738   \n",
+       "2             10          5       0.920031        0.919793  0.954555   \n",
+       "3             10          5       0.920008        0.919787  0.954554   \n",
+       "4             10          5       0.917305        0.917457  0.953070   \n",
+       "5             10          5       0.920376        0.920645  0.954760   \n",
+       "6             10          5       0.918176        0.918210  0.953563   \n",
+       "7             10          5       0.918414        0.918595  0.953670   \n",
+       "8             10          5       0.918874        0.918810  0.953964   \n",
+       "9             10          5       0.919651        0.919729  0.954408   \n",
+       "10            10         10       0.947473        0.949585  0.969781   \n",
+       "11            10         10       0.947651        0.949634  0.969879   \n",
+       "12            10         10       0.947336        0.949079  0.969703   \n",
+       "13            10         10       0.948380        0.950519  0.970292   \n",
+       "14            10         10       0.946944        0.949269  0.969474   \n",
+       "15            10         10       0.947763        0.949670  0.969946   \n",
+       "16            10         10       0.947479        0.949631  0.969787   \n",
+       "17            10         10       0.946731        0.948403  0.969353   \n",
+       "18            10         10       0.947526        0.949605  0.969822   \n",
+       "19            10         10       0.947421        0.949485  0.969779   \n",
+       "\n",
+       "    train_f1  test_precision  train_precision  test_recall  train_recall  \n",
+       "0   0.954169        0.979252         0.979097     0.930664      0.930478  \n",
+       "1   0.952818        0.979042         0.979237     0.927811      0.927787  \n",
+       "2   0.954422        0.977984         0.977704     0.932223      0.932222  \n",
+       "3   0.954414        0.977727         0.977783     0.932454      0.932137  \n",
+       "4   0.953178        0.978153         0.978051     0.929241      0.929539  \n",
+       "5   0.954898        0.978004         0.978137     0.932595      0.932737  \n",
+       "6   0.953576        0.977817         0.978125     0.930484      0.930228  \n",
+       "7   0.953794        0.977988         0.978111     0.930531      0.930657  \n",
+       "8   0.953909        0.978158         0.978360     0.930937      0.930651  \n",
+       "9   0.954418        0.978668         0.978603     0.931322      0.931399  \n",
+       "10  0.970971        0.980907         0.981816     0.958904      0.960363  \n",
+       "11  0.970994        0.980909         0.981657     0.959094      0.960561  \n",
+       "12  0.970703        0.981415         0.982128     0.958268      0.959541  \n",
+       "13  0.971503        0.981094         0.982121     0.959725      0.961112  \n",
+       "14  0.970812        0.981403         0.982112     0.957831      0.959769  \n",
+       "15  0.971027        0.981195         0.982040     0.958953      0.960259  \n",
+       "16  0.971009        0.981054         0.982229     0.958775      0.960043  \n",
+       "17  0.970316        0.981196         0.981756     0.957793      0.959140  \n",
+       "18  0.971000        0.981211         0.982460     0.958694      0.959805  \n",
+       "19  0.970932        0.981709         0.982450     0.958135      0.959682  "
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "merged_table.head(20)[['n_estimators', 'max_depth',\n",
+    "                       'test_accuracy','train_accuracy',\n",
+    "                       'test_f1','train_f1',\n",
+    "                       'test_precision','train_precision',\n",
+    "                       'test_recall','train_recall',]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "db7f4f13-a696-4314-b0f4-028852863573",
+   "metadata": {
+    "execution": {
+     "iopub.execute_input": "2025-08-21T23:09:34.576689Z",
+     "iopub.status.busy": "2025-08-21T23:09:34.576115Z",
+     "iopub.status.idle": "2025-08-21T23:09:34.590011Z",
+     "shell.execute_reply": "2025-08-21T23:09:34.589688Z"
+    },
+    "papermill": {
+     "duration": 0.019834,
+     "end_time": "2025-08-21T23:09:34.590874",
+     "exception": false,
+     "start_time": "2025-08-21T23:09:34.571040",
+     "status": "completed"
+    },
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# merged_table.to_csv('third_model_results.csv',index=False,header=True)\n",
+    "merged_table.to_csv(output_file,index=False,header=True)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Birdclef",
+   "language": "python",
+   "name": "birdclef"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.11"
+  },
+  "papermill": {
+   "default_parameters": {},
+   "duration": 3943.115939,
+   "end_time": "2025-08-21T23:09:35.210970",
+   "environment_variables": {},
+   "exception": null,
+   "input_path": "fit_model.ipynb",
+   "output_path": "ran/fit_model.ipynb",
+   "parameters": {
+    "input_file": "xgb_rnd3_next.csv",
+    "output_file": "third_model_results.csv"
+   },
+   "start_time": "2025-08-21T22:03:52.095031",
+   "version": "2.6.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

models/xgboost_third_model.json ADDED Viewed

The diff for this file is too large to render. See raw diff

models/xgboost_third_model_not_2025.json ADDED Viewed

The diff for this file is too large to render. See raw diff