sdtemple commited on
Commit ·
c5fa54c
1
Parent(s): 40e0ac2
third iterate models
Browse files- README.md +26 -13
- fit_model.ipynb +147 -0
- fit_models.ipynb +1045 -0
- models/xgboost_third_model.json +0 -0
- models/xgboost_third_model_not_2025.json +0 -0
README.md
CHANGED
|
@@ -26,10 +26,13 @@ The 24 features are of the characteristics:
|
|
| 26 |
1. Sound frequency percentiles
|
| 27 |
- https://www.tandfonline.com/doi/full/10.1080/09524622.2020.1730241
|
| 28 |
- quantiles = [0.8,0.9,0.925,0.95,0.975,0.99,]
|
| 29 |
-
2.
|
| 30 |
-
-
|
|
|
|
|
|
|
| 31 |
- Mean, standard deviations, min, max
|
| 32 |
-
Zero crossing rate of the entire clip
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
These features summarize an entire clip, irrespective of position in waveform or spectrogram, and technically, the clip does not have to be 5 seconds long.
|
|
@@ -74,13 +77,13 @@ First model iterate
|
|
| 74 |
1. Fit decision tree-based classifiers to Freefield and Warblrb10k
|
| 75 |
- The Warblrb10k data is about 3/4 does have bird
|
| 76 |
- The Freefield data is about 1/4 does not have bird
|
| 77 |
-
-
|
| 78 |
2. Grid search with 25% test and 75% training splits (averaging over 5 randomizations)
|
| 79 |
-
- RandomForestClassifier, GradientBoostingClassifier,
|
| 80 |
- n_estimators: [10, 20, 50,]
|
| 81 |
- max_depth: [5, 10, 20,]
|
| 82 |
- I saved the results in the following file
|
| 83 |
-
3. I chose to use the
|
| 84 |
- This simpler model does not have too large a gap between training and test metrics
|
| 85 |
- The test accuracy is
|
| 86 |
- The test precision is
|
|
@@ -90,24 +93,34 @@ First model iterate
|
|
| 90 |
Second model iterate
|
| 91 |
---
|
| 92 |
|
| 93 |
-
1. Fit
|
| 94 |
2. Use first model iterate to predict "hasbird" in Birdclef data
|
| 95 |
- Apply zero padding to the Birdclef data if the final clip longer than 2 seconds
|
| 96 |
- Subset Birdclef data to those with
|
| 97 |
- Predicted presence > 0.75, or
|
| 98 |
- Audio file duration <= 15 seconds, or
|
| 99 |
- Amphibian, Insecta, Mammalia as 0 in 2025 data
|
| 100 |
-
3.
|
| 101 |
-
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
Third model iterate
|
| 105 |
---
|
| 106 |
|
| 107 |
-
1. Fit
|
| 108 |
2. Use the second model iterate to predict "hasbird" in Birdclef data
|
| 109 |
-
|
| 110 |
-
- Predicted presence >
|
| 111 |
- Amphibia, Insecta, Mammalia as 0 in 2025 data
|
| 112 |
|
| 113 |
Non-2025 model
|
|
|
|
| 26 |
1. Sound frequency percentiles
|
| 27 |
- https://www.tandfonline.com/doi/full/10.1080/09524622.2020.1730241
|
| 28 |
- quantiles = [0.8,0.9,0.925,0.95,0.975,0.99,]
|
| 29 |
+
2. Thirteen Mel spectrogram cepstral coefficients
|
| 30 |
+
- Averaged over axis 1 (columns)
|
| 31 |
+
- n_fft=2048, hop_length=512
|
| 32 |
+
3. Summary statistics of zero crossing rates in 1-second segments
|
| 33 |
- Mean, standard deviations, min, max
|
| 34 |
+
- Zero crossing rate of the entire clip
|
| 35 |
+
- Threshold of 0.02
|
| 36 |
|
| 37 |
|
| 38 |
These features summarize an entire clip, irrespective of position in waveform or spectrogram, and technically, the clip does not have to be 5 seconds long.
|
|
|
|
| 77 |
1. Fit decision tree-based classifiers to Freefield and Warblrb10k
|
| 78 |
- The Warblrb10k data is about 3/4 does have bird
|
| 79 |
- The Freefield data is about 1/4 does not have bird
|
| 80 |
+
- No data augmentation
|
| 81 |
2. Grid search with 25% test and 75% training splits (averaging over 5 randomizations)
|
| 82 |
+
- `RandomForestClassifier`, `GradientBoostingClassifier`, `XGBClassifier`
|
| 83 |
- n_estimators: [10, 20, 50,]
|
| 84 |
- max_depth: [5, 10, 20,]
|
| 85 |
- I saved the results in the following file
|
| 86 |
+
3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
|
| 87 |
- This simpler model does not have too large a gap between training and test metrics
|
| 88 |
- The test accuracy is
|
| 89 |
- The test precision is
|
|
|
|
| 93 |
Second model iterate
|
| 94 |
---
|
| 95 |
|
| 96 |
+
1. Fit `XGBClassifier` to all heretofore mentioned data
|
| 97 |
2. Use first model iterate to predict "hasbird" in Birdclef data
|
| 98 |
- Apply zero padding to the Birdclef data if the final clip longer than 2 seconds
|
| 99 |
- Subset Birdclef data to those with
|
| 100 |
- Predicted presence > 0.75, or
|
| 101 |
- Audio file duration <= 15 seconds, or
|
| 102 |
- Amphibian, Insecta, Mammalia as 0 in 2025 data
|
| 103 |
+
3. Five data augmented instances for each file
|
| 104 |
+
- Use OneOf in audiomentations
|
| 105 |
+
- AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=1.)
|
| 106 |
+
- AddGaussianSNR(min_snr_db=5.0, max_snr_db=40.0, p=1.)
|
| 107 |
+
- AddColorNoise(min_snr_db=5.0, max_snr_db=40.0, n_fft=128, p=1.)
|
| 108 |
+
4. Grid search with 25% test and 75% training splits (averaging over 10 randomizations)
|
| 109 |
+
- n_estimators: [10, 20, 50,]
|
| 110 |
+
- max_depth: [5, 10, 20,]
|
| 111 |
+
5. I chose the XGBClassifier with 50 estimators and max depth 10
|
| 112 |
+
- The test accuracy is
|
| 113 |
+
- The test precision is
|
| 114 |
+
- The test recall is
|
| 115 |
+
- The test AUROC is
|
| 116 |
|
| 117 |
Third model iterate
|
| 118 |
---
|
| 119 |
|
| 120 |
+
1. Fit `XGBClassifier` to all heretofore mentioned data
|
| 121 |
2. Use the second model iterate to predict "hasbird" in Birdclef data
|
| 122 |
+
3. Subset Birdclef data to those wth
|
| 123 |
+
- Predicted presence > 0.90
|
| 124 |
- Amphibia, Insecta, Mammalia as 0 in 2025 data
|
| 125 |
|
| 126 |
Non-2025 model
|
fit_model.ipynb
ADDED
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "code",
|
| 5 |
+
"execution_count": null,
|
| 6 |
+
"id": "2b5565b0-0eda-4961-9065-b3e56d683baa",
|
| 7 |
+
"metadata": {
|
| 8 |
+
"editable": true,
|
| 9 |
+
"execution": {
|
| 10 |
+
"iopub.execute_input": "2025-08-21T22:03:54.063042Z",
|
| 11 |
+
"iopub.status.busy": "2025-08-21T22:03:54.062823Z",
|
| 12 |
+
"iopub.status.idle": "2025-08-21T22:03:54.067443Z",
|
| 13 |
+
"shell.execute_reply": "2025-08-21T22:03:54.066939Z"
|
| 14 |
+
},
|
| 15 |
+
"papermill": {
|
| 16 |
+
"duration": 0.013166,
|
| 17 |
+
"end_time": "2025-08-21T22:03:54.068798",
|
| 18 |
+
"exception": false,
|
| 19 |
+
"start_time": "2025-08-21T22:03:54.055632",
|
| 20 |
+
"status": "completed"
|
| 21 |
+
},
|
| 22 |
+
"slideshow": {
|
| 23 |
+
"slide_type": ""
|
| 24 |
+
},
|
| 25 |
+
"tags": [
|
| 26 |
+
"parameters"
|
| 27 |
+
]
|
| 28 |
+
},
|
| 29 |
+
"outputs": [],
|
| 30 |
+
"source": [
|
| 31 |
+
"input_file = None\n",
|
| 32 |
+
"output_file = None\n",
|
| 33 |
+
"n_estimators = None\n",
|
| 34 |
+
"max_depth = None\n",
|
| 35 |
+
"random_state = None"
|
| 36 |
+
]
|
| 37 |
+
},
|
| 38 |
+
{
|
| 39 |
+
"cell_type": "code",
|
| 40 |
+
"execution_count": null,
|
| 41 |
+
"id": "3c36001f-b354-4457-95b4-01953533dbaa",
|
| 42 |
+
"metadata": {
|
| 43 |
+
"execution": {
|
| 44 |
+
"iopub.execute_input": "2025-08-21T22:03:54.091554Z",
|
| 45 |
+
"iopub.status.busy": "2025-08-21T22:03:54.091171Z",
|
| 46 |
+
"iopub.status.idle": "2025-08-21T22:03:57.814895Z",
|
| 47 |
+
"shell.execute_reply": "2025-08-21T22:03:57.814474Z"
|
| 48 |
+
},
|
| 49 |
+
"papermill": {
|
| 50 |
+
"duration": 3.728978,
|
| 51 |
+
"end_time": "2025-08-21T22:03:57.816040",
|
| 52 |
+
"exception": false,
|
| 53 |
+
"start_time": "2025-08-21T22:03:54.087062",
|
| 54 |
+
"status": "completed"
|
| 55 |
+
},
|
| 56 |
+
"tags": []
|
| 57 |
+
},
|
| 58 |
+
"outputs": [],
|
| 59 |
+
"source": [
|
| 60 |
+
"import pandas as pd\n",
|
| 61 |
+
"import numpy as np\n",
|
| 62 |
+
"import xgboost as xgb\n",
|
| 63 |
+
"import sklearn"
|
| 64 |
+
]
|
| 65 |
+
},
|
| 66 |
+
{
|
| 67 |
+
"cell_type": "code",
|
| 68 |
+
"execution_count": null,
|
| 69 |
+
"id": "82f8195d-236a-4288-89fc-4952b377f0cc",
|
| 70 |
+
"metadata": {
|
| 71 |
+
"execution": {
|
| 72 |
+
"iopub.execute_input": "2025-08-21T22:03:57.825272Z",
|
| 73 |
+
"iopub.status.busy": "2025-08-21T22:03:57.824333Z",
|
| 74 |
+
"iopub.status.idle": "2025-08-21T22:04:08.809191Z",
|
| 75 |
+
"shell.execute_reply": "2025-08-21T22:04:08.808791Z"
|
| 76 |
+
},
|
| 77 |
+
"papermill": {
|
| 78 |
+
"duration": 10.990556,
|
| 79 |
+
"end_time": "2025-08-21T22:04:08.810302",
|
| 80 |
+
"exception": false,
|
| 81 |
+
"start_time": "2025-08-21T22:03:57.819746",
|
| 82 |
+
"status": "completed"
|
| 83 |
+
},
|
| 84 |
+
"tags": []
|
| 85 |
+
},
|
| 86 |
+
"outputs": [],
|
| 87 |
+
"source": [
|
| 88 |
+
"# load data\n",
|
| 89 |
+
"combined = pd.read_csv(input_file,low_memory=False)\n",
|
| 90 |
+
"X = combined[[f'feature_{i}' for i in range(24)]].to_numpy()\n",
|
| 91 |
+
"y = combined['hasbird'].to_numpy()\n",
|
| 92 |
+
"del combined"
|
| 93 |
+
]
|
| 94 |
+
},
|
| 95 |
+
{
|
| 96 |
+
"cell_type": "code",
|
| 97 |
+
"execution_count": null,
|
| 98 |
+
"id": "466457c2",
|
| 99 |
+
"metadata": {},
|
| 100 |
+
"outputs": [],
|
| 101 |
+
"source": [
|
| 102 |
+
"# define, fit, and save model\n",
|
| 103 |
+
"model = xgb.XGBClassifier(n_estimators=int(n_estimators),\n",
|
| 104 |
+
" max_depth=int(max_depth),\n",
|
| 105 |
+
" random_state=int(random_state))\n",
|
| 106 |
+
"model.fit(X, y)\n",
|
| 107 |
+
"model.save_model(output_file)"
|
| 108 |
+
]
|
| 109 |
+
}
|
| 110 |
+
],
|
| 111 |
+
"metadata": {
|
| 112 |
+
"kernelspec": {
|
| 113 |
+
"display_name": "Birdclef",
|
| 114 |
+
"language": "python",
|
| 115 |
+
"name": "birdclef"
|
| 116 |
+
},
|
| 117 |
+
"language_info": {
|
| 118 |
+
"codemirror_mode": {
|
| 119 |
+
"name": "ipython",
|
| 120 |
+
"version": 3
|
| 121 |
+
},
|
| 122 |
+
"file_extension": ".py",
|
| 123 |
+
"mimetype": "text/x-python",
|
| 124 |
+
"name": "python",
|
| 125 |
+
"nbconvert_exporter": "python",
|
| 126 |
+
"pygments_lexer": "ipython3",
|
| 127 |
+
"version": "3.12.11"
|
| 128 |
+
},
|
| 129 |
+
"papermill": {
|
| 130 |
+
"default_parameters": {},
|
| 131 |
+
"duration": 3943.115939,
|
| 132 |
+
"end_time": "2025-08-21T23:09:35.210970",
|
| 133 |
+
"environment_variables": {},
|
| 134 |
+
"exception": null,
|
| 135 |
+
"input_path": "fit_model.ipynb",
|
| 136 |
+
"output_path": "ran/fit_model.ipynb",
|
| 137 |
+
"parameters": {
|
| 138 |
+
"input_file": "xgb_rnd3_next.csv",
|
| 139 |
+
"output_file": "third_model_results.csv"
|
| 140 |
+
},
|
| 141 |
+
"start_time": "2025-08-21T22:03:52.095031",
|
| 142 |
+
"version": "2.6.0"
|
| 143 |
+
}
|
| 144 |
+
},
|
| 145 |
+
"nbformat": 4,
|
| 146 |
+
"nbformat_minor": 5
|
| 147 |
+
}
|
fit_models.ipynb
ADDED
|
@@ -0,0 +1,1045 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "code",
|
| 5 |
+
"execution_count": 1,
|
| 6 |
+
"id": "2b5565b0-0eda-4961-9065-b3e56d683baa",
|
| 7 |
+
"metadata": {
|
| 8 |
+
"editable": true,
|
| 9 |
+
"execution": {
|
| 10 |
+
"iopub.execute_input": "2025-08-21T22:03:54.063042Z",
|
| 11 |
+
"iopub.status.busy": "2025-08-21T22:03:54.062823Z",
|
| 12 |
+
"iopub.status.idle": "2025-08-21T22:03:54.067443Z",
|
| 13 |
+
"shell.execute_reply": "2025-08-21T22:03:54.066939Z"
|
| 14 |
+
},
|
| 15 |
+
"papermill": {
|
| 16 |
+
"duration": 0.013166,
|
| 17 |
+
"end_time": "2025-08-21T22:03:54.068798",
|
| 18 |
+
"exception": false,
|
| 19 |
+
"start_time": "2025-08-21T22:03:54.055632",
|
| 20 |
+
"status": "completed"
|
| 21 |
+
},
|
| 22 |
+
"slideshow": {
|
| 23 |
+
"slide_type": ""
|
| 24 |
+
},
|
| 25 |
+
"tags": [
|
| 26 |
+
"parameters"
|
| 27 |
+
]
|
| 28 |
+
},
|
| 29 |
+
"outputs": [],
|
| 30 |
+
"source": [
|
| 31 |
+
"input_file = None\n",
|
| 32 |
+
"output_file = None"
|
| 33 |
+
]
|
| 34 |
+
},
|
| 35 |
+
{
|
| 36 |
+
"cell_type": "code",
|
| 37 |
+
"execution_count": 2,
|
| 38 |
+
"id": "2af28fa1",
|
| 39 |
+
"metadata": {
|
| 40 |
+
"execution": {
|
| 41 |
+
"iopub.execute_input": "2025-08-21T22:03:54.076410Z",
|
| 42 |
+
"iopub.status.busy": "2025-08-21T22:03:54.076201Z",
|
| 43 |
+
"iopub.status.idle": "2025-08-21T22:03:54.082455Z",
|
| 44 |
+
"shell.execute_reply": "2025-08-21T22:03:54.082034Z"
|
| 45 |
+
},
|
| 46 |
+
"papermill": {
|
| 47 |
+
"duration": 0.011177,
|
| 48 |
+
"end_time": "2025-08-21T22:03:54.083645",
|
| 49 |
+
"exception": false,
|
| 50 |
+
"start_time": "2025-08-21T22:03:54.072468",
|
| 51 |
+
"status": "completed"
|
| 52 |
+
},
|
| 53 |
+
"tags": [
|
| 54 |
+
"injected-parameters"
|
| 55 |
+
]
|
| 56 |
+
},
|
| 57 |
+
"outputs": [],
|
| 58 |
+
"source": [
|
| 59 |
+
"# Parameters\n",
|
| 60 |
+
"input_file = \"xgb_rnd3_next.csv\"\n",
|
| 61 |
+
"output_file = \"third_model_results.csv\"\n"
|
| 62 |
+
]
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"cell_type": "code",
|
| 66 |
+
"execution_count": 3,
|
| 67 |
+
"id": "3c36001f-b354-4457-95b4-01953533dbaa",
|
| 68 |
+
"metadata": {
|
| 69 |
+
"execution": {
|
| 70 |
+
"iopub.execute_input": "2025-08-21T22:03:54.091554Z",
|
| 71 |
+
"iopub.status.busy": "2025-08-21T22:03:54.091171Z",
|
| 72 |
+
"iopub.status.idle": "2025-08-21T22:03:57.814895Z",
|
| 73 |
+
"shell.execute_reply": "2025-08-21T22:03:57.814474Z"
|
| 74 |
+
},
|
| 75 |
+
"papermill": {
|
| 76 |
+
"duration": 3.728978,
|
| 77 |
+
"end_time": "2025-08-21T22:03:57.816040",
|
| 78 |
+
"exception": false,
|
| 79 |
+
"start_time": "2025-08-21T22:03:54.087062",
|
| 80 |
+
"status": "completed"
|
| 81 |
+
},
|
| 82 |
+
"tags": []
|
| 83 |
+
},
|
| 84 |
+
"outputs": [],
|
| 85 |
+
"source": [
|
| 86 |
+
"import pandas as pd\n",
|
| 87 |
+
"import os\n",
|
| 88 |
+
"import numpy as np\n",
|
| 89 |
+
"import matplotlib.pyplot as plt\n",
|
| 90 |
+
"import itertools\n",
|
| 91 |
+
"import copy\n",
|
| 92 |
+
"\n",
|
| 93 |
+
"import xgboost as xgb\n",
|
| 94 |
+
"import sklearn\n",
|
| 95 |
+
"from sklearn.model_selection import train_test_split, GridSearchCV\n",
|
| 96 |
+
"from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc, precision_recall_curve"
|
| 97 |
+
]
|
| 98 |
+
},
|
| 99 |
+
{
|
| 100 |
+
"cell_type": "code",
|
| 101 |
+
"execution_count": 4,
|
| 102 |
+
"id": "82f8195d-236a-4288-89fc-4952b377f0cc",
|
| 103 |
+
"metadata": {
|
| 104 |
+
"execution": {
|
| 105 |
+
"iopub.execute_input": "2025-08-21T22:03:57.825272Z",
|
| 106 |
+
"iopub.status.busy": "2025-08-21T22:03:57.824333Z",
|
| 107 |
+
"iopub.status.idle": "2025-08-21T22:04:08.809191Z",
|
| 108 |
+
"shell.execute_reply": "2025-08-21T22:04:08.808791Z"
|
| 109 |
+
},
|
| 110 |
+
"papermill": {
|
| 111 |
+
"duration": 10.990556,
|
| 112 |
+
"end_time": "2025-08-21T22:04:08.810302",
|
| 113 |
+
"exception": false,
|
| 114 |
+
"start_time": "2025-08-21T22:03:57.819746",
|
| 115 |
+
"status": "completed"
|
| 116 |
+
},
|
| 117 |
+
"tags": []
|
| 118 |
+
},
|
| 119 |
+
"outputs": [],
|
| 120 |
+
"source": [
|
| 121 |
+
"# combined = pd.read_csv('xgb_rnd3_next.csv',low_memory=False)\n",
|
| 122 |
+
"combined = pd.read_csv(input_file,low_memory=False)\n",
|
| 123 |
+
"X = combined[[f'feature_{i}' for i in range(24)]].to_numpy()\n",
|
| 124 |
+
"y = combined['hasbird'].to_numpy()\n",
|
| 125 |
+
"del combined"
|
| 126 |
+
]
|
| 127 |
+
},
|
| 128 |
+
{
|
| 129 |
+
"cell_type": "code",
|
| 130 |
+
"execution_count": 5,
|
| 131 |
+
"id": "235a0d9e-5476-43e7-ada3-96f9fe1254ff",
|
| 132 |
+
"metadata": {
|
| 133 |
+
"execution": {
|
| 134 |
+
"iopub.execute_input": "2025-08-21T22:04:08.819492Z",
|
| 135 |
+
"iopub.status.busy": "2025-08-21T22:04:08.818725Z",
|
| 136 |
+
"iopub.status.idle": "2025-08-21T22:04:08.822610Z",
|
| 137 |
+
"shell.execute_reply": "2025-08-21T22:04:08.822231Z"
|
| 138 |
+
},
|
| 139 |
+
"papermill": {
|
| 140 |
+
"duration": 0.009105,
|
| 141 |
+
"end_time": "2025-08-21T22:04:08.823514",
|
| 142 |
+
"exception": false,
|
| 143 |
+
"start_time": "2025-08-21T22:04:08.814409",
|
| 144 |
+
"status": "completed"
|
| 145 |
+
},
|
| 146 |
+
"tags": []
|
| 147 |
+
},
|
| 148 |
+
"outputs": [],
|
| 149 |
+
"source": [
|
| 150 |
+
"grid_params = {\n",
|
| 151 |
+
" 'model': [\n",
|
| 152 |
+
" xgb.XGBClassifier,\n",
|
| 153 |
+
" ],\n",
|
| 154 |
+
" 'n_estimators': [10, 20, 50,],\n",
|
| 155 |
+
" 'max_depth': [5, 10, 20,],\n",
|
| 156 |
+
" # 'n_estimators': [5,],\n",
|
| 157 |
+
" # 'max_depth': [2, 5,],\n",
|
| 158 |
+
"}\n",
|
| 159 |
+
"param_combos = list(itertools.product(*grid_params.values()))\n",
|
| 160 |
+
"param_combos = [{k: v for k, v in zip(grid_params.keys(), combination)} for combination in param_combos]"
|
| 161 |
+
]
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"cell_type": "code",
|
| 165 |
+
"execution_count": 6,
|
| 166 |
+
"id": "7aeeb04b-9884-4780-b17b-882ac06316dc",
|
| 167 |
+
"metadata": {
|
| 168 |
+
"execution": {
|
| 169 |
+
"iopub.execute_input": "2025-08-21T22:04:08.831773Z",
|
| 170 |
+
"iopub.status.busy": "2025-08-21T22:04:08.831007Z",
|
| 171 |
+
"iopub.status.idle": "2025-08-21T23:09:34.542148Z",
|
| 172 |
+
"shell.execute_reply": "2025-08-21T23:09:34.541580Z"
|
| 173 |
+
},
|
| 174 |
+
"papermill": {
|
| 175 |
+
"duration": 3925.716838,
|
| 176 |
+
"end_time": "2025-08-21T23:09:34.543697",
|
| 177 |
+
"exception": false,
|
| 178 |
+
"start_time": "2025-08-21T22:04:08.826859",
|
| 179 |
+
"status": "completed"
|
| 180 |
+
},
|
| 181 |
+
"tags": []
|
| 182 |
+
},
|
| 183 |
+
"outputs": [
|
| 184 |
+
{
|
| 185 |
+
"name": "stdout",
|
| 186 |
+
"output_type": "stream",
|
| 187 |
+
"text": [
|
| 188 |
+
"Fit 0\n"
|
| 189 |
+
]
|
| 190 |
+
},
|
| 191 |
+
{
|
| 192 |
+
"name": "stdout",
|
| 193 |
+
"output_type": "stream",
|
| 194 |
+
"text": [
|
| 195 |
+
"Fit 1\n"
|
| 196 |
+
]
|
| 197 |
+
},
|
| 198 |
+
{
|
| 199 |
+
"name": "stdout",
|
| 200 |
+
"output_type": "stream",
|
| 201 |
+
"text": [
|
| 202 |
+
"Fit 2\n"
|
| 203 |
+
]
|
| 204 |
+
},
|
| 205 |
+
{
|
| 206 |
+
"name": "stdout",
|
| 207 |
+
"output_type": "stream",
|
| 208 |
+
"text": [
|
| 209 |
+
"Fit 3\n"
|
| 210 |
+
]
|
| 211 |
+
},
|
| 212 |
+
{
|
| 213 |
+
"name": "stdout",
|
| 214 |
+
"output_type": "stream",
|
| 215 |
+
"text": [
|
| 216 |
+
"Fit 4\n"
|
| 217 |
+
]
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"name": "stdout",
|
| 221 |
+
"output_type": "stream",
|
| 222 |
+
"text": [
|
| 223 |
+
"Fit 5\n"
|
| 224 |
+
]
|
| 225 |
+
},
|
| 226 |
+
{
|
| 227 |
+
"name": "stdout",
|
| 228 |
+
"output_type": "stream",
|
| 229 |
+
"text": [
|
| 230 |
+
"Fit 6\n"
|
| 231 |
+
]
|
| 232 |
+
},
|
| 233 |
+
{
|
| 234 |
+
"name": "stdout",
|
| 235 |
+
"output_type": "stream",
|
| 236 |
+
"text": [
|
| 237 |
+
"Fit 7\n"
|
| 238 |
+
]
|
| 239 |
+
},
|
| 240 |
+
{
|
| 241 |
+
"name": "stdout",
|
| 242 |
+
"output_type": "stream",
|
| 243 |
+
"text": [
|
| 244 |
+
"Fit 8\n"
|
| 245 |
+
]
|
| 246 |
+
},
|
| 247 |
+
{
|
| 248 |
+
"name": "stdout",
|
| 249 |
+
"output_type": "stream",
|
| 250 |
+
"text": [
|
| 251 |
+
"Fit 9\n"
|
| 252 |
+
]
|
| 253 |
+
},
|
| 254 |
+
{
|
| 255 |
+
"data": {
|
| 256 |
+
"text/html": [
|
| 257 |
+
"<div>\n",
|
| 258 |
+
"<style scoped>\n",
|
| 259 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
| 260 |
+
" vertical-align: middle;\n",
|
| 261 |
+
" }\n",
|
| 262 |
+
"\n",
|
| 263 |
+
" .dataframe tbody tr th {\n",
|
| 264 |
+
" vertical-align: top;\n",
|
| 265 |
+
" }\n",
|
| 266 |
+
"\n",
|
| 267 |
+
" .dataframe thead th {\n",
|
| 268 |
+
" text-align: right;\n",
|
| 269 |
+
" }\n",
|
| 270 |
+
"</style>\n",
|
| 271 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
| 272 |
+
" <thead>\n",
|
| 273 |
+
" <tr style=\"text-align: right;\">\n",
|
| 274 |
+
" <th></th>\n",
|
| 275 |
+
" <th>model</th>\n",
|
| 276 |
+
" <th>n_estimators</th>\n",
|
| 277 |
+
" <th>max_depth</th>\n",
|
| 278 |
+
" <th>random_state</th>\n",
|
| 279 |
+
" <th>test_size</th>\n",
|
| 280 |
+
" <th>random_state_split</th>\n",
|
| 281 |
+
" <th>test_accuracy</th>\n",
|
| 282 |
+
" <th>test_precision</th>\n",
|
| 283 |
+
" <th>test_recall</th>\n",
|
| 284 |
+
" <th>test_f1</th>\n",
|
| 285 |
+
" <th>...</th>\n",
|
| 286 |
+
" <th>var14</th>\n",
|
| 287 |
+
" <th>var15</th>\n",
|
| 288 |
+
" <th>var16</th>\n",
|
| 289 |
+
" <th>var17</th>\n",
|
| 290 |
+
" <th>var18</th>\n",
|
| 291 |
+
" <th>var19</th>\n",
|
| 292 |
+
" <th>var20</th>\n",
|
| 293 |
+
" <th>var21</th>\n",
|
| 294 |
+
" <th>var22</th>\n",
|
| 295 |
+
" <th>var23</th>\n",
|
| 296 |
+
" </tr>\n",
|
| 297 |
+
" </thead>\n",
|
| 298 |
+
" <tbody>\n",
|
| 299 |
+
" <tr>\n",
|
| 300 |
+
" <th>0</th>\n",
|
| 301 |
+
" <td>XGBClassifier</td>\n",
|
| 302 |
+
" <td>10</td>\n",
|
| 303 |
+
" <td>5</td>\n",
|
| 304 |
+
" <td>62618</td>\n",
|
| 305 |
+
" <td>0.25</td>\n",
|
| 306 |
+
" <td>8564</td>\n",
|
| 307 |
+
" <td>0.919485</td>\n",
|
| 308 |
+
" <td>0.979252</td>\n",
|
| 309 |
+
" <td>0.930664</td>\n",
|
| 310 |
+
" <td>0.954340</td>\n",
|
| 311 |
+
" <td>...</td>\n",
|
| 312 |
+
" <td>0.016507</td>\n",
|
| 313 |
+
" <td>0.049293</td>\n",
|
| 314 |
+
" <td>0.008913</td>\n",
|
| 315 |
+
" <td>0.008224</td>\n",
|
| 316 |
+
" <td>0.007370</td>\n",
|
| 317 |
+
" <td>0.017610</td>\n",
|
| 318 |
+
" <td>0.001872</td>\n",
|
| 319 |
+
" <td>0.015215</td>\n",
|
| 320 |
+
" <td>0.010002</td>\n",
|
| 321 |
+
" <td>0.011387</td>\n",
|
| 322 |
+
" </tr>\n",
|
| 323 |
+
" <tr>\n",
|
| 324 |
+
" <th>1</th>\n",
|
| 325 |
+
" <td>XGBClassifier</td>\n",
|
| 326 |
+
" <td>10</td>\n",
|
| 327 |
+
" <td>5</td>\n",
|
| 328 |
+
" <td>38092</td>\n",
|
| 329 |
+
" <td>0.25</td>\n",
|
| 330 |
+
" <td>29471</td>\n",
|
| 331 |
+
" <td>0.916542</td>\n",
|
| 332 |
+
" <td>0.979042</td>\n",
|
| 333 |
+
" <td>0.927811</td>\n",
|
| 334 |
+
" <td>0.952738</td>\n",
|
| 335 |
+
" <td>...</td>\n",
|
| 336 |
+
" <td>0.014524</td>\n",
|
| 337 |
+
" <td>0.041489</td>\n",
|
| 338 |
+
" <td>0.009664</td>\n",
|
| 339 |
+
" <td>0.007179</td>\n",
|
| 340 |
+
" <td>0.010653</td>\n",
|
| 341 |
+
" <td>0.015027</td>\n",
|
| 342 |
+
" <td>0.000000</td>\n",
|
| 343 |
+
" <td>0.012734</td>\n",
|
| 344 |
+
" <td>0.009035</td>\n",
|
| 345 |
+
" <td>0.009792</td>\n",
|
| 346 |
+
" </tr>\n",
|
| 347 |
+
" <tr>\n",
|
| 348 |
+
" <th>2</th>\n",
|
| 349 |
+
" <td>XGBClassifier</td>\n",
|
| 350 |
+
" <td>10</td>\n",
|
| 351 |
+
" <td>5</td>\n",
|
| 352 |
+
" <td>53379</td>\n",
|
| 353 |
+
" <td>0.25</td>\n",
|
| 354 |
+
" <td>53105</td>\n",
|
| 355 |
+
" <td>0.920031</td>\n",
|
| 356 |
+
" <td>0.977984</td>\n",
|
| 357 |
+
" <td>0.932223</td>\n",
|
| 358 |
+
" <td>0.954555</td>\n",
|
| 359 |
+
" <td>...</td>\n",
|
| 360 |
+
" <td>0.029407</td>\n",
|
| 361 |
+
" <td>0.042983</td>\n",
|
| 362 |
+
" <td>0.011433</td>\n",
|
| 363 |
+
" <td>0.007371</td>\n",
|
| 364 |
+
" <td>0.007829</td>\n",
|
| 365 |
+
" <td>0.019937</td>\n",
|
| 366 |
+
" <td>0.006534</td>\n",
|
| 367 |
+
" <td>0.014162</td>\n",
|
| 368 |
+
" <td>0.008016</td>\n",
|
| 369 |
+
" <td>0.010182</td>\n",
|
| 370 |
+
" </tr>\n",
|
| 371 |
+
" <tr>\n",
|
| 372 |
+
" <th>3</th>\n",
|
| 373 |
+
" <td>XGBClassifier</td>\n",
|
| 374 |
+
" <td>10</td>\n",
|
| 375 |
+
" <td>5</td>\n",
|
| 376 |
+
" <td>53990</td>\n",
|
| 377 |
+
" <td>0.25</td>\n",
|
| 378 |
+
" <td>1020</td>\n",
|
| 379 |
+
" <td>0.920008</td>\n",
|
| 380 |
+
" <td>0.977727</td>\n",
|
| 381 |
+
" <td>0.932454</td>\n",
|
| 382 |
+
" <td>0.954554</td>\n",
|
| 383 |
+
" <td>...</td>\n",
|
| 384 |
+
" <td>0.017415</td>\n",
|
| 385 |
+
" <td>0.040748</td>\n",
|
| 386 |
+
" <td>0.009362</td>\n",
|
| 387 |
+
" <td>0.009092</td>\n",
|
| 388 |
+
" <td>0.004017</td>\n",
|
| 389 |
+
" <td>0.019394</td>\n",
|
| 390 |
+
" <td>0.000000</td>\n",
|
| 391 |
+
" <td>0.013799</td>\n",
|
| 392 |
+
" <td>0.011235</td>\n",
|
| 393 |
+
" <td>0.011108</td>\n",
|
| 394 |
+
" </tr>\n",
|
| 395 |
+
" <tr>\n",
|
| 396 |
+
" <th>4</th>\n",
|
| 397 |
+
" <td>XGBClassifier</td>\n",
|
| 398 |
+
" <td>10</td>\n",
|
| 399 |
+
" <td>5</td>\n",
|
| 400 |
+
" <td>20157</td>\n",
|
| 401 |
+
" <td>0.25</td>\n",
|
| 402 |
+
" <td>8650</td>\n",
|
| 403 |
+
" <td>0.917305</td>\n",
|
| 404 |
+
" <td>0.978153</td>\n",
|
| 405 |
+
" <td>0.929241</td>\n",
|
| 406 |
+
" <td>0.953070</td>\n",
|
| 407 |
+
" <td>...</td>\n",
|
| 408 |
+
" <td>0.019698</td>\n",
|
| 409 |
+
" <td>0.052578</td>\n",
|
| 410 |
+
" <td>0.011287</td>\n",
|
| 411 |
+
" <td>0.006563</td>\n",
|
| 412 |
+
" <td>0.005683</td>\n",
|
| 413 |
+
" <td>0.013883</td>\n",
|
| 414 |
+
" <td>0.008399</td>\n",
|
| 415 |
+
" <td>0.010119</td>\n",
|
| 416 |
+
" <td>0.008999</td>\n",
|
| 417 |
+
" <td>0.007195</td>\n",
|
| 418 |
+
" </tr>\n",
|
| 419 |
+
" <tr>\n",
|
| 420 |
+
" <th>5</th>\n",
|
| 421 |
+
" <td>XGBClassifier</td>\n",
|
| 422 |
+
" <td>10</td>\n",
|
| 423 |
+
" <td>5</td>\n",
|
| 424 |
+
" <td>29087</td>\n",
|
| 425 |
+
" <td>0.25</td>\n",
|
| 426 |
+
" <td>15247</td>\n",
|
| 427 |
+
" <td>0.920376</td>\n",
|
| 428 |
+
" <td>0.978004</td>\n",
|
| 429 |
+
" <td>0.932595</td>\n",
|
| 430 |
+
" <td>0.954760</td>\n",
|
| 431 |
+
" <td>...</td>\n",
|
| 432 |
+
" <td>0.014503</td>\n",
|
| 433 |
+
" <td>0.042829</td>\n",
|
| 434 |
+
" <td>0.009559</td>\n",
|
| 435 |
+
" <td>0.006757</td>\n",
|
| 436 |
+
" <td>0.010323</td>\n",
|
| 437 |
+
" <td>0.013950</td>\n",
|
| 438 |
+
" <td>0.007816</td>\n",
|
| 439 |
+
" <td>0.010859</td>\n",
|
| 440 |
+
" <td>0.013720</td>\n",
|
| 441 |
+
" <td>0.011215</td>\n",
|
| 442 |
+
" </tr>\n",
|
| 443 |
+
" <tr>\n",
|
| 444 |
+
" <th>6</th>\n",
|
| 445 |
+
" <td>XGBClassifier</td>\n",
|
| 446 |
+
" <td>10</td>\n",
|
| 447 |
+
" <td>5</td>\n",
|
| 448 |
+
" <td>63289</td>\n",
|
| 449 |
+
" <td>0.25</td>\n",
|
| 450 |
+
" <td>37405</td>\n",
|
| 451 |
+
" <td>0.918176</td>\n",
|
| 452 |
+
" <td>0.977817</td>\n",
|
| 453 |
+
" <td>0.930484</td>\n",
|
| 454 |
+
" <td>0.953563</td>\n",
|
| 455 |
+
" <td>...</td>\n",
|
| 456 |
+
" <td>0.016545</td>\n",
|
| 457 |
+
" <td>0.049997</td>\n",
|
| 458 |
+
" <td>0.009340</td>\n",
|
| 459 |
+
" <td>0.006492</td>\n",
|
| 460 |
+
" <td>0.005942</td>\n",
|
| 461 |
+
" <td>0.013955</td>\n",
|
| 462 |
+
" <td>0.005050</td>\n",
|
| 463 |
+
" <td>0.008405</td>\n",
|
| 464 |
+
" <td>0.009122</td>\n",
|
| 465 |
+
" <td>0.010000</td>\n",
|
| 466 |
+
" </tr>\n",
|
| 467 |
+
" </tbody>\n",
|
| 468 |
+
"</table>\n",
|
| 469 |
+
"<p>7 rows × 42 columns</p>\n",
|
| 470 |
+
"</div>"
|
| 471 |
+
],
|
| 472 |
+
"text/plain": [
|
| 473 |
+
" model n_estimators max_depth random_state test_size \\\n",
|
| 474 |
+
"0 XGBClassifier 10 5 62618 0.25 \n",
|
| 475 |
+
"1 XGBClassifier 10 5 38092 0.25 \n",
|
| 476 |
+
"2 XGBClassifier 10 5 53379 0.25 \n",
|
| 477 |
+
"3 XGBClassifier 10 5 53990 0.25 \n",
|
| 478 |
+
"4 XGBClassifier 10 5 20157 0.25 \n",
|
| 479 |
+
"5 XGBClassifier 10 5 29087 0.25 \n",
|
| 480 |
+
"6 XGBClassifier 10 5 63289 0.25 \n",
|
| 481 |
+
"\n",
|
| 482 |
+
" random_state_split test_accuracy test_precision test_recall test_f1 \\\n",
|
| 483 |
+
"0 8564 0.919485 0.979252 0.930664 0.954340 \n",
|
| 484 |
+
"1 29471 0.916542 0.979042 0.927811 0.952738 \n",
|
| 485 |
+
"2 53105 0.920031 0.977984 0.932223 0.954555 \n",
|
| 486 |
+
"3 1020 0.920008 0.977727 0.932454 0.954554 \n",
|
| 487 |
+
"4 8650 0.917305 0.978153 0.929241 0.953070 \n",
|
| 488 |
+
"5 15247 0.920376 0.978004 0.932595 0.954760 \n",
|
| 489 |
+
"6 37405 0.918176 0.977817 0.930484 0.953563 \n",
|
| 490 |
+
"\n",
|
| 491 |
+
" ... var14 var15 var16 var17 var18 var19 var20 \\\n",
|
| 492 |
+
"0 ... 0.016507 0.049293 0.008913 0.008224 0.007370 0.017610 0.001872 \n",
|
| 493 |
+
"1 ... 0.014524 0.041489 0.009664 0.007179 0.010653 0.015027 0.000000 \n",
|
| 494 |
+
"2 ... 0.029407 0.042983 0.011433 0.007371 0.007829 0.019937 0.006534 \n",
|
| 495 |
+
"3 ... 0.017415 0.040748 0.009362 0.009092 0.004017 0.019394 0.000000 \n",
|
| 496 |
+
"4 ... 0.019698 0.052578 0.011287 0.006563 0.005683 0.013883 0.008399 \n",
|
| 497 |
+
"5 ... 0.014503 0.042829 0.009559 0.006757 0.010323 0.013950 0.007816 \n",
|
| 498 |
+
"6 ... 0.016545 0.049997 0.009340 0.006492 0.005942 0.013955 0.005050 \n",
|
| 499 |
+
"\n",
|
| 500 |
+
" var21 var22 var23 \n",
|
| 501 |
+
"0 0.015215 0.010002 0.011387 \n",
|
| 502 |
+
"1 0.012734 0.009035 0.009792 \n",
|
| 503 |
+
"2 0.014162 0.008016 0.010182 \n",
|
| 504 |
+
"3 0.013799 0.011235 0.011108 \n",
|
| 505 |
+
"4 0.010119 0.008999 0.007195 \n",
|
| 506 |
+
"5 0.010859 0.013720 0.011215 \n",
|
| 507 |
+
"6 0.008405 0.009122 0.010000 \n",
|
| 508 |
+
"\n",
|
| 509 |
+
"[7 rows x 42 columns]"
|
| 510 |
+
]
|
| 511 |
+
},
|
| 512 |
+
"execution_count": 6,
|
| 513 |
+
"metadata": {},
|
| 514 |
+
"output_type": "execute_result"
|
| 515 |
+
}
|
| 516 |
+
],
|
| 517 |
+
"source": [
|
| 518 |
+
"# how many duplicated runs of the models\n",
|
| 519 |
+
"# for averaging results\n",
|
| 520 |
+
"num_fits = 10\n",
|
| 521 |
+
"test_size = 0.25\n",
|
| 522 |
+
"\n",
|
| 523 |
+
"param_dicts = []\n",
|
| 524 |
+
"report_dicts = []\n",
|
| 525 |
+
"fi_dicts = []\n",
|
| 526 |
+
"itr = 0\n",
|
| 527 |
+
"\n",
|
| 528 |
+
"for i in range(num_fits):\n",
|
| 529 |
+
"\n",
|
| 530 |
+
" # randomize the train test split\n",
|
| 531 |
+
" rsp = np.random.randint(0,2**16)\n",
|
| 532 |
+
" X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=rsp)\n",
|
| 533 |
+
"\n",
|
| 534 |
+
" print(f\"Fit {i}\")\n",
|
| 535 |
+
" \n",
|
| 536 |
+
" for _ in param_combos:\n",
|
| 537 |
+
" \n",
|
| 538 |
+
" itr += 1\n",
|
| 539 |
+
" \n",
|
| 540 |
+
" # manipulate the parameter combinations\n",
|
| 541 |
+
" rs = np.random.randint(0,2**16)\n",
|
| 542 |
+
" d = {k:v for k,v in _.items() if k!='model'}\n",
|
| 543 |
+
" d['random_state'] = rs\n",
|
| 544 |
+
" \n",
|
| 545 |
+
" # fit the model\n",
|
| 546 |
+
" model = _['model'](**d)\n",
|
| 547 |
+
" model.fit(X_train, y_train)\n",
|
| 548 |
+
" y_test_pred = model.predict(X_test)\n",
|
| 549 |
+
" y_test_pred_proba = model.predict_proba(X_test)\n",
|
| 550 |
+
" y_train_pred = model.predict(X_train)\n",
|
| 551 |
+
" y_train_pred_proba = model.predict_proba(X_train)\n",
|
| 552 |
+
" \n",
|
| 553 |
+
" # parameters dictionary\n",
|
| 554 |
+
" param_dict = copy.deepcopy(d)\n",
|
| 555 |
+
" if type(model) == type(xgb.XGBClassifier()):\n",
|
| 556 |
+
" param_dict['model'] = 'XGBClassifier'\n",
|
| 557 |
+
" elif type(model) == type(sklearn.ensemble.RandomForestClassifier()):\n",
|
| 558 |
+
" param_dict['model'] = 'RandomForestClassifier'\n",
|
| 559 |
+
" elif type(model) == type(sklearn.ensemble.GradientBoostingClassifier()):\n",
|
| 560 |
+
" param_dict['model'] = 'GradientBoostingClassifier'\n",
|
| 561 |
+
" param_dict['unique_id'] = itr\n",
|
| 562 |
+
" param_dict['test_size'] = test_size\n",
|
| 563 |
+
" param_dict['random_state_split'] = rsp\n",
|
| 564 |
+
" param_dicts.append(param_dict)\n",
|
| 565 |
+
" \n",
|
| 566 |
+
" # report dictionary to compute and save\n",
|
| 567 |
+
" report_dict = {\n",
|
| 568 |
+
" 'unique_id' : itr,\n",
|
| 569 |
+
" 'test_accuracy' : sklearn.metrics.accuracy_score(y_test_pred, y_test),\n",
|
| 570 |
+
" 'test_precision' : sklearn.metrics.precision_score(y_test_pred, y_test),\n",
|
| 571 |
+
" 'test_recall' : sklearn.metrics.recall_score(y_test_pred, y_test),\n",
|
| 572 |
+
" 'test_f1' : sklearn.metrics.f1_score(y_test_pred, y_test),\n",
|
| 573 |
+
" 'test_auroc' : sklearn.metrics.roc_auc_score(y_test, y_test_pred_proba[:,1]),\n",
|
| 574 |
+
" 'test_log_loss' : sklearn.metrics.log_loss(y_test, y_test_pred_proba[:,1]),\n",
|
| 575 |
+
" 'train_accuracy' : sklearn.metrics.accuracy_score(y_train_pred, y_train),\n",
|
| 576 |
+
" 'train_precision' : sklearn.metrics.precision_score(y_train_pred, y_train),\n",
|
| 577 |
+
" 'train_recall' : sklearn.metrics.recall_score(y_train_pred, y_train),\n",
|
| 578 |
+
" 'train_f1' : sklearn.metrics.f1_score(y_train_pred, y_train),\n",
|
| 579 |
+
" 'train_auroc' : sklearn.metrics.roc_auc_score(y_train, y_train_pred_proba[:,1]),\n",
|
| 580 |
+
" 'train_log_loss' : sklearn.metrics.log_loss(y_train, y_train_pred_proba[:,1]),\n",
|
| 581 |
+
" }\n",
|
| 582 |
+
" report_dicts.append(report_dict)\n",
|
| 583 |
+
"\n",
|
| 584 |
+
" # record feature importances\n",
|
| 585 |
+
" fi = model.feature_importances_\n",
|
| 586 |
+
" fi_dict = {'var'+str(i) : float(j) for i,j in enumerate(model.feature_importances_)}\n",
|
| 587 |
+
" fi_dict['unique_id'] = itr\n",
|
| 588 |
+
" fi_dicts.append(fi_dict)\n",
|
| 589 |
+
"\n",
|
| 590 |
+
"fi_table = pd.DataFrame(fi_dicts)\n",
|
| 591 |
+
"report_table = pd.DataFrame(report_dicts)\n",
|
| 592 |
+
"param_table = pd.DataFrame(param_dicts)\n",
|
| 593 |
+
"merged_table = pd.merge(param_table, report_table, on='unique_id')[['model','unique_id'] + \\\n",
|
| 594 |
+
" [k for k in param_dict.keys() if k not in ['model','unique_id']] + \\\n",
|
| 595 |
+
" [k for k in report_dict.keys() if k != 'unique_id']\n",
|
| 596 |
+
"]\n",
|
| 597 |
+
"merged_table = pd.merge(merged_table, fi_table, on='unique_id')\n",
|
| 598 |
+
"merged_table.drop('unique_id',axis=1,inplace=True)\n",
|
| 599 |
+
"merged_table.sort_values(by=['model'] + [k for k in d.keys() if k != 'random_state'],inplace=True)\n",
|
| 600 |
+
"merged_table.reset_index(inplace=True,drop=True)\n",
|
| 601 |
+
"merged_table.head(7)"
|
| 602 |
+
]
|
| 603 |
+
},
|
| 604 |
+
{
|
| 605 |
+
"cell_type": "code",
|
| 606 |
+
"execution_count": 7,
|
| 607 |
+
"id": "64953ef5-5206-4d47-9519-311c858eb8a1",
|
| 608 |
+
"metadata": {
|
| 609 |
+
"execution": {
|
| 610 |
+
"iopub.execute_input": "2025-08-21T23:09:34.553963Z",
|
| 611 |
+
"iopub.status.busy": "2025-08-21T23:09:34.553337Z",
|
| 612 |
+
"iopub.status.idle": "2025-08-21T23:09:34.566190Z",
|
| 613 |
+
"shell.execute_reply": "2025-08-21T23:09:34.565837Z"
|
| 614 |
+
},
|
| 615 |
+
"papermill": {
|
| 616 |
+
"duration": 0.0184,
|
| 617 |
+
"end_time": "2025-08-21T23:09:34.566962",
|
| 618 |
+
"exception": false,
|
| 619 |
+
"start_time": "2025-08-21T23:09:34.548562",
|
| 620 |
+
"status": "completed"
|
| 621 |
+
},
|
| 622 |
+
"tags": []
|
| 623 |
+
},
|
| 624 |
+
"outputs": [
|
| 625 |
+
{
|
| 626 |
+
"data": {
|
| 627 |
+
"text/html": [
|
| 628 |
+
"<div>\n",
|
| 629 |
+
"<style scoped>\n",
|
| 630 |
+
" .dataframe tbody tr th:only-of-type {\n",
|
| 631 |
+
" vertical-align: middle;\n",
|
| 632 |
+
" }\n",
|
| 633 |
+
"\n",
|
| 634 |
+
" .dataframe tbody tr th {\n",
|
| 635 |
+
" vertical-align: top;\n",
|
| 636 |
+
" }\n",
|
| 637 |
+
"\n",
|
| 638 |
+
" .dataframe thead th {\n",
|
| 639 |
+
" text-align: right;\n",
|
| 640 |
+
" }\n",
|
| 641 |
+
"</style>\n",
|
| 642 |
+
"<table border=\"1\" class=\"dataframe\">\n",
|
| 643 |
+
" <thead>\n",
|
| 644 |
+
" <tr style=\"text-align: right;\">\n",
|
| 645 |
+
" <th></th>\n",
|
| 646 |
+
" <th>n_estimators</th>\n",
|
| 647 |
+
" <th>max_depth</th>\n",
|
| 648 |
+
" <th>test_accuracy</th>\n",
|
| 649 |
+
" <th>train_accuracy</th>\n",
|
| 650 |
+
" <th>test_f1</th>\n",
|
| 651 |
+
" <th>train_f1</th>\n",
|
| 652 |
+
" <th>test_precision</th>\n",
|
| 653 |
+
" <th>train_precision</th>\n",
|
| 654 |
+
" <th>test_recall</th>\n",
|
| 655 |
+
" <th>train_recall</th>\n",
|
| 656 |
+
" </tr>\n",
|
| 657 |
+
" </thead>\n",
|
| 658 |
+
" <tbody>\n",
|
| 659 |
+
" <tr>\n",
|
| 660 |
+
" <th>0</th>\n",
|
| 661 |
+
" <td>10</td>\n",
|
| 662 |
+
" <td>5</td>\n",
|
| 663 |
+
" <td>0.919485</td>\n",
|
| 664 |
+
" <td>0.919226</td>\n",
|
| 665 |
+
" <td>0.954340</td>\n",
|
| 666 |
+
" <td>0.954169</td>\n",
|
| 667 |
+
" <td>0.979252</td>\n",
|
| 668 |
+
" <td>0.979097</td>\n",
|
| 669 |
+
" <td>0.930664</td>\n",
|
| 670 |
+
" <td>0.930478</td>\n",
|
| 671 |
+
" </tr>\n",
|
| 672 |
+
" <tr>\n",
|
| 673 |
+
" <th>1</th>\n",
|
| 674 |
+
" <td>10</td>\n",
|
| 675 |
+
" <td>5</td>\n",
|
| 676 |
+
" <td>0.916542</td>\n",
|
| 677 |
+
" <td>0.916715</td>\n",
|
| 678 |
+
" <td>0.952738</td>\n",
|
| 679 |
+
" <td>0.952818</td>\n",
|
| 680 |
+
" <td>0.979042</td>\n",
|
| 681 |
+
" <td>0.979237</td>\n",
|
| 682 |
+
" <td>0.927811</td>\n",
|
| 683 |
+
" <td>0.927787</td>\n",
|
| 684 |
+
" </tr>\n",
|
| 685 |
+
" <tr>\n",
|
| 686 |
+
" <th>2</th>\n",
|
| 687 |
+
" <td>10</td>\n",
|
| 688 |
+
" <td>5</td>\n",
|
| 689 |
+
" <td>0.920031</td>\n",
|
| 690 |
+
" <td>0.919793</td>\n",
|
| 691 |
+
" <td>0.954555</td>\n",
|
| 692 |
+
" <td>0.954422</td>\n",
|
| 693 |
+
" <td>0.977984</td>\n",
|
| 694 |
+
" <td>0.977704</td>\n",
|
| 695 |
+
" <td>0.932223</td>\n",
|
| 696 |
+
" <td>0.932222</td>\n",
|
| 697 |
+
" </tr>\n",
|
| 698 |
+
" <tr>\n",
|
| 699 |
+
" <th>3</th>\n",
|
| 700 |
+
" <td>10</td>\n",
|
| 701 |
+
" <td>5</td>\n",
|
| 702 |
+
" <td>0.920008</td>\n",
|
| 703 |
+
" <td>0.919787</td>\n",
|
| 704 |
+
" <td>0.954554</td>\n",
|
| 705 |
+
" <td>0.954414</td>\n",
|
| 706 |
+
" <td>0.977727</td>\n",
|
| 707 |
+
" <td>0.977783</td>\n",
|
| 708 |
+
" <td>0.932454</td>\n",
|
| 709 |
+
" <td>0.932137</td>\n",
|
| 710 |
+
" </tr>\n",
|
| 711 |
+
" <tr>\n",
|
| 712 |
+
" <th>4</th>\n",
|
| 713 |
+
" <td>10</td>\n",
|
| 714 |
+
" <td>5</td>\n",
|
| 715 |
+
" <td>0.917305</td>\n",
|
| 716 |
+
" <td>0.917457</td>\n",
|
| 717 |
+
" <td>0.953070</td>\n",
|
| 718 |
+
" <td>0.953178</td>\n",
|
| 719 |
+
" <td>0.978153</td>\n",
|
| 720 |
+
" <td>0.978051</td>\n",
|
| 721 |
+
" <td>0.929241</td>\n",
|
| 722 |
+
" <td>0.929539</td>\n",
|
| 723 |
+
" </tr>\n",
|
| 724 |
+
" <tr>\n",
|
| 725 |
+
" <th>5</th>\n",
|
| 726 |
+
" <td>10</td>\n",
|
| 727 |
+
" <td>5</td>\n",
|
| 728 |
+
" <td>0.920376</td>\n",
|
| 729 |
+
" <td>0.920645</td>\n",
|
| 730 |
+
" <td>0.954760</td>\n",
|
| 731 |
+
" <td>0.954898</td>\n",
|
| 732 |
+
" <td>0.978004</td>\n",
|
| 733 |
+
" <td>0.978137</td>\n",
|
| 734 |
+
" <td>0.932595</td>\n",
|
| 735 |
+
" <td>0.932737</td>\n",
|
| 736 |
+
" </tr>\n",
|
| 737 |
+
" <tr>\n",
|
| 738 |
+
" <th>6</th>\n",
|
| 739 |
+
" <td>10</td>\n",
|
| 740 |
+
" <td>5</td>\n",
|
| 741 |
+
" <td>0.918176</td>\n",
|
| 742 |
+
" <td>0.918210</td>\n",
|
| 743 |
+
" <td>0.953563</td>\n",
|
| 744 |
+
" <td>0.953576</td>\n",
|
| 745 |
+
" <td>0.977817</td>\n",
|
| 746 |
+
" <td>0.978125</td>\n",
|
| 747 |
+
" <td>0.930484</td>\n",
|
| 748 |
+
" <td>0.930228</td>\n",
|
| 749 |
+
" </tr>\n",
|
| 750 |
+
" <tr>\n",
|
| 751 |
+
" <th>7</th>\n",
|
| 752 |
+
" <td>10</td>\n",
|
| 753 |
+
" <td>5</td>\n",
|
| 754 |
+
" <td>0.918414</td>\n",
|
| 755 |
+
" <td>0.918595</td>\n",
|
| 756 |
+
" <td>0.953670</td>\n",
|
| 757 |
+
" <td>0.953794</td>\n",
|
| 758 |
+
" <td>0.977988</td>\n",
|
| 759 |
+
" <td>0.978111</td>\n",
|
| 760 |
+
" <td>0.930531</td>\n",
|
| 761 |
+
" <td>0.930657</td>\n",
|
| 762 |
+
" </tr>\n",
|
| 763 |
+
" <tr>\n",
|
| 764 |
+
" <th>8</th>\n",
|
| 765 |
+
" <td>10</td>\n",
|
| 766 |
+
" <td>5</td>\n",
|
| 767 |
+
" <td>0.918874</td>\n",
|
| 768 |
+
" <td>0.918810</td>\n",
|
| 769 |
+
" <td>0.953964</td>\n",
|
| 770 |
+
" <td>0.953909</td>\n",
|
| 771 |
+
" <td>0.978158</td>\n",
|
| 772 |
+
" <td>0.978360</td>\n",
|
| 773 |
+
" <td>0.930937</td>\n",
|
| 774 |
+
" <td>0.930651</td>\n",
|
| 775 |
+
" </tr>\n",
|
| 776 |
+
" <tr>\n",
|
| 777 |
+
" <th>9</th>\n",
|
| 778 |
+
" <td>10</td>\n",
|
| 779 |
+
" <td>5</td>\n",
|
| 780 |
+
" <td>0.919651</td>\n",
|
| 781 |
+
" <td>0.919729</td>\n",
|
| 782 |
+
" <td>0.954408</td>\n",
|
| 783 |
+
" <td>0.954418</td>\n",
|
| 784 |
+
" <td>0.978668</td>\n",
|
| 785 |
+
" <td>0.978603</td>\n",
|
| 786 |
+
" <td>0.931322</td>\n",
|
| 787 |
+
" <td>0.931399</td>\n",
|
| 788 |
+
" </tr>\n",
|
| 789 |
+
" <tr>\n",
|
| 790 |
+
" <th>10</th>\n",
|
| 791 |
+
" <td>10</td>\n",
|
| 792 |
+
" <td>10</td>\n",
|
| 793 |
+
" <td>0.947473</td>\n",
|
| 794 |
+
" <td>0.949585</td>\n",
|
| 795 |
+
" <td>0.969781</td>\n",
|
| 796 |
+
" <td>0.970971</td>\n",
|
| 797 |
+
" <td>0.980907</td>\n",
|
| 798 |
+
" <td>0.981816</td>\n",
|
| 799 |
+
" <td>0.958904</td>\n",
|
| 800 |
+
" <td>0.960363</td>\n",
|
| 801 |
+
" </tr>\n",
|
| 802 |
+
" <tr>\n",
|
| 803 |
+
" <th>11</th>\n",
|
| 804 |
+
" <td>10</td>\n",
|
| 805 |
+
" <td>10</td>\n",
|
| 806 |
+
" <td>0.947651</td>\n",
|
| 807 |
+
" <td>0.949634</td>\n",
|
| 808 |
+
" <td>0.969879</td>\n",
|
| 809 |
+
" <td>0.970994</td>\n",
|
| 810 |
+
" <td>0.980909</td>\n",
|
| 811 |
+
" <td>0.981657</td>\n",
|
| 812 |
+
" <td>0.959094</td>\n",
|
| 813 |
+
" <td>0.960561</td>\n",
|
| 814 |
+
" </tr>\n",
|
| 815 |
+
" <tr>\n",
|
| 816 |
+
" <th>12</th>\n",
|
| 817 |
+
" <td>10</td>\n",
|
| 818 |
+
" <td>10</td>\n",
|
| 819 |
+
" <td>0.947336</td>\n",
|
| 820 |
+
" <td>0.949079</td>\n",
|
| 821 |
+
" <td>0.969703</td>\n",
|
| 822 |
+
" <td>0.970703</td>\n",
|
| 823 |
+
" <td>0.981415</td>\n",
|
| 824 |
+
" <td>0.982128</td>\n",
|
| 825 |
+
" <td>0.958268</td>\n",
|
| 826 |
+
" <td>0.959541</td>\n",
|
| 827 |
+
" </tr>\n",
|
| 828 |
+
" <tr>\n",
|
| 829 |
+
" <th>13</th>\n",
|
| 830 |
+
" <td>10</td>\n",
|
| 831 |
+
" <td>10</td>\n",
|
| 832 |
+
" <td>0.948380</td>\n",
|
| 833 |
+
" <td>0.950519</td>\n",
|
| 834 |
+
" <td>0.970292</td>\n",
|
| 835 |
+
" <td>0.971503</td>\n",
|
| 836 |
+
" <td>0.981094</td>\n",
|
| 837 |
+
" <td>0.982121</td>\n",
|
| 838 |
+
" <td>0.959725</td>\n",
|
| 839 |
+
" <td>0.961112</td>\n",
|
| 840 |
+
" </tr>\n",
|
| 841 |
+
" <tr>\n",
|
| 842 |
+
" <th>14</th>\n",
|
| 843 |
+
" <td>10</td>\n",
|
| 844 |
+
" <td>10</td>\n",
|
| 845 |
+
" <td>0.946944</td>\n",
|
| 846 |
+
" <td>0.949269</td>\n",
|
| 847 |
+
" <td>0.969474</td>\n",
|
| 848 |
+
" <td>0.970812</td>\n",
|
| 849 |
+
" <td>0.981403</td>\n",
|
| 850 |
+
" <td>0.982112</td>\n",
|
| 851 |
+
" <td>0.957831</td>\n",
|
| 852 |
+
" <td>0.959769</td>\n",
|
| 853 |
+
" </tr>\n",
|
| 854 |
+
" <tr>\n",
|
| 855 |
+
" <th>15</th>\n",
|
| 856 |
+
" <td>10</td>\n",
|
| 857 |
+
" <td>10</td>\n",
|
| 858 |
+
" <td>0.947763</td>\n",
|
| 859 |
+
" <td>0.949670</td>\n",
|
| 860 |
+
" <td>0.969946</td>\n",
|
| 861 |
+
" <td>0.971027</td>\n",
|
| 862 |
+
" <td>0.981195</td>\n",
|
| 863 |
+
" <td>0.982040</td>\n",
|
| 864 |
+
" <td>0.958953</td>\n",
|
| 865 |
+
" <td>0.960259</td>\n",
|
| 866 |
+
" </tr>\n",
|
| 867 |
+
" <tr>\n",
|
| 868 |
+
" <th>16</th>\n",
|
| 869 |
+
" <td>10</td>\n",
|
| 870 |
+
" <td>10</td>\n",
|
| 871 |
+
" <td>0.947479</td>\n",
|
| 872 |
+
" <td>0.949631</td>\n",
|
| 873 |
+
" <td>0.969787</td>\n",
|
| 874 |
+
" <td>0.971009</td>\n",
|
| 875 |
+
" <td>0.981054</td>\n",
|
| 876 |
+
" <td>0.982229</td>\n",
|
| 877 |
+
" <td>0.958775</td>\n",
|
| 878 |
+
" <td>0.960043</td>\n",
|
| 879 |
+
" </tr>\n",
|
| 880 |
+
" <tr>\n",
|
| 881 |
+
" <th>17</th>\n",
|
| 882 |
+
" <td>10</td>\n",
|
| 883 |
+
" <td>10</td>\n",
|
| 884 |
+
" <td>0.946731</td>\n",
|
| 885 |
+
" <td>0.948403</td>\n",
|
| 886 |
+
" <td>0.969353</td>\n",
|
| 887 |
+
" <td>0.970316</td>\n",
|
| 888 |
+
" <td>0.981196</td>\n",
|
| 889 |
+
" <td>0.981756</td>\n",
|
| 890 |
+
" <td>0.957793</td>\n",
|
| 891 |
+
" <td>0.959140</td>\n",
|
| 892 |
+
" </tr>\n",
|
| 893 |
+
" <tr>\n",
|
| 894 |
+
" <th>18</th>\n",
|
| 895 |
+
" <td>10</td>\n",
|
| 896 |
+
" <td>10</td>\n",
|
| 897 |
+
" <td>0.947526</td>\n",
|
| 898 |
+
" <td>0.949605</td>\n",
|
| 899 |
+
" <td>0.969822</td>\n",
|
| 900 |
+
" <td>0.971000</td>\n",
|
| 901 |
+
" <td>0.981211</td>\n",
|
| 902 |
+
" <td>0.982460</td>\n",
|
| 903 |
+
" <td>0.958694</td>\n",
|
| 904 |
+
" <td>0.959805</td>\n",
|
| 905 |
+
" </tr>\n",
|
| 906 |
+
" <tr>\n",
|
| 907 |
+
" <th>19</th>\n",
|
| 908 |
+
" <td>10</td>\n",
|
| 909 |
+
" <td>10</td>\n",
|
| 910 |
+
" <td>0.947421</td>\n",
|
| 911 |
+
" <td>0.949485</td>\n",
|
| 912 |
+
" <td>0.969779</td>\n",
|
| 913 |
+
" <td>0.970932</td>\n",
|
| 914 |
+
" <td>0.981709</td>\n",
|
| 915 |
+
" <td>0.982450</td>\n",
|
| 916 |
+
" <td>0.958135</td>\n",
|
| 917 |
+
" <td>0.959682</td>\n",
|
| 918 |
+
" </tr>\n",
|
| 919 |
+
" </tbody>\n",
|
| 920 |
+
"</table>\n",
|
| 921 |
+
"</div>"
|
| 922 |
+
],
|
| 923 |
+
"text/plain": [
|
| 924 |
+
" n_estimators max_depth test_accuracy train_accuracy test_f1 \\\n",
|
| 925 |
+
"0 10 5 0.919485 0.919226 0.954340 \n",
|
| 926 |
+
"1 10 5 0.916542 0.916715 0.952738 \n",
|
| 927 |
+
"2 10 5 0.920031 0.919793 0.954555 \n",
|
| 928 |
+
"3 10 5 0.920008 0.919787 0.954554 \n",
|
| 929 |
+
"4 10 5 0.917305 0.917457 0.953070 \n",
|
| 930 |
+
"5 10 5 0.920376 0.920645 0.954760 \n",
|
| 931 |
+
"6 10 5 0.918176 0.918210 0.953563 \n",
|
| 932 |
+
"7 10 5 0.918414 0.918595 0.953670 \n",
|
| 933 |
+
"8 10 5 0.918874 0.918810 0.953964 \n",
|
| 934 |
+
"9 10 5 0.919651 0.919729 0.954408 \n",
|
| 935 |
+
"10 10 10 0.947473 0.949585 0.969781 \n",
|
| 936 |
+
"11 10 10 0.947651 0.949634 0.969879 \n",
|
| 937 |
+
"12 10 10 0.947336 0.949079 0.969703 \n",
|
| 938 |
+
"13 10 10 0.948380 0.950519 0.970292 \n",
|
| 939 |
+
"14 10 10 0.946944 0.949269 0.969474 \n",
|
| 940 |
+
"15 10 10 0.947763 0.949670 0.969946 \n",
|
| 941 |
+
"16 10 10 0.947479 0.949631 0.969787 \n",
|
| 942 |
+
"17 10 10 0.946731 0.948403 0.969353 \n",
|
| 943 |
+
"18 10 10 0.947526 0.949605 0.969822 \n",
|
| 944 |
+
"19 10 10 0.947421 0.949485 0.969779 \n",
|
| 945 |
+
"\n",
|
| 946 |
+
" train_f1 test_precision train_precision test_recall train_recall \n",
|
| 947 |
+
"0 0.954169 0.979252 0.979097 0.930664 0.930478 \n",
|
| 948 |
+
"1 0.952818 0.979042 0.979237 0.927811 0.927787 \n",
|
| 949 |
+
"2 0.954422 0.977984 0.977704 0.932223 0.932222 \n",
|
| 950 |
+
"3 0.954414 0.977727 0.977783 0.932454 0.932137 \n",
|
| 951 |
+
"4 0.953178 0.978153 0.978051 0.929241 0.929539 \n",
|
| 952 |
+
"5 0.954898 0.978004 0.978137 0.932595 0.932737 \n",
|
| 953 |
+
"6 0.953576 0.977817 0.978125 0.930484 0.930228 \n",
|
| 954 |
+
"7 0.953794 0.977988 0.978111 0.930531 0.930657 \n",
|
| 955 |
+
"8 0.953909 0.978158 0.978360 0.930937 0.930651 \n",
|
| 956 |
+
"9 0.954418 0.978668 0.978603 0.931322 0.931399 \n",
|
| 957 |
+
"10 0.970971 0.980907 0.981816 0.958904 0.960363 \n",
|
| 958 |
+
"11 0.970994 0.980909 0.981657 0.959094 0.960561 \n",
|
| 959 |
+
"12 0.970703 0.981415 0.982128 0.958268 0.959541 \n",
|
| 960 |
+
"13 0.971503 0.981094 0.982121 0.959725 0.961112 \n",
|
| 961 |
+
"14 0.970812 0.981403 0.982112 0.957831 0.959769 \n",
|
| 962 |
+
"15 0.971027 0.981195 0.982040 0.958953 0.960259 \n",
|
| 963 |
+
"16 0.971009 0.981054 0.982229 0.958775 0.960043 \n",
|
| 964 |
+
"17 0.970316 0.981196 0.981756 0.957793 0.959140 \n",
|
| 965 |
+
"18 0.971000 0.981211 0.982460 0.958694 0.959805 \n",
|
| 966 |
+
"19 0.970932 0.981709 0.982450 0.958135 0.959682 "
|
| 967 |
+
]
|
| 968 |
+
},
|
| 969 |
+
"execution_count": 7,
|
| 970 |
+
"metadata": {},
|
| 971 |
+
"output_type": "execute_result"
|
| 972 |
+
}
|
| 973 |
+
],
|
| 974 |
+
"source": [
|
| 975 |
+
"merged_table.head(20)[['n_estimators', 'max_depth',\n",
|
| 976 |
+
" 'test_accuracy','train_accuracy',\n",
|
| 977 |
+
" 'test_f1','train_f1',\n",
|
| 978 |
+
" 'test_precision','train_precision',\n",
|
| 979 |
+
" 'test_recall','train_recall',]]"
|
| 980 |
+
]
|
| 981 |
+
},
|
| 982 |
+
{
|
| 983 |
+
"cell_type": "code",
|
| 984 |
+
"execution_count": 8,
|
| 985 |
+
"id": "db7f4f13-a696-4314-b0f4-028852863573",
|
| 986 |
+
"metadata": {
|
| 987 |
+
"execution": {
|
| 988 |
+
"iopub.execute_input": "2025-08-21T23:09:34.576689Z",
|
| 989 |
+
"iopub.status.busy": "2025-08-21T23:09:34.576115Z",
|
| 990 |
+
"iopub.status.idle": "2025-08-21T23:09:34.590011Z",
|
| 991 |
+
"shell.execute_reply": "2025-08-21T23:09:34.589688Z"
|
| 992 |
+
},
|
| 993 |
+
"papermill": {
|
| 994 |
+
"duration": 0.019834,
|
| 995 |
+
"end_time": "2025-08-21T23:09:34.590874",
|
| 996 |
+
"exception": false,
|
| 997 |
+
"start_time": "2025-08-21T23:09:34.571040",
|
| 998 |
+
"status": "completed"
|
| 999 |
+
},
|
| 1000 |
+
"tags": []
|
| 1001 |
+
},
|
| 1002 |
+
"outputs": [],
|
| 1003 |
+
"source": [
|
| 1004 |
+
"# merged_table.to_csv('third_model_results.csv',index=False,header=True)\n",
|
| 1005 |
+
"merged_table.to_csv(output_file,index=False,header=True)"
|
| 1006 |
+
]
|
| 1007 |
+
}
|
| 1008 |
+
],
|
| 1009 |
+
"metadata": {
|
| 1010 |
+
"kernelspec": {
|
| 1011 |
+
"display_name": "Birdclef",
|
| 1012 |
+
"language": "python",
|
| 1013 |
+
"name": "birdclef"
|
| 1014 |
+
},
|
| 1015 |
+
"language_info": {
|
| 1016 |
+
"codemirror_mode": {
|
| 1017 |
+
"name": "ipython",
|
| 1018 |
+
"version": 3
|
| 1019 |
+
},
|
| 1020 |
+
"file_extension": ".py",
|
| 1021 |
+
"mimetype": "text/x-python",
|
| 1022 |
+
"name": "python",
|
| 1023 |
+
"nbconvert_exporter": "python",
|
| 1024 |
+
"pygments_lexer": "ipython3",
|
| 1025 |
+
"version": "3.12.11"
|
| 1026 |
+
},
|
| 1027 |
+
"papermill": {
|
| 1028 |
+
"default_parameters": {},
|
| 1029 |
+
"duration": 3943.115939,
|
| 1030 |
+
"end_time": "2025-08-21T23:09:35.210970",
|
| 1031 |
+
"environment_variables": {},
|
| 1032 |
+
"exception": null,
|
| 1033 |
+
"input_path": "fit_model.ipynb",
|
| 1034 |
+
"output_path": "ran/fit_model.ipynb",
|
| 1035 |
+
"parameters": {
|
| 1036 |
+
"input_file": "xgb_rnd3_next.csv",
|
| 1037 |
+
"output_file": "third_model_results.csv"
|
| 1038 |
+
},
|
| 1039 |
+
"start_time": "2025-08-21T22:03:52.095031",
|
| 1040 |
+
"version": "2.6.0"
|
| 1041 |
+
}
|
| 1042 |
+
},
|
| 1043 |
+
"nbformat": 4,
|
| 1044 |
+
"nbformat_minor": 5
|
| 1045 |
+
}
|
models/xgboost_third_model.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
models/xgboost_third_model_not_2025.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|