sdtemple commited on
Commit ·
1b8990e
1
Parent(s): dc3c501
metrics
Browse files- .gitignore +7 -0
- README.md +33 -9
.gitignore
CHANGED
|
@@ -1,3 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Store scrappy development code
|
| 2 |
archive/
|
| 3 |
|
|
|
|
| 1 |
+
# audio files
|
| 2 |
+
*.ogg
|
| 3 |
+
*.wav
|
| 4 |
+
*.flac
|
| 5 |
+
*.mp3
|
| 6 |
+
*.m4a
|
| 7 |
+
|
| 8 |
# Store scrappy development code
|
| 9 |
archive/
|
| 10 |
|
README.md
CHANGED
|
@@ -85,10 +85,10 @@ First model iterate
|
|
| 85 |
- I saved the results in the following file
|
| 86 |
3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
|
| 87 |
- This simpler model does not have too large a gap between training and test metrics
|
| 88 |
-
- The test accuracy is
|
| 89 |
-
- The test precision is
|
| 90 |
-
- The test recall is
|
| 91 |
-
- The test AUROC is
|
| 92 |
|
| 93 |
Second model iterate
|
| 94 |
---
|
|
@@ -109,10 +109,14 @@ Second model iterate
|
|
| 109 |
- n_estimators: [10, 20, 50,]
|
| 110 |
- max_depth: [5, 10, 20,]
|
| 111 |
5. I chose the XGBClassifier with 50 estimators and max depth 10
|
| 112 |
-
- The
|
| 113 |
-
- The
|
| 114 |
-
- The
|
| 115 |
-
- The
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
Third model iterate
|
| 118 |
---
|
|
@@ -122,9 +126,29 @@ Third model iterate
|
|
| 122 |
3. Subset Birdclef data to those wth
|
| 123 |
- Predicted presence > 0.90
|
| 124 |
- Amphibia, Insecta, Mammalia as 0 in 2025 data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
|
| 126 |
Non-2025 model
|
| 127 |
---
|
|
|
|
| 128 |
I fit a model like the third iterate but without the Birdclef 2025 data. The point is to evaluate if the model predicts presence for birds not observed in the training data. In the 2025 dataset, there are some birds that are not observed in 2022, 2023, and 2024 datasets.
|
| 129 |
|
| 130 |
-
Because initial model iterates used the 2025 data, there is some data leakage in how pseudo-present bird sounds were determined in the second model iterate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
- I saved the results in the following file
|
| 86 |
3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
|
| 87 |
- This simpler model does not have too large a gap between training and test metrics
|
| 88 |
+
- The test accuracy is 80.40%.
|
| 89 |
+
- The test precision is 79.05%.
|
| 90 |
+
- The test recall is 81.68%.
|
| 91 |
+
- The test AUROC is 88.08%.
|
| 92 |
|
| 93 |
Second model iterate
|
| 94 |
---
|
|
|
|
| 109 |
- n_estimators: [10, 20, 50,]
|
| 110 |
- max_depth: [5, 10, 20,]
|
| 111 |
5. I chose the XGBClassifier with 50 estimators and max depth 10
|
| 112 |
+
- The training accuracy is.
|
| 113 |
+
- The training precision is.
|
| 114 |
+
- The training recall is.
|
| 115 |
+
- The training AUROC is.
|
| 116 |
+
- The test accuracy is 94.51%.
|
| 117 |
+
- The test precision is 96.73%.
|
| 118 |
+
- The test recall is 96.09%.
|
| 119 |
+
- The test AUROC is 98.18%.
|
| 120 |
|
| 121 |
Third model iterate
|
| 122 |
---
|
|
|
|
| 126 |
3. Subset Birdclef data to those wth
|
| 127 |
- Predicted presence > 0.90
|
| 128 |
- Amphibia, Insecta, Mammalia as 0 in 2025 data
|
| 129 |
+
4. I chose the XGBClassifier with 50 estimators and max depth 5
|
| 130 |
+
- The training accuracy is.
|
| 131 |
+
- The training precision is.
|
| 132 |
+
- The training recall is.
|
| 133 |
+
- The training AUROC is.
|
| 134 |
+
- The test accuracy is 94.45%.
|
| 135 |
+
- The test precision is 98.06%.
|
| 136 |
+
- The test recall is 95.60%.
|
| 137 |
+
- The test AUROC is 95.91%.
|
| 138 |
|
| 139 |
Non-2025 model
|
| 140 |
---
|
| 141 |
+
|
| 142 |
I fit a model like the third iterate but without the Birdclef 2025 data. The point is to evaluate if the model predicts presence for birds not observed in the training data. In the 2025 dataset, there are some birds that are not observed in 2022, 2023, and 2024 datasets.
|
| 143 |
|
| 144 |
+
Because initial model iterates used the 2025 data, there is some data leakage in how pseudo-present bird sounds were determined in the second model iterate.
|
| 145 |
+
|
| 146 |
+
I chose the XGBClassifier with 50 estimators and max depth 5.
|
| 147 |
+
- The training accuracy is.
|
| 148 |
+
- The training precision is.
|
| 149 |
+
- The training recall is.
|
| 150 |
+
- The training AUROC is.
|
| 151 |
+
- The test accuracy is 94.85%.
|
| 152 |
+
- The test precision is 97.56%.
|
| 153 |
+
- The test recall is 96.32%.
|
| 154 |
+
- THe test AUROC is 97.54%.
|