sdtemple commited on
Commit
1b8990e
·
1 Parent(s): dc3c501
Files changed (2) hide show
  1. .gitignore +7 -0
  2. README.md +33 -9
.gitignore CHANGED
@@ -1,3 +1,10 @@
 
 
 
 
 
 
 
1
  # Store scrappy development code
2
  archive/
3
 
 
1
+ # audio files
2
+ *.ogg
3
+ *.wav
4
+ *.flac
5
+ *.mp3
6
+ *.m4a
7
+
8
  # Store scrappy development code
9
  archive/
10
 
README.md CHANGED
@@ -85,10 +85,10 @@ First model iterate
85
  - I saved the results in the following file
86
  3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
87
  - This simpler model does not have too large a gap between training and test metrics
88
- - The test accuracy is
89
- - The test precision is
90
- - The test recall is
91
- - The test AUROC is
92
 
93
  Second model iterate
94
  ---
@@ -109,10 +109,14 @@ Second model iterate
109
  - n_estimators: [10, 20, 50,]
110
  - max_depth: [5, 10, 20,]
111
  5. I chose the XGBClassifier with 50 estimators and max depth 10
112
- - The test accuracy is
113
- - The test precision is
114
- - The test recall is
115
- - The test AUROC is
 
 
 
 
116
 
117
  Third model iterate
118
  ---
@@ -122,9 +126,29 @@ Third model iterate
122
  3. Subset Birdclef data to those wth
123
  - Predicted presence > 0.90
124
  - Amphibia, Insecta, Mammalia as 0 in 2025 data
 
 
 
 
 
 
 
 
 
125
 
126
  Non-2025 model
127
  ---
 
128
  I fit a model like the third iterate but without the Birdclef 2025 data. The point is to evaluate if the model predicts presence for birds not observed in the training data. In the 2025 dataset, there are some birds that are not observed in 2022, 2023, and 2024 datasets.
129
 
130
- Because initial model iterates used the 2025 data, there is some data leakage in how pseudo-present bird sounds were determined in the second model iterate.
 
 
 
 
 
 
 
 
 
 
 
85
  - I saved the results in the following file
86
  3. I chose to use the `XGBClassifier` with n_estimators=20 and max_depth=5
87
  - This simpler model does not have too large a gap between training and test metrics
88
+ - The test accuracy is 80.40%.
89
+ - The test precision is 79.05%.
90
+ - The test recall is 81.68%.
91
+ - The test AUROC is 88.08%.
92
 
93
  Second model iterate
94
  ---
 
109
  - n_estimators: [10, 20, 50,]
110
  - max_depth: [5, 10, 20,]
111
  5. I chose the XGBClassifier with 50 estimators and max depth 10
112
+ - The training accuracy is.
113
+ - The training precision is.
114
+ - The training recall is.
115
+ - The training AUROC is.
116
+ - The test accuracy is 94.51%.
117
+ - The test precision is 96.73%.
118
+ - The test recall is 96.09%.
119
+ - The test AUROC is 98.18%.
120
 
121
  Third model iterate
122
  ---
 
126
  3. Subset Birdclef data to those wth
127
  - Predicted presence > 0.90
128
  - Amphibia, Insecta, Mammalia as 0 in 2025 data
129
+ 4. I chose the XGBClassifier with 50 estimators and max depth 5
130
+ - The training accuracy is.
131
+ - The training precision is.
132
+ - The training recall is.
133
+ - The training AUROC is.
134
+ - The test accuracy is 94.45%.
135
+ - The test precision is 98.06%.
136
+ - The test recall is 95.60%.
137
+ - The test AUROC is 95.91%.
138
 
139
  Non-2025 model
140
  ---
141
+
142
  I fit a model like the third iterate but without the Birdclef 2025 data. The point is to evaluate if the model predicts presence for birds not observed in the training data. In the 2025 dataset, there are some birds that are not observed in 2022, 2023, and 2024 datasets.
143
 
144
+ Because initial model iterates used the 2025 data, there is some data leakage in how pseudo-present bird sounds were determined in the second model iterate.
145
+
146
+ I chose the XGBClassifier with 50 estimators and max depth 5.
147
+ - The training accuracy is.
148
+ - The training precision is.
149
+ - The training recall is.
150
+ - The training AUROC is.
151
+ - The test accuracy is 94.85%.
152
+ - The test precision is 97.56%.
153
+ - The test recall is 96.32%.
154
+ - THe test AUROC is 97.54%.