ddecosmo commited on
Commit
c003126
·
verified ·
1 Parent(s): a6ec099

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -23
README.md CHANGED
@@ -21,7 +21,7 @@ in this case it is used to track the relative density of lanternfly sightings in
21
  This model is a KDE. This is an unsupervised model that
22
  estimates the density of continuous values from discrete points.
23
 
24
- This model is from the SciPy library and stored to allow for rapid access.
25
 
26
  - **Developed by:** Devin DeCosmo
27
  - **Model type:** Image Classifier
@@ -44,66 +44,85 @@ The direct use is classifying our lanternfly sighting samples from our geolocal
44
  As the Gaussian KDE is a generalized unsupervised learning model, this could be used
45
  for other datsets with latitude/longitude coordinates.
46
 
 
47
  ### Out-of-Scope Use
48
 
49
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
50
 
 
 
51
 
52
 
53
  ## Bias, Risks, and Limitations
54
 
55
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
56
 
 
 
 
 
 
 
 
57
 
58
 
59
  ### Recommendations
60
 
61
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
62
 
 
 
63
 
64
 
 
65
 
66
- ## Training Details
67
-
68
- ### Training Data
69
 
70
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
71
 
72
- rlogh/lanternfly_swatter_training
73
 
74
- ### Training Procedure
75
 
76
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
77
 
 
78
 
79
- #### Training Hyperparameters
80
-
81
 
 
 
 
 
82
 
83
  ## Evaluation
84
 
85
  <!-- This section describes the evaluation protocols and provides the results. -->
86
 
87
- ### Testing Data, Factors & Metrics
88
-
89
- #### Testing Data
90
-
91
- <!-- This should link to a Dataset Card if possible. -->
92
-
93
 
94
- #### Factors
95
-
96
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
97
-
98
-
99
- #### Metrics
100
-
101
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
102
 
 
103
 
 
 
104
 
105
- ### Results
 
 
106
 
107
 
108
  #### Summary
109
 
 
 
 
 
 
21
  This model is a KDE. This is an unsupervised model that
22
  estimates the density of continuous values from discrete points.
23
 
24
+ This is an off the shelf model from the SciPy library and stored to allow for rapid access.
25
 
26
  - **Developed by:** Devin DeCosmo
27
  - **Model type:** Image Classifier
 
44
  As the Gaussian KDE is a generalized unsupervised learning model, this could be used
45
  for other datsets with latitude/longitude coordinates.
46
 
47
+
48
  ### Out-of-Scope Use
49
 
50
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
51
 
52
+ KDE's are unable to perform regression or classification on out of set data.
53
+ They can only predict concentration within the space of the provided data.
54
 
55
 
56
  ## Bias, Risks, and Limitations
57
 
58
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
59
 
60
+ This KDE can only use the data in our current dataset. At this time
61
+ that is data at CMU during Fall 2025. This puts geographic and temporal
62
+ contstraints on the current model fit.
63
+
64
+ This model only shows the highest concentration of lanternflies. It does
65
+ not and can not make any estimations of reasons for these density measurments.
66
+ Additional tools are needed to use the KDE outputs in useful research tasks.
67
 
68
 
69
  ### Recommendations
70
 
71
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
72
 
73
+ This model is recommended to be used with data gathered with a specific area and time period in mind.
74
+ This will allow the KDE to accurately model the data and regions provided.
75
 
76
 
77
+ ## Training and Testing Details
78
 
79
+ ### Training and Testing Data
 
 
80
 
81
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
82
 
83
+ This model was trained on our geolocal dataset rlogh/lanternfly_swatter_training
84
 
85
+ ### Training and Testing Procedure
86
 
87
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
88
+ KDE models do not train like standard ML models. Instead they read the
89
+ entire dataset, or subset of data, and calculate the relative densities based on
90
+ the proximity of points.
91
 
92
+ #### Training and Testing Hyperparameters
93
 
94
+ The smoothing and calculations of the KDE can be altered depending on the
95
+ bandwidth estimation method used.
96
 
97
+ In this case, the standard value of "scott" was used. This allowed for
98
+ a middle ground between distinct small clusters and larger overall trends.
99
+ Additional experimentation with the bandwidth method could be necessary
100
+ for future datasets with different.
101
 
102
  ## Evaluation
103
 
104
  <!-- This section describes the evaluation protocols and provides the results. -->
105
 
106
+ There are no metrics like accuracy for unsupevised models. To ensure the
107
+ data fits the dataset correctly the plot is inspected by hand. This included
108
+ testing different bandwith parameters like Scott, silverman, and integer
109
+ values to determine the best fit. From this, the scott was determined to
110
+ show the most easily readable values for hotspot.
 
111
 
 
 
 
 
 
 
 
 
112
 
113
+ ### Results
114
 
115
+ From this, we have a useful, lightweight model from SciPy that can
116
+ rapidly model the relative densities of collected lanternfly data.
117
 
118
+ The limits of these result from the bandwidth parameters of and limits of the KDE
119
+ function. In future if the bandwidth could be adjusted automatically based on the
120
+ input region the models could be made more generalizable.
121
 
122
 
123
  #### Summary
124
 
125
+ This model is a pre-built KDE from the SciPy library. In this case,
126
+ it is being used to map different lanternfly datapoints for research
127
+ and user purposes.
128
+