|
|
--- |
|
|
'[object Object]': null |
|
|
license: mit |
|
|
datasets: |
|
|
- ddecosmo/lanternfly_training_dataset |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for {{ model_id | default("Model ID", true) }} |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
|
|
This is an off the shelf KDE model from SciPy. It is Kernel Density Estimator, |
|
|
in this case it is used to track the relative density of lanternfly sightings in Pittsburgh. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model is a KDE. This is an unsupervised model that |
|
|
estimates the density of continuous values from discrete points. |
|
|
|
|
|
This is an off the shelf model from the SciPy library and stored to allow for rapid access. |
|
|
|
|
|
- **Developed by:** Devin DeCosmo |
|
|
- **Model type:** Image Classifier |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** SciPy Gaussian KDE |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
This model is used to estimate the density of values in proportion to each other. |
|
|
From 0 - 1. In this case, it uses longitude and latitude as X,Y coordinates to perform this analysis. |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
|
|
The direct use is classifying our lanternfly sighting samples from our geolocal dataset. |
|
|
As the Gaussian KDE is a generalized unsupervised learning model, this could be used |
|
|
for other datsets with latitude/longitude coordinates. |
|
|
|
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
|
|
KDE's are unable to perform regression or classification on out of set data. |
|
|
They can only predict concentration within the space of the provided data. |
|
|
|
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
|
|
This KDE can only use the data in our current dataset. At this time |
|
|
that is data at CMU during Fall 2025. This puts geographic and temporal |
|
|
contstraints on the current model fit. |
|
|
|
|
|
This model only shows the highest concentration of lanternflies. It does |
|
|
not and can not make any estimations of reasons for these density measurments. |
|
|
Additional tools are needed to use the KDE outputs in useful research tasks. |
|
|
|
|
|
|
|
|
### Recommendations |
|
|
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
|
|
This model is recommended to be used with data gathered with a specific area and time period in mind. |
|
|
This will allow the KDE to accurately model the data and regions provided. |
|
|
|
|
|
|
|
|
## Training and Testing Details |
|
|
|
|
|
### Training and Testing Data |
|
|
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
|
|
This model was trained on our geolocal dataset rlogh/lanternfly_swatter_training |
|
|
|
|
|
### Training and Testing Procedure |
|
|
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
KDE models do not train like standard ML models. Instead they read the |
|
|
entire dataset, or subset of data, and calculate the relative densities based on |
|
|
the proximity of points. |
|
|
|
|
|
#### Training and Testing Hyperparameters |
|
|
|
|
|
The smoothing and calculations of the KDE can be altered depending on the |
|
|
bandwidth estimation method used. |
|
|
|
|
|
In this case, the standard value of "scott" was used. This allowed for |
|
|
a middle ground between distinct small clusters and larger overall trends. |
|
|
Additional experimentation with the bandwidth method could be necessary |
|
|
for future datasets with different. |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
|
|
There are no metrics like accuracy for unsupevised models. To ensure the |
|
|
data fits the dataset correctly the plot is inspected by hand. This included |
|
|
testing different bandwith parameters like Scott, silverman, and integer |
|
|
values to determine the best fit. From this, the scott was determined to |
|
|
show the most easily readable values for hotspot. |
|
|
|
|
|
|
|
|
### Results |
|
|
|
|
|
From this, we have a useful, lightweight model from SciPy that can |
|
|
rapidly model the relative densities of collected lanternfly data. |
|
|
|
|
|
The limits of these result from the bandwidth parameters of and limits of the KDE |
|
|
function. In future if the bandwidth could be adjusted automatically based on the |
|
|
input region the models could be made more generalizable. |
|
|
|
|
|
|
|
|
#### Summary |
|
|
|
|
|
This model is a pre-built KDE from the SciPy library. In this case, |
|
|
it is being used to map different lanternfly datapoints for research |
|
|
and user purposes. |
|
|
|
|
|
|