ddecosmo's picture
Update README.md
c003126 verified
---
'[object Object]': null
license: mit
datasets:
- ddecosmo/lanternfly_training_dataset
language:
- en
---
# Model Card for {{ model_id | default("Model ID", true) }}
<!-- Provide a quick summary of what the model is/does. -->
This is an off the shelf KDE model from SciPy. It is Kernel Density Estimator,
in this case it is used to track the relative density of lanternfly sightings in Pittsburgh.
## Model Details
### Model Description
This model is a KDE. This is an unsupervised model that
estimates the density of continuous values from discrete points.
This is an off the shelf model from the SciPy library and stored to allow for rapid access.
- **Developed by:** Devin DeCosmo
- **Model type:** Image Classifier
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** SciPy Gaussian KDE
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model is used to estimate the density of values in proportion to each other.
From 0 - 1. In this case, it uses longitude and latitude as X,Y coordinates to perform this analysis.
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
The direct use is classifying our lanternfly sighting samples from our geolocal dataset.
As the Gaussian KDE is a generalized unsupervised learning model, this could be used
for other datsets with latitude/longitude coordinates.
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
KDE's are unable to perform regression or classification on out of set data.
They can only predict concentration within the space of the provided data.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
This KDE can only use the data in our current dataset. At this time
that is data at CMU during Fall 2025. This puts geographic and temporal
contstraints on the current model fit.
This model only shows the highest concentration of lanternflies. It does
not and can not make any estimations of reasons for these density measurments.
Additional tools are needed to use the KDE outputs in useful research tasks.
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
This model is recommended to be used with data gathered with a specific area and time period in mind.
This will allow the KDE to accurately model the data and regions provided.
## Training and Testing Details
### Training and Testing Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
This model was trained on our geolocal dataset rlogh/lanternfly_swatter_training
### Training and Testing Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
KDE models do not train like standard ML models. Instead they read the
entire dataset, or subset of data, and calculate the relative densities based on
the proximity of points.
#### Training and Testing Hyperparameters
The smoothing and calculations of the KDE can be altered depending on the
bandwidth estimation method used.
In this case, the standard value of "scott" was used. This allowed for
a middle ground between distinct small clusters and larger overall trends.
Additional experimentation with the bandwidth method could be necessary
for future datasets with different.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
There are no metrics like accuracy for unsupevised models. To ensure the
data fits the dataset correctly the plot is inspected by hand. This included
testing different bandwith parameters like Scott, silverman, and integer
values to determine the best fit. From this, the scott was determined to
show the most easily readable values for hotspot.
### Results
From this, we have a useful, lightweight model from SciPy that can
rapidly model the relative densities of collected lanternfly data.
The limits of these result from the bandwidth parameters of and limits of the KDE
function. In future if the bandwidth could be adjusted automatically based on the
input region the models could be made more generalizable.
#### Summary
This model is a pre-built KDE from the SciPy library. In this case,
it is being used to map different lanternfly datapoints for research
and user purposes.