Model Card for {{ model_id | default("Model ID", true) }}
This is an off the shelf KDE model from SciPy. It is Kernel Density Estimator, in this case it is used to track the relative density of lanternfly sightings in Pittsburgh.
Model Details
Model Description
This model is a KDE. This is an unsupervised model that estimates the density of continuous values from discrete points.
This is an off the shelf model from the SciPy library and stored to allow for rapid access.
- Developed by: Devin DeCosmo
- Model type: Image Classifier
- Language(s) (NLP): English
- License: MIT
- Finetuned from model: SciPy Gaussian KDE
Uses
This model is used to estimate the density of values in proportion to each other. From 0 - 1. In this case, it uses longitude and latitude as X,Y coordinates to perform this analysis.
Direct Use
The direct use is classifying our lanternfly sighting samples from our geolocal dataset. As the Gaussian KDE is a generalized unsupervised learning model, this could be used for other datsets with latitude/longitude coordinates.
Out-of-Scope Use
KDE's are unable to perform regression or classification on out of set data. They can only predict concentration within the space of the provided data.
Bias, Risks, and Limitations
This KDE can only use the data in our current dataset. At this time that is data at CMU during Fall 2025. This puts geographic and temporal contstraints on the current model fit.
This model only shows the highest concentration of lanternflies. It does not and can not make any estimations of reasons for these density measurments. Additional tools are needed to use the KDE outputs in useful research tasks.
Recommendations
This model is recommended to be used with data gathered with a specific area and time period in mind. This will allow the KDE to accurately model the data and regions provided.
Training and Testing Details
Training and Testing Data
This model was trained on our geolocal dataset rlogh/lanternfly_swatter_training
Training and Testing Procedure
KDE models do not train like standard ML models. Instead they read the entire dataset, or subset of data, and calculate the relative densities based on the proximity of points.
Training and Testing Hyperparameters
The smoothing and calculations of the KDE can be altered depending on the bandwidth estimation method used.
In this case, the standard value of "scott" was used. This allowed for a middle ground between distinct small clusters and larger overall trends. Additional experimentation with the bandwidth method could be necessary for future datasets with different.
Evaluation
There are no metrics like accuracy for unsupevised models. To ensure the data fits the dataset correctly the plot is inspected by hand. This included testing different bandwith parameters like Scott, silverman, and integer values to determine the best fit. From this, the scott was determined to show the most easily readable values for hotspot.
Results
From this, we have a useful, lightweight model from SciPy that can rapidly model the relative densities of collected lanternfly data.
The limits of these result from the bandwidth parameters of and limits of the KDE function. In future if the bandwidth could be adjusted automatically based on the input region the models could be made more generalizable.
Summary
This model is a pre-built KDE from the SciPy library. In this case, it is being used to map different lanternfly datapoints for research and user purposes.