ddecosmo
/

lanternfly-kde-model

Joblib

English

Model card Files Files and versions

xet

Community

ddecosmo commited on Oct 10, 2025

Commit

a6ec099

verified ·

1 Parent(s): a47a3a6

Update README.md

Browse files

Files changed (1) hide show

README.md +61 -73

README.md CHANGED Viewed

@@ -1,121 +1,109 @@
 ---
 license: mit
 language:
 - en
-pretty_name: Lanternfly Image Classifier Training Dataset
-datasets:
-- rlogh/lanternfly-data
-- rlogh/lanternfly_swatter_training
-- rlogh/negativesirl
-- uoft-cs/cifar100
-- AI-Lab-Makerere/beans
-- Francesco/insects-mytwu
 ---
-# Dataset Card for {{ pretty_name | default("Dataset Name", true) }}
-This dataset is the training dataset for 24-679 Project 1: Lanternfly Tracker
-It is composed of 360 original lanternfly photos, 150 original photos with no lanternflies, and 800 original photos
-from nature, urban, and other insect datasets listed below.
-These were augmented 50X to 65.1k augmented images.
-## Dataset Details
-### Dataset Description
-- **Curated by:** Carnegie Mellon University: 24-679
-- **Shared by [optional]:** Devin DeCosmo
 - **Language(s) (NLP):** English
 - **License:** MIT
-### Dataset Sources [optional]
-Original Lanternfly Datasets
-  rlogh/lanternfly-data: Original Lanternfly Dataset, 229 unmarked photos
-  rlogh/lanternfly_swatter_training: Dataset with geolocal data: 165 photos
-Original Negative Datasets:
-  rlogh/negativesirl: Negatives dataset, images of outdoor environements and people with no lanternflies. 107 photos
-Total: 501 original images
-Imported Datasets
-uoft-cs/cifar100: General image classifier, no insect class
-AI-Lab-Makerere/beans: Foliage with no insects
-Francesco/insects-mytwu: Insect Images
-Total: 800 additional images imported
-## Uses
-These images were used to train the EfficientNetB1 model, ddecosmo/lanternfly_classifier, on how to classify images
-as containing or not containing lanternflies.
-### Direct Use
-The direct use is identifying photographs containing lanterflies so this could be used for tracking purposes.
-### Out-of-Scope Use
-In future, this model could be adapted to identify other types of insect within this dataset.
-## Dataset Structure
-This dataset consists of two splits
-An original split with 1.3k photos
-An artificial split with 65.1k photos
-The tasks fall into 3 categories based on the building pictured
-1. Lanternflies, all original photos
-2. Other Insect, all 3rd party datasets
-3. No insect, original photos and 3rd party datasets
-## Dataset Creation
-### Source Data
-This data is sourced by the creators, Devin and Rumi for all original photos
-Additional datasets can be found here,
-uoft-cs/cifar100
-AI-Lab-Makerere/beans
-Francesco/insects-mytwu
-#### Data Collection and Processing
-Original datasets were collected using the mobile phones of the authors.
-Additional datasets were recommended by Gemini AI and then validated as fitting the purpose, type, and scope of this process.
-uoft-cs/cifar100: This is a general image identifier with no insect class. Used for no insect for generalizability
-AI-Lab-Makerere/beans: This dataset is focused on vegetation with and without disease, this is used to train the model to recognize
-vegetation without insects/lanterflies.
-Francesco/insects-mytwu: This is an object detection dataset used for identifying insects as subjects, not including lanterflies.
-We are using it train a seperate non-lanternfly insect class.
-#### Who are the source data producers?
-Original data was produced by the authors.
-Additional datasets were produced by,
-uoft-cs/cifar100: Created by University of Toronto Computer Science
-AI-Lab-Makerere/beans: Created by AI Lab Makere
-Francesco/insects-mytwu: Created by Fanscesco Sovrano
-## Bias, Risks, and Limitations
-The main risk of this dataset is the lanternfly split. It contains only images of singular lanternflies on the ground.
-Normally on concrete or asphalt. This severly limits the scope of the environments these creatures appear in.
-Incorporating blob detection or YOLO into future models could mitigate this by focusing on the subject.
-### Recommendations
-This is a large dataset, and has been shown to accurately classify lanternflies, but there are many edge cases when it does not work correctly.
-In order to take this into account, using new types of models with subject detection can make use of the many images while improving model accuracy.

 ---
+'[object Object]': null
 license: mit
+datasets:
+- ddecosmo/lanternfly_training_dataset
 language:
 - en
 ---
+# Model Card for {{ model_id | default("Model ID", true) }}
+<!-- Provide a quick summary of what the model is/does. -->
+This is an off the shelf KDE model from SciPy. It is Kernel Density Estimator,
+in this case it is used to track the relative density of lanternfly sightings in Pittsburgh.
+## Model Details
+### Model Description
+This model is a KDE. This is an unsupervised model that
+estimates the density of continuous values from discrete points.
+This model is from the SciPy library and stored to allow for rapid access.
+- **Developed by:** Devin DeCosmo
+- **Model type:** Image Classifier
 - **Language(s) (NLP):** English
 - **License:** MIT
+- **Finetuned from model:** SciPy Gaussian KDE
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+This model is used to estimate the density of values in proportion to each other.
+From 0 - 1. In this case, it uses longitude and latitude as X,Y coordinates to perform this analysis.
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+The direct use is classifying our lanternfly sighting samples from our geolocal dataset.
+As the Gaussian KDE is a generalized unsupervised learning model, this could be used
+for other datsets with latitude/longitude coordinates.
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+rlogh/lanternfly_swatter_training
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Training Hyperparameters
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+### Results
+#### Summary