ddecosmo commited on
Commit
a6ec099
·
verified ·
1 Parent(s): a47a3a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -73
README.md CHANGED
@@ -1,121 +1,109 @@
1
  ---
 
2
  license: mit
 
 
3
  language:
4
  - en
5
- pretty_name: Lanternfly Image Classifier Training Dataset
6
- datasets:
7
- - rlogh/lanternfly-data
8
- - rlogh/lanternfly_swatter_training
9
- - rlogh/negativesirl
10
- - uoft-cs/cifar100
11
- - AI-Lab-Makerere/beans
12
- - Francesco/insects-mytwu
13
  ---
14
 
 
15
 
16
- # Dataset Card for {{ pretty_name | default("Dataset Name", true) }}
17
 
18
- This dataset is the training dataset for 24-679 Project 1: Lanternfly Tracker
19
- It is composed of 360 original lanternfly photos, 150 original photos with no lanternflies, and 800 original photos
20
- from nature, urban, and other insect datasets listed below.
21
 
22
- These were augmented 50X to 65.1k augmented images.
23
 
 
24
 
25
- ## Dataset Details
 
26
 
27
- ### Dataset Description
28
 
29
- - **Curated by:** Carnegie Mellon University: 24-679
30
- - **Shared by [optional]:** Devin DeCosmo
31
  - **Language(s) (NLP):** English
32
  - **License:** MIT
 
33
 
34
- ### Dataset Sources [optional]
35
 
36
- Original Lanternfly Datasets
37
- rlogh/lanternfly-data: Original Lanternfly Dataset, 229 unmarked photos
38
- rlogh/lanternfly_swatter_training: Dataset with geolocal data: 165 photos
39
-
40
- Original Negative Datasets:
41
- rlogh/negativesirl: Negatives dataset, images of outdoor environements and people with no lanternflies. 107 photos
42
 
 
 
43
 
44
- Total: 501 original images
45
 
46
- Imported Datasets
47
- uoft-cs/cifar100: General image classifier, no insect class
48
- AI-Lab-Makerere/beans: Foliage with no insects
49
- Francesco/insects-mytwu: Insect Images
50
 
51
- Total: 800 additional images imported
 
 
52
 
 
53
 
 
54
 
55
- ## Uses
56
 
57
- These images were used to train the EfficientNetB1 model, ddecosmo/lanternfly_classifier, on how to classify images
58
- as containing or not containing lanternflies.
59
 
 
60
 
61
- ### Direct Use
62
 
63
- The direct use is identifying photographs containing lanterflies so this could be used for tracking purposes.
64
 
65
- ### Out-of-Scope Use
66
 
67
- In future, this model could be adapted to identify other types of insect within this dataset.
 
 
68
 
69
 
70
- ## Dataset Structure
71
 
72
- This dataset consists of two splits
73
- An original split with 1.3k photos
74
- An artificial split with 65.1k photos
75
 
76
- The tasks fall into 3 categories based on the building pictured
77
- 1. Lanternflies, all original photos
78
- 2. Other Insect, all 3rd party datasets
79
- 3. No insect, original photos and 3rd party datasets
80
 
81
- ## Dataset Creation
82
 
83
- ### Source Data
84
 
85
- This data is sourced by the creators, Devin and Rumi for all original photos
86
 
87
- Additional datasets can be found here,
88
- uoft-cs/cifar100
89
- AI-Lab-Makerere/beans
90
- Francesco/insects-mytwu
91
 
92
- #### Data Collection and Processing
93
 
94
- Original datasets were collected using the mobile phones of the authors.
95
 
96
- Additional datasets were recommended by Gemini AI and then validated as fitting the purpose, type, and scope of this process.
97
- uoft-cs/cifar100: This is a general image identifier with no insect class. Used for no insect for generalizability
98
- AI-Lab-Makerere/beans: This dataset is focused on vegetation with and without disease, this is used to train the model to recognize
99
- vegetation without insects/lanterflies.
100
- Francesco/insects-mytwu: This is an object detection dataset used for identifying insects as subjects, not including lanterflies.
101
- We are using it train a seperate non-lanternfly insect class.
102
 
103
- #### Who are the source data producers?
104
 
105
- Original data was produced by the authors.
106
 
107
- Additional datasets were produced by,
108
- uoft-cs/cifar100: Created by University of Toronto Computer Science
109
- AI-Lab-Makerere/beans: Created by AI Lab Makere
110
- Francesco/insects-mytwu: Created by Fanscesco Sovrano
111
 
112
- ## Bias, Risks, and Limitations
113
 
114
- The main risk of this dataset is the lanternfly split. It contains only images of singular lanternflies on the ground.
115
- Normally on concrete or asphalt. This severly limits the scope of the environments these creatures appear in.
116
- Incorporating blob detection or YOLO into future models could mitigate this by focusing on the subject.
117
 
118
- ### Recommendations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
 
120
- This is a large dataset, and has been shown to accurately classify lanternflies, but there are many edge cases when it does not work correctly.
121
- In order to take this into account, using new types of models with subject detection can make use of the many images while improving model accuracy.
 
1
  ---
2
+ '[object Object]': null
3
  license: mit
4
+ datasets:
5
+ - ddecosmo/lanternfly_training_dataset
6
  language:
7
  - en
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ # Model Card for {{ model_id | default("Model ID", true) }}
11
 
12
+ <!-- Provide a quick summary of what the model is/does. -->
13
 
14
+ This is an off the shelf KDE model from SciPy. It is Kernel Density Estimator,
15
+ in this case it is used to track the relative density of lanternfly sightings in Pittsburgh.
 
16
 
17
+ ## Model Details
18
 
19
+ ### Model Description
20
 
21
+ This model is a KDE. This is an unsupervised model that
22
+ estimates the density of continuous values from discrete points.
23
 
24
+ This model is from the SciPy library and stored to allow for rapid access.
25
 
26
+ - **Developed by:** Devin DeCosmo
27
+ - **Model type:** Image Classifier
28
  - **Language(s) (NLP):** English
29
  - **License:** MIT
30
+ - **Finetuned from model:** SciPy Gaussian KDE
31
 
32
+ ## Uses
33
 
34
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
 
 
35
 
36
+ This model is used to estimate the density of values in proportion to each other.
37
+ From 0 - 1. In this case, it uses longitude and latitude as X,Y coordinates to perform this analysis.
38
 
39
+ ### Direct Use
40
 
41
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
 
42
 
43
+ The direct use is classifying our lanternfly sighting samples from our geolocal dataset.
44
+ As the Gaussian KDE is a generalized unsupervised learning model, this could be used
45
+ for other datsets with latitude/longitude coordinates.
46
 
47
+ ### Out-of-Scope Use
48
 
49
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
50
 
 
51
 
 
 
52
 
53
+ ## Bias, Risks, and Limitations
54
 
55
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
56
 
 
57
 
 
58
 
59
+ ### Recommendations
60
+
61
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
62
 
63
 
 
64
 
 
 
 
65
 
66
+ ## Training Details
 
 
 
67
 
68
+ ### Training Data
69
 
70
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
71
 
72
+ rlogh/lanternfly_swatter_training
73
 
74
+ ### Training Procedure
 
 
 
75
 
76
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
77
 
 
78
 
79
+ #### Training Hyperparameters
 
 
 
 
 
80
 
 
81
 
 
82
 
83
+ ## Evaluation
 
 
 
84
 
85
+ <!-- This section describes the evaluation protocols and provides the results. -->
86
 
87
+ ### Testing Data, Factors & Metrics
 
 
88
 
89
+ #### Testing Data
90
+
91
+ <!-- This should link to a Dataset Card if possible. -->
92
+
93
+
94
+ #### Factors
95
+
96
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
97
+
98
+
99
+ #### Metrics
100
+
101
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
102
+
103
+
104
+
105
+ ### Results
106
+
107
+
108
+ #### Summary
109