rdinnager commited on
Commit
7cc002a
·
verified ·
1 Parent(s): 65b0d2d

Update model card with complete documentation

Browse files
Files changed (1) hide show
  1. README.md +176 -153
README.md CHANGED
@@ -1,199 +1,222 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
- A model to detect the presence of phenological stages (flowers and/or fruit) in citizen science photos of plants.
10
 
 
 
 
 
 
 
11
 
12
  ## Model Details
13
 
14
- Specifically the model outputs logits proportional to the probability that a plant photo shows at least one flower or fruit.
 
 
 
 
 
 
 
 
 
15
 
16
- ### Model Description
17
 
18
- <!-- Provide a longer summary of what this model is. -->
19
 
20
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
21
 
22
- - **Developed by:** Phenobase (https://phenobase.org/)
23
- - **Funded by:** National Science Foundation (NSF)
24
- - **Model type:** Masked Autoencoder with Image Classifier
25
- - **License:** ??
26
- - **Finetuned from model:** https://github.com/xml94/PlantCLEF2022
27
 
28
- ### Model Sources [optional]
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
 
 
31
 
32
- - **Repository:** https://github.com/rdinnager/phenovision
33
- - **Paper:** TBA
34
- - **Demo:** TBA
 
 
35
 
36
- ## Uses
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
39
 
40
- ### Direct Use
 
 
 
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
43
 
44
- [More Information Needed]
 
 
45
 
46
- ### Downstream Use [optional]
 
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
49
 
50
- [More Information Needed]
51
 
52
- ### Out-of-Scope Use
53
 
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
 
 
55
 
56
- [More Information Needed]
 
 
57
 
58
- ## Bias, Risks, and Limitations
59
 
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
 
 
 
61
 
62
- [More Information Needed]
63
 
64
- ### Recommendations
 
 
 
 
 
 
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
 
 
 
 
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
 
 
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
 
 
 
 
 
81
 
82
- [More Information Needed]
 
 
 
 
 
83
 
84
- ### Training Procedure
 
 
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
87
 
88
- #### Preprocessing [optional]
 
 
 
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in Lacoste et al. (2019).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ pipeline_tag: image-classification
4
+ license: mit
5
+ tags:
6
+ - vision
7
+ - image-classification
8
+ - biology
9
+ - ecology
10
+ - phenology
11
+ - plants
12
+ - vit
13
+ - plant-phenology
14
+ - iNaturalist
15
+ datasets:
16
+ - iNaturalist
17
+ metrics:
18
+ - accuracy
19
+ - f1
20
+ language:
21
+ - en
22
+ model-index:
23
+ - name: PhenoVision
24
+ results:
25
+ - task:
26
+ type: image-classification
27
+ name: Plant Reproductive Structure Detection
28
+ metrics:
29
+ - type: accuracy
30
+ value: 98.02
31
+ name: Flower Accuracy (buffer-filtered)
32
+ - type: accuracy
33
+ value: 97.01
34
+ name: Fruit Accuracy (buffer-filtered)
35
  ---
36
 
37
+ # PhenoVision: Automated Plant Reproductive Phenology from Field Images
38
 
39
+ PhenoVision is a Vision Transformer (ViT-Large) model fine-tuned to detect **flowers** and **fruits** in plant photographs. It was trained on 1.5 million human-annotated iNaturalist images and has been used to generate over 30 million new phenology records across 119,000+ plant species, vastly expanding global coverage of plant reproductive phenology data.
 
40
 
41
+ | | Flower | Fruit |
42
+ |---|---|---|
43
+ | **Accuracy** | 98.0% | 97.0% |
44
+ | **Sensitivity** | 98.5% | 84.2% |
45
+ | **Specificity** | 97.2% | 99.4% |
46
+ | **Expert validation** | 98.6% | 90.4% |
47
 
48
  ## Model Details
49
 
50
+ - **Model type:** Multi-label image classification (sigmoid outputs)
51
+ - **Architecture:** Vision Transformer Large (ViT-L/16), ~304M parameters
52
+ - **Input:** 224 x 224 RGB images
53
+ - **Output:** 2 logits (flower, fruit) — apply sigmoid for probabilities
54
+ - **Pretraining:** PlantCLEF 2022 checkpoint ("virtual taxonomist" — trained on 2.9M plant species images)
55
+ - **Current version:** v1.1.0
56
+ - **Model DOI:** [10.57967/hf/7952](https://doi.org/10.57967/hf/7952)
57
+ - **Developer:** [Phenobase](https://phenobase.org/)
58
+ - **Repository:** [github.com/Phenobase/phenovision](https://github.com/Phenobase/phenovision)
59
+ - **License:** MIT
60
 
61
+ ### Key Innovation: Virtual Taxonomist Pretraining
62
 
63
+ Instead of standard ImageNet pretraining, PhenoVision uses a ViT-Large checkpoint pretrained on the PlantCLEF 2022 dataset (2.9 million plant images for species classification). Since species classification relies heavily on recognizing reproductive structures (flowers, fruits), this domain-specific pretraining provides a strong initialization for phenology detection. Compared to ImageNet pretraining, PlantCLEF pretraining achieved:
64
 
65
+ - Higher accuracy: TSS = 0.864 vs. 0.835
66
+ - Faster convergence: Best epoch at 4 vs. 11
67
 
68
+ ## Intended Uses
 
 
 
 
69
 
70
+ **Primary use:** Detecting the presence of flowers and/or fruits in field photographs of plants.
71
 
72
+ **Suitable for:**
73
+ - Automated phenology annotation of iNaturalist and other community science images
74
+ - Large-scale phenology monitoring and climate change research
75
+ - Generating presence-only reproductive phenology datasets
76
+ - Integration with phenology databases (e.g., [Phenobase](https://phenobase.org/), USA-NPN)
77
 
78
+ **Out of scope:**
79
+ - Counting individual flowers or fruits
80
+ - Distinguishing flower developmental stages (buds vs. open vs. senescent)
81
+ - Detecting leaf phenology (use [PhenoVisionL](https://huggingface.co/phenobase/phenovisionL) instead)
82
+ - Identifying plant species (this is a phenology model, not a taxonomic classifier)
83
 
84
+ ## How to Use
85
 
86
+ ```python
87
+ from transformers import ViTForImageClassification, ViTImageProcessor
88
+ from PIL import Image
89
+ import torch
90
 
91
+ # Load model and processor
92
+ processor = ViTImageProcessor.from_pretrained("phenobase/phenovision")
93
+ model = ViTForImageClassification.from_pretrained("phenobase/phenovision")
94
+ model.eval()
95
 
96
+ # Run inference
97
+ image = Image.open("plant_photo.jpg").convert("RGB")
98
+ inputs = processor(images=image, return_tensors="pt")
99
 
100
+ with torch.no_grad():
101
+ outputs = model(**inputs)
102
+ probs = torch.sigmoid(outputs.logits)[0]
103
 
104
+ flower_prob = probs[0].item()
105
+ fruit_prob = probs[1].item()
106
 
107
+ print(f"Flower: {flower_prob:.3f}")
108
+ print(f"Fruit: {fruit_prob:.3f}")
109
+ ```
110
 
111
+ ### Applying Thresholds
112
 
113
+ Raw probabilities should be converted to detection calls using the optimized thresholds and uncertainty buffers provided as companion files. Predictions falling within the buffer zone are classified as "Equivocal" and should be excluded for research-quality outputs.
114
 
115
+ | Class | Threshold | Buffer Lower | Buffer Upper | Equivocal Range |
116
+ |-------|-----------|--------------|--------------|-----------------|
117
+ | Flower | 0.48 | 0.325 | 0.385 | 0.155 - 0.865 |
118
+ | Fruit | 0.60 | 0.405 | 0.305 | 0.195 - 0.905 |
119
 
120
+ - Probability **above** (threshold + buffer_upper) → **Detected** (high certainty)
121
+ - Probability **below** (threshold - buffer_lower) → **Not Detected** (high certainty)
122
+ - Probability **within** buffer zone → **Equivocal** (exclude from analysis)
123
 
124
+ ## Training Data
125
 
126
+ - **Source:** [iNaturalist](https://www.inaturalist.org/) open data (research-grade observations)
127
+ - **Size:** 1,535,930 images from 119,340 species across 10,406 genera and 408 plant families
128
+ - **Splits:** 60% train (921,720) / 20% validation (307,291) / 20% test (306,919), stratified by genus
129
+ - **Annotations:** Human phenology annotations from iNaturalist platform (reproductiveCondition field)
130
+ - **Licensing:** Images under CC-0, CC-BY, or CC-BY-NC licenses
131
+ - **Note:** Approximately 1-5% of training annotations are marked "unknown" due to annotation difficulty
132
 
133
+ ## Training Procedure
134
 
135
+ - **Optimizer:** AdamW
136
+ - **Learning rate:** 5e-4 (base), with layer-wise decay factor 0.65
137
+ - **Batch size:** 384
138
+ - **Weight decay:** 0.05
139
+ - **Data augmentation:** RandAugment
140
+ - **Epochs:** 10 (best model selected at epoch 7 by average Data Quality Index)
141
+ - **Hardware:** NVIDIA A100 GPU
142
+ - **Loss:** Binary cross-entropy (multi-label)
143
+ - **v1.1.0 training:** Fine-tuned from v1.0.0 checkpoint on updated data snapshot (2025-10-27)
144
 
145
+ ## Evaluation Results
146
 
147
+ ### Test Set Performance (v1.1.0)
148
 
149
+ | Class | Filter | N | Accuracy | Sensitivity | Specificity | PPV | NPV | J-Index | F1 | DQI |
150
+ |-------|--------|---|----------|-------------|-------------|-----|-----|---------|-----|-----|
151
+ | Flower | All data | 713,698 | 95.77% | 96.93% | 93.72% | 96.45% | 94.54% | 0.907 | 96.69% | 0.934 |
152
+ | Flower | Buffer filtered | 663,738 | 98.02% | 98.47% | 97.19% | 98.48% | 97.19% | 0.957 | 98.48% | 0.970 |
153
+ | Fruit | All data | 713,698 | 94.33% | 77.33% | 98.04% | 89.64% | 95.18% | 0.754 | 83.03% | 0.670 |
154
+ | Fruit | Buffer filtered | 651,791 | 97.01% | 84.16% | 99.37% | 96.11% | 97.16% | 0.835 | 89.74% | 0.803 |
155
 
156
+ ### Expert Validation
157
 
158
+ Independent expert review of model predictions:
159
+ - **Flower presence:** 98.6% agreement
160
+ - **Fruit presence:** 90.4% agreement
161
 
162
+ ### Taxonomic Coverage
163
 
164
+ - **Species:** 119,340 from 10,406 genera and 408 families
165
+ - **Genera with 10+ records:** 7,409 (flowers), 5,240 (fruits)
166
+ - **Median records per genus:** 184 (flowers), 85 (fruits)
167
+ - **New geographic grid cells:** 3,798 (flowers), 4,147 (fruits) with no prior phenology data
168
 
169
+ ## Companion Files
170
+
171
+ The following files are uploaded alongside the model weights:
172
+
173
+ | File | Description |
174
+ |------|-------------|
175
+ | `final_buffer_params.csv` | Decision thresholds and uncertainty buffer parameters per class. Used to convert probabilities to Detected/Not Detected/Equivocal calls. |
176
+ | `family_stats.csv` | Per-family (706 families) accuracy statistics. Useful for assessing model reliability for specific taxonomic groups. |
177
 
178
+ ## Limitations and Biases
179
+
180
+ ### Design Limitations
181
+ - **Presence-only:** The model reports detections but NOT absences. A non-detection does not mean the plant lacks flowers/fruits — it may simply not be visible in the image.
182
+ - **Partial plant coverage:** Images typically show only part of a plant. Reproductive structures may exist on non-photographed parts.
183
+ - **Buffer zone data loss:** Applying uncertainty thresholds removes ~7-9% of predictions as equivocal, trading completeness for accuracy.
184
 
185
+ ### Known Failure Modes
186
+ - Inconspicuous reproductive structures (grasses, sedges) are harder to detect
187
+ - Flower buds may be confused with open flowers
188
+ - Background plants with flowers/fruits can cause false positives for the focal plant
189
+ - Some families show lower accuracy (e.g., Haloragaceae ~79%)
190
 
191
+ ### Data Biases
192
+ - Reflects iNaturalist's geographic biases: overrepresentation of urban areas, developed countries, and coastal regions
193
+ - Taxonomic bias toward common, conspicuous species
194
+ - Limited coverage in biodiversity-rich tropical regions
195
 
196
+ ### Annotation Quality
197
+ - Training labels come from community science annotations with inherent variability
198
+ - Some iNaturalist annotations are incomplete (e.g., flower present but only fruit annotated)
199
+ - Family-level accuracy statistics (in `family_stats.csv`) should be consulted when interpreting results for specific taxonomic groups
200
 
201
+ ## Citation
202
 
203
+ If you use PhenoVision in your research, please cite:
204
 
205
+ ```bibtex
206
+ @article{dinnage2025phenovision,
207
+ title={PhenoVision: A framework for automating and delivering research-ready plant phenology data from field images},
208
+ author={Dinnage, Russell and Grady, Erin and Neal, Nevyn and Deck, Jonn and Denny, Ellen and Walls, Ramona and Seltzer, Carrie and Guralnick, Robert and Li, Daijiang},
209
+ journal={Methods in Ecology and Evolution},
210
+ volume={16},
211
+ pages={1763--1780},
212
+ year={2025},
213
+ doi={10.1111/2041-210X.14346}
214
+ }
215
+ ```
216
+
217
+ ## Acknowledgments
218
+
219
+ - **Funding:** National Science Foundation (NSF)
220
+ - **Data:** [iNaturalist](https://www.inaturalist.org/) community and platform
221
+ - **Infrastructure:** [Phenobase](https://phenobase.org/) — a global plant phenology database
222
+ - **Integration:** Plant Phenology Ontology (PPO), USA National Phenology Network (USA-NPN)