arbabarshad Claude Opus 4.6 commited on
Commit
f8fa434
Β·
1 Parent(s): c43f82f

Add multi-page PDF support, expand USA tier to 219 species, add all 4 vignettes

Browse files

- Fix app_database_prep to ingest all pages of multi-page PDFs (not just first page)
- Expand USA IPM tier from 110 to 219 species
- Add all 4 example vignettes: SLF, Striga, FAW, Sahara Mustard
- Improve RAG prompt to be more concise and avoid document number references
- Update README with chunk counts, new species pipeline docs, and updated eval metrics
- Rebuild vector database and regenerate evaluation results

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (23) hide show
  1. README.md +59 -17
  2. app.py +12 -5
  3. app_database_prep.py +28 -24
  4. examples-work-out/example_results_20260302_202103.json +20 -0
  5. examples-work-out/example_results_20260302_202159.json +20 -0
  6. examples-work-out/example_results_20260302_202402.json +20 -0
  7. examples-work-out/example_results_20260303_104758.json +56 -0
  8. examples-work-out/species-verification.md +44 -25
  9. retrieval_evaluation_results.json +56 -56
  10. species-organized/PestID Species - Organized.xlsx +2 -2
  11. species-organized/species_analysis.png +2 -2
  12. species-organized/species_statistics.txt +11 -11
  13. species-organized/species_table.tex +2 -0
  14. usa_ipm_checkpoint.json +232 -2
  15. usa_ipm_requests.jsonl +0 -0
  16. usa_ipm_results.jsonl +0 -0
  17. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/27ac9297-abc2-406e-8919-7670a60055f1/link_lists.bin +0 -0
  18. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/data_level0.bin +1 -1
  19. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/header.bin +1 -1
  20. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/ba4b6a9d-5012-4456-8635-7ea7d6cb7a71/index_metadata.pickle +3 -0
  21. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/length.bin +1 -1
  22. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/ba4b6a9d-5012-4456-8635-7ea7d6cb7a71/link_lists.bin +3 -0
  23. vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 +2 -2
README.md CHANGED
@@ -65,9 +65,8 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
65
  β”œβ”€β”€ examples-work-out/
66
  β”‚ β”œβ”€β”€ generate_latex.py # JSON β†’ LaTeX converter for paper examples
67
  β”‚ β”œβ”€β”€ example_results_*.json # Generated example outputs
68
- β”‚ β”œβ”€β”€ example_results_20260301_191932.json # Striga asiatica (Africa + USA) β€” used in paper supplementary
69
- β”‚ β”œβ”€β”€ example_results_20260302_102648.json # Spodoptera frugiperda (India + USA) β€” used in paper supplementary
70
- β”‚ └── species-verification.md # Example vignettes motivation (flow is approximate, not exact)
71
  β”‚
72
  └── writing/
73
  └── 65d4fadc59fceb1a54d1aae6/main.tex # Paper draft
@@ -75,20 +74,21 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
75
 
76
  ### Database Build Flow (4 Geographic Tiers)
77
 
78
- | Tier | Species | Source |
79
- |------|---------|--------|
80
- | Midwest USA | 80 | ISU Handbook PDFs |
81
- | USA | 110 | GPT-4o generated IPM |
82
- | Africa | 35 | Expert-curated Excel |
83
- | India | 11 | Expert-curated Excel |
 
84
 
85
- **Midwest USA Data (80 species):**
86
  1. PDFs loaded from `agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/` (content source)
87
  2. `matched_species_results_v2.csv` in each folder maps PDF filename β†’ species name (metadata); region hardcoded as `"Midwest USA"` in `app_database_prep.py`
88
  3. The `Midwest USA` sheet in the Excel is a reference copy only β€” **not** read by `app_database_prep.py`
89
 
90
- **USA Data (110 species - LLM generated):**
91
- 4. Run `generate_usa_ipm_info.py` to query GPT-4o for all species
92
  5. Creates `USA` sheet in `species-organized/PestID Species - Organized.xlsx` β€” both content (IPM Info column) and metadata are read from this sheet
93
 
94
  **Africa/India Data (35 + 11 species):**
@@ -111,15 +111,57 @@ python generate_usa_ipm_info.py --step process # Call GPT-4o API
111
  python generate_usa_ipm_info.py --step parse # Create Excel sheet
112
  ```
113
 
114
- **Output:** Updates `species-organized/PestID Species - Organized.xlsx` with "USA" sheet containing 110 species present in the United States (pests + beneficials).
115
 
116
  ### Evaluation Filters (retrieval_evaluation.py)
117
  | Filter | P@1 | P@2 | P@4 | P@8 | nDCG@1 | nDCG@2 | nDCG@4 | nDCG@8 |
118
  |--------|-----|-----|-----|-----|--------|--------|--------|--------|
119
- | No Filter | 0.64 | 0.76 | 0.81 | 0.85 | 0.64 | 0.72 | 0.74 | 0.75 |
120
- | Species Only | 0.68 | 0.84 | 0.93 | **1.00** | 0.68 | 0.78 | 0.83 | 0.85 |
121
- | Region Only | 0.69 | 0.78 | 0.84 | 0.87 | 0.69 | 0.75 | 0.78 | 0.79 |
122
- | Species + Region | **0.79** | **0.91** | **0.99** | **1.00** | **0.79** | **0.87** | **0.90** | **0.91** |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  ---
125
 
 
65
  β”œβ”€β”€ examples-work-out/
66
  β”‚ β”œβ”€β”€ generate_latex.py # JSON β†’ LaTeX converter for paper examples
67
  β”‚ β”œβ”€β”€ example_results_*.json # Generated example outputs
68
+ β”‚ β”œβ”€β”€ example_results_20260303_110427.json # All 4 vignettes (SLF, Striga, FAW, Sahara Mustard) β€” latest, updated prompt
69
+ β”‚ └── species-verification.md # Vignette tracking: species, regions, DB status, example outputs
 
70
  β”‚
71
  └── writing/
72
  └── 65d4fadc59fceb1a54d1aae6/main.tex # Paper draft
 
74
 
75
  ### Database Build Flow (4 Geographic Tiers)
76
 
77
+ | Tier | Species | Chunks | Source |
78
+ |------|---------|--------|--------|
79
+ | Midwest USA | 82 | 388 | ISU Handbook PDFs |
80
+ | USA | 219 | 913 | GPT-4o generated IPM |
81
+ | Africa | 35 | 70 | Expert-curated Excel |
82
+ | India | 11 | 39 | Expert-curated Excel |
83
+ | **Total** | **128 unique** | **1410** | |
84
 
85
+ **Midwest USA Data (82 species):**
86
  1. PDFs loaded from `agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/` (content source)
87
  2. `matched_species_results_v2.csv` in each folder maps PDF filename β†’ species name (metadata); region hardcoded as `"Midwest USA"` in `app_database_prep.py`
88
  3. The `Midwest USA` sheet in the Excel is a reference copy only β€” **not** read by `app_database_prep.py`
89
 
90
+ **USA Data (219 species - LLM generated):**
91
+ 4. Run `generate_usa_ipm_info.py` to query GPT-4o for all species (species from all 3 primary tiers, filtered to those confirmed in USA)
92
  5. Creates `USA` sheet in `species-organized/PestID Species - Organized.xlsx` β€” both content (IPM Info column) and metadata are read from this sheet
93
 
94
  **Africa/India Data (35 + 11 species):**
 
111
  python generate_usa_ipm_info.py --step parse # Create Excel sheet
112
  ```
113
 
114
+ **Output:** Updates `species-organized/PestID Species - Organized.xlsx` with "USA" sheet containing 219 species present in the United States (pests + beneficials).
115
 
116
  ### Evaluation Filters (retrieval_evaluation.py)
117
  | Filter | P@1 | P@2 | P@4 | P@8 | nDCG@1 | nDCG@2 | nDCG@4 | nDCG@8 |
118
  |--------|-----|-----|-----|-----|--------|--------|--------|--------|
119
+ | No Filter | 0.50 | 0.72 | 0.79 | 0.87 | 0.50 | 0.64 | 0.67 | 0.70 |
120
+ | Species Only | 0.69 | 0.85 | 0.97 | **1.00** | 0.69 | 0.79 | 0.85 | 0.86 |
121
+ | Region Only | 0.53 | 0.77 | 0.84 | 0.89 | 0.53 | 0.68 | 0.72 | 0.73 |
122
+ | Species + Region | **0.74** | **0.89** | **0.99** | **1.00** | **0.74** | **0.83** | **0.88** | **0.89** |
123
+
124
+ ### Adding a New Species (Full Pipeline)
125
+
126
+ ```bash
127
+ # Activate environment (always deactivate first to avoid path conflicts)
128
+ source ~/miniconda3/etc/profile.d/conda.sh && conda deactivate && conda activate agllm-june-15
129
+
130
+ # Set API keys (or ensure .env file is correct β€” note: shell env vars override .env)
131
+ export OPENAI_API_KEY="sk-proj-..."
132
+ export OPENROUTER_API_KEY="sk-or-v1-..."
133
+
134
+ # 1. Copy PDF to the appropriate data directory
135
+ cp "Species Name.pdf" agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/raw-pdfs/
136
+
137
+ # 2. Add entry to matched_species_results_v2.csv (in the same data directory)
138
+ echo "Genus species,Common Name,Genus species.pdf" >> agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/matched_species_results_v2.csv
139
+
140
+ # 3. (Optional) Add to Midwest USA sheet in Excel for reference
141
+ # Edit species-organized/PestID Species - Organized.xlsx
142
+
143
+ # 4. Regenerate USA tier IPM info (includes new species)
144
+ python generate_usa_ipm_info.py --force
145
+
146
+ # 5. Rebuild ChromaDB
147
+ python app_database_prep.py
148
+
149
+ # 6. Run examples to verify
150
+ python app.py --examples
151
+
152
+ # 7. Regenerate visualization and copy to paper figures
153
+ python species-organized/species_analysis.py
154
+ cp species-organized/species_analysis.pdf writing/65d4fadc59fceb1a54d1aae6/figures-in-making/species_analysis.pdf
155
+
156
+ # 8. Run retrieval evaluation
157
+ python retrieval_evaluation.py
158
+ ```
159
+
160
+ **Important notes:**
161
+ - Always `conda deactivate && conda activate agllm-june-15` before running (avoids env path conflicts where `agllm-env1-updates-1` python gets used instead)
162
+ - Shell env vars override `.env` file β€” if you see `Incorrect API key` errors, check `echo $OPENAI_API_KEY`
163
+ - The `matched_species_results_v2.csv` maps PDF filenames β†’ species names; this is the authoritative source for Midwest USA tier
164
+ - Multi-page PDFs are fully supported (all pages ingested and chunked)
165
 
166
  ---
167
 
app.py CHANGED
@@ -418,10 +418,11 @@ Context retrieved from the database (organized by region):
418
  {{context}}
419
 
420
  Instructions:
421
- - Answer based ONLY on the provided context. If no context exists for a region, do not invent information for it.
 
422
  - Start with information from the selected region ('{region}') if available. If not available, clearly state that.
423
- - Then include relevant information from other regions, clearly labeling which region each piece of information comes from.
424
- - If the context is insufficient to fully answer the question, you may add brief general knowledge, clearly marked as such.
425
  - {language_constraint}
426
 
427
  Question: {{question}}
@@ -672,10 +673,16 @@ def run_examples():
672
  from datetime import datetime
673
 
674
  examples = [
675
- # {"species": "Striga asiatica", "region": "Africa", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
676
- # {"species": "Striga asiatica", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
 
 
 
 
677
  {"species": "Spodoptera frugiperda", "region": "India", "model": "Llama-3 70B", "mode": "Researcher", "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?"},
678
  {"species": "Spodoptera frugiperda", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?"},
 
 
679
  ]
680
 
681
  results = []
 
418
  {{context}}
419
 
420
  Instructions:
421
+ - Answer based ONLY on the provided context. Do NOT add information beyond what is in the context.
422
+ - Be concise: use bullet points and keep the response brief. Do not repeat similar information across regions.
423
  - Start with information from the selected region ('{region}') if available. If not available, clearly state that.
424
+ - Then include relevant information from other regions, labeling by region name (e.g., "According to Midwest USA guidance..."). Do NOT use "Document 1/2/3" references.
425
+ - If the context is insufficient, state what is available and what is missing.
426
  - {language_constraint}
427
 
428
  Question: {{question}}
 
673
  from datetime import datetime
674
 
675
  examples = [
676
+ # Vignette 1: Spotted Lanternfly β€” Midwest USA (ISU PDF)
677
+ {"species": "Lycorma delicatula", "region": "Midwest USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "How can I identify and report spotted lanternfly in Iowa?"},
678
+ # Vignette 2: Witchweed β€” Africa vs USA cross-regional
679
+ {"species": "Striga asiatica", "region": "Africa", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
680
+ {"species": "Striga asiatica", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
681
+ # Vignette 3: Fall Armyworm β€” India vs USA cross-regional
682
  {"species": "Spodoptera frugiperda", "region": "India", "model": "Llama-3 70B", "mode": "Researcher", "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?"},
683
  {"species": "Spodoptera frugiperda", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?"},
684
+ # Vignette 4: Sahara Mustard β€” Midwest USA (extension PDF)
685
+ {"species": "Brassica tournefortii", "region": "Midwest USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What mechanical, chemical, and cultural control methods are recommended for managing Sahara mustard infestations?"},
686
  ]
687
 
688
  results = []
app_database_prep.py CHANGED
@@ -128,18 +128,19 @@ excel_file_path = "species-organized/PestID Species - Organized.xlsx"
128
  ## Process PDF documents using CSV β†’ PDF approach
129
  print("--- Processing PDF Documents (CSV β†’ PDF approach) ---")
130
 
131
- # Function to find PDF file for a given filename
132
- def find_pdf_file(filename, documents):
133
- """Find a PDF document by filename in the loaded documents"""
 
134
  for doc in documents:
135
  doc_filename = doc.metadata["source"].split('/')[-1]
136
  # Try exact match first
137
  if doc_filename.lower() == filename.lower():
138
- return doc
139
  # Try without extension
140
- if doc_filename.lower().replace('.pdf', '') == filename.lower().replace('.pdf', ''):
141
- return doc
142
- return None
143
 
144
  pdf_documents_for_splitting = []
145
  processed_files = set()
@@ -151,27 +152,30 @@ for index, row in metadata_raw.iterrows():
151
  filename = row['File Name']
152
  species = row['Species']
153
 
154
- # Find the corresponding PDF document
155
- pdf_doc = find_pdf_file(filename, documents)
156
-
157
- if pdf_doc is not None:
158
- # Only process if we haven't already processed this file
159
- doc_source = pdf_doc.metadata["source"]
160
- if doc_source not in processed_files:
161
- # Add region for PDF docs (Midwest USA - from ISU Handbook)
162
- pdf_doc.metadata["region"] = "Midwest USA"
163
-
164
- # Add species metadata - guaranteed to exist since we're starting from CSV
165
- pdf_doc.metadata["matched_specie_0"] = species
166
 
 
 
 
 
167
  # Check if there are multiple species for the same file
168
  same_file_species = metadata_raw[metadata_raw["File Name"].str.lower() == filename.lower()]["Species"]
169
- for specie_index, specie_name in enumerate(same_file_species):
170
- pdf_doc.metadata[f"matched_specie_{specie_index}"] = specie_name
171
 
172
- pdf_documents_for_splitting.append(pdf_doc)
173
- processed_files.add(doc_source)
174
- print(f"βœ“ Processed: {filename} β†’ {species}")
 
 
 
 
 
 
 
 
 
 
 
175
  else:
176
  print(f"⚠ Already processed: {filename}")
177
  else:
 
128
  ## Process PDF documents using CSV β†’ PDF approach
129
  print("--- Processing PDF Documents (CSV β†’ PDF approach) ---")
130
 
131
+ # Function to find ALL pages of a PDF file for a given filename
132
+ def find_pdf_pages(filename, documents):
133
+ """Find all document pages matching a PDF filename in the loaded documents"""
134
+ matching_docs = []
135
  for doc in documents:
136
  doc_filename = doc.metadata["source"].split('/')[-1]
137
  # Try exact match first
138
  if doc_filename.lower() == filename.lower():
139
+ matching_docs.append(doc)
140
  # Try without extension
141
+ elif doc_filename.lower().replace('.pdf', '') == filename.lower().replace('.pdf', ''):
142
+ matching_docs.append(doc)
143
+ return matching_docs
144
 
145
  pdf_documents_for_splitting = []
146
  processed_files = set()
 
152
  filename = row['File Name']
153
  species = row['Species']
154
 
155
+ # Find ALL pages of the corresponding PDF document
156
+ pdf_pages = find_pdf_pages(filename, documents)
 
 
 
 
 
 
 
 
 
 
157
 
158
+ if pdf_pages:
159
+ # Use the base filename (without page info) to track processed files
160
+ base_source = pdf_pages[0].metadata["source"].split('/')[-1].lower()
161
+ if base_source not in processed_files:
162
  # Check if there are multiple species for the same file
163
  same_file_species = metadata_raw[metadata_raw["File Name"].str.lower() == filename.lower()]["Species"]
 
 
164
 
165
+ for pdf_doc in pdf_pages:
166
+ # Add region for PDF docs (Midwest USA - from ISU Handbook)
167
+ pdf_doc.metadata["region"] = "Midwest USA"
168
+
169
+ # Add species metadata - guaranteed to exist since we're starting from CSV
170
+ pdf_doc.metadata["matched_specie_0"] = species
171
+
172
+ for specie_index, specie_name in enumerate(same_file_species):
173
+ pdf_doc.metadata[f"matched_specie_{specie_index}"] = specie_name
174
+
175
+ pdf_documents_for_splitting.append(pdf_doc)
176
+
177
+ processed_files.add(base_source)
178
+ print(f"βœ“ Processed: {filename} β†’ {species} ({len(pdf_pages)} pages)")
179
  else:
180
  print(f"⚠ Already processed: {filename}")
181
  else:
examples-work-out/example_results_20260302_202103.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "species": "Lycorma delicatula",
4
+ "region": "Midwest USA",
5
+ "model": "Llama-3 70B",
6
+ "mode": "Researcher",
7
+ "question": "How can I identify and report spotted lanternfly in Iowa?",
8
+ "availability": "**Common Name:** Unknown\n\nInformation for **Lycorma delicatula** is available in region(s): **Midwest USA**.",
9
+ "answer": "Based on the provided context from the Midwest USA region, here's how you can identify and report spotted lanternfly in Iowa:\n\n**Identification:**\n\n* Egg stage: Look for egg masses, which would be present in Iowa from late July to the following spring.\n* Nymph stage: Identify by their striking colors. Young nymphs are black with white spots, while older nymphs are red and black with white spots. They are often found on host trees, usually on branches, and will move quickly to avoid being seen.\n* Adult stage: Recognize adults by their pinkish-gray color with black spots, long grayish-black legs, and a second pair of red, white, and black wings beneath the gray spotted wings.\n\n**Reporting:**\n\nWhile there is no specific information on reporting in the provided context, as a general knowledge, it is essential to report any spotted lanternfly sightings to the Iowa Department of Agriculture and Land Stewardship or local authorities to help track the spread of this invasive species and inform management efforts.\n\n**Additional information:**\n\nFrom the same region (Midwest USA), it is known that spotted lanternfly feed on a wide variety of plants, but prefer tree of heaven (Ailanthus altissima). This information can be useful when monitoring for the species.\n\nNo additional information on identification and reporting is available from other regions (USA, Africa, or India) in the provided context."
10
+ },
11
+ {
12
+ "species": "Brassica tournefortii",
13
+ "region": "Midwest USA",
14
+ "model": "Llama-3 70B",
15
+ "mode": "Researcher",
16
+ "question": "What are the most effective methods to control Sahara mustard before it spreads?",
17
+ "availability": "**Common Name:** Unknown\n\nInformation for **Brassica tournefortii** is available in region(s): **Midwest USA**.",
18
+ "answer": "Based on the provided context from the Midwest USA region, there is no specific information on the most effective methods to control Sahara mustard. However, it is mentioned that \"it is imperative that further establishment of this invasive weed be prevented and that existing plants be eliminated\" (Document 2 from Midwest USA), indicating the importance of control measures.\n\nUnfortunately, there is no information on control methods in the provided context from the Midwest USA region. \n\nIf we look at other regions, there is no relevant information on control methods in the provided context from the USA, Africa, or India regions.\n\nAs a general knowledge addition, it is known that effective control methods for invasive weeds like Sahara mustard often involve a combination of prevention, early detection, and rapid response. This can include manual removal, herbicide application, biological control, and cultural practices that promote native vegetation. However, the specific methods and their effectiveness may vary depending on the region, habitat, and other factors.\n\nFurther research is needed to determine the most effective methods to control Sahara mustard in the Midwest USA region. It is recommended to consult with local experts, conduct field studies, and review relevant literature to develop a comprehensive control strategy for this invasive species."
19
+ }
20
+ ]
examples-work-out/example_results_20260302_202159.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "species": "Lycorma delicatula",
4
+ "region": "Midwest USA",
5
+ "model": "Llama-3 70B",
6
+ "mode": "Researcher",
7
+ "question": "How can I identify and report spotted lanternfly in Iowa?",
8
+ "availability": "**Common Name:** Unknown\n\nInformation for **Lycorma delicatula** is available in region(s): **Midwest USA**.",
9
+ "answer": "Based on the provided context from the Midwest USA region, here's the information to help you identify and report spotted lanternfly in Iowa:\n\n**Identification:**\n\n* Egg stage: Look for egg masses present in Iowa from late July to the following spring.\n* Nymphs: Identify by their striking colors - young nymphs are black with white spots, while older nymphs are red and black with white spots. They are often found on host trees, usually on branches, and will move quickly to avoid being seen.\n* Adults: Recognize by their pinkish-gray color with black spots, long grayish-black legs, and a second pair of red, white, and black wings beneath the gray spotted wings.\n\n**Reporting:**\n\nWhile there is no specific information in the provided context on how to report spotted lanternfly in Iowa, it is essential to report any sightings to the Iowa Department of Agriculture and Land Stewardship or local authorities to help track the spread of this invasive species.\n\n**Additional information:**\n\nFrom the context, we know that spotted lanternfly feed on a wide variety of plants, but prefer tree of heaven (Ailanthus altissima). This information can be useful when monitoring for the species.\n\n**General knowledge:**\n\nIt is crucial to note that early detection and reporting are critical in managing the spread of invasive species like the spotted lanternfly. If you suspect you have found a spotted lanternfly, take clear, close-up photos of the insect and the egg masses, and report the location and any other relevant details to the authorities.\n\nNo additional information from other regions (USA, Africa, or India) is relevant to this specific question."
10
+ },
11
+ {
12
+ "species": "Brassica tournefortii",
13
+ "region": "Midwest USA",
14
+ "model": "Llama-3 70B",
15
+ "mode": "Researcher",
16
+ "question": "What mechanical, chemical, and cultural control methods are recommended for managing Sahara mustard infestations?",
17
+ "availability": "**Common Name:** Unknown\n\nInformation for **Brassica tournefortii** is available in region(s): **Midwest USA**.",
18
+ "answer": "Based on the provided context, I must inform you that there is no specific information available in the Midwest USA region regarding mechanical, chemical, and cultural control methods for managing Sahara mustard infestations.\n\nHowever, I can provide some general knowledge on control methods for invasive weeds like Sahara mustard. Please note that these are not specific to the Midwest USA region and may require adaptation and testing for local conditions.\n\n**Mechanical Control:**\n\n* Hand-pulling or digging can be effective for small infestations, especially when the soil is moist. It is essential to remove the entire root system to prevent regrowth.\n* Mowing or cutting can be used to reduce seed production, but it may not completely eliminate the infestation.\n\n**Chemical Control:**\n\n* Herbicides can be used to control Sahara mustard, but the specific products and application rates may vary depending on the region and local regulations. It is crucial to choose a herbicide that is selective for the target species and minimizes harm to non-target plants and the environment.\n\n**Cultural Control:**\n\n* Preventing seed dispersal is critical in managing Sahara mustard infestations. This can be achieved by avoiding the movement of contaminated soil, equipment, and vehicles.\n* Restoring native vegetation can help outcompete Sahara mustard and reduce its spread.\n* Altering land use practices, such as changing grazing patterns or implementing conservation tillage, may also help reduce the spread of Sahara mustard.\n\nIt is essential to note that a comprehensive management plan should consider a combination of these control methods, as well as monitoring and early detection, to effectively manage Sahara mustard infestations.\n\nIf you are looking for region-specific information, I recommend consulting with local experts or conducting further research to develop a management plan tailored to the Midwest USA region."
19
+ }
20
+ ]
examples-work-out/example_results_20260302_202402.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "species": "Lycorma delicatula",
4
+ "region": "Midwest USA",
5
+ "model": "Llama-3 70B",
6
+ "mode": "Researcher",
7
+ "question": "How can I identify and report spotted lanternfly in Iowa?",
8
+ "availability": "**Common Name:** Unknown\n\nInformation for **Lycorma delicatula** is available in region(s): **Midwest USA**.",
9
+ "answer": "Based on the provided context from the Midwest USA region, here's the information to help you identify and report spotted lanternfly in Iowa:\n\n**Identification:**\n\n* Egg stage: Spotted lanternfly spend the winter in the egg stage, and egg masses would be present in Iowa from late July to the following spring.\n* Nymphs: Young nymphs are black with white spots, and older nymphs are red and black with white spots. They are fond of host trees, usually on branches, and will move quickly around the branch to avoid being seen.\n* Adults: Spotted lanternfly adults look like pinkish-gray cicadas or very large treehoppers. Their wings are pinkish-gray colored and covered in black spots. They have long legs that are grayish black. Beneath the gray spotted wings, there is a second pair of wings that are red, white, and black.\n\n**Reporting:**\n\n* According to the Iowa State University Yard and Garden Encyclopedia, you can report spotted lanternfly sightings in Iowa. The exact reporting process is not specified in the provided context, but it is recommended to contact the Iowa State University Extension or the Iowa Department of Agriculture and Land Stewardship for guidance on reporting procedures.\n\n**Additional Information:**\n\n* General knowledge: It is essential to report spotted lanternfly sightings promptly, as early detection and rapid response are critical in preventing the spread of this invasive species.\n\nNo additional information is available from other regions (USA, Africa, or India) that is specific to reporting spotted lanternfly in Iowa."
10
+ },
11
+ {
12
+ "species": "Brassica tournefortii",
13
+ "region": "Midwest USA",
14
+ "model": "Llama-3 70B",
15
+ "mode": "Researcher",
16
+ "question": "What mechanical, chemical, and cultural control methods are recommended for managing Sahara mustard infestations?",
17
+ "availability": "**Common Name:** Unknown\n\nInformation for **Brassica tournefortii** is available in region(s): **Midwest USA**.",
18
+ "answer": "Based on the provided context from the Midwest USA region, here are the recommended control methods for managing Sahara mustard infestations:\n\n**Mechanical Control (Midwest USA):** Monitoring the area for rodent caches and treating them accordingly is important. This suggests that mechanical control methods, such as hand-pulling or digging up, may be effective in managing Sahara mustard infestations, especially in areas where rodent caching is a significant means of spread.\n\n**Chemical Control (Midwest USA):** Early applications of chemicals may control Sahara mustard due to its extremely early development. Applications should be particularly effective if started before native species have begun to develop. However, it is noted that finding an agent that will control Sahara mustard without damaging canola crops may be challenging.\n\n**Cultural Control (No information available in Midwest USA context):** Unfortunately, there is no information available in the provided context from the Midwest USA region regarding cultural control methods for managing Sahara mustard infestations.\n\n**General Knowledge:** In general, cultural control methods for managing invasive species like Sahara mustard may include practices such as altering land use, modifying soil conditions, or using cover crops to outcompete the invasive species. However, more research is needed to determine the effectiveness of these methods for Sahara mustard specifically.\n\nNo relevant information was found in the other regions (USA, Africa, and India) that could supplement the information from the Midwest USA region."
19
+ }
20
+ ]
examples-work-out/example_results_20260303_104758.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "species": "Lycorma delicatula",
4
+ "region": "Midwest USA",
5
+ "model": "Llama-3 70B",
6
+ "mode": "Researcher",
7
+ "question": "How can I identify and report spotted lanternfly in Iowa?",
8
+ "availability": "**Common Name:** Spotted Lanternfly\n\nInformation for **Lycorma delicatula** is available in region(s): **Midwest USA, USA**.",
9
+ "answer": "Based on the provided context, here's the information on identifying and reporting spotted lanternfly in Iowa:\n\n**From Midwest USA:**\n\nTo identify spotted lanternfly in Iowa, you can look for the following characteristics:\n\n* Egg masses: Present in Iowa from late July to the following spring.\n* Nymphs: Black with white spots (young) or red and black with white spots (older). They are fond of host trees, usually on branches, and will move quickly to avoid being seen. Nymphs start hatching from eggs in April and develop throughout the early summer.\n* Adults: Pinkish-gray cicada-like or large treehopper-like, with pinkish-gray wings covered in black spots, grayish-black legs, and a second pair of red, white, and black wings beneath. Adults are present in Iowa from mid-July through a killing freeze.\n\n**Reporting:**\n\nAccording to the Iowa State University Yard and Garden Encyclopedia, you can report spotted lanternfly sightings to the authorities. The exact reporting process is not specified in the provided context, but it is likely that you can contact the Iowa Department of Agriculture and Land Stewardship or the Iowa State University Extension and Outreach office to report any sightings.\n\n**General Knowledge:**\nIt is essential to report spotted lanternfly sightings promptly, as early detection and rapid response are critical in managing the spread of this invasive species.\n\nPlease note that the provided context does not specify the exact reporting process or contact information. You may need to search for additional resources or contact local authorities for more information on reporting spotted lanternfly in Iowa."
10
+ },
11
+ {
12
+ "species": "Striga asiatica",
13
+ "region": "Africa",
14
+ "model": "Llama-3 70B",
15
+ "mode": "Researcher",
16
+ "question": "What is the most effective way to manage Striga in my maize field?",
17
+ "availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
18
+ "answer": "Based on the context from Africa, the most effective way to manage Striga in your maize field is to use a combination of integrated Striga management (ISM) technologies. According to Document 1 from Africa, this can include:\n\n1. Using Striga-resistant maize varieties as the foundation.\n2. Implementing maize-legume rotation to deplete the Striga seedbank.\n3. Adding legume intercropping (push-pull) to trigger suicidal germination.\n4. Applying nitrogen fertilizer to suppress strigolactone exudation.\n5. Deploying Fusarium oxysporum biocontrol where feasible.\n\nIt is emphasized that no single method suffices, and combining these complementary components is necessary for maximum effect.\n\nAdditionally, from the USA context, it is recommended to implement integrated management strategies combining cultural, chemical, and biological controls for effective long-term management of Striga asiatica (Document 1 from USA). This may include cultural control practices such as crop rotation, trap crops, deep tillage, and sanitation (Document 3 from USA).\n\nIt is also worth noting that monitoring and early detection are crucial for managing Striga infestations, and regular field inspections can help identify early signs of infestation (Document 1 from USA).\n\nIn general, it is essential to adopt a holistic approach that considers the specific conditions of your maize field and the local Striga population to develop an effective management strategy.\n\nPlease note that the provided context does not mention the specific efficacy of each method or the optimal combination of methods for your specific region or field conditions. Further research and experimentation may be necessary to determine the most effective management strategy for your specific situation."
19
+ },
20
+ {
21
+ "species": "Striga asiatica",
22
+ "region": "USA",
23
+ "model": "Llama-3 70B",
24
+ "mode": "Researcher",
25
+ "question": "What is the most effective way to manage Striga in my maize field?",
26
+ "availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
27
+ "answer": "Based on the provided context, here's a comprehensive answer to your question:\n\n**From the USA region:**\n\nFor effective management of Striga asiatica in your maize field, an integrated approach combining cultural, chemical, and biological controls is recommended (Document 1 from USA). This multi-faceted strategy can provide long-term management of Striga infestations.\n\nCultural control practices can play a crucial role in managing Striga. Implementing crop rotation with non-host crops can help reduce the Striga seed bank in the soil (Document 3 from USA). Additionally, planting trap crops that stimulate Striga germination but do not support its growth can help deplete the seed bank. Deep tillage can also be effective in burying Striga seeds deeper in the soil, reducing their viability. Furthermore, ensuring that farm equipment is clean can prevent the spread of Striga seeds between fields.\n\nChemical control options, such as pre-emergent and post-emergent herbicides, can also be used to control Striga infestations. Herbicides like imazapyr and glyphosate have been effective when used appropriately (Document 2 from USA).\n\n**From other regions:**\n\nWhile there is no specific information from the USA region on biological control options, research from Africa suggests that Fusarium oxysporum biocontrol can be an effective component of an integrated management strategy (Document 1 from Africa).\n\n**General knowledge:**\n\nIt's essential to note that monitoring and early detection are crucial for managing Striga infestations. Regular field inspections can help identify early signs of infestation, allowing for prompt action to be taken.\n\nIn summary, a comprehensive approach combining cultural, chemical, and biological controls, along with regular monitoring and early detection, is likely to be the most effective way to manage Striga in your maize field."
28
+ },
29
+ {
30
+ "species": "Spodoptera frugiperda",
31
+ "region": "India",
32
+ "model": "Llama-3 70B",
33
+ "mode": "Researcher",
34
+ "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?",
35
+ "availability": "**Common Name:** Fall Armyworm\n\nInformation for **Spodoptera frugiperda** is available in region(s): **India, Midwest USA, USA**.",
36
+ "answer": "Based on the provided context, here are the recommended IPM strategies for managing fall armyworm in your corn field in India:\n\n**From India:**\n\n1. **Cultural Control:** Implement crop rotation, intercropping, and planting shortly after major rainstorms to reduce pest incidence.\n2. **Monitoring:** Use pheromone traps (8 traps/40 lures per acre) for early detection and adult population monitoring, applied based on Economic Threshold Level (ETL).\n3. **Biological Control:** Apply Metarhizium anisopliae (entomopathogenic fungus) at 1 kg/acre.\n4. **Botanical/Biorational:** Use azadirachtin 1500 ppm (neem-based) at 1 L/acre.\n5. **Chemical Control (need-based):** Apply emamectin benzoate 5% SG at 100 g/acre, based on ETL, noting that resistance has developed to older insecticides like cypermethrin and lambda-cyhalothrin.\n\n**From other regions:**\n\n**From USA:**\n\n1. **Cultural Control/Management Practices:** Rotate crops to disrupt the lifecycle of the pest, adjust planting dates to avoid peak periods of armyworm activity, and practice field sanitation by removing crop residues and weeds that can serve as hosts.\n2. **Biological Control Options/Natural Enemies:** Conserve natural enemies like parasitoids (e.g., Trichogramma spp.) and predators (e.g., lady beetles and lacewings) through habitat management to help control fall armyworm populations.\n\n**General knowledge:** It is essential to adopt a holistic IPM approach, combining multiple strategies to manage fall armyworm effectively. Regular monitoring and threshold-based applications are crucial to minimize the development of resistance to control measures.\n\nPlease note that the recommended strategies may vary depending on the specific region, climate, and crop conditions. It is essential to consult with local experts and"
37
+ },
38
+ {
39
+ "species": "Spodoptera frugiperda",
40
+ "region": "USA",
41
+ "model": "Llama-3 70B",
42
+ "mode": "Researcher",
43
+ "question": "What are the recommended IPM strategies for managing fall armyworm in my corn field?",
44
+ "availability": "**Common Name:** Fall Armyworm\n\nInformation for **Spodoptera frugiperda** is available in region(s): **India, Midwest USA, USA**.",
45
+ "answer": "Based on the provided context, here are the recommended IPM strategies for managing fall armyworm in your corn field in the USA:\n\n**From USA:**\n\n1. **Chemical Control Options**: Insecticides may be used, but resistance management is critical due to the fall armyworm's ability to develop resistance. Use of insecticides should be based on monitoring and threshold levels to minimize unnecessary applications. Bacillus thuringiensis (Bt) products can be effective against larvae.\n2. **Biological Control Options/Natural Enemies**: Natural enemies include parasitoids such as Trichogramma spp. and predators like lady beetles and lacewings. Conservation of these natural enemies through habitat management can help control fall armyworm populations.\n3. **Cultural Control/Management Practices**:\n\t* **Crop Rotation**: Rotate crops to disrupt the lifecycle of the pest.\n\t* **Planting Date Adjustment**: Alter planting dates to avoid peak periods of armyworm activity.\n\t* **Field Sanitation**: Remove crop residues and weeds that can serve as hosts.\n\t* **Trap Crops**: Use trap crops to attract and manage populations away from main crops.\n\n**From Midwest USA:**\n\n1. **Scouting and Thresholds**: Begin scouting in mid-June and continue until silks dry in corn. Check whorls for larvae prior to tasseling in corn. Use a sweep net in soybean and alfalfa. Areas appearing drought stressed in alfalfa may indicate fall armyworm presence. Treat corn with foliar insecticides when 25% of plants are infested, caterpillars are small (<1 inch), and still exposed.\n2. **Management**: Consider managing grassy weeds, both in and near soybean and corn fields, to reduce armyworm problems. Natural enemies contribute to population reduction, especially in alfalfa. Transgenic Bt corn exhibits action against fall armyworm; however, some Bt proteins such as Cry1F and Vip3A"
46
+ },
47
+ {
48
+ "species": "Brassica tournefortii",
49
+ "region": "Midwest USA",
50
+ "model": "Llama-3 70B",
51
+ "mode": "Researcher",
52
+ "question": "What mechanical, chemical, and cultural control methods are recommended for managing Sahara mustard infestations?",
53
+ "availability": "**Common Name:** Sahara Mustard\n\nInformation for **Brassica tournefortii** is available in region(s): **Midwest USA, USA**.",
54
+ "answer": "Based on the provided context, here are the recommended mechanical, chemical, and cultural control methods for managing Sahara mustard infestations:\n\n**Mechanical Control (Midwest USA):**\nMonitoring the area for rodent caches and treating them accordingly is important. No other mechanical control methods are mentioned in the Midwest USA context.\n\n**Chemical Control (Midwest USA):**\nEarly applications of chemicals may control Sahara mustard due to its extremely early development. Applications should be particularly effective if started before native species have begun to develop.\n\n**Cultural Control (Midwest USA):**\nNo cultural control methods are mentioned in the Midwest USA context.\n\n**Additional Information from Other Regions:**\n\n**Cultural Control (USA):**\nManual removal is effective for small infestations, ensuring the entire root is removed to prevent regrowth. Mowing before seed set can reduce seed production but may need to be repeated due to staggered germination. Restoration of native vegetation can help outcompete Sahara Mustard.\n\n**Chemical Control (USA):**\nHerbicides such as glyphosate and 2,4-D can be effective when applied to young plants before they set seed. Pre-emergent herbicides can be used to prevent seed germination in heavily infested areas.\n\n**General Knowledge:**\nIt is essential to note that the effectiveness of these control methods may vary depending on the specific infestation site, soil type, and climate. A combination of methods may be necessary for optimal results. Additionally, prevention and early detection are crucial in managing Sahara mustard infestations, as they can prevent the establishment of a seed bank in the soil.\n\nIn summary, a comprehensive approach to managing Sahara mustard infestations may involve a combination of mechanical, chemical, and cultural control methods, including monitoring for rodent caches, early chemical applications, manual removal, mowing, and restoration of native vegetation."
55
+ }
56
+ ]
examples-work-out/species-verification.md CHANGED
@@ -4,47 +4,61 @@ The 4 app regions are: **Midwest USA**, **USA**, **India**, **Africa**.
4
 
5
  Each vignette maps to one row in the motivation table and demonstrates a specific PestIDBot capability. The user question should naturally trigger that capability.
6
 
 
 
 
 
7
  ---
8
 
9
  ## 1. Spotted Lanternfly (*Lycorma delicatula*)
10
 
11
  **Table highlight**: Life-stage variation (eggs, instars, adults); risk of misidentification β†’ **Life-stage robust ID; abstains when uncertain (Stage 1)**
12
 
13
- **Region**: Midwest USA
14
 
15
- **User question**: "I found this insect on my grapevine β€” can you identify it?" *(uploads photo showing a specific life stage, e.g. late instar or egg mass)*
 
 
 
 
 
16
 
17
- **What PestIDBot does**: Correctly identifies the life stage despite visual variation across stages. If uncertain, abstains and asks for re-imaging.
18
-
19
- **Database needs**: Life-stage descriptions/images for SLF (egg mass, early nymph, late nymph, adult), reporting and quarantine info for the selected US state/region.
20
 
21
  ---
22
 
23
- ## 2. Witchweed (*Striga* spp.)
24
 
25
  **Table highlight**: Africa/India (smallholder constraints) β†’ resistant varieties, intercrops; US β†’ eradication, quarantine β†’ **Region-filtered recommendations (Stage 2)**
26
 
27
- **Region**: Africa (primary), with cross-regional supplement from USA/India
28
-
29
- **User question**: "What is the most effective way to manage Striga in my maize field?" *(region: Africa)*
 
 
30
 
31
- **What PestIDBot does**: Returns African smallholder-appropriate IPM (resistant varieties, *Desmodium* push-pull, sanitation) first, then cross-regional supplements with transferability notes. Does NOT lead with US eradication protocols since those don't apply to the user's context.
32
-
33
- **Database needs**: African Striga IPM strategies (varieties, intercropping, sanitation, seed treatment). US eradication/quarantine info and India management info for cross-regional context.
34
 
35
  ---
36
 
37
- ## 3. Old World Bollworm (*Helicoverpa armigera*) [replaced with fall army worom with same regular aware guidance which had more regions. ]
38
-
39
- **Table highlight**: Management approved in US may not be approved in African/Indian systems β†’ **Regulatory-aware guidance (Stage 2)**
40
 
41
- **Region**: Africa (to show the contrast β€” US chemicals are NOT recommended because they're unavailable/unapproved)
42
 
43
- **User question**: "How should I control bollworm in my cotton?" *(region: Africa)*
 
 
 
 
44
 
45
- **What PestIDBot does**: Provides region-appropriate controls (cultural practices, pheromone traps, biocontrol) instead of US-approved chemicals. Explicitly notes regulatory differences.
46
-
47
- **Database needs**: African bollworm management practices (cultural, biological, pheromone-based). US-approved chemicals for *H. armigera* (to know what NOT to recommend). Regulatory context on why they differ.
 
48
 
49
  ---
50
 
@@ -52,10 +66,15 @@ Each vignette maps to one row in the motivation table and demonstrates a specifi
52
 
53
  **Table highlight**: Port-of-entry screening; partial/low-light images; misidentification historically enabled spread β†’ **Abstention β†’ triage β†’ escalate (Stage 1+2)**
54
 
55
- **Region**: USA (California)
56
-
57
- **User question**: "Can you identify this plant from my field survey?" *(uploads blurry/partial photo)*
58
 
59
- **What PestIDBot does**: Abstains (low confidence on poor image), provides triage steps: re-imaging instructions, specimen collection protocol, diagnostic lab contact, reporting portal. Does NOT guess.
 
 
 
 
 
60
 
61
- **Database needs**: Specimen collection and chain-of-custody protocol, California diagnostic lab contacts, CDFA reporting portals, re-imaging best practices.
 
 
 
4
 
5
  Each vignette maps to one row in the motivation table and demonstrates a specific PestIDBot capability. The user question should naturally trigger that capability.
6
 
7
+ **Database summary** (rebuilt 2026-03-03): **1410 total chunks** β€” Midwest USA: 388, USA: 913, India: 39, Africa: 70.
8
+
9
+ **Latest comprehensive example run**: `example_results_20260303_110427.json` (6 examples covering all 4 vignettes with cross-regional demos, using updated prompt β€” no "Document N" citations, no general knowledge filler)
10
+
11
  ---
12
 
13
  ## 1. Spotted Lanternfly (*Lycorma delicatula*)
14
 
15
  **Table highlight**: Life-stage variation (eggs, instars, adults); risk of misidentification β†’ **Life-stage robust ID; abstains when uncertain (Stage 1)**
16
 
17
+ **Stage 1 reference**: InsectNet Figure 2 in Chiranjeevi et al. (2025) shows life-stage classification results.
18
 
19
+ **Stage 2 example (AgLLM)**:
20
+ - **Region**: Midwest USA (available in DB)
21
+ - **Source PDF**: ISU Yard and Garden Extension β€” Spotted Lanternfly (3 pages: life cycle, damage, reporting)
22
+ - **User question**: "How can I identify and report spotted lanternfly in Iowa?"
23
+ - **What PestIDBot does**: Returns life-stage descriptions (eggs, nymphs with color variations, adults) from ISU extension data, plus Iowa-specific reporting guidance.
24
+ - **Example output**: `example_results_20260303_110427.json` (Example 1)
25
 
26
+ **Database status**:
27
+ - Midwest USA: βœ… 6+ chunks from ISU PDF (3 pages ingested)
28
+ - USA: βœ… GPT-4o generated IPM info (5 chunks)
29
 
30
  ---
31
 
32
+ ## 2. Witchweed (*Striga asiatica*)
33
 
34
  **Table highlight**: Africa/India (smallholder constraints) β†’ resistant varieties, intercrops; US β†’ eradication, quarantine β†’ **Region-filtered recommendations (Stage 2)**
35
 
36
+ **Stage 2 examples (AgLLM)**:
37
+ - **Regions**: Africa (primary), USA (cross-regional)
38
+ - **User question**: "What is the most effective way to manage Striga in my maize field?"
39
+ - **What PestIDBot does**: For Africa, returns smallholder-appropriate IPM (resistant varieties, *Desmodium* push-pull, *Fusarium oxysporum* biocontrol, nitrogen fertilizer). For USA, leads with integrated cultural/chemical/biological approach (crop rotation, trap crops, imazapyr/glyphosate herbicides), supplements with African biocontrol research.
40
+ - **Example output**: `example_results_20260303_110427.json` (Examples 2-3: Africa vs USA)
41
 
42
+ **Database status**:
43
+ - Africa: βœ…
44
+ - USA: βœ…
45
 
46
  ---
47
 
48
+ ## 3. Fall Armyworm (*Spodoptera frugiperda*) [replaced Old World Bollworm β€” more regions available]
 
 
49
 
50
+ **Table highlight**: India: biorationals, ETL monitoring, insecticide resistance awareness; USA: Bt crops, synthetic insecticides β†’ **Resistance-aware, resource-appropriate guidance (Stage 2)**
51
 
52
+ **Stage 2 examples (AgLLM)**:
53
+ - **Regions**: India (primary), USA (cross-regional), Midwest USA
54
+ - **User question**: "What are the recommended IPM strategies for managing fall armyworm in my corn field?"
55
+ - **What PestIDBot does**: For India, foregrounds ETL-based pheromone monitoring, biorationals (neem, *Metarhizium*), and emamectin benzoate with resistance warnings. For USA, leads with Bt crops, resistance management, mechanized scouting with ISU thresholds (25% infestation).
56
+ - **Example output**: `example_results_20260303_110427.json` (Examples 4-5: India vs USA)
57
 
58
+ **Database status**:
59
+ - India: βœ…
60
+ - Midwest USA: βœ…
61
+ - USA: βœ…
62
 
63
  ---
64
 
 
66
 
67
  **Table highlight**: Port-of-entry screening; partial/low-light images; misidentification historically enabled spread β†’ **Abstention β†’ triage β†’ escalate (Stage 1+2)**
68
 
69
+ **Stage 1 reference**: WeedNet OOD detection and conformal prediction for abstention on poor images (Shen et al. 2025).
 
 
70
 
71
+ **Stage 2 example (AgLLM)**:
72
+ - **Region**: Midwest USA (available in DB), USA (cross-regional)
73
+ - **Source PDF**: University of Nevada Extension β€” Identifying and Managing Sahara Mustard (3 pages: ID, habitat/impact, management methods)
74
+ - **User question**: "What mechanical, chemical, and cultural control methods are recommended for managing Sahara mustard infestations?"
75
+ - **What PestIDBot does**: Returns Midwest USA data (early chemical applications, rodent cache monitoring) plus USA cross-regional supplements (manual removal, mowing before seed set, glyphosate/2,4-D herbicides, native vegetation restoration).
76
+ - **Example output**: `example_results_20260303_110427.json` (Example 6)
77
 
78
+ **Database status**:
79
+ - Midwest USA: βœ… 11+ chunks from extension PDF (3 pages ingested)
80
+ - USA: βœ… GPT-4o generated IPM info
retrieval_evaluation_results.json CHANGED
@@ -1,60 +1,60 @@
1
  {
2
  "no_filter": {
3
  "precision@1": {
4
- "mean": 0.64,
5
- "std": 0.48,
6
  "count": 100
7
  },
8
  "precision@2": {
9
- "mean": 0.76,
10
- "std": 0.4270831300812524,
11
  "count": 100
12
  },
13
  "precision@4": {
14
- "mean": 0.81,
15
- "std": 0.39230090491866054,
16
  "count": 100
17
  },
18
  "precision@8": {
19
- "mean": 0.85,
20
- "std": 0.3570714214271425,
21
  "count": 100
22
  },
23
  "ndcg@1": {
24
- "mean": 0.64,
25
- "std": 0.48,
26
  "count": 100
27
  },
28
  "ndcg@2": {
29
- "mean": 0.7157115704285749,
30
- "std": 0.41895779074707734,
31
  "count": 100
32
  },
33
  "ndcg@4": {
34
- "mean": 0.7400183360093088,
35
- "std": 0.38986711559166154,
36
  "count": 100
37
  },
38
  "ndcg@8": {
39
- "mean": 0.754344341157148,
40
- "std": 0.3684202134419992,
41
  "count": 100
42
  }
43
  },
44
  "species_only": {
45
  "precision@1": {
46
- "mean": 0.68,
47
- "std": 0.466476151587624,
48
  "count": 100
49
  },
50
  "precision@2": {
51
- "mean": 0.84,
52
- "std": 0.3666060555964672,
53
  "count": 100
54
  },
55
  "precision@4": {
56
- "mean": 0.93,
57
- "std": 0.25514701644346144,
58
  "count": 100
59
  },
60
  "precision@8": {
@@ -63,35 +63,35 @@
63
  "count": 100
64
  },
65
  "ndcg@1": {
66
- "mean": 0.68,
67
- "std": 0.466476151587624,
68
  "count": 100
69
  },
70
  "ndcg@2": {
71
- "mean": 0.7809487605714333,
72
- "std": 0.36580132584863706,
73
  "count": 100
74
  },
75
  "ndcg@4": {
76
- "mean": 0.824562291732901,
77
- "std": 0.29154459287790513,
78
  "count": 100
79
  },
80
  "ndcg@8": {
81
- "mean": 0.8508003372990416,
82
- "std": 0.22561628946474954,
83
  "count": 100
84
  }
85
  },
86
  "region_only": {
87
  "precision@1": {
88
- "mean": 0.69,
89
- "std": 0.462493243193887,
90
  "count": 100
91
  },
92
  "precision@2": {
93
- "mean": 0.78,
94
- "std": 0.4142463035441596,
95
  "count": 100
96
  },
97
  "precision@4": {
@@ -100,40 +100,40 @@
100
  "count": 100
101
  },
102
  "precision@8": {
103
- "mean": 0.87,
104
- "std": 0.33630343441600474,
105
  "count": 100
106
  },
107
  "ndcg@1": {
108
- "mean": 0.69,
109
- "std": 0.462493243193887,
110
  "count": 100
111
  },
112
  "ndcg@2": {
113
- "mean": 0.7467836778214312,
114
- "std": 0.4100495706548825,
115
  "count": 100
116
  },
117
  "ndcg@4": {
118
- "mean": 0.7747039745636328,
119
- "std": 0.37245769233603554,
120
  "count": 100
121
  },
122
  "ndcg@8": {
123
- "mean": 0.7857743640416571,
124
- "std": 0.35433577663047267,
125
  "count": 100
126
  }
127
  },
128
  "species_and_region": {
129
  "precision@1": {
130
- "mean": 0.79,
131
- "std": 0.40730823708832603,
132
  "count": 100
133
  },
134
  "precision@2": {
135
- "mean": 0.91,
136
- "std": 0.2861817604250837,
137
  "count": 100
138
  },
139
  "precision@4": {
@@ -147,23 +147,23 @@
147
  "count": 100
148
  },
149
  "ndcg@1": {
150
- "mean": 0.79,
151
- "std": 0.40730823708832603,
152
  "count": 100
153
  },
154
  "ndcg@2": {
155
- "mean": 0.8657115704285749,
156
- "std": 0.2971736181074952,
157
  "count": 100
158
  },
159
  "ndcg@4": {
160
- "mean": 0.9036318671707767,
161
- "std": 0.19819838555518796,
162
  "count": 100
163
  },
164
  "ndcg@8": {
165
- "mean": 0.907500395243122,
166
- "std": 0.18377359707383636,
167
  "count": 100
168
  }
169
  }
 
1
  {
2
  "no_filter": {
3
  "precision@1": {
4
+ "mean": 0.5,
5
+ "std": 0.5,
6
  "count": 100
7
  },
8
  "precision@2": {
9
+ "mean": 0.72,
10
+ "std": 0.4489988864128729,
11
  "count": 100
12
  },
13
  "precision@4": {
14
+ "mean": 0.79,
15
+ "std": 0.40730823708832603,
16
  "count": 100
17
  },
18
  "precision@8": {
19
+ "mean": 0.87,
20
+ "std": 0.33630343441600474,
21
  "count": 100
22
  },
23
  "ndcg@1": {
24
+ "mean": 0.5,
25
+ "std": 0.5,
26
  "count": 100
27
  },
28
  "ndcg@2": {
29
+ "mean": 0.6388045457857207,
30
+ "std": 0.4236799147359738,
31
  "count": 100
32
  },
33
  "ndcg@4": {
34
+ "mean": 0.6724180769471885,
35
+ "std": 0.38940915964484785,
36
  "count": 100
37
  },
38
  "ndcg@8": {
39
+ "mean": 0.7001274694006319,
40
+ "std": 0.35105364601238853,
41
  "count": 100
42
  }
43
  },
44
  "species_only": {
45
  "precision@1": {
46
+ "mean": 0.69,
47
+ "std": 0.46249324319388707,
48
  "count": 100
49
  },
50
  "precision@2": {
51
+ "mean": 0.85,
52
+ "std": 0.3570714214271425,
53
  "count": 100
54
  },
55
  "precision@4": {
56
+ "mean": 0.97,
57
+ "std": 0.17058722109231983,
58
  "count": 100
59
  },
60
  "precision@8": {
 
63
  "count": 100
64
  },
65
  "ndcg@1": {
66
+ "mean": 0.69,
67
+ "std": 0.46249324319388707,
68
  "count": 100
69
  },
70
  "ndcg@2": {
71
+ "mean": 0.7909487605714333,
72
+ "std": 0.3578989169880122,
73
  "count": 100
74
  },
75
  "ndcg@4": {
76
+ "mean": 0.8467893540558369,
77
+ "std": 0.2505360340326592,
78
  "count": 100
79
  },
80
  "ndcg@8": {
81
+ "mean": 0.8575532873325957,
82
+ "std": 0.21977117994271048,
83
  "count": 100
84
  }
85
  },
86
  "region_only": {
87
  "precision@1": {
88
+ "mean": 0.53,
89
+ "std": 0.4990991885387111,
90
  "count": 100
91
  },
92
  "precision@2": {
93
+ "mean": 0.77,
94
+ "std": 0.4208325082500163,
95
  "count": 100
96
  },
97
  "precision@4": {
 
100
  "count": 100
101
  },
102
  "precision@8": {
103
+ "mean": 0.89,
104
+ "std": 0.31288975694324034,
105
  "count": 100
106
  },
107
  "ndcg@1": {
108
+ "mean": 0.53,
109
+ "std": 0.4990991885387111,
110
  "count": 100
111
  },
112
  "ndcg@2": {
113
+ "mean": 0.6814231408571497,
114
+ "std": 0.4014970336757105,
115
  "count": 100
116
  },
117
  "ndcg@4": {
118
+ "mean": 0.7150366720186176,
119
+ "std": 0.36120571502682,
120
  "count": 100
121
  },
122
  "ndcg@8": {
123
+ "mean": 0.7327460644720609,
124
+ "std": 0.33335934745276274,
125
  "count": 100
126
  }
127
  },
128
  "species_and_region": {
129
  "precision@1": {
130
+ "mean": 0.74,
131
+ "std": 0.4386342439892262,
132
  "count": 100
133
  },
134
  "precision@2": {
135
+ "mean": 0.89,
136
+ "std": 0.31288975694324034,
137
  "count": 100
138
  },
139
  "precision@4": {
 
147
  "count": 100
148
  },
149
  "ndcg@1": {
150
+ "mean": 0.74,
151
+ "std": 0.4386342439892262,
152
  "count": 100
153
  },
154
  "ndcg@2": {
155
+ "mean": 0.8346394630357188,
156
+ "std": 0.32107291980904945,
157
  "count": 100
158
  },
159
  "ndcg@4": {
160
+ "mean": 0.8832529941971865,
161
+ "std": 0.2080496269798261,
162
  "count": 100
163
  },
164
  "ndcg@8": {
165
+ "mean": 0.8871215222695319,
166
+ "std": 0.19476255984038815,
167
  "count": 100
168
  }
169
  }
species-organized/PestID Species - Organized.xlsx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c2c04042374df7b1c0122979a109e36553d9d7f3c560aa8e5a6cd258eca23066
3
- size 63579
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:965f330dfbc74381e712a8ea4e65633ed5a07fcae68c69dae797c1b533fdcc6b
3
+ size 98454
species-organized/species_analysis.png CHANGED

Git LFS Details

  • SHA256: fdc3ab7a7b0b81ec829fd73ab26c17207fa0f63001fa6bd71943cfe2e7bf6c7b
  • Pointer size: 131 Bytes
  • Size of remote file: 438 kB

Git LFS Details

  • SHA256: fb1dbc5e51cbb84cbc03f89078df6deef82ea00702c6335f77300eaca865f8bb
  • Pointer size: 131 Bytes
  • Size of remote file: 439 kB
species-organized/species_statistics.txt CHANGED
@@ -4,23 +4,23 @@ PEST SPECIES ANALYSIS - STATISTICS SUMMARY
4
 
5
  1. OVERALL SPECIES COUNTS
6
  ----------------------------------------
7
- Total species: 126
8
 
9
  By Region:
10
- Midwest USA: 80 species ( 63.5%)
11
- Africa : 35 species ( 27.8%)
12
- India : 11 species ( 8.7%)
13
 
14
  2. INSECT VS WEED DISTRIBUTION
15
  ----------------------------------------
16
  Overall:
17
- Insects: 65 (51.6%)
18
- Weeds: 61 (48.4%)
19
 
20
  By Region:
21
  Midwest USA:
22
- Insects: 44
23
- Weeds: 36
24
  Africa:
25
  Insects: 10
26
  Weeds: 25
@@ -36,7 +36,7 @@ Midwest USA:
36
  Std Dev: 11.74%
37
  Min: 40.00%
38
  Max: 100.00%
39
- Missing: 19 (23.8%)
40
 
41
  Africa:
42
  Mean: 89.81%
@@ -62,10 +62,10 @@ Overall (all regions):
62
 
63
  4. IPM INFORMATION COVERAGE
64
  ----------------------------------------
65
- Midwest USA: 0/80 species ( 0.0%)
66
  Africa : 35/35 species (100.0%)
67
  India : 11/11 species (100.0%)
68
- Overall : 46/126 species ( 36.5%)
69
 
70
  5. TOP 10 SPECIES BY ACCURACY
71
  ----------------------------------------
 
4
 
5
  1. OVERALL SPECIES COUNTS
6
  ----------------------------------------
7
+ Total species: 128
8
 
9
  By Region:
10
+ Midwest USA: 82 species ( 64.1%)
11
+ Africa : 35 species ( 27.3%)
12
+ India : 11 species ( 8.6%)
13
 
14
  2. INSECT VS WEED DISTRIBUTION
15
  ----------------------------------------
16
  Overall:
17
+ Insects: 66 (51.6%)
18
+ Weeds: 62 (48.4%)
19
 
20
  By Region:
21
  Midwest USA:
22
+ Insects: 45
23
+ Weeds: 37
24
  Africa:
25
  Insects: 10
26
  Weeds: 25
 
36
  Std Dev: 11.74%
37
  Min: 40.00%
38
  Max: 100.00%
39
+ Missing: 21 (25.6%)
40
 
41
  Africa:
42
  Mean: 89.81%
 
62
 
63
  4. IPM INFORMATION COVERAGE
64
  ----------------------------------------
65
+ Midwest USA: 0/82 species ( 0.0%)
66
  Africa : 35/35 species (100.0%)
67
  India : 11/11 species (100.0%)
68
+ Overall : 46/128 species ( 35.9%)
69
 
70
  5. TOP 10 SPECIES BY ACCURACY
71
  ----------------------------------------
species-organized/species_table.tex CHANGED
@@ -82,6 +82,7 @@ Midwest USA & \textit{Pirate bugs} & Pirate bugs & insect & β€” & No \\
82
  Midwest USA & \textit{Soldier beetles} & Soldier beetles & insect & β€” & No \\
83
  Midwest USA & \textit{Podisus maculiventris} & Spined soldier bug & insect & β€” & No \\
84
  Midwest USA & \textit{Tachinid flies} & Tachinid flies & insect & β€” & No \\
 
85
  Midwest USA & \textit{Empoasca fabae} & potato leafhopper & insect & 97.0 & No \\
86
  Midwest USA & \textit{Striacosta albicosta} & western bean cutworm & insect & 97.0 & No \\
87
  Midwest USA & \textit{Hypena scabra} & green cloverworm & insect & 96.0 & No \\
@@ -115,6 +116,7 @@ Midwest USA & \textit{Solanum ptycanthum} & Eastern black nightshade & weed &
115
  Midwest USA & \textit{Conyza canadensis} & Horseweed & weed & β€” & No \\
116
  Midwest USA & \textit{Kochia scoparia} & Kochia & weed & β€” & No \\
117
  Midwest USA & \textit{Sinapis arvensis} & Wild mustard & weed & β€” & No \\
 
118
  Midwest USA & \textit{Ambrosia artemisiifolia} & common Ragweed & weed & 95.0 & No \\
119
  Midwest USA & \textit{Stellaria media} & commonChickweed & weed & 95.0 & No \\
120
  Midwest USA & \textit{Equisetum arvense} & Field Horsetail & weed & 95.0 & No \\
 
82
  Midwest USA & \textit{Soldier beetles} & Soldier beetles & insect & β€” & No \\
83
  Midwest USA & \textit{Podisus maculiventris} & Spined soldier bug & insect & β€” & No \\
84
  Midwest USA & \textit{Tachinid flies} & Tachinid flies & insect & β€” & No \\
85
+ Midwest USA & \textit{Lycorma delicatula} & Spotted Lanternfly & insect & β€” & No \\
86
  Midwest USA & \textit{Empoasca fabae} & potato leafhopper & insect & 97.0 & No \\
87
  Midwest USA & \textit{Striacosta albicosta} & western bean cutworm & insect & 97.0 & No \\
88
  Midwest USA & \textit{Hypena scabra} & green cloverworm & insect & 96.0 & No \\
 
116
  Midwest USA & \textit{Conyza canadensis} & Horseweed & weed & β€” & No \\
117
  Midwest USA & \textit{Kochia scoparia} & Kochia & weed & β€” & No \\
118
  Midwest USA & \textit{Sinapis arvensis} & Wild mustard & weed & β€” & No \\
119
+ Midwest USA & \textit{Brassica tournefortii} & Sahara Mustard & weed & β€” & No \\
120
  Midwest USA & \textit{Ambrosia artemisiifolia} & common Ragweed & weed & 95.0 & No \\
121
  Midwest USA & \textit{Stellaria media} & commonChickweed & weed & 95.0 & No \\
122
  Midwest USA & \textit{Equisetum arvense} & Field Horsetail & weed & 95.0 & No \\
usa_ipm_checkpoint.json CHANGED
@@ -115,7 +115,237 @@
115
  "Cleome spinosa",
116
  "Medicago falcata",
117
  "Spodoptera frugiperda",
118
- "Striga asiatica"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
119
  ],
120
- "last_run": "2026-02-04T21:47:41.950969"
121
  }
 
115
  "Cleome spinosa",
116
  "Medicago falcata",
117
  "Spodoptera frugiperda",
118
+ "Striga asiatica",
119
+ "Papaipema nebris",
120
+ "Helicoverpa zea",
121
+ "Lolium multiflorum",
122
+ "Lady beetles",
123
+ "Lygus lineolaris",
124
+ "Agrotis ipsilon",
125
+ "Vanessa cardui",
126
+ "Hypena scabra",
127
+ "Sitona hispidulus",
128
+ "Echinochloa crus-galli",
129
+ "Popillia japonica",
130
+ "Kochia scoparia",
131
+ "Tetranychus urticae",
132
+ "Lamium amplexicaule",
133
+ "Chaetocnema pulicaria",
134
+ "Chrysodeixis includens",
135
+ "Lacewings",
136
+ "Mythimna unipuncta",
137
+ "Cyperus esculentus",
138
+ "Setaria viridis",
139
+ "Setaria faberi",
140
+ "Hypera postica",
141
+ "Hydraecia immanis",
142
+ "Stellaria media",
143
+ "Colias eurytheme",
144
+ "Parasitoid wasps",
145
+ "Ground beetles",
146
+ "Microtechnites bractatus",
147
+ "Cerotoma trifurcata",
148
+ "Podisus maculiventris",
149
+ "Digitaria sanguinalis",
150
+ "Conyza canadensis",
151
+ "Loxostege cereralis",
152
+ "Equisetum arvense",
153
+ "Dectes texanus",
154
+ "Spodoptera ornithogalli",
155
+ "Stenolophus lecontei",
156
+ "Hypera zoilus",
157
+ "Trianthema triquetrum",
158
+ "Flower fly larvae",
159
+ "Aphis glycines",
160
+ "Ambrosia artemisiifolia",
161
+ "Agromyza frontella",
162
+ "Choristoneura parallela",
163
+ "Damsel bugs",
164
+ "Persicaria pensylvanica",
165
+ "Philaenus spumarius",
166
+ "Eleusine indica",
167
+ "Drosophila suzukii",
168
+ "Lactuca serriola",
169
+ "Abutilon theophrasti",
170
+ "Spodoptera frugiperda",
171
+ "Xanthium strumarium",
172
+ "Amaranthus palmeri",
173
+ "Brassica tournefortii",
174
+ "Delia platura",
175
+ "Setaria pumila",
176
+ "Calomycterus setarius",
177
+ "Ostrinia nubilalis",
178
+ "Amaranthus retroflexus",
179
+ "Helicoverpa armigera",
180
+ "Medicago minima",
181
+ "Sida spinosa",
182
+ "Chenopodium album",
183
+ "Halyomorpha halys",
184
+ "Soldier beetles",
185
+ "Solanum ptycanthum",
186
+ "Mirabilis nyctaginea",
187
+ "Equisetum hyemale",
188
+ "Amaranthus tuberculatus",
189
+ "Cleome rutidosperma",
190
+ "Aphis craccivora",
191
+ "Helianthus annuus",
192
+ "Sorghum bicolor",
193
+ "Cyperus difformis",
194
+ "Lycorma delicatula",
195
+ "Striacosta albicosta",
196
+ "Empoasca fabae",
197
+ "Cleome spinosa",
198
+ "Datura stramonium",
199
+ "Cleome viscosa",
200
+ "Striga asiatica",
201
+ "Pirate bugs",
202
+ "Amaranthus viridis",
203
+ "Ambrosia trifida",
204
+ "Sinapis arvensis",
205
+ "Bromus tectorum",
206
+ "Cyperus mindorensis",
207
+ "Euborellia annulipes",
208
+ "Odontota horni",
209
+ "Cyperus prolifer",
210
+ "Cleome aculeata",
211
+ "Medicago polymorpha",
212
+ "Cleome monophylla",
213
+ "Cyperus brevifolius",
214
+ "Daucus carota",
215
+ "Capsella bursa-pastoris",
216
+ "Euphorbia dentata",
217
+ "Icerya purchasi",
218
+ "Cyperus rotundus",
219
+ "Aphis spiraecola",
220
+ "Cyperus entrerianus",
221
+ "Salsola tragus",
222
+ "Trianthema portulacastrum",
223
+ "Spodoptera litura",
224
+ "Cyperus iria l.",
225
+ "Cleome houtteana",
226
+ "Cleome gynandra",
227
+ "Medicago lupulina",
228
+ "Cyperus haspan",
229
+ "Atherigona reversura",
230
+ "Tachinid flies",
231
+ "Nezara viridula",
232
+ "Medicago falcata",
233
+ "Resseliella maxima",
234
+ "Cyperus mindorensis",
235
+ "Spodoptera litura",
236
+ "Trianthema triquetrum",
237
+ "Amaranthus viridis",
238
+ "Vanessa cardui",
239
+ "Colias eurytheme",
240
+ "Daucus carota",
241
+ "Brassica tournefortii",
242
+ "Euphorbia dentata",
243
+ "Loxostege cereralis",
244
+ "Podisus maculiventris",
245
+ "Ground beetles",
246
+ "Abutilon theophrasti",
247
+ "Cyperus haspan",
248
+ "Odontota horni",
249
+ "Stellaria media",
250
+ "Sida spinosa",
251
+ "Cyperus brevifolius",
252
+ "Setaria pumila",
253
+ "Xanthium strumarium",
254
+ "Capsella bursa-pastoris",
255
+ "Persicaria pensylvanica",
256
+ "Digitaria sanguinalis",
257
+ "Chenopodium album",
258
+ "Lamium amplexicaule",
259
+ "Icerya purchasi",
260
+ "Calomycterus setarius",
261
+ "Aphis spiraecola",
262
+ "Ambrosia artemisiifolia",
263
+ "Halyomorpha halys",
264
+ "Solanum ptycanthum",
265
+ "Cyperus difformis",
266
+ "Flower fly larvae",
267
+ "Datura stramonium",
268
+ "Echinochloa crus-galli",
269
+ "Kochia scoparia",
270
+ "Lady beetles",
271
+ "Delia platura",
272
+ "Choristoneura parallela",
273
+ "Aphis craccivora",
274
+ "Hypera postica",
275
+ "Lygus lineolaris",
276
+ "Dectes texanus",
277
+ "Sinapis arvensis",
278
+ "Spodoptera frugiperda",
279
+ "Setaria viridis",
280
+ "Pirate bugs",
281
+ "Lacewings",
282
+ "Ostrinia nubilalis",
283
+ "Salsola tragus",
284
+ "Equisetum hyemale",
285
+ "Agromyza frontella",
286
+ "Damsel bugs",
287
+ "Agrotis ipsilon",
288
+ "Resseliella maxima",
289
+ "Cyperus prolifer",
290
+ "Helianthus annuus",
291
+ "Cleome houtteana",
292
+ "Lolium multiflorum",
293
+ "Microtechnites bractatus",
294
+ "Mythimna unipuncta",
295
+ "Sitona hispidulus",
296
+ "Parasitoid wasps",
297
+ "Eleusine indica",
298
+ "Soldier beetles",
299
+ "Striacosta albicosta",
300
+ "Tachinid flies",
301
+ "Ambrosia trifida",
302
+ "Atherigona reversura",
303
+ "Cleome monophylla",
304
+ "Cyperus entrerianus",
305
+ "Bromus tectorum",
306
+ "Cyperus iria l.",
307
+ "Conyza canadensis",
308
+ "Nezara viridula",
309
+ "Cyperus esculentus",
310
+ "Equisetum arvense",
311
+ "Mirabilis nyctaginea",
312
+ "Lactuca serriola",
313
+ "Hypera zoilus",
314
+ "Philaenus spumarius",
315
+ "Helicoverpa armigera",
316
+ "Aphis glycines",
317
+ "Amaranthus palmeri",
318
+ "Lycorma delicatula",
319
+ "Setaria faberi",
320
+ "Amaranthus tuberculatus",
321
+ "Cerotoma trifurcata",
322
+ "Spodoptera ornithogalli",
323
+ "Empoasca fabae",
324
+ "Euborellia annulipes",
325
+ "Amaranthus retroflexus",
326
+ "Chaetocnema pulicaria",
327
+ "Drosophila suzukii",
328
+ "Helicoverpa zea",
329
+ "Popillia japonica",
330
+ "Hypena scabra",
331
+ "Stenolophus lecontei",
332
+ "Cyperus rotundus",
333
+ "Tetranychus urticae",
334
+ "Hydraecia immanis",
335
+ "Papaipema nebris",
336
+ "Cleome spinosa",
337
+ "Sorghum bicolor",
338
+ "Chrysodeixis includens",
339
+ "Cleome rutidosperma",
340
+ "Medicago lupulina",
341
+ "Trianthema portulacastrum",
342
+ "Cleome gynandra",
343
+ "Cleome aculeata",
344
+ "Medicago minima",
345
+ "Medicago falcata",
346
+ "Striga asiatica",
347
+ "Cleome viscosa",
348
+ "Medicago polymorpha"
349
  ],
350
+ "last_run": "2026-03-03T10:44:42.836240"
351
  }
usa_ipm_requests.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
usa_ipm_results.jsonl CHANGED
The diff for this file is too large to render. See raw diff
 
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/27ac9297-abc2-406e-8919-7670a60055f1/link_lists.bin DELETED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/data_level0.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb76ae0f6ca830a9a048b0cf53962a78c88c5c7fcda63fc846077d3456eb3890
3
  size 62840000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0df39a7224e80c717058f5c4e95a21faf300fe9102de146ce32a9f3e2e7d5a76
3
  size 62840000
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/header.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ec666c9828420c69fc6b597461d8c18487becec1527c7d1cff9b898cbb393c2d
3
  size 100
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f23645792d8664b1d497ca854c4459f7c29d757472fdb6591d327f0303495f93
3
  size 100
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/ba4b6a9d-5012-4456-8635-7ea7d6cb7a71/index_metadata.pickle ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd94800fffc4e1b3aca003062b6cf6e8eb3c4683b77ed4eae5f2d378cd41d183
3
+ size 129852
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{27ac9297-abc2-406e-8919-7670a60055f1 β†’ ba4b6a9d-5012-4456-8635-7ea7d6cb7a71}/length.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0f2fe12bccc27d05f521a0a00eb43912172a7113283d0a99b45af0b2da569582
3
  size 40000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7e2dcff542de95352682dc186432e98f0188084896773f1973276b0577d5305
3
  size 40000
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/ba4b6a9d-5012-4456-8635-7ea7d6cb7a71/link_lists.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38d6d79326a1813390ce0d391cfa648cb550eba165548e079e7dd1f6ea240ef6
3
+ size 12508
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:08dce2645f4aa422368f3fc4e975e58a518875b335458e83f60a0205c46f27b8
3
- size 10715136
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8ba30aefc5e3dbf2fee25c2fe8f49eda6fe5396d31f8d8dbcc34660ed926392
3
+ size 17203200