Spaces:
Sleeping
Sleeping
Commit Β·
0198d9a
1
Parent(s): 9e21e50
Update docs, gitignore, examples, and vector DB
Browse files- Add LaTeX artifacts and editor temp files to .gitignore
- Improve README with environment setup, examples/LaTeX commands, clearer DB build flow
- Simplify example runner to Africa + USA regions only
- Update species Excel and rebuild vector database
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- .gitignore +14 -0
- README.md +18 -7
- app.py +2 -3
- examples-work-out/example_results_20260301_191932.json +20 -0
- species-organized/PestID Species - Organized.xlsx +2 -2
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/data_level0.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/header.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/length.bin +1 -1
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/link_lists.bin +0 -0
- vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 +2 -2
.gitignore
CHANGED
|
@@ -38,4 +38,18 @@ vector-databases-deployed-backup/
|
|
| 38 |
__pycache__/
|
| 39 |
.DS_Store
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
|
|
|
|
|
|
|
|
| 38 |
__pycache__/
|
| 39 |
.DS_Store
|
| 40 |
|
| 41 |
+
# LaTeX build artifacts
|
| 42 |
+
*.aux
|
| 43 |
+
*.log
|
| 44 |
+
*.out
|
| 45 |
+
*.synctex.gz
|
| 46 |
+
*.fls
|
| 47 |
+
*.fdb_latexmk
|
| 48 |
+
*.toc
|
| 49 |
+
*.lof
|
| 50 |
+
*.lot
|
| 51 |
+
*.bbl
|
| 52 |
+
*.blg
|
| 53 |
|
| 54 |
+
# Editor temp files
|
| 55 |
+
~$*
|
README.md
CHANGED
|
@@ -15,7 +15,8 @@ license: apache-2.0
|
|
| 15 |
### Environment
|
| 16 |
```bash
|
| 17 |
# Conda environment: agllm-june-15
|
| 18 |
-
|
|
|
|
| 19 |
|
| 20 |
# Required env vars (in .env file)
|
| 21 |
OPENAI_API_KEY=sk-proj-...
|
|
@@ -29,6 +30,8 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
|
|
| 29 |
| Build DB | `python app_database_prep.py` |
|
| 30 |
| Run Eval | `python retrieval_evaluation.py` |
|
| 31 |
| Run App | `python app.py` |
|
|
|
|
|
|
|
| 32 |
| Deploy Dev | `git push space3 fresh-start:main` |
|
| 33 |
| Deploy Prod | `git push space2 fresh-start:main` |
|
| 34 |
|
|
@@ -59,7 +62,14 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
|
|
| 59 |
β βββ species_analysis.py # Generates paper Figure 3
|
| 60 |
β βββ species_table.tex # LaTeX species table
|
| 61 |
β
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
```
|
| 64 |
|
| 65 |
### Database Build Flow (4 Geographic Tiers)
|
|
@@ -72,15 +82,16 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
|
|
| 72 |
| India | 11 | Expert-curated Excel |
|
| 73 |
|
| 74 |
**Midwest USA Data (80 species):**
|
| 75 |
-
1. PDFs loaded from `agllm-data/
|
| 76 |
-
2. `matched_species_results_v2.csv` maps PDF filename β species name (metadata)
|
|
|
|
| 77 |
|
| 78 |
**USA Data (110 species - LLM generated):**
|
| 79 |
-
|
| 80 |
-
|
| 81 |
|
| 82 |
**Africa/India Data (35 + 11 species):**
|
| 83 |
-
|
| 84 |
|
| 85 |
**All Data:**
|
| 86 |
6. Documents chunked (512 tokens, 10 overlap)
|
|
|
|
| 15 |
### Environment
|
| 16 |
```bash
|
| 17 |
# Conda environment: agllm-june-15
|
| 18 |
+
# Python path: /Users/muhammadarbabarshad/miniconda3/envs/agllm-june-15/bin/python
|
| 19 |
+
source ~/miniconda3/etc/profile.d/conda.sh && conda deactivate && conda activate agllm-june-15
|
| 20 |
|
| 21 |
# Required env vars (in .env file)
|
| 22 |
OPENAI_API_KEY=sk-proj-...
|
|
|
|
| 30 |
| Build DB | `python app_database_prep.py` |
|
| 31 |
| Run Eval | `python retrieval_evaluation.py` |
|
| 32 |
| Run App | `python app.py` |
|
| 33 |
+
| Run Examples | `python app.py --examples` (saves JSON to `examples-work-out/`) |
|
| 34 |
+
| Generate LaTeX | `cd examples-work-out && python generate_latex.py <results.json>` |
|
| 35 |
| Deploy Dev | `git push space3 fresh-start:main` |
|
| 36 |
| Deploy Prod | `git push space2 fresh-start:main` |
|
| 37 |
|
|
|
|
| 62 |
β βββ species_analysis.py # Generates paper Figure 3
|
| 63 |
β βββ species_table.tex # LaTeX species table
|
| 64 |
β
|
| 65 |
+
βββ examples-work-out/
|
| 66 |
+
β βββ generate_latex.py # JSON β LaTeX converter for paper examples
|
| 67 |
+
β βββ example_results_*.json # Generated example outputs
|
| 68 |
+
β βββ example_results_20260301_191932.json # Striga asiatica (Africa + USA) β used in paper supplementary
|
| 69 |
+
β βββ species-verification.md # Example vignettes motivation (flow is approximate, not exact)
|
| 70 |
+
β
|
| 71 |
+
βββ writing/
|
| 72 |
+
βββ 65d4fadc59fceb1a54d1aae6/main.tex # Paper draft
|
| 73 |
```
|
| 74 |
|
| 75 |
### Database Build Flow (4 Geographic Tiers)
|
|
|
|
| 82 |
| India | 11 | Expert-curated Excel |
|
| 83 |
|
| 84 |
**Midwest USA Data (80 species):**
|
| 85 |
+
1. PDFs loaded from `agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/` (content source)
|
| 86 |
+
2. `matched_species_results_v2.csv` in each folder maps PDF filename β species name (metadata); region hardcoded as `"Midwest USA"` in `app_database_prep.py`
|
| 87 |
+
3. The `Midwest USA` sheet in the Excel is a reference copy only β **not** read by `app_database_prep.py`
|
| 88 |
|
| 89 |
**USA Data (110 species - LLM generated):**
|
| 90 |
+
4. Run `generate_usa_ipm_info.py` to query GPT-4o for all species
|
| 91 |
+
5. Creates `USA` sheet in `species-organized/PestID Species - Organized.xlsx` β both content (IPM Info column) and metadata are read from this sheet
|
| 92 |
|
| 93 |
**Africa/India Data (35 + 11 species):**
|
| 94 |
+
6. `Africa` and `India` sheets in the same Excel file provide both content (IPM Info) and metadata
|
| 95 |
|
| 96 |
**All Data:**
|
| 97 |
6. Documents chunked (512 tokens, 10 overlap)
|
app.py
CHANGED
|
@@ -672,9 +672,8 @@ def run_examples():
|
|
| 672 |
from datetime import datetime
|
| 673 |
|
| 674 |
examples = [
|
| 675 |
-
{"species": "Striga asiatica", "region": "
|
| 676 |
-
{"species": "Striga asiatica", "region": "
|
| 677 |
-
{"species": "Striga asiatica", "region": "Midwest USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
|
| 678 |
]
|
| 679 |
|
| 680 |
results = []
|
|
|
|
| 672 |
from datetime import datetime
|
| 673 |
|
| 674 |
examples = [
|
| 675 |
+
{"species": "Striga asiatica", "region": "Africa", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
|
| 676 |
+
{"species": "Striga asiatica", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
|
|
|
|
| 677 |
]
|
| 678 |
|
| 679 |
results = []
|
examples-work-out/example_results_20260301_191932.json
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"species": "Striga asiatica",
|
| 4 |
+
"region": "Africa",
|
| 5 |
+
"model": "Llama-3 70B",
|
| 6 |
+
"mode": "Researcher",
|
| 7 |
+
"question": "What is the most effective way to manage Striga in my maize field?",
|
| 8 |
+
"availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
|
| 9 |
+
"answer": "Based on the context from the selected region, Africa, the most effective way to manage Striga in your maize field is to use a combination of integrated Striga management (ISM) technologies. According to Document 1 from Africa, this can include:\n\n1. Using Striga-resistant maize varieties as the foundation.\n2. Implementing maize-legume rotation to deplete the Striga seedbank.\n3. Adding legume intercropping (push-pull) to trigger suicidal germination.\n4. Applying nitrogen fertilizer to suppress strigolactone exudation.\n5. Deploying Fusarium oxysporum biocontrol where feasible.\n\nIt is emphasized that no single method suffices, and combining these complementary components is necessary for maximum effect.\n\nAdditionally, from the same region, Document 3 from Africa provides information on the identification of Striga asiatica, which can aid in early detection and monitoring.\n\nFrom other regions, Document 1 from USA highlights the importance of monitoring and early detection, as well as the use of integrated management strategies combining cultural, chemical, and biological controls for effective long-term management of Striga asiatica.\n\nIt is worth noting that while there are ongoing studies into biological control options, such as the use of certain fungi and bacteria that can attack Striga seeds or seedlings (Document 2 from USA), these are not yet widely implemented.\n\nIn terms of cultural control/management practices, Document 3 from USA suggests crop rotation with non-host crops, deep plowing, use of resistant crop varieties, and maintaining high soil fertility as additional strategies to consider.\n\n**General knowledge:** It is essential to remember that Striga management requires a holistic approach, taking into account the specific agro-ecological conditions and farming practices of your region. Therefore, it is crucial to adapt and refine these strategies based on local conditions and research findings.\n\nIn summary, the most effective way to manage Striga in your maize field is to adopt an integrated approach, combining Striga-resistant maize"
|
| 10 |
+
},
|
| 11 |
+
{
|
| 12 |
+
"species": "Striga asiatica",
|
| 13 |
+
"region": "USA",
|
| 14 |
+
"model": "Llama-3 70B",
|
| 15 |
+
"mode": "Researcher",
|
| 16 |
+
"question": "What is the most effective way to manage Striga in my maize field?",
|
| 17 |
+
"availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
|
| 18 |
+
"answer": "Based on the provided context, here's a comprehensive answer to your question:\n\n**From the USA region:**\n\nFor effective management of Striga asiatica in your maize field, an integrated approach combining cultural, chemical, and biological controls is recommended (Document 1 from USA). This multi-faceted approach can provide long-term management of Striga infestations.\n\nCultural control practices that can be employed include crop rotation with non-host crops to reduce Striga seed banks in the soil, deep plowing to bury Striga seeds to depths where they cannot germinate, and using resistant crop varieties to limit the impact of Striga infestations (Document 3 from USA). Maintaining high soil fertility can also help crops outgrow the parasitic effects of Striga.\n\nChemical control options include the use of pre-emergent and post-emergent herbicides, such as imazapyr and glyphosate, which have been effective when used appropriately (Document 2 from USA).\n\n**From other regions:**\n\nIn Africa, research suggests that using Striga-resistant maize varieties as the foundation, combined with maize-legume rotation to deplete the seedbank, can be an effective approach (Document 1 from Africa). Additionally, legume intercropping (push-pull) can trigger suicidal germination, and applying nitrogen fertilizer can suppress strigolactone exudation. In some cases, deploying Fusarium oxysporum biocontrol may also be feasible.\n\n**General knowledge:**\n\nIt is essential to monitor your field regularly for early signs of Striga infestation, as early detection is crucial for effective management. Regular field inspections can help identify early signs of infestation, allowing for prompt action to be taken.\n\nIn summary, a comprehensive approach combining cultural, chemical, and biological controls, along with regular monitoring and early detection, is likely to be the most effective way to manage Striga in your maize field."
|
| 19 |
+
}
|
| 20 |
+
]
|
species-organized/PestID Species - Organized.xlsx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c2c04042374df7b1c0122979a109e36553d9d7f3c560aa8e5a6cd258eca23066
|
| 3 |
+
size 63579
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/data_level0.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/header.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/length.bin
RENAMED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 40000
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dbb9f2f038d6e0f4c5b95d4f1dcbfc4ad48bd314cef9cc46e4485055fcc1b756
|
| 3 |
size 40000
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β 27ac9297-abc2-406e-8919-7670a60055f1}/link_lists.bin
RENAMED
|
File without changes
|
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:92e1f0c264be839e55759bb55ffcfacec045c997fe19c52e9cca26c2f3b0be1c
|
| 3 |
+
size 10715136
|