arbabarshad Claude Opus 4.6 commited on
Commit
0198d9a
Β·
1 Parent(s): 9e21e50

Update docs, gitignore, examples, and vector DB

Browse files

- Add LaTeX artifacts and editor temp files to .gitignore
- Improve README with environment setup, examples/LaTeX commands, clearer DB build flow
- Simplify example runner to Africa + USA regions only
- Update species Excel and rebuild vector database

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

.gitignore CHANGED
@@ -38,4 +38,18 @@ vector-databases-deployed-backup/
38
  __pycache__/
39
  .DS_Store
40
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
 
 
 
38
  __pycache__/
39
  .DS_Store
40
 
41
+ # LaTeX build artifacts
42
+ *.aux
43
+ *.log
44
+ *.out
45
+ *.synctex.gz
46
+ *.fls
47
+ *.fdb_latexmk
48
+ *.toc
49
+ *.lof
50
+ *.lot
51
+ *.bbl
52
+ *.blg
53
 
54
+ # Editor temp files
55
+ ~$*
README.md CHANGED
@@ -15,7 +15,8 @@ license: apache-2.0
15
  ### Environment
16
  ```bash
17
  # Conda environment: agllm-june-15
18
- source ~/miniconda3/etc/profile.d/conda.sh && conda activate agllm-june-15
 
19
 
20
  # Required env vars (in .env file)
21
  OPENAI_API_KEY=sk-proj-...
@@ -29,6 +30,8 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
29
  | Build DB | `python app_database_prep.py` |
30
  | Run Eval | `python retrieval_evaluation.py` |
31
  | Run App | `python app.py` |
 
 
32
  | Deploy Dev | `git push space3 fresh-start:main` |
33
  | Deploy Prod | `git push space2 fresh-start:main` |
34
 
@@ -59,7 +62,14 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
59
  β”‚ β”œβ”€β”€ species_analysis.py # Generates paper Figure 3
60
  β”‚ └── species_table.tex # LaTeX species table
61
  β”‚
62
- └── writing/ # Paper drafts
 
 
 
 
 
 
 
63
  ```
64
 
65
  ### Database Build Flow (4 Geographic Tiers)
@@ -72,15 +82,16 @@ OPENROUTER_API_KEY=... # optional, for Llama/Gemini
72
  | India | 11 | Expert-curated Excel |
73
 
74
  **Midwest USA Data (80 species):**
75
- 1. PDFs loaded from `agllm-data/.../raw-pdfs/` (content source)
76
- 2. `matched_species_results_v2.csv` maps PDF filename β†’ species name (metadata)
 
77
 
78
  **USA Data (110 species - LLM generated):**
79
- 3. Run `generate_usa_ipm_info.py` to query GPT-4o for all species
80
- 4. Creates "USA" sheet in Excel with IPM info for all US-present species
81
 
82
  **Africa/India Data (35 + 11 species):**
83
- 5. Excel `species-organized/PestID Species - Organized.xlsx` provides both content (IPM Info) and metadata
84
 
85
  **All Data:**
86
  6. Documents chunked (512 tokens, 10 overlap)
 
15
  ### Environment
16
  ```bash
17
  # Conda environment: agllm-june-15
18
+ # Python path: /Users/muhammadarbabarshad/miniconda3/envs/agllm-june-15/bin/python
19
+ source ~/miniconda3/etc/profile.d/conda.sh && conda deactivate && conda activate agllm-june-15
20
 
21
  # Required env vars (in .env file)
22
  OPENAI_API_KEY=sk-proj-...
 
30
  | Build DB | `python app_database_prep.py` |
31
  | Run Eval | `python retrieval_evaluation.py` |
32
  | Run App | `python app.py` |
33
+ | Run Examples | `python app.py --examples` (saves JSON to `examples-work-out/`) |
34
+ | Generate LaTeX | `cd examples-work-out && python generate_latex.py <results.json>` |
35
  | Deploy Dev | `git push space3 fresh-start:main` |
36
  | Deploy Prod | `git push space2 fresh-start:main` |
37
 
 
62
  β”‚ β”œβ”€β”€ species_analysis.py # Generates paper Figure 3
63
  β”‚ └── species_table.tex # LaTeX species table
64
  β”‚
65
+ β”œβ”€β”€ examples-work-out/
66
+ β”‚ β”œβ”€β”€ generate_latex.py # JSON β†’ LaTeX converter for paper examples
67
+ β”‚ β”œβ”€β”€ example_results_*.json # Generated example outputs
68
+ β”‚ β”œβ”€β”€ example_results_20260301_191932.json # Striga asiatica (Africa + USA) β€” used in paper supplementary
69
+ β”‚ └── species-verification.md # Example vignettes motivation (flow is approximate, not exact)
70
+ β”‚
71
+ └── writing/
72
+ └── 65d4fadc59fceb1a54d1aae6/main.tex # Paper draft
73
  ```
74
 
75
  ### Database Build Flow (4 Geographic Tiers)
 
82
  | India | 11 | Expert-curated Excel |
83
 
84
  **Midwest USA Data (80 species):**
85
+ 1. PDFs loaded from `agllm-data/agllm-data-isu-field-{insects,weeds}-all-species/` (content source)
86
+ 2. `matched_species_results_v2.csv` in each folder maps PDF filename β†’ species name (metadata); region hardcoded as `"Midwest USA"` in `app_database_prep.py`
87
+ 3. The `Midwest USA` sheet in the Excel is a reference copy only β€” **not** read by `app_database_prep.py`
88
 
89
  **USA Data (110 species - LLM generated):**
90
+ 4. Run `generate_usa_ipm_info.py` to query GPT-4o for all species
91
+ 5. Creates `USA` sheet in `species-organized/PestID Species - Organized.xlsx` β€” both content (IPM Info column) and metadata are read from this sheet
92
 
93
  **Africa/India Data (35 + 11 species):**
94
+ 6. `Africa` and `India` sheets in the same Excel file provide both content (IPM Info) and metadata
95
 
96
  **All Data:**
97
  6. Documents chunked (512 tokens, 10 overlap)
app.py CHANGED
@@ -672,9 +672,8 @@ def run_examples():
672
  from datetime import datetime
673
 
674
  examples = [
675
- {"species": "Striga asiatica", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
676
- {"species": "Striga asiatica", "region": "Africa", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
677
- {"species": "Striga asiatica", "region": "Midwest USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
678
  ]
679
 
680
  results = []
 
672
  from datetime import datetime
673
 
674
  examples = [
675
+ {"species": "Striga asiatica", "region": "Africa", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
676
+ {"species": "Striga asiatica", "region": "USA", "model": "Llama-3 70B", "mode": "Researcher", "question": "What is the most effective way to manage Striga in my maize field?"},
 
677
  ]
678
 
679
  results = []
examples-work-out/example_results_20260301_191932.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "species": "Striga asiatica",
4
+ "region": "Africa",
5
+ "model": "Llama-3 70B",
6
+ "mode": "Researcher",
7
+ "question": "What is the most effective way to manage Striga in my maize field?",
8
+ "availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
9
+ "answer": "Based on the context from the selected region, Africa, the most effective way to manage Striga in your maize field is to use a combination of integrated Striga management (ISM) technologies. According to Document 1 from Africa, this can include:\n\n1. Using Striga-resistant maize varieties as the foundation.\n2. Implementing maize-legume rotation to deplete the Striga seedbank.\n3. Adding legume intercropping (push-pull) to trigger suicidal germination.\n4. Applying nitrogen fertilizer to suppress strigolactone exudation.\n5. Deploying Fusarium oxysporum biocontrol where feasible.\n\nIt is emphasized that no single method suffices, and combining these complementary components is necessary for maximum effect.\n\nAdditionally, from the same region, Document 3 from Africa provides information on the identification of Striga asiatica, which can aid in early detection and monitoring.\n\nFrom other regions, Document 1 from USA highlights the importance of monitoring and early detection, as well as the use of integrated management strategies combining cultural, chemical, and biological controls for effective long-term management of Striga asiatica.\n\nIt is worth noting that while there are ongoing studies into biological control options, such as the use of certain fungi and bacteria that can attack Striga seeds or seedlings (Document 2 from USA), these are not yet widely implemented.\n\nIn terms of cultural control/management practices, Document 3 from USA suggests crop rotation with non-host crops, deep plowing, use of resistant crop varieties, and maintaining high soil fertility as additional strategies to consider.\n\n**General knowledge:** It is essential to remember that Striga management requires a holistic approach, taking into account the specific agro-ecological conditions and farming practices of your region. Therefore, it is crucial to adapt and refine these strategies based on local conditions and research findings.\n\nIn summary, the most effective way to manage Striga in your maize field is to adopt an integrated approach, combining Striga-resistant maize"
10
+ },
11
+ {
12
+ "species": "Striga asiatica",
13
+ "region": "USA",
14
+ "model": "Llama-3 70B",
15
+ "mode": "Researcher",
16
+ "question": "What is the most effective way to manage Striga in my maize field?",
17
+ "availability": "**Common Name:** Witch weed\n\nInformation for **Striga asiatica** is available in region(s): **Africa, USA**.",
18
+ "answer": "Based on the provided context, here's a comprehensive answer to your question:\n\n**From the USA region:**\n\nFor effective management of Striga asiatica in your maize field, an integrated approach combining cultural, chemical, and biological controls is recommended (Document 1 from USA). This multi-faceted approach can provide long-term management of Striga infestations.\n\nCultural control practices that can be employed include crop rotation with non-host crops to reduce Striga seed banks in the soil, deep plowing to bury Striga seeds to depths where they cannot germinate, and using resistant crop varieties to limit the impact of Striga infestations (Document 3 from USA). Maintaining high soil fertility can also help crops outgrow the parasitic effects of Striga.\n\nChemical control options include the use of pre-emergent and post-emergent herbicides, such as imazapyr and glyphosate, which have been effective when used appropriately (Document 2 from USA).\n\n**From other regions:**\n\nIn Africa, research suggests that using Striga-resistant maize varieties as the foundation, combined with maize-legume rotation to deplete the seedbank, can be an effective approach (Document 1 from Africa). Additionally, legume intercropping (push-pull) can trigger suicidal germination, and applying nitrogen fertilizer can suppress strigolactone exudation. In some cases, deploying Fusarium oxysporum biocontrol may also be feasible.\n\n**General knowledge:**\n\nIt is essential to monitor your field regularly for early signs of Striga infestation, as early detection is crucial for effective management. Regular field inspections can help identify early signs of infestation, allowing for prompt action to be taken.\n\nIn summary, a comprehensive approach combining cultural, chemical, and biological controls, along with regular monitoring and early detection, is likely to be the most effective way to manage Striga in your maize field."
19
+ }
20
+ ]
species-organized/PestID Species - Organized.xlsx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d9b0e9af388f34b0452a7df83a969df19da056c5e6ff6f03eef2ef8fe6556607
3
- size 75571
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2c04042374df7b1c0122979a109e36553d9d7f3c560aa8e5a6cd258eca23066
3
+ size 63579
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β†’ 27ac9297-abc2-406e-8919-7670a60055f1}/data_level0.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β†’ 27ac9297-abc2-406e-8919-7670a60055f1}/header.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β†’ 27ac9297-abc2-406e-8919-7670a60055f1}/length.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9ddf37297936a4b972cfb2c49f1215f38c32ec782e45fbfce6590540e91dbd02
3
  size 40000
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbb9f2f038d6e0f4c5b95d4f1dcbfc4ad48bd314cef9cc46e4485055fcc1b756
3
  size 40000
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/{6934d416-b9d8-4bd9-b666-9621dc3ed67f β†’ 27ac9297-abc2-406e-8919-7670a60055f1}/link_lists.bin RENAMED
File without changes
vector-databases-deployed/db5-agllm-data-isu-field-insects-all-species/chroma.sqlite3 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0012d7100fc6b0082f7f8744c90432a2b9d55fa9250274c7e57bd3c3160d5ddc
3
- size 10678272
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92e1f0c264be839e55759bb55ffcfacec045c997fe19c52e9cca26c2f3b0be1c
3
+ size 10715136