Update agent.py
Browse files
agent.py
CHANGED
|
@@ -189,15 +189,15 @@ Golden thread: CSV β Sentences β Vectors β Clusters β Topics
|
|
| 189 |
Tool 1: load_scopus_csv(filepath)
|
| 190 |
Load CSV, show columns, estimate sentence count.
|
| 191 |
|
| 192 |
-
Tool 2: run_bertopic_discovery(run_key,
|
| 193 |
-
|
| 194 |
|
| 195 |
Tool 3: label_topics_with_llm(run_key)
|
| 196 |
5 nearest centroid sentences β Mistral only β initial topic labels.
|
| 197 |
|
| 198 |
Tool 4: verify_topic_labels_with_groq(run_key)
|
| 199 |
Run only when researcher types VERIFY at STOP GATE 1.
|
| 200 |
-
Return Mistral vs Groq comparison in chat for manual verification.
|
| 201 |
|
| 202 |
Tool 5: consolidate_into_themes(run_key, theme_map)
|
| 203 |
Merge researcher-approved topic groups β recompute centroids β new evidence.
|
|
@@ -234,22 +234,20 @@ Golden thread: CSV β Sentences β Vectors β Clusters β Topics
|
|
| 234 |
- Researcher is active interpreter, not passive receiver of themes
|
| 235 |
|
| 236 |
Grootendorst (2022), arXiv:2203.05794 β BERTopic:
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
- Cosine similarity = semantic relatedness
|
| 252 |
-
- Same meaning clusters together regardless of exact wording
|
| 253 |
|
| 254 |
PACIS/ICIS Research Categories:
|
| 255 |
IS Design Science, HCI, E-Commerce, Knowledge Management,
|
|
@@ -281,8 +279,8 @@ When researcher uploads CSV or says "analyze":
|
|
| 281 |
Loaded [N] papers (~[M] sentences estimated)
|
| 282 |
Columns: Title β
| Abstract β
| Author Keywords (optional) β
|
| 283 |
|
| 284 |
-
|
| 285 |
-
|
| 286 |
contribute to MULTIPLE topics.
|
| 287 |
|
| 288 |
I can run 3 configurations:
|
|
@@ -290,15 +288,15 @@ When researcher uploads CSV or says "analyze":
|
|
| 290 |
2οΈβ£ **Title only** β what papers CLAIM to be about (author's framing)
|
| 291 |
3οΈβ£ **Keywords only** β author-declared focus areas (author keywords)
|
| 292 |
|
| 293 |
-
|
| 294 |
|
| 295 |
**Ready to proceed to Phase 2?**
|
| 296 |
β’ `run` β execute BERTopic discovery
|
| 297 |
β’ `run abstract` β single config
|
| 298 |
β’ `run title` β single config
|
| 299 |
β’ `run keywords` β single config
|
| 300 |
-
|
| 301 |
-
|
| 302 |
|
| 303 |
3. WAIT for researcher confirmation before proceeding.
|
| 304 |
|
|
@@ -310,11 +308,11 @@ When researcher uploads CSV or says "analyze":
|
|
| 310 |
|
| 311 |
After researcher confirms:
|
| 312 |
|
| 313 |
-
1. Call run_bertopic_discovery(run_key,
|
| 314 |
β Splits papers into sentences (regex, min 30 chars)
|
| 315 |
β Filters publisher boilerplate (copyright, license text)
|
| 316 |
-
|
| 317 |
-
|
| 318 |
β Finds 5 nearest centroid sentences per topic
|
| 319 |
β Saves Plotly HTML visualizations
|
| 320 |
β Saves embeddings + summaries checkpoints
|
|
@@ -325,7 +323,7 @@ After researcher confirms:
|
|
| 325 |
β Writes review table with Mistral labels by default
|
| 326 |
OPTIONAL: if researcher types `VERIFY` at STOP GATE 1,
|
| 327 |
call verify_topic_labels_with_groq(run_key) and present side-by-side
|
| 328 |
-
Mistral vs Groq label comparison directly in chat.
|
| 329 |
NOTE: NO PACIS categories in Phase 2. PACIS comparison comes in Phase 5.5.
|
| 330 |
|
| 331 |
3. Present CODED data with EVIDENCE under each topic:
|
|
@@ -347,10 +345,10 @@ After researcher confirms:
|
|
| 347 |
π 4 Plotly visualizations saved (download below)
|
| 348 |
|
| 349 |
**Review these codes. Ready for Phase 3 (theme search)?**
|
| 350 |
-
β’ `VERIFY` β run Groq labels and compare with Mistral in chat output
|
| 351 |
β’ `approve` β codes look good, move to theme grouping
|
| 352 |
-
|
| 353 |
-
|
| 354 |
β’ `show topic 4 papers` β see all paper titles in topic 4
|
| 355 |
β’ `code 2 looks wrong` β I will show why it was labeled that way
|
| 356 |
|
|
@@ -388,9 +386,10 @@ After researcher confirms:
|
|
| 388 |
|
| 389 |
7. If researcher questions a code:
|
| 390 |
β Show the 5 sentences that generated the label
|
| 391 |
-
|
| 392 |
-
|
| 393 |
-
|
|
|
|
| 394 |
β Offer re-run with adjusted parameters
|
| 395 |
|
| 396 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
@@ -614,9 +613,9 @@ After all requested run configs have finalized themes:
|
|
| 614 |
- ONLY call verify_taxonomy_mapping_with_groq when user explicitly says VERIFY
|
| 615 |
and the workflow is at STOP GATE 4 (post-Phase 5.5 mapping).
|
| 616 |
- ALWAYS call compare_with_taxonomy before claiming PAJAIS mappings.
|
| 617 |
-
- Use
|
| 618 |
-
- If too many topics (>200), suggest increasing
|
| 619 |
-
- If too few topics (<20), suggest decreasing
|
| 620 |
- NEVER skip Phase 4 saturation check or Phase 5.5 taxonomy comparison.
|
| 621 |
- NEVER proceed to Phase 6 unless every run that was executed has completed Phase 5.5.
|
| 622 |
- NEVER invent topic labels β only present labels returned by Tool 3.
|
|
@@ -1032,14 +1031,15 @@ def _build_verify_chat_report(rows: list[dict]) -> str:
|
|
| 1032 |
|
| 1033 |
shown = rows[:VERIFY_CHAT_MAX_ROWS]
|
| 1034 |
header = [
|
| 1035 |
-
"| # | Mistral Label | Groq Label |",
|
| 1036 |
-
"|---|---|---|",
|
| 1037 |
]
|
| 1038 |
lines = list(map(
|
| 1039 |
lambda r: (
|
| 1040 |
f"| {int(r.get('cluster_id', 0))} "
|
| 1041 |
f"| {_sanitize_markdown_cell(r.get('mistral_label') or r.get('label', ''))} "
|
| 1042 |
-
f"| {_sanitize_markdown_cell(r.get('groq_label', ''))}
|
|
|
|
| 1043 |
),
|
| 1044 |
shown,
|
| 1045 |
))
|
|
@@ -1120,9 +1120,9 @@ def _handle_verify_command(state: dict) -> tuple[str, dict]:
|
|
| 1120 |
report = _build_verify_chat_report(labels_rows)
|
| 1121 |
|
| 1122 |
reply = (
|
| 1123 |
-
"VERIFY complete. Groq topic labeling has been added for Phase 2 topics.\n\n"
|
| 1124 |
f"Verified topics: {verified_count}/{labelled_count}\n"
|
| 1125 |
-
"Mistral vs Groq comparison is shown below in chat.\n\n"
|
| 1126 |
f"{report}\n\n"
|
| 1127 |
"Compare labels, edit Rename To/Approve, then click Submit Review to continue.\n\n"
|
| 1128 |
"[STOP GATE 1 β AWAITING REVIEW TABLE SUBMISSION]"
|
|
|
|
| 189 |
Tool 1: load_scopus_csv(filepath)
|
| 190 |
Load CSV, show columns, estimate sentence count.
|
| 191 |
|
| 192 |
+
Tool 2: run_bertopic_discovery(run_key, min_cluster_size, max_cluster_size)
|
| 193 |
+
Split β embed β UMAP + HDBSCAN β centroid nearest 5 β Plotly charts.
|
| 194 |
|
| 195 |
Tool 3: label_topics_with_llm(run_key)
|
| 196 |
5 nearest centroid sentences β Mistral only β initial topic labels.
|
| 197 |
|
| 198 |
Tool 4: verify_topic_labels_with_groq(run_key)
|
| 199 |
Run only when researcher types VERIFY at STOP GATE 1.
|
| 200 |
+
Return Mistral vs Groq-Ollama vs Groq-GPT comparison in chat for manual verification.
|
| 201 |
|
| 202 |
Tool 5: consolidate_into_themes(run_key, theme_map)
|
| 203 |
Merge researcher-approved topic groups β recompute centroids β new evidence.
|
|
|
|
| 234 |
- Researcher is active interpreter, not passive receiver of themes
|
| 235 |
|
| 236 |
Grootendorst (2022), arXiv:2203.05794 β BERTopic:
|
| 237 |
+
- Modular: any embedding, any clustering, any dim reduction
|
| 238 |
+
- UMAP + HDBSCAN is a common discovery stack for density-based topics
|
| 239 |
+
- c-TF-IDF extracts distinguishing words per cluster
|
| 240 |
+
|
| 241 |
+
McInnes et al. (2017) β HDBSCAN:
|
| 242 |
+
- Density-based clustering with variable-density support
|
| 243 |
+
- Allows noise points (unassigned sentences)
|
| 244 |
+
- min_cluster_size controls granularity (lower = more topics)
|
| 245 |
+
- max_cluster_size caps oversized clusters
|
| 246 |
+
|
| 247 |
+
Cohan et al. (2020) β SPECTER2:
|
| 248 |
+
- SPECTER2 produces semantically aligned embeddings for scientific text
|
| 249 |
+
- Cosine similarity = semantic relatedness
|
| 250 |
+
- Same meaning clusters together regardless of exact wording
|
|
|
|
|
|
|
| 251 |
|
| 252 |
PACIS/ICIS Research Categories:
|
| 253 |
IS Design Science, HCI, E-Commerce, Knowledge Management,
|
|
|
|
| 279 |
Loaded [N] papers (~[M] sentences estimated)
|
| 280 |
Columns: Title β
| Abstract β
| Author Keywords (optional) β
|
| 281 |
|
| 282 |
+
Sentence-level approach: each abstract splits into ~10
|
| 283 |
+
sentences, each becomes a SPECTER2 vector. One paper can
|
| 284 |
contribute to MULTIPLE topics.
|
| 285 |
|
| 286 |
I can run 3 configurations:
|
|
|
|
| 288 |
2οΈβ£ **Title only** β what papers CLAIM to be about (author's framing)
|
| 289 |
3οΈβ£ **Keywords only** β author-declared focus areas (author keywords)
|
| 290 |
|
| 291 |
+
βοΈ Defaults: UMAP + HDBSCAN (min_cluster_size=20, max_cluster_size=120), 5 nearest
|
| 292 |
|
| 293 |
**Ready to proceed to Phase 2?**
|
| 294 |
β’ `run` β execute BERTopic discovery
|
| 295 |
β’ `run abstract` β single config
|
| 296 |
β’ `run title` β single config
|
| 297 |
β’ `run keywords` β single config
|
| 298 |
+
β’ `change min_cluster_size to 4` β more topics (smaller groups)
|
| 299 |
+
β’ `change max_cluster_size to 100` β cap oversized clusters"
|
| 300 |
|
| 301 |
3. WAIT for researcher confirmation before proceeding.
|
| 302 |
|
|
|
|
| 308 |
|
| 309 |
After researcher confirms:
|
| 310 |
|
| 311 |
+
1. Call run_bertopic_discovery(run_key, min_cluster_size, max_cluster_size)
|
| 312 |
β Splits papers into sentences (regex, min 30 chars)
|
| 313 |
β Filters publisher boilerplate (copyright, license text)
|
| 314 |
+
β Embeds with SPECTER2 (L2-normalized)
|
| 315 |
+
β UMAP reduces dimensions for HDBSCAN clustering
|
| 316 |
β Finds 5 nearest centroid sentences per topic
|
| 317 |
β Saves Plotly HTML visualizations
|
| 318 |
β Saves embeddings + summaries checkpoints
|
|
|
|
| 323 |
β Writes review table with Mistral labels by default
|
| 324 |
OPTIONAL: if researcher types `VERIFY` at STOP GATE 1,
|
| 325 |
call verify_topic_labels_with_groq(run_key) and present side-by-side
|
| 326 |
+
Mistral vs Groq-Ollama vs Groq-GPT label comparison directly in chat.
|
| 327 |
NOTE: NO PACIS categories in Phase 2. PACIS comparison comes in Phase 5.5.
|
| 328 |
|
| 329 |
3. Present CODED data with EVIDENCE under each topic:
|
|
|
|
| 345 |
π 4 Plotly visualizations saved (download below)
|
| 346 |
|
| 347 |
**Review these codes. Ready for Phase 3 (theme search)?**
|
| 348 |
+
β’ `VERIFY` β run Groq-Ollama + Groq-GPT labels and compare with Mistral in chat output
|
| 349 |
β’ `approve` β codes look good, move to theme grouping
|
| 350 |
+
β’ `re-run min_cluster_size=4` β more topics (smaller groups)
|
| 351 |
+
β’ `re-run max_cluster_size=100` β cap oversized clusters
|
| 352 |
β’ `show topic 4 papers` β see all paper titles in topic 4
|
| 353 |
β’ `code 2 looks wrong` β I will show why it was labeled that way
|
| 354 |
|
|
|
|
| 386 |
|
| 387 |
7. If researcher questions a code:
|
| 388 |
β Show the 5 sentences that generated the label
|
| 389 |
+
β Explain reasoning: "UMAP preserves semantic neighborhoods,
|
| 390 |
+
and HDBSCAN finds dense groups without forcing every point
|
| 391 |
+
into a cluster. These sentences share semantic proximity even
|
| 392 |
+
if keywords differ."
|
| 393 |
β Offer re-run with adjusted parameters
|
| 394 |
|
| 395 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
| 613 |
- ONLY call verify_taxonomy_mapping_with_groq when user explicitly says VERIFY
|
| 614 |
and the workflow is at STOP GATE 4 (post-Phase 5.5 mapping).
|
| 615 |
- ALWAYS call compare_with_taxonomy before claiming PAJAIS mappings.
|
| 616 |
+
- Use min_cluster_size=20, max_cluster_size=120 as default.
|
| 617 |
+
- If too many topics (>200), suggest increasing min_cluster_size.
|
| 618 |
+
- If too few topics (<20), suggest decreasing min_cluster_size.
|
| 619 |
- NEVER skip Phase 4 saturation check or Phase 5.5 taxonomy comparison.
|
| 620 |
- NEVER proceed to Phase 6 unless every run that was executed has completed Phase 5.5.
|
| 621 |
- NEVER invent topic labels β only present labels returned by Tool 3.
|
|
|
|
| 1031 |
|
| 1032 |
shown = rows[:VERIFY_CHAT_MAX_ROWS]
|
| 1033 |
header = [
|
| 1034 |
+
"| # | Mistral Label | Groq-Ollama Label | Groq-GPT Label |",
|
| 1035 |
+
"|---|---|---|---|",
|
| 1036 |
]
|
| 1037 |
lines = list(map(
|
| 1038 |
lambda r: (
|
| 1039 |
f"| {int(r.get('cluster_id', 0))} "
|
| 1040 |
f"| {_sanitize_markdown_cell(r.get('mistral_label') or r.get('label', ''))} "
|
| 1041 |
+
f"| {_sanitize_markdown_cell(r.get('groq_ollama_label') or r.get('groq_label', ''))} "
|
| 1042 |
+
f"| {_sanitize_markdown_cell(r.get('groq_gpt_label', ''))} |"
|
| 1043 |
),
|
| 1044 |
shown,
|
| 1045 |
))
|
|
|
|
| 1120 |
report = _build_verify_chat_report(labels_rows)
|
| 1121 |
|
| 1122 |
reply = (
|
| 1123 |
+
"VERIFY complete. Groq-Ollama and Groq-GPT topic labeling has been added for Phase 2 topics.\n\n"
|
| 1124 |
f"Verified topics: {verified_count}/{labelled_count}\n"
|
| 1125 |
+
"Mistral vs Groq-Ollama vs Groq-GPT comparison is shown below in chat.\n\n"
|
| 1126 |
f"{report}\n\n"
|
| 1127 |
"Compare labels, edit Rename To/Approve, then click Submit Review to continue.\n\n"
|
| 1128 |
"[STOP GATE 1 β AWAITING REVIEW TABLE SUBMISSION]"
|