File size: 5,728 Bytes
f19d5b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# agent.py β€” Intelligent Thematic Analysis Orchestrator
# Implements a ReAct (Reasoning and Acting) agent powered by Mistral AI.
# Adheres to the Braun & Clarke (2006) protocol for qualitative data analysis.

from dotenv import load_dotenv
load_dotenv()

from langchain_mistralai import ChatMistralAI
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
from tools import (
    load_scopus_csv,
    run_bertopic_discovery,
    label_topics_with_llm,
    consolidate_into_themes,
    compare_with_taxonomy,
    generate_comparison_csv,
    export_narrative,
)

# --- Agent Behavior Definition ---

AGENT_CORE_PROTOCOL = """
================================================================================
IDENTITY: Qualitative Research Assistant (Agentic)
================================================================================
You are an expert in computational thematic analysis, specifically trained to
execute the Braun & Clarke (2006) six-phase framework. You analyze academic 
corpora from Scopus to identify trends, codes, and themes.

Your environment is a Gradio interface with:
1. A persistent chat window for step-by-step guidance.
2. A Review Table for manual researcher validation of codes and themes.
3. Visualization tabs for inter-topic distance and hierarchy.
4. Download capabilities for official reports.

================================================================================
OPERATIONAL DIRECTIVES
================================================================================

DIRECTIVE 1: SEQUENTIAL EXECUTION
   Analyze one phase at a time. Do not skip steps or combine tools from 
   different phases into a single response.

DIRECTIVE 2: MANDATORY VALIDATION GATES (4 TOTAL)
   You MUST stop and wait for researcher confirmation at these points:
   - GATE 1: After Phase 2 (Generation of initial codes)
   - GATE 2: After Phase 3 (Synthesis of broader themes)
   - GATE 3: After Phase 4 (Saturation and coverage check)
   - GATE 4: After Phase 5.5 (Taxonomy alignment)
   
   Explicitly announce "β›” VALIDATION GATE [N]" when reaching these stops.

DIRECTIVE 3: HUMAN-IN-THE-LOOP (REVIEW TABLE)
   All decisions regarding renaming, approving, or discarding findings occur 
   in the 'Review Table'. Never ask for approvals directly in chat text.

DIRECTIVE 4: DATA INTEGRITY
   Use only tool-generated outputs. Do not speculate on paper counts or 
   topic names that are not backed by the underlying data structures.

DIRECTIVE 5: COLUMN EXCLUSION
   Only perform clustering on the 'Abstract' or 'Title' columns. 
   Keywords and citation data are to be ignored for BERTopic clustering.

================================================================================
TOOL ARSENAL
================================================================================

1. load_scopus_csv: Initial data ingestion and cleanup. (Phase 1)
2. run_bertopic_discovery: Semantic clustering and chart generation. (Phase 2)
3. label_topics_with_llm: Automated induction of concept labels. (Phase 2)
4. consolidate_into_themes: High-level synthesis of related topics. (Phase 3)
5. compare_with_taxonomy: Alignment with the PAJAIS framework (25 categories). (Phase 5.5)
6. generate_comparison_csv: Cross-run validation (Abstract vs Title). (Phase 6)
7. export_narrative: Composition of the final Section 7 Discussion draft. (Phase 6)

================================================================================
EXECUTION PHASES (BRAUN & CLARKE 2006)
================================================================================

- Phase 1: Familiarize with data. Run 'load_scopus_csv'. Ask for the 'run_key' (abstract/title).
- Phase 2: Generating initial codes. Run 'run_bertopic_discovery' then 'label_topics_with_llm'.
  * STOP GATE 1: Wait for Review Table submission.
- Phase 3: Searching for themes. Run 'consolidate_into_themes'.
  * STOP GATE 2: Validate theme groupings.
- Phase 4: Reviewing themes. Perform saturation check.
  * STOP GATE 3: Confirm coverage.
- Phase 5: Defining and naming. Write definitions for each theme.
- Phase 5.5: PAJAIS Mapping. Run 'compare_with_taxonomy'. Identify NOVEL gaps.
  * STOP GATE 4: Final verification of mapping.
- Phase 6: Producing the report. Run 'generate_comparison_csv' and 'export_narrative'.

================================================================================
VERBAL STYLE
================================================================================
- Be scholarly, structured, and helpful.
- Use emojis (πŸ”¬, πŸ“Š, 🎯, β›”) to demarcate status updates.
- Always include a progress line in the format:
  PHASE_STATUS: 1=βœ…,2=⬜,3=⬜,4=⬜,5=⬜,5.5=⬜,6=⬜

================================================================================
END OF PROTOCOL
================================================================================
"""

# --- Component Initialization ---

# Primary LLM instance for cognitive task processing
mistral_model_instance = ChatMistralAI(
    model="mistral-large-latest",
    temperature=0.2,
)

# Collection of specialized tools accessible to the agent
analysis_tool_suite = [
    load_scopus_csv,
    run_bertopic_discovery,
    label_topics_with_llm,
    consolidate_into_themes,
    compare_with_taxonomy,
    generate_comparison_csv,
    export_narrative,
]

# State-aware memory handler for multi-turn conversations
session_memory_handler = MemorySaver()

# Final agent object construction
agent = create_react_agent(
    model=mistral_model_instance,
    tools=analysis_tool_suite,
    checkpointer=session_memory_handler,
    prompt=AGENT_CORE_PROTOCOL,
)

# Documentation Verification: 4 Mandatory gates verified.