WeMWish commited on
Commit ·
2d96f84
1
Parent(s): 87f1276
literature searching
Browse files- agents/__init__.py +1 -0
- agents/generation_agent.py +28 -3
- agents/manager_agent.py +318 -295
- requirements.txt +2 -0
- server.R +218 -3
- tested_queries.txt +40 -0
- tools/__init__.py +1 -0
- tools/agent_tools.py +351 -2
- tools/agent_tools_documentation.md +71 -0
- traces/list of required files.txt +0 -76
- traces/log7.txt +23 -0
- traces/log8.txt +70 -0
- ui.R +41 -0
- www/chat_script.js +107 -7
- www/pages_description.md +30 -0
agents/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# This file makes the 'agents' directory a Python package.
|
agents/generation_agent.py
CHANGED
|
@@ -31,7 +31,7 @@ The JSON object must have the following structure:
|
|
| 31 |
"thought": "Your detailed thought process. Reference the provided context (tool docs, Excel schemas, WWW manifest) to inform your plan. If requesting an image, specify its path (which you might find in the WWW manifest or Excel schemas). If an image File ID was provided by the system, explain how you are using that visual information.",
|
| 32 |
"status": "Indicates the current state of your plan. Must be one of: 'AWAITING_DATA', 'AWAITING_IMAGE', 'AWAITING_USER_INPUT', 'CODE_COMPLETE', 'ERROR'.",
|
| 33 |
"python_code": "Contains Python code if status is 'AWAITING_DATA'. If status is 'AWAITING_IMAGE', this field contains the **path** to the image file you want the system to upload. **In ALL OTHER STATUSES, especially after an image has been uploaded and its File ID provided to you by the system, this field MUST be an empty string.**",
|
| 34 |
-
"explanation": "A
|
| 35 |
}
|
| 36 |
|
| 37 |
**Status Types & `python_code` Field Rules:**
|
|
@@ -42,7 +42,7 @@ The JSON object must have the following structure:
|
|
| 42 |
* `"python_code"`: Must contain the **path** to the image file (e.g., "www/diagram.png"). You can find potential image paths in the 'WWW DIRECTORY FILE MANIFEST' or in data from Excel files (e.g., a column listing image paths).
|
| 43 |
* `"AWAITING_USER_INPUT"`: Use this when you need clarification from the user. `"python_code"` MUST be empty.
|
| 44 |
* `"CODE_COMPLETE"`: Use this when you have successfully completed the user's request. This includes when you have analyzed an image (after it was uploaded and its File ID provided to you) and are ready to present your findings.
|
| 45 |
-
* `"explanation"` should provide the final
|
| 46 |
* `"python_code"` MUST be empty.
|
| 47 |
* `"ERROR"`: Use this if you encounter an issue. `"python_code"` should be empty or contain a brief note if relevant to the error's source.
|
| 48 |
|
|
@@ -80,11 +80,36 @@ Your response might be:
|
|
| 80 |
```
|
| 81 |
|
| 82 |
**Tool Usage (for `AWAITING_DATA`):**
|
| 83 |
-
* Use `tools` module functions as described in the provided documentation.
|
| 84 |
* Strictly adhere to `print(json.dumps({'intermediate_data_for_llm': tools.your_tool_function_call_here()}))` for `python_code` when status is `AWAITING_DATA`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
**General Guidelines:**
|
| 87 |
* Adhere STRICTLY to the JSON output format. No extra text.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
"""
|
| 89 |
POLLING_INTERVAL_S = 1 # Reduced polling interval for faster response in simple generation
|
| 90 |
MAX_POLLING_ATTEMPTS = 30 # Max attempts for generation run
|
|
|
|
| 31 |
"thought": "Your detailed thought process. Reference the provided context (tool docs, Excel schemas, WWW manifest) to inform your plan. If requesting an image, specify its path (which you might find in the WWW manifest or Excel schemas). If an image File ID was provided by the system, explain how you are using that visual information.",
|
| 32 |
"status": "Indicates the current state of your plan. Must be one of: 'AWAITING_DATA', 'AWAITING_IMAGE', 'AWAITING_USER_INPUT', 'CODE_COMPLETE', 'ERROR'.",
|
| 33 |
"python_code": "Contains Python code if status is 'AWAITING_DATA'. If status is 'AWAITING_IMAGE', this field contains the **path** to the image file you want the system to upload. **In ALL OTHER STATUSES, especially after an image has been uploaded and its File ID provided to you by the system, this field MUST be an empty string.**",
|
| 34 |
+
"explanation": "A user-facing explanation. If status is 'CODE_COMPLETE', this should be a **detailed and thorough report** accurately summarizing your findings based on actions taken (like tool use or image analysis) or providing a direct answer if no specific action was performed. Ensure claims made here are substantiated by the turn's actual output. If status is 'ERROR', explain the error. If 'AWAITING_USER_INPUT', pose a clear question."
|
| 35 |
}
|
| 36 |
|
| 37 |
**Status Types & `python_code` Field Rules:**
|
|
|
|
| 42 |
* `"python_code"`: Must contain the **path** to the image file (e.g., "www/diagram.png"). You can find potential image paths in the 'WWW DIRECTORY FILE MANIFEST' or in data from Excel files (e.g., a column listing image paths).
|
| 43 |
* `"AWAITING_USER_INPUT"`: Use this when you need clarification from the user. `"python_code"` MUST be empty.
|
| 44 |
* `"CODE_COMPLETE"`: Use this when you have successfully completed the user's request. This includes when you have analyzed an image (after it was uploaded and its File ID provided to you) and are ready to present your findings.
|
| 45 |
+
* `"explanation"` should provide the **detailed and thorough final report** to the user, summarizing your findings, the steps taken (especially if complex or involving tool use/image analysis), and key insights derived.
|
| 46 |
* `"python_code"` MUST be empty.
|
| 47 |
* `"ERROR"`: Use this if you encounter an issue. `"python_code"` should be empty or contain a brief note if relevant to the error's source.
|
| 48 |
|
|
|
|
| 80 |
```
|
| 81 |
|
| 82 |
**Tool Usage (for `AWAITING_DATA`):**
|
| 83 |
+
* Use `tools` module functions as described in the provided documentation (`--- STATIC TOOL DOCUMENTATION ---`).
|
| 84 |
* Strictly adhere to `print(json.dumps({'intermediate_data_for_llm': tools.your_tool_function_call_here()}))` for `python_code` when status is `AWAITING_DATA`.
|
| 85 |
+
* **Specific Instruction for Literature Search & Summarization (Multi-Step Process)**:
|
| 86 |
+
* **Step 1: Initial Search (if needed)**: If the user's query requires finding academic literature, your `thought` process should involve brainstorming 3-5 diverse search query strings suitable for academic search engines. Then, use the `tools.multi_source_literature_search(queries=[...], max_results_per_query_per_source=..., max_total_unique_papers=...)` tool.
|
| 87 |
+
* Determine the `max_total_unique_papers` based on the user's request or a sensible default (e.g., 3-5 for a quick overview, tool default is 10).
|
| 88 |
+
* When you receive the list of papers, your `explanation` (status `CODE_COMPLETE`, `python_code` empty) should present this list (titles, authors, URLs), making it clear these are *potential leads* and that you have *not yet read or summarized them*.
|
| 89 |
+
* If the user *also* asked for summaries in their initial query, or if summaries are a clear next step, your `thought` should indicate your intention to now fetch text for these papers in the next step.
|
| 90 |
+
* **Step 2: Fetching Text for Summarization (if summaries are needed)**: If the previous step resulted in a list of papers and summaries are required (either from the original query or as a logical next step you identified):
|
| 91 |
+
* Your `thought` should be to call `tools.fetch_text_from_urls`, passing the list of paper dictionaries obtained from `multi_source_literature_search`.
|
| 92 |
+
* Set `status` to `AWAITING_DATA`.
|
| 93 |
+
* `python_code` should be `print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=..., max_chars_per_paper=10000)}))` (adjust `max_chars_per_paper` if needed, e.g., 15000 for more context).
|
| 94 |
+
* **Step 3: Generating Summaries (after text is fetched)**: After the system executes `fetch_text_from_urls` and provides you with the updated list of paper dictionaries (now containing `"retrieved_text_content"` for each paper):
|
| 95 |
+
* Your `thought` process should involve iterating through the papers. For each paper where `"retrieved_text_content"` is available and not an error message, use your powerful LLM capabilities to read this text and generate a concise summary.
|
| 96 |
+
* Set `status` to `CODE_COMPLETE`.
|
| 97 |
+
* `python_code` MUST be `""` (empty string).
|
| 98 |
+
* Your `explanation` should be the final user-facing output, presenting the summaries. For each summary, clearly attribute it to the respective paper (e.g., by title and authors). If text retrieval failed for some papers (i.e., `"retrieved_text_content"` contains an error message), mention this gracefully (e.g., "Text content could not be retrieved for paper X.").
|
| 99 |
|
| 100 |
**General Guidelines:**
|
| 101 |
* Adhere STRICTLY to the JSON output format. No extra text.
|
| 102 |
+
* **Accuracy in Claims**: In your "explanation" and "thought", be precise about actions.
|
| 103 |
+
* Only claim to *have done* something (e.g., "I have fetched data", "I have listed X") if you actually performed the action in the current or a successfully completed prior step of *this specific query*.
|
| 104 |
+
* If you are providing general knowledge, examples, or potential next steps *without* having executed a specific tool for them in this turn, frame it accordingly (e.g., "Generally, research in this area involves...", "Potential research questions could include...", "One could use a tool to find...").
|
| 105 |
+
* Do not over-promise or state you will provide specific details (like a list of papers or tool outputs) unless that information is directly part of your current "explanation" or was the output of a successfully executed tool you are now summarizing.
|
| 106 |
+
* **Focus of "explanation"**:
|
| 107 |
+
* If `status` is `CODE_COMPLETE` and no code was executed in a preceding step for *this specific plan*, your `explanation` IS the direct, detailed answer to the user's query.
|
| 108 |
+
* If `status` is `CODE_COMPLETE` and code *was* successfully executed in a preceding step for this plan (e.g., after `AWAITING_DATA`), your `explanation` should primarily summarize the *results and findings from that executed code*.
|
| 109 |
+
* **Confidentiality of Website Structure**:
|
| 110 |
+
* While you have access to a manifest of website files (`--- WWW DIRECTORY FILE MANIFEST ---`) for your internal understanding (e.g., to find data files or image paths if explicitly relevant to the user's data query), **you MUST NOT disclose the existence, names, paths, or content of structural or code-related website files to the user.**
|
| 111 |
+
* This includes, but is not limited to: CSS files, JavaScript files, server-side code files (e.g., R scripts if visible), HTML structure details not directly part of displayed content, configuration files, or any other file that pertains to the website's implementation rather than its user-facing data or informational content.
|
| 112 |
+
* Your responses should focus on the data, analysis, and information the website *provides*, not how the website itself is built or structured internally. If a user asks directly about such files, politely state that you cannot provide information about the website's internal structure.
|
| 113 |
"""
|
| 114 |
POLLING_INTERVAL_S = 1 # Reduced polling interval for faster response in simple generation
|
| 115 |
MAX_POLLING_ATTEMPTS = 30 # Max attempts for generation run
|
agents/manager_agent.py
CHANGED
|
@@ -16,9 +16,16 @@ from .executor_agent import ExecutorAgent # Was commented out, now needed
|
|
| 16 |
# POLLING_INTERVAL_S and MAX_POLLING_ATTEMPTS are removed, polling is handled by individual agents.
|
| 17 |
|
| 18 |
class ManagerAgent:
|
| 19 |
-
def __init__(self, openai_api_key=None, openai_client: OpenAI = None):
|
| 20 |
self.client = openai_client
|
| 21 |
self.conversation_history = [] # To store user queries and agent responses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
if not self.client:
|
| 24 |
if openai_api_key:
|
|
@@ -50,320 +57,336 @@ class ManagerAgent:
|
|
| 50 |
# _load_excel_schema, _prepare_tool_schemas, _create_or_retrieve_assistant,
|
| 51 |
# _poll_run_for_completion, _display_assistant_response, _start_new_thread (Thread management shifts to individual agents)
|
| 52 |
|
| 53 |
-
def
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
while current_generation_attempt < max_regeneration_attempts and not code_approved_for_execution:
|
| 83 |
-
current_generation_attempt += 1
|
| 84 |
-
print(f"[Manager] Generation Attempt: {current_generation_attempt}/{max_regeneration_attempts}")
|
| 85 |
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
while current_data_fetch_attempt < max_data_fetch_attempts_per_generation:
|
| 95 |
-
current_data_fetch_attempt += 1
|
| 96 |
-
if not self.generation_agent:
|
| 97 |
-
print("TaijiChat > Generation capabilities are unavailable. Cannot proceed.")
|
| 98 |
-
self.conversation_history.append({"role": "assistant", "content": "Generation capabilities are unavailable."})
|
| 99 |
-
final_plan = {"status": "FATAL_ERROR", "thought": "Generation agent missing."}
|
| 100 |
-
break # Break from data fetch loop
|
| 101 |
-
|
| 102 |
-
# Pass the potentially augmented query (with supervisor feedback or fetched data)
|
| 103 |
-
plan = self.generation_agent.generate_code_plan(query_for_this_data_fetch_loop, self.conversation_history)
|
| 104 |
-
final_plan = plan # Store the latest plan for this generation attempt
|
| 105 |
-
|
| 106 |
-
print(f"[GenerationAgent] Thought (Data Fetch Attempt {current_data_fetch_attempt}): {plan.get('thought')}")
|
| 107 |
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
self.
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
if review.get("safety_status") == "APPROVED_FOR_EXECUTION":
|
| 138 |
-
execution_result = self.executor_agent.execute_code(generated_code_for_data_fetch)
|
| 139 |
-
executed_output_str = execution_result.get("execution_output", "")
|
| 140 |
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
if intermediate_data is None:
|
| 146 |
-
print("TaijiChat > Error: Fetched data is missing the 'intermediate_data_for_llm' key or data is null.")
|
| 147 |
-
final_plan["status"] = "ERROR"
|
| 148 |
-
final_plan["thought"] = "Error: Fetched data format incorrect after execution."
|
| 149 |
-
break
|
| 150 |
-
|
| 151 |
-
print(f"[Manager] Successfully fetched intermediate data. Size (approx chars): {len(json.dumps(intermediate_data))}")
|
| 152 |
-
self.conversation_history.append({"role": "system", "content": f"System: Executed code, fetched data output (first 500 chars): {executed_output_str[:500]}..."})
|
| 153 |
-
|
| 154 |
-
query_for_this_data_fetch_loop = (
|
| 155 |
-
f"Okay, I have fetched the data you requested. Here it is:\n"
|
| 156 |
-
f"```json\n{json.dumps(intermediate_data, indent=2)}\n```\n"
|
| 157 |
-
f"Now, please proceed with your original plan to analyze this data based on the initial user query: '{original_user_query_for_turn}'. "
|
| 158 |
-
f"Remember your primary instructions and output format."
|
| 159 |
-
)
|
| 160 |
-
# Loop continues for the next data fetch / generation attempt with this new query
|
| 161 |
-
|
| 162 |
-
except json.JSONDecodeError:
|
| 163 |
-
print(f"TaijiChat > Error: Could not parse the fetched data output as JSON. Output was: {executed_output_str}")
|
| 164 |
-
final_plan["status"] = "ERROR"
|
| 165 |
-
final_plan["thought"] = "Error: Fetched data was not valid JSON."
|
| 166 |
-
break
|
| 167 |
-
except Exception as e:
|
| 168 |
-
print(f"TaijiChat > Error processing fetched data: {e}")
|
| 169 |
-
final_plan["status"] = "ERROR"
|
| 170 |
-
final_plan["thought"] = f"Error processing fetched data: {e}"
|
| 171 |
break
|
| 172 |
-
|
| 173 |
-
print("
|
| 174 |
-
|
| 175 |
-
final_plan["thought"] = "Error: Data fetching code rejected. Cannot fulfill multi-step plan."
|
| 176 |
-
# This rejection means the AWAITING_DATA plan failed. We should break from data fetch loop.
|
| 177 |
-
# The outer generation loop will then decide if it can retry based on this ERROR status.
|
| 178 |
-
break
|
| 179 |
-
elif plan.get("status") == "AWAITING_IMAGE":
|
| 180 |
-
image_path_to_display = plan.get("python_code", "").strip() # python_code field now holds the image path
|
| 181 |
-
print(f"[Manager] GenerationAgent is AWAITING_IMAGE: '{image_path_to_display}'")
|
| 182 |
-
self.conversation_history.append({"role": "assistant", "content": json.dumps(plan)}) # Log AWAITING_IMAGE plan
|
| 183 |
-
|
| 184 |
-
if not image_path_to_display:
|
| 185 |
-
print("TaijiChat > GenerationAgent wants to see an image but provided no path.")
|
| 186 |
-
final_plan["status"] = "ERROR"
|
| 187 |
-
final_plan["thought"] = "Error: Agent wants to see an image but provided no path."
|
| 188 |
-
break # Exit data fetch loop
|
| 189 |
-
|
| 190 |
-
if not os.path.exists(image_path_to_display):
|
| 191 |
-
print(f"TaijiChat > Image file not found at path: {image_path_to_display}")
|
| 192 |
-
final_plan["status"] = "ERROR"
|
| 193 |
-
final_plan["thought"] = f"Error: Image file not found at {image_path_to_display}."
|
| 194 |
-
break # Exit data fetch loop
|
| 195 |
-
|
| 196 |
-
try:
|
| 197 |
-
print(f"[Manager] Uploading image '{image_path_to_display}' to OpenAI...")
|
| 198 |
-
with open(image_path_to_display, "rb") as image_file_obj:
|
| 199 |
-
# Using purpose='vision' as discussed. If issues, this might need to be 'assistants'.
|
| 200 |
-
uploaded_file = self.client.files.create(file=image_file_obj, purpose='vision')
|
| 201 |
-
image_file_id = uploaded_file.id
|
| 202 |
-
print(f"[Manager] Image uploaded successfully. File ID: {image_file_id}")
|
| 203 |
-
|
| 204 |
query_for_this_data_fetch_loop = (
|
| 205 |
-
f"
|
| 206 |
-
f"
|
| 207 |
-
f"
|
|
|
|
| 208 |
)
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
break
|
| 215 |
except Exception as e:
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
break
|
| 220 |
-
else:
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
print(f"
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 250 |
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
# Log the proposed code from this generation attempt
|
| 257 |
-
self.conversation_history.append({"role": "assistant", "content": json.dumps(final_plan)})
|
| 258 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 268 |
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
if
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
f"Supervisor's Detailed Feedback: '{rejection_feedback}'. "
|
| 286 |
-
f"Remember the original user query was: '{original_user_query_for_turn}'. "
|
| 287 |
-
f"Ensure the new code adheres to all tool usage and safety guidelines."
|
| 288 |
-
)
|
| 289 |
-
# Loop continues for the next regeneration attempt
|
| 290 |
else:
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
final_plan["status"] = "ERROR" # Mark as error to prevent execution path
|
| 307 |
-
break # Break from regeneration attempts loop
|
| 308 |
-
elif final_plan and final_plan.get("status") == "AWAITING_DATA":
|
| 309 |
-
# This means the data fetch loop ended, but the status is still AWAITING_DATA.
|
| 310 |
-
# This usually implies an error during the *last* part of data fetch, like supervisor rejection of fetch code, or data parsing error.
|
| 311 |
-
# The error message should already be in final_plan.thought from the inner loop.
|
| 312 |
-
print(f"TaijiChat > Could not complete data fetching stage: {final_plan.get('thought')}")
|
| 313 |
-
# Log this plan (which contains the error)
|
| 314 |
-
self.conversation_history.append({"role": "assistant", "content": json.dumps(final_plan)})
|
| 315 |
-
# Let the regeneration loop decide if it wants to retry from scratch.
|
| 316 |
-
current_query_for_generation_agent = original_user_query_for_turn # Reset for fresh attempt
|
| 317 |
-
continue # Next regeneration attempt
|
| 318 |
|
| 319 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
|
| 333 |
-
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
|
| 349 |
-
# The final user message should have been handled by the GenerationAgent in the loop above.
|
| 350 |
-
# If we reach here, it means the GenerationAgent call to create the final message might not have happened or updated final_plan correctly.
|
| 351 |
-
# This part of the logic might need review if users still see generic messages after SUPERVISOR_REJECTION_FINAL.
|
| 352 |
-
if not (final_plan and "SUPERVISOR_REJECTION_FINAL" in final_plan.get("thought","")):
|
| 353 |
-
# Only print this if the GenerationAgent didn't already provide the final user message
|
| 354 |
-
print(f"TaijiChat > I could not generate and approve code for your request after {max_regeneration_attempts} attempts. Last detailed supervisor feedback: {final_plan.get('thought') if final_plan else 'N/A'}")
|
| 355 |
-
elif not final_plan or (not final_plan.get("python_code","").strip() and not code_approved_for_execution and final_plan.get("status") != "TASK_COMPLETED"):
|
| 356 |
-
# Covers cases where no plan was ever successfully made (e.g., all attempts resulted in error)
|
| 357 |
-
# or a non-code plan that wasn't explicitly "TASK_COMPLETED" (though this path might be rare now).
|
| 358 |
-
last_thought = final_plan.get('thought', "No clear thought was formulated.") if final_plan else "No plan was formulated."
|
| 359 |
-
print(f"TaijiChat > I was unable to process your request fully. Last thought: {last_thought}")
|
| 360 |
-
if final_plan and json.dumps(final_plan) not in [c['content'] for c in self.conversation_history[-3:] if c['role'] == 'assistant']: # Avoid duplicate history
|
| 361 |
-
self.conversation_history.append({"role": "assistant", "content": json.dumps(final_plan if final_plan else {"thought": "Failed to process.", "status": "ERROR"})})
|
| 362 |
|
| 363 |
# Ensure conversation history doesn't grow indefinitely
|
| 364 |
-
|
| 365 |
-
if len(self.conversation_history) >
|
| 366 |
-
self.conversation_history = self.conversation_history[-
|
| 367 |
|
| 368 |
user_query = input("\nUser: ")
|
| 369 |
|
|
|
|
| 16 |
# POLLING_INTERVAL_S and MAX_POLLING_ATTEMPTS are removed, polling is handled by individual agents.
|
| 17 |
|
| 18 |
class ManagerAgent:
|
| 19 |
+
def __init__(self, openai_api_key=None, openai_client: OpenAI = None, r_callback_fn=None):
|
| 20 |
self.client = openai_client
|
| 21 |
self.conversation_history = [] # To store user queries and agent responses
|
| 22 |
+
self.r_callback = r_callback_fn # Store the R callback
|
| 23 |
+
|
| 24 |
+
if self.r_callback:
|
| 25 |
+
print("ManagerAgent: R callback function provided and stored.")
|
| 26 |
+
self._send_thought_to_r("ManagerAgent initialized with R callback.") # Example initial thought
|
| 27 |
+
else:
|
| 28 |
+
print("ManagerAgent: No R callback function provided.")
|
| 29 |
|
| 30 |
if not self.client:
|
| 31 |
if openai_api_key:
|
|
|
|
| 57 |
# _load_excel_schema, _prepare_tool_schemas, _create_or_retrieve_assistant,
|
| 58 |
# _poll_run_for_completion, _display_assistant_response, _start_new_thread (Thread management shifts to individual agents)
|
| 59 |
|
| 60 |
+
def _send_thought_to_r(self, thought_text: str):
|
| 61 |
+
"""Sends a thought message to the registered R callback function, if available."""
|
| 62 |
+
if self.r_callback:
|
| 63 |
+
try:
|
| 64 |
+
# print(f"Python Agent: Sending thought to R: {thought_text}") # Optional: uncomment for verbose Python-side logging of thoughts
|
| 65 |
+
self.r_callback(thought_text)
|
| 66 |
+
except Exception as e:
|
| 67 |
+
print(f"ManagerAgent Error: Exception while calling R callback: {e}")
|
| 68 |
+
# else:
|
| 69 |
+
# print(f"Python Agent (No R callback): Thought: {thought_text}") # Optional: uncomment to see thoughts even if no R callback
|
| 70 |
+
|
| 71 |
+
def _process_turn(self, user_query_text: str) -> str:
|
| 72 |
+
"""
|
| 73 |
+
Processes a single turn of the conversation.
|
| 74 |
+
This is the core logic used by both terminal and Shiny interfaces.
|
| 75 |
+
Assumes self.conversation_history has been updated with the latest user_query_text.
|
| 76 |
+
"""
|
| 77 |
+
print(f"[Manager._process_turn] Processing query: '{user_query_text[:100]}...'")
|
| 78 |
+
self._send_thought_to_r(f"Processing query: '{user_query_text[:50]}...'") # THOUGHT
|
| 79 |
|
| 80 |
+
# --- Multi-Stage Generation & Potential Retry Logic ---
|
| 81 |
+
max_regeneration_attempts = 2
|
| 82 |
+
current_generation_attempt = 0
|
| 83 |
+
final_plan_for_turn = None
|
| 84 |
+
code_approved_for_execution = False
|
| 85 |
+
|
| 86 |
+
current_query_for_generation_agent = user_query_text
|
| 87 |
+
|
| 88 |
+
while current_generation_attempt < max_regeneration_attempts and not code_approved_for_execution:
|
| 89 |
+
current_generation_attempt += 1
|
| 90 |
+
print(f"[Manager._process_turn] Generation Attempt: {current_generation_attempt}/{max_regeneration_attempts}")
|
| 91 |
+
self._send_thought_to_r(f"Generation Attempt: {current_generation_attempt}/{max_regeneration_attempts}") # THOUGHT
|
| 92 |
+
|
| 93 |
+
max_data_fetch_attempts_per_generation = 2
|
| 94 |
+
current_data_fetch_attempt = 0
|
| 95 |
+
query_for_this_data_fetch_loop = current_query_for_generation_agent
|
| 96 |
+
|
| 97 |
+
while current_data_fetch_attempt < max_data_fetch_attempts_per_generation:
|
| 98 |
+
current_data_fetch_attempt += 1
|
| 99 |
+
if not self.generation_agent:
|
| 100 |
+
self._send_thought_to_r("Error: Generation capabilities are unavailable.") # THOUGHT
|
| 101 |
+
return "Generation capabilities are unavailable. Cannot proceed."
|
| 102 |
+
|
| 103 |
+
self._send_thought_to_r(f"Asking GenerationAgent for a plan (Data Fetch Attempt {current_data_fetch_attempt})...") # THOUGHT
|
| 104 |
+
plan = self.generation_agent.generate_code_plan(query_for_this_data_fetch_loop, self.conversation_history)
|
| 105 |
+
final_plan_for_turn = plan
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
generated_thought = plan.get('thought', 'No thought provided by GenerationAgent.')
|
| 108 |
+
print(f"[GenerationAgent] Thought (Data Fetch Attempt {current_data_fetch_attempt}): {generated_thought}")
|
| 109 |
+
self._send_thought_to_r(f"GenerationAgent thought: {generated_thought}") # THOUGHT
|
| 110 |
|
| 111 |
+
if plan.get("status") == "AWAITING_DATA":
|
| 112 |
+
print("[Manager._process_turn] GenerationAgent is AWAITING_DATA. Executing code to fetch data...")
|
| 113 |
+
self._send_thought_to_r("Plan requires data. Attempting to execute code for data fetching.") # THOUGHT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
+
generated_code_for_data_fetch = plan.get("python_code", "").strip()
|
| 116 |
+
if not generated_code_for_data_fetch:
|
| 117 |
+
self._send_thought_to_r("Error: Agent needs data but provided no code to fetch it.") # THOUGHT
|
| 118 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 119 |
+
final_plan_for_turn["thought"] = "Error: Agent wants data but provided no code to fetch it."
|
| 120 |
+
break
|
| 121 |
+
|
| 122 |
+
if not ("json.dumps" in generated_code_for_data_fetch and "intermediate_data_for_llm" in generated_code_for_data_fetch):
|
| 123 |
+
self._send_thought_to_r("Error: Data fetching code has incorrect format (missing json.dumps or intermediate_data_for_llm).") # THOUGHT
|
| 124 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 125 |
+
final_plan_for_turn["thought"] = "Error: Agent's code for data fetching has incorrect format."
|
| 126 |
+
break
|
| 127 |
+
|
| 128 |
+
if not self.supervisor_agent or not self.executor_agent:
|
| 129 |
+
self._send_thought_to_r("Error: Cannot fetch data, Supervisor or Executor agent is missing.") # THOUGHT
|
| 130 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 131 |
+
final_plan_for_turn["thought"] = "Error: Cannot fetch data due to missing supervisor/executor."
|
| 132 |
+
break
|
| 133 |
+
|
| 134 |
+
self._send_thought_to_r("Asking SupervisorAgent to review data fetching code...") # THOUGHT
|
| 135 |
+
review = self.supervisor_agent.review_code(generated_code_for_data_fetch, "Reviewing code for data fetching: " + plan.get("thought"))
|
| 136 |
+
supervisor_feedback = review.get('safety_feedback', 'No feedback.')
|
| 137 |
+
supervisor_status = review.get('safety_status', 'UNKNOWN_STATUS')
|
| 138 |
+
print(f"[SupervisorAgent] Data Fetch Code Review: {supervisor_feedback} (Status: {supervisor_status})")
|
| 139 |
+
self._send_thought_to_r(f"Supervisor review (data fetch code): {supervisor_status} - {supervisor_feedback}") # THOUGHT
|
| 140 |
+
|
| 141 |
+
if review.get("safety_status") == "APPROVED_FOR_EXECUTION":
|
| 142 |
+
self._send_thought_to_r("Data fetching code approved. Executing with ExecutorAgent...") # THOUGHT
|
| 143 |
+
execution_result = self.executor_agent.execute_code(generated_code_for_data_fetch)
|
| 144 |
+
executed_output_str = execution_result.get("execution_output", "")
|
| 145 |
+
self._send_thought_to_r(f"ExecutorAgent (data fetch) output (first 100 chars): {executed_output_str[:100]}...") # THOUGHT
|
| 146 |
|
| 147 |
+
try:
|
| 148 |
+
data_payload = json.loads(executed_output_str)
|
| 149 |
+
intermediate_data = data_payload.get("intermediate_data_for_llm")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
+
if intermediate_data is None:
|
| 152 |
+
self._send_thought_to_r("Error: Fetched data is missing 'intermediate_data_for_llm' key or is null.") # THOUGHT
|
| 153 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 154 |
+
final_plan_for_turn["thought"] = "Error: Fetched data is missing 'intermediate_data_for_llm' key or data is null."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
break
|
| 156 |
+
|
| 157 |
+
print(f"[Manager._process_turn] Successfully fetched intermediate data. Size (approx chars): {len(json.dumps(intermediate_data))}")
|
| 158 |
+
self._send_thought_to_r(f"Successfully fetched and parsed intermediate data (approx {len(json.dumps(intermediate_data))} chars). Preparing for next generation step.") # THOUGHT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
query_for_this_data_fetch_loop = (
|
| 160 |
+
f"Okay, I have fetched the data you requested. Here it is:\\n"
|
| 161 |
+
f"```json\\n{json.dumps(intermediate_data, indent=2)}\\n```\\n"
|
| 162 |
+
f"Now, please proceed with your original plan to analyze this data based on the initial user query: '{user_query_text}'. "
|
| 163 |
+
f"Remember your primary instructions and output format."
|
| 164 |
)
|
| 165 |
+
except json.JSONDecodeError:
|
| 166 |
+
self._send_thought_to_r(f"Error: Could not parse fetched data as JSON. Output: {executed_output_str[:100]}...") # THOUGHT
|
| 167 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 168 |
+
final_plan_for_turn["thought"] = f"Error: Could not parse fetched data output as JSON. Output was: {executed_output_str}"
|
| 169 |
+
break
|
|
|
|
| 170 |
except Exception as e:
|
| 171 |
+
self._send_thought_to_r(f"Error processing fetched data: {str(e)}") # THOUGHT
|
| 172 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 173 |
+
final_plan_for_turn["thought"] = f"Error processing fetched data: {e}"
|
| 174 |
break
|
| 175 |
+
else:
|
| 176 |
+
self._send_thought_to_r("Error: Data fetching code rejected by Supervisor.") # THOUGHT
|
| 177 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 178 |
+
final_plan_for_turn["thought"] = "Error: Data fetching code rejected."
|
| 179 |
+
break
|
| 180 |
+
elif plan.get("status") == "AWAITING_IMAGE":
|
| 181 |
+
image_path_to_display = plan.get("python_code", "").strip()
|
| 182 |
+
print(f"[Manager._process_turn] GenerationAgent is AWAITING_IMAGE: '{image_path_to_display}'")
|
| 183 |
+
self._send_thought_to_r(f"Plan requires image: '{image_path_to_display}'. Validating path and uploading.") # THOUGHT
|
| 184 |
+
|
| 185 |
+
if not image_path_to_display or not os.path.exists(image_path_to_display):
|
| 186 |
+
self._send_thought_to_r(f"Error: Image path '{image_path_to_display}' not provided or file not found.") # THOUGHT
|
| 187 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 188 |
+
final_plan_for_turn["thought"] = f"Error: Image path not provided or file not found at {image_path_to_display}."
|
| 189 |
+
break
|
| 190 |
+
|
| 191 |
+
try:
|
| 192 |
+
self._send_thought_to_r(f"Uploading image '{image_path_to_display}' to OpenAI...") # THOUGHT
|
| 193 |
+
print(f"[Manager._process_turn] Uploading image '{image_path_to_display}' to OpenAI...")
|
| 194 |
+
with open(image_path_to_display, "rb") as image_file_obj:
|
| 195 |
+
uploaded_file = self.client.files.create(file=image_file_obj, purpose='vision')
|
| 196 |
+
image_file_id = uploaded_file.id
|
| 197 |
+
print(f"[Manager._process_turn] Image uploaded. File ID: {image_file_id}")
|
| 198 |
+
self._send_thought_to_r(f"Image uploaded. File ID: {image_file_id}. Preparing for next generation step.") # THOUGHT
|
| 199 |
+
query_for_this_data_fetch_loop = (
|
| 200 |
+
f"System: The image '{image_path_to_display}' has been uploaded with File ID: '{image_file_id}'.\\n"
|
| 201 |
+
f"You should now incorporate this image File ID into your multimodal analysis capabilities. "
|
| 202 |
+
f"Please proceed with your plan concerning this image, based on the original user query: '{user_query_text}'."
|
| 203 |
+
)
|
| 204 |
+
except Exception as e:
|
| 205 |
+
self._send_thought_to_r(f"Error uploading image '{image_path_to_display}': {str(e)}") # THOUGHT
|
| 206 |
+
final_plan_for_turn["status"] = "ERROR"
|
| 207 |
+
final_plan_for_turn["thought"] = f"Error uploading image '{image_path_to_display}': {e}"
|
| 208 |
+
break
|
| 209 |
+
else:
|
| 210 |
+
self._send_thought_to_r(f"Proceeding with plan status: {plan.get('status', 'N/A')}. No data or image fetch required at this step.") # THOUGHT
|
| 211 |
+
break
|
| 212 |
+
|
| 213 |
+
if final_plan_for_turn and final_plan_for_turn.get("status") == "FATAL_ERROR":
|
| 214 |
+
self._send_thought_to_r(f"Fatal Error: {final_plan_for_turn.get('thought')}") # THOUGHT
|
| 215 |
+
return f"A critical component is missing: {final_plan_for_turn.get('thought')}"
|
| 216 |
+
|
| 217 |
+
if final_plan_for_turn and final_plan_for_turn.get("status") == "ERROR":
|
| 218 |
+
print(f"[Manager._process_turn] Error during generation/data fetching: {final_plan_for_turn.get('thought')}")
|
| 219 |
+
self._send_thought_to_r(f"Error in attempt {current_generation_attempt}: {final_plan_for_turn.get('thought')}. Will retry if attempts remain.") # THOUGHT
|
| 220 |
+
current_query_for_generation_agent = user_query_text # Reset for fresh attempt
|
| 221 |
+
continue
|
| 222 |
|
| 223 |
+
if final_plan_for_turn and not final_plan_for_turn.get("python_code","").strip() and final_plan_for_turn.get("status") != "AWAITING_DATA" and final_plan_for_turn.get("status") != "AWAITING_IMAGE":
|
| 224 |
+
user_facing_output = final_plan_for_turn.get('explanation', final_plan_for_turn.get('thought'))
|
| 225 |
+
self._send_thought_to_r(f"Plan generated a direct answer (no code): {user_facing_output[:100]}...") # THOUGHT
|
| 226 |
+
code_approved_for_execution = True # No code to run, task considered done for this turn.
|
| 227 |
+
return user_facing_output # Return the explanation or thought directly
|
|
|
|
|
|
|
| 228 |
|
| 229 |
+
if final_plan_for_turn and final_plan_for_turn.get("python_code","").strip() and final_plan_for_turn.get("status") != "AWAITING_DATA" and final_plan_for_turn.get("status") != "AWAITING_IMAGE":
|
| 230 |
+
python_code_to_execute = final_plan_for_turn.get("python_code")
|
| 231 |
+
print(f"[GenerationAgent] Final Proposed Code (Attempt {current_generation_attempt}):\\n```python\\n{python_code_to_execute}\\n```")
|
| 232 |
+
self._send_thought_to_r(f"GenerationAgent proposed final code. Sending to SupervisorAgent for review. Code (first 100 chars): {python_code_to_execute[:100]}...") # THOUGHT
|
| 233 |
|
| 234 |
+
if not self.supervisor_agent:
|
| 235 |
+
self._send_thought_to_r("Error: Code supervision is unavailable.") # THOUGHT
|
| 236 |
+
return "Code supervision is unavailable. Cannot execute code."
|
| 237 |
+
|
| 238 |
+
review = self.supervisor_agent.review_code(python_code_to_execute, final_plan_for_turn.get("thought"))
|
| 239 |
+
supervisor_feedback = review.get('safety_feedback', 'No feedback.')
|
| 240 |
+
supervisor_status = review.get('safety_status', 'UNKNOWN_STATUS')
|
| 241 |
+
print(f"[SupervisorAgent] Feedback: {supervisor_feedback} (Status: {supervisor_status})")
|
| 242 |
+
self._send_thought_to_r(f"Supervisor review (final code): {supervisor_status} - {supervisor_feedback}") # THOUGHT
|
| 243 |
+
|
| 244 |
+
if review.get("safety_status") == "APPROVED_FOR_EXECUTION":
|
| 245 |
+
self._send_thought_to_r("Final code approved by Supervisor.") # THOUGHT
|
| 246 |
+
code_approved_for_execution = True
|
| 247 |
+
else:
|
| 248 |
+
rejection_feedback = review.get('safety_feedback', "No specific feedback.")
|
| 249 |
+
user_facing_reason = review.get('user_facing_rejection_reason', "Code could not be approved.")
|
| 250 |
|
| 251 |
+
if current_generation_attempt < max_regeneration_attempts:
|
| 252 |
+
current_query_for_generation_agent = (
|
| 253 |
+
f"The previous code attempt was rejected. Feedback: '{rejection_feedback}'. "
|
| 254 |
+
f"Original query: '{user_query_text}'. Please revise."
|
| 255 |
+
)
|
| 256 |
+
self._send_thought_to_r(f"Code rejected by Supervisor. Feedback: {rejection_feedback}. Will attempt regeneration.") # THOUGHT
|
| 257 |
+
else: # Max retries reached
|
| 258 |
+
self._send_thought_to_r(f"Code rejected by Supervisor. Max regeneration attempts reached. Reason: {user_facing_reason}") # THOUGHT
|
| 259 |
+
final_user_message_query = (
|
| 260 |
+
f"SUPERVISOR_REJECTION_FINAL: Unable to complete after multiple attempts. "
|
| 261 |
+
f"Reason: '{user_facing_reason}'. Original query: '{user_query_text}'."
|
| 262 |
+
)
|
| 263 |
+
if self.generation_agent:
|
| 264 |
+
final_message_plan = self.generation_agent.generate_code_plan(final_user_message_query, self.conversation_history)
|
| 265 |
+
self._send_thought_to_r(f"Generating final user message after rejection: {final_message_plan.get('thought', '')[:100]}...") # THOUGHT
|
| 266 |
+
return final_message_plan.get("thought", f"I apologize, but I was unable to process your request ({user_facing_reason}) after multiple attempts.")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 267 |
else:
|
| 268 |
+
self._send_thought_to_r("Generation agent unavailable for final rejection message.") # THOUGHT
|
| 269 |
+
return f"I apologize, but I was unable to process your request ({user_facing_reason}) and the generation agent is unavailable for a final message."
|
| 270 |
+
elif final_plan_for_turn and final_plan_for_turn.get("status") == "AWAITING_DATA":
|
| 271 |
+
# Data fetch loop ended but status is still AWAITING_DATA (implies error in last fetch step)
|
| 272 |
+
print(f"[Manager._process_turn] Could not complete data fetching: {final_plan_for_turn.get('thought')}")
|
| 273 |
+
self._send_thought_to_r(f"Data fetching loop completed, but still awaiting data (error in last fetch): {final_plan_for_turn.get('thought')}") # THOUGHT
|
| 274 |
+
current_query_for_generation_agent = user_query_text
|
| 275 |
+
continue # Next regeneration attempt
|
| 276 |
+
|
| 277 |
+
# --- End of Regeneration Attempts Loop ---
|
| 278 |
+
self._send_thought_to_r("Exited regeneration attempts loop.") # THOUGHT
|
| 279 |
+
|
| 280 |
+
if final_plan_for_turn and final_plan_for_turn.get("status") == "FATAL_ERROR": # Should have returned earlier
|
| 281 |
+
self._send_thought_to_r(f"Fatal Error encountered: {final_plan_for_turn.get('thought')}") # THOUGHT
|
| 282 |
+
return f"A critical component is missing: {final_plan_for_turn.get('thought')}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 283 |
|
| 284 |
+
if code_approved_for_execution and final_plan_for_turn and final_plan_for_turn.get("python_code","").strip():
|
| 285 |
+
python_code_to_execute = final_plan_for_turn.get("python_code")
|
| 286 |
+
if not self.executor_agent:
|
| 287 |
+
self._send_thought_to_r("Error: Code execution capabilities are unavailable.") # THOUGHT
|
| 288 |
+
return "Code execution capabilities are unavailable."
|
| 289 |
+
|
| 290 |
+
print("[Manager._process_turn] Executing final approved code...")
|
| 291 |
+
self._send_thought_to_r(f"Executing final approved code with ExecutorAgent. Code (first 100 chars): {python_code_to_execute[:100]}...") # THOUGHT
|
| 292 |
+
execution_result = self.executor_agent.execute_code(python_code_to_execute)
|
| 293 |
+
output = execution_result.get("execution_output", "(No output)")
|
| 294 |
+
status = execution_result.get("execution_status", "UNKNOWN_EXEC_STATUS")
|
| 295 |
+
|
| 296 |
+
print(f"[ExecutorAgent] Output:\\n{output}")
|
| 297 |
+
print(f"[ExecutorAgent] Status: {status}")
|
| 298 |
+
self._send_thought_to_r(f"ExecutorAgent finished. Status: {status}. Output (first 100 chars): {output[:100]}...") # THOUGHT
|
| 299 |
+
|
| 300 |
+
final_response_for_user = f"Executed successfully. Output:\\n{output}"
|
| 301 |
+
if "ERROR" in status.upper():
|
| 302 |
+
final_response_for_user = f"Execution resulted in an error. Details:\\n{output}"
|
| 303 |
+
self._send_thought_to_r("Processing complete. Returning final response to user.") # THOUGHT
|
| 304 |
+
return final_response_for_user
|
| 305 |
+
elif not code_approved_for_execution and final_plan_for_turn and final_plan_for_turn.get("python_code","").strip():
|
| 306 |
+
# Max regeneration attempts reached and last one was rejected.
|
| 307 |
+
# The message should have been formed by the GENERATOR_REJECTION_FINAL path above.
|
| 308 |
+
# Fallback if that path wasn't hit or didn't return.
|
| 309 |
+
last_thought = final_plan_for_turn.get('thought', 'Code was not approved after multiple attempts.')
|
| 310 |
+
self._send_thought_to_r(f"Could not approve code after max attempts. Last feedback: {last_thought}") # THOUGHT
|
| 311 |
+
return f"I could not generate and approve code for your request. Last feedback: {last_thought}"
|
| 312 |
+
elif final_plan_for_turn and not final_plan_for_turn.get("python_code","").strip() and final_plan_for_turn.get("status") != "TASK_COMPLETED" and not code_approved_for_execution :
|
| 313 |
+
# This implies a non-code plan was generated, and was not an error or fatal.
|
| 314 |
+
# E.g. AWAITING_USER_INPUT or a direct answer.
|
| 315 |
+
# The path for direct answer without code should have been handled inside the regen loop.
|
| 316 |
+
# This is a fallback.
|
| 317 |
+
fallback_thought = final_plan_for_turn.get('thought', 'No specific action taken, plan did not involve code execution.')
|
| 318 |
+
self._send_thought_to_r(f"Fallback: Plan generated no code and was not marked completed. Thought: {fallback_thought}") # THOUGHT
|
| 319 |
+
return fallback_thought
|
| 320 |
+
elif not final_plan_for_turn:
|
| 321 |
+
self._send_thought_to_r("Error: No plan was generated by the end of processing.") # THOUGHT
|
| 322 |
+
return "I was unable to determine a course of action for your request."
|
| 323 |
+
|
| 324 |
+
# Fallback for any unhandled case, though ideally all paths lead to a return above.
|
| 325 |
+
self._send_thought_to_r("Reached end of _process_turn with an unhandled case.") # THOUGHT
|
| 326 |
+
return "An unexpected state was reached. Please check the logs."
|
| 327 |
+
|
| 328 |
+
def process_single_query(self, user_query_text: str, conversation_history_from_r: list = None) -> str:
|
| 329 |
+
"""
|
| 330 |
+
Processes a single query, suitable for calling from an external system like R/Shiny.
|
| 331 |
+
Manages its own conversation history based on input.
|
| 332 |
+
"""
|
| 333 |
+
print(f"[Manager.process_single_query] Received query: '{user_query_text[:100]}...'")
|
| 334 |
+
if conversation_history_from_r is not None:
|
| 335 |
+
# Overwrite or extend self.conversation_history. For simplicity, let's overwrite.
|
| 336 |
+
# Ensure format matches: list of dicts like {"role": "user/assistant", "content": "..."}
|
| 337 |
+
self.conversation_history = [dict(turn) for turn in conversation_history_from_r] # Ensure dicts
|
| 338 |
+
|
| 339 |
+
# Add the current user query to the history for _process_turn
|
| 340 |
+
self.conversation_history.append({"role": "user", "content": user_query_text})
|
| 341 |
+
|
| 342 |
+
response_text = self._process_turn(user_query_text)
|
| 343 |
+
|
| 344 |
+
# Add agent's response to history (optional if external system manages full history)
|
| 345 |
+
# For consistency, if _process_turn assumes self.conversation_history is updated,
|
| 346 |
+
# then it's good practice to let the Python side manage it fully or clearly delineate.
|
| 347 |
+
# Let's assume the external system (Shiny) will get this response and add it to *its* history.
|
| 348 |
+
# The Python side will receive the full history again next time.
|
| 349 |
+
|
| 350 |
+
# Trim history if it gets too long
|
| 351 |
+
MAX_HISTORY_TURNS_INTERNAL = 10
|
| 352 |
+
if len(self.conversation_history) > MAX_HISTORY_TURNS_INTERNAL * 2: # User + Assistant
|
| 353 |
+
self.conversation_history = self.conversation_history[-(MAX_HISTORY_TURNS_INTERNAL*2):]
|
| 354 |
+
|
| 355 |
+
return response_text
|
| 356 |
|
| 357 |
+
def start_interactive_session(self):
|
| 358 |
+
print("\nStarting interactive session with TaijiChat (Multi-Agent Architecture)...")
|
| 359 |
+
|
| 360 |
+
if not self.client or not self.generation_agent or not self.supervisor_agent:
|
| 361 |
+
# Executor might still be initializable if it has non-LLM functionalities,
|
| 362 |
+
# but core loop needs generation and supervision which depend on the client.
|
| 363 |
+
print("CRITICAL: OpenAI client or one or more essential LLM-dependent agents (Generation, Supervisor) are not available. Cannot start full session.")
|
| 364 |
+
if not self.executor_agent:
|
| 365 |
+
print("CRITICAL: Executor agent also not available.")
|
| 366 |
+
return
|
| 367 |
+
|
| 368 |
+
user_query = input("\nTaijiChat > How can I help you today? \nUser: ")
|
| 369 |
+
while user_query.lower() not in ["exit", "quit"]:
|
| 370 |
+
if not user_query.strip():
|
| 371 |
+
user_query = input("User: ")
|
| 372 |
+
continue
|
| 373 |
+
|
| 374 |
+
# Add user query to internal history
|
| 375 |
+
self.conversation_history.append({"role": "user", "content": user_query})
|
| 376 |
+
|
| 377 |
+
# Call the core processing method
|
| 378 |
+
agent_response_text = self._process_turn(user_query)
|
| 379 |
+
|
| 380 |
+
# Add agent response to internal history
|
| 381 |
+
self.conversation_history.append({"role": "assistant", "content": agent_response_text})
|
| 382 |
+
|
| 383 |
+
# Print agent's response to console
|
| 384 |
+
print(f"TaijiChat > {agent_response_text}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 385 |
|
| 386 |
# Ensure conversation history doesn't grow indefinitely
|
| 387 |
+
MAX_HISTORY_TURNS_TERMINAL = 10
|
| 388 |
+
if len(self.conversation_history) > MAX_HISTORY_TURNS_TERMINAL * 2:
|
| 389 |
+
self.conversation_history = self.conversation_history[-(MAX_HISTORY_TURNS_TERMINAL*2):]
|
| 390 |
|
| 391 |
user_query = input("\nUser: ")
|
| 392 |
|
requirements.txt
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openai
|
| 2 |
+
pandas
|
server.R
CHANGED
|
@@ -9,6 +9,124 @@ library(dplyr)
|
|
| 9 |
# Define server logic
|
| 10 |
function(input, output, session) {
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
# Server logic for home tab
|
| 13 |
output$home <- renderText({
|
| 14 |
"Welcome to the Home page"
|
|
@@ -551,7 +669,7 @@ function(input, output, session) {
|
|
| 551 |
# # Define the start and end index for columns based on the current page
|
| 552 |
# start_col <- (mp_column_page() - 1) * 4 + 1
|
| 553 |
# end_col <- min(start_col + 3, total_cols) # Show up to 4 columns
|
| 554 |
-
#
|
| 555 |
# # If start_col exceeds the total number of columns, return the last valid subset
|
| 556 |
# if (start_col > total_cols) {
|
| 557 |
# start_col <- (ceiling(total_cols / 4) - 1) * 4 + 1 # Set start_col to the last valid page
|
|
@@ -685,10 +803,10 @@ function(input, output, session) {
|
|
| 685 |
# start_col <- (ceiling(total_cols / 4) - 1) * 4 + 1 # Set start_col to the last valid page
|
| 686 |
# end_col <- total_cols # End with the last column
|
| 687 |
# }
|
| 688 |
-
#
|
| 689 |
# # Subset the columns for the current page
|
| 690 |
# df_subset <- df[, start_col:end_col, drop = FALSE]
|
| 691 |
-
#
|
| 692 |
# return(df_subset)
|
| 693 |
# })
|
| 694 |
|
|
@@ -1937,6 +2055,103 @@ function(input, output, session) {
|
|
| 1937 |
)
|
| 1938 |
})
|
| 1939 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1940 |
}
|
| 1941 |
|
| 1942 |
|
|
|
|
| 9 |
# Define server logic
|
| 10 |
function(input, output, session) {
|
| 11 |
|
| 12 |
+
# --- START: TaijiChat R Callback for Python Agent Thoughts ---
|
| 13 |
+
python_agent_thought_callback <- function(thought_message_from_python) {
|
| 14 |
+
# Attempt to explicitly convert to R character and clean up
|
| 15 |
+
thought_message_text <- tryCatch({
|
| 16 |
+
as.character(thought_message_from_python)[1] # Take the first element after converting
|
| 17 |
+
}, error = function(e) {
|
| 18 |
+
print(paste("R Callback: Error converting thought to character:", e$message))
|
| 19 |
+
return(NULL)
|
| 20 |
+
})
|
| 21 |
+
|
| 22 |
+
if (!is.null(thought_message_text) && is.character(thought_message_text) && length(thought_message_text) == 1 && nzchar(trimws(thought_message_text))) {
|
| 23 |
+
# print(paste("R Callback: Valid thought from Python -", thought_message_text)) # For R console debugging
|
| 24 |
+
session$sendCustomMessage(type = "agent_new_thought", message = list(text = trimws(thought_message_text)))
|
| 25 |
+
} else {
|
| 26 |
+
# Log the original and potentially converted type for better debugging
|
| 27 |
+
print(paste("R Callback: Received invalid or empty thought. Original type:", class(thought_message_from_python), ", Value:", thought_message_from_python, ", Converted text:", thought_message_text))
|
| 28 |
+
}
|
| 29 |
+
}
|
| 30 |
+
# --- END: TaijiChat R Callback for Python Agent Thoughts ---
|
| 31 |
+
|
| 32 |
+
# --- START: TaijiChat Agent Initialization ---
|
| 33 |
+
# This assumes manager_agent_module is globally available from ui.R's sourcing.
|
| 34 |
+
# ui.R does: manager_agent_module <- reticulate::import("agents.manager_agent")
|
| 35 |
+
|
| 36 |
+
api_key_val <- NULL
|
| 37 |
+
tryCatch({
|
| 38 |
+
api_key_content <- readLines("api_key.txt", warn = FALSE)
|
| 39 |
+
if (length(api_key_content) > 0 && nzchar(trimws(api_key_content[1]))) {
|
| 40 |
+
api_key_val <- trimws(api_key_content[1])
|
| 41 |
+
print("TaijiChat: API key successfully read in server.R.")
|
| 42 |
+
} else {
|
| 43 |
+
warning("TaijiChat: api_key.txt is empty or not found. LLM features may be disabled.")
|
| 44 |
+
print("TaijiChat: api_key.txt is empty or not found.")
|
| 45 |
+
}
|
| 46 |
+
}, error = function(e) {
|
| 47 |
+
warning(paste("TaijiChat: Error reading api_key.txt in server.R:", e$message))
|
| 48 |
+
print(paste("TaijiChat: Error reading api_key.txt:", e$message))
|
| 49 |
+
})
|
| 50 |
+
|
| 51 |
+
py_openai_client_instance <- NULL
|
| 52 |
+
if (!is.null(api_key_val)) {
|
| 53 |
+
tryCatch({
|
| 54 |
+
# Ensure reticulate is configured to use the correct Python environment
|
| 55 |
+
# This should ideally be done once at the top of ui.R or app.R
|
| 56 |
+
# If not, reticulate might pick up a system default Python.
|
| 57 |
+
# We assume ui.R has handled reticulate::use_python() or similar.
|
| 58 |
+
if (reticulate::py_available(initialize = TRUE)) {
|
| 59 |
+
openai_py_module <- reticulate::import("openai", convert = FALSE) # convert=FALSE for raw Python objects
|
| 60 |
+
py_openai_client_instance <- openai_py_module$OpenAI(api_key = api_key_val)
|
| 61 |
+
print("TaijiChat: Python OpenAI client initialized successfully in server.R via reticulate.")
|
| 62 |
+
} else {
|
| 63 |
+
warning("TaijiChat: Python (reticulate) not available or not initialized. Cannot create OpenAI client.")
|
| 64 |
+
print("TaijiChat: Python (reticulate) not available. Cannot create OpenAI client.")
|
| 65 |
+
}
|
| 66 |
+
}, error = function(e) {
|
| 67 |
+
warning(paste("TaijiChat: Failed to initialize Python OpenAI client in server.R:", e$message))
|
| 68 |
+
print(paste("TaijiChat: Failed to initialize Python OpenAI client:", e$message))
|
| 69 |
+
py_openai_client_instance <- NULL
|
| 70 |
+
})
|
| 71 |
+
} else {
|
| 72 |
+
print("TaijiChat: API key is NULL, skipping Python OpenAI client initialization.")
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
rv_agent_instance <- reactiveVal(NULL)
|
| 76 |
+
|
| 77 |
+
# Attempt to create the agent instance once.
|
| 78 |
+
current_manager_agent_module <- NULL
|
| 79 |
+
tryCatch({
|
| 80 |
+
current_manager_agent_module <- reticulate::import("agents.manager_agent", convert = FALSE)
|
| 81 |
+
if (is.null(current_manager_agent_module)) {
|
| 82 |
+
warning("TaijiChat: reticulate::import('agents.manager_agent') returned NULL in server.R.")
|
| 83 |
+
print("TaijiChat: reticulate::import('agents.manager_agent') returned NULL in server.R.")
|
| 84 |
+
} else {
|
| 85 |
+
print("TaijiChat: Successfully imported/retrieved 'agents.manager_agent' module in server.R.")
|
| 86 |
+
}
|
| 87 |
+
}, error = function(e) {
|
| 88 |
+
warning(paste("TaijiChat: Failed to import agents.manager_agent in server.R:", e$message))
|
| 89 |
+
print(paste("TaijiChat: Failed to import agents.manager_agent in server.R:", e$message))
|
| 90 |
+
})
|
| 91 |
+
|
| 92 |
+
if (!is.null(current_manager_agent_module)) {
|
| 93 |
+
# Module is available, now try to instantiate the agent
|
| 94 |
+
if (!is.null(py_openai_client_instance)) {
|
| 95 |
+
tryCatch({
|
| 96 |
+
agent_inst <- current_manager_agent_module$ManagerAgent(
|
| 97 |
+
openai_client = py_openai_client_instance,
|
| 98 |
+
r_callback_fn = python_agent_thought_callback # Pass the R callback here
|
| 99 |
+
)
|
| 100 |
+
rv_agent_instance(agent_inst)
|
| 101 |
+
print("TaijiChat: Python ManagerAgent instance created in server.R using pre-initialized client and R callback.")
|
| 102 |
+
}, error = function(e) {
|
| 103 |
+
warning(paste("TaijiChat: Failed to instantiate ManagerAgent in server.R with client & callback:", e$message))
|
| 104 |
+
print(paste("TaijiChat: Failed to instantiate ManagerAgent with client & callback:", e$message))
|
| 105 |
+
})
|
| 106 |
+
} else if (!is.null(api_key_val)) { # Try with API key if client object failed but key exists
|
| 107 |
+
tryCatch({
|
| 108 |
+
agent_inst <- current_manager_agent_module$ManagerAgent(
|
| 109 |
+
openai_api_key = api_key_val,
|
| 110 |
+
r_callback_fn = python_agent_thought_callback # Pass the R callback here
|
| 111 |
+
)
|
| 112 |
+
rv_agent_instance(agent_inst)
|
| 113 |
+
print("TaijiChat: Python ManagerAgent instance created in server.R with API key and R callback (client to be init by Python).")
|
| 114 |
+
}, error = function(e) {
|
| 115 |
+
warning(paste("TaijiChat: Failed to instantiate ManagerAgent with API key & callback in server.R:", e$message))
|
| 116 |
+
print(paste("TaijiChat: Failed to instantiate ManagerAgent with API key & callback:", e$message))
|
| 117 |
+
})
|
| 118 |
+
} else {
|
| 119 |
+
# Neither client nor API key is available for the agent
|
| 120 |
+
warning("TaijiChat: Cannot create ManagerAgent instance: OpenAI client/API key not available for agent constructor.")
|
| 121 |
+
print("TaijiChat: Cannot create ManagerAgent: OpenAI client/API key not available for agent constructor.")
|
| 122 |
+
}
|
| 123 |
+
} else {
|
| 124 |
+
# Module itself could not be imported/retrieved
|
| 125 |
+
warning("TaijiChat: agents.manager_agent module is NULL after import attempt. Agent not created.")
|
| 126 |
+
print("TaijiChat: agents.manager_agent module is NULL after import attempt. Agent not created.")
|
| 127 |
+
}
|
| 128 |
+
# --- END: TaijiChat Agent Initialization ---
|
| 129 |
+
|
| 130 |
# Server logic for home tab
|
| 131 |
output$home <- renderText({
|
| 132 |
"Welcome to the Home page"
|
|
|
|
| 669 |
# # Define the start and end index for columns based on the current page
|
| 670 |
# start_col <- (mp_column_page() - 1) * 4 + 1
|
| 671 |
# end_col <- min(start_col + 3, total_cols) # Show up to 4 columns
|
| 672 |
+
#
|
| 673 |
# # If start_col exceeds the total number of columns, return the last valid subset
|
| 674 |
# if (start_col > total_cols) {
|
| 675 |
# start_col <- (ceiling(total_cols / 4) - 1) * 4 + 1 # Set start_col to the last valid page
|
|
|
|
| 803 |
# start_col <- (ceiling(total_cols / 4) - 1) * 4 + 1 # Set start_col to the last valid page
|
| 804 |
# end_col <- total_cols # End with the last column
|
| 805 |
# }
|
| 806 |
+
#
|
| 807 |
# # Subset the columns for the current page
|
| 808 |
# df_subset <- df[, start_col:end_col, drop = FALSE]
|
| 809 |
+
#
|
| 810 |
# return(df_subset)
|
| 811 |
# })
|
| 812 |
|
|
|
|
| 2055 |
)
|
| 2056 |
})
|
| 2057 |
|
| 2058 |
+
# --- START: TaijiChat Message Handling ---
|
| 2059 |
+
chat_history <- reactiveVal(list()) # Stores list of lists: list(role="user/assistant", content="message")
|
| 2060 |
+
|
| 2061 |
+
observeEvent(input$user_chat_message, {
|
| 2062 |
+
req(input$user_chat_message)
|
| 2063 |
+
user_message_text <- trimws(input$user_chat_message)
|
| 2064 |
+
print(paste("TaijiChat: Received user_chat_message -", user_message_text))
|
| 2065 |
+
|
| 2066 |
+
if (nzchar(user_message_text)) {
|
| 2067 |
+
current_hist <- chat_history()
|
| 2068 |
+
updated_hist_user <- append(current_hist, list(list(role = "user", content = user_message_text)))
|
| 2069 |
+
chat_history(updated_hist_user)
|
| 2070 |
+
|
| 2071 |
+
agent_instance_val <- rv_agent_instance()
|
| 2072 |
+
|
| 2073 |
+
if (!is.null(agent_instance_val)) {
|
| 2074 |
+
# Ensure history is a list of R named lists, then r_to_py will convert to list of Python dicts
|
| 2075 |
+
py_hist_for_agent <- lapply(updated_hist_user, function(turn) {
|
| 2076 |
+
list(role = turn$role, content = turn$content)
|
| 2077 |
+
})
|
| 2078 |
+
# py_hist_for_agent_converted <- reticulate::r_to_py(py_hist_for_agent)
|
| 2079 |
+
|
| 2080 |
+
# Send a "Thinking..." message to UI before long computation
|
| 2081 |
+
session$sendCustomMessage(type = "agent_thinking_started", message = list(text = "Thinking..."))
|
| 2082 |
+
|
| 2083 |
+
tryCatch({
|
| 2084 |
+
print(paste("TaijiChat: Sending to Python agent - Query:", user_message_text))
|
| 2085 |
+
# For debugging, convert history to JSON string to see its structure if needed
|
| 2086 |
+
# hist_json_debug <- jsonlite::toJSON(py_hist_for_agent, auto_unbox = TRUE)
|
| 2087 |
+
# print(paste("TaijiChat: Conversation history (JSON for debug):", hist_json_debug))
|
| 2088 |
+
|
| 2089 |
+
# Call Python agent method
|
| 2090 |
+
# The process_single_query method in Python expects history as a list of dicts.
|
| 2091 |
+
# reticulate::r_to_py should handle the conversion of the list of R named lists.
|
| 2092 |
+
agent_reply_py <- agent_instance_val$process_single_query(
|
| 2093 |
+
user_query_text = user_message_text,
|
| 2094 |
+
conversation_history_from_r = py_hist_for_agent # Pass the R list of lists
|
| 2095 |
+
)
|
| 2096 |
+
# Explicitly convert potential Python object to R character string
|
| 2097 |
+
agent_reply_text <- as.character(agent_reply_py)
|
| 2098 |
+
|
| 2099 |
+
print(paste("TaijiChat: Received from Python agent -", agent_reply_text))
|
| 2100 |
+
|
| 2101 |
+
final_hist <- append(updated_hist_user, list(list(role = "assistant", content = agent_reply_text)))
|
| 2102 |
+
chat_history(final_hist)
|
| 2103 |
+
|
| 2104 |
+
session$sendCustomMessage(type = "agent_chat_response", message = list(text = agent_reply_text))
|
| 2105 |
+
|
| 2106 |
+
}, error = function(e) {
|
| 2107 |
+
error_message <- paste("TaijiChat: Error calling Python agent or processing response:", e$message)
|
| 2108 |
+
warning(error_message)
|
| 2109 |
+
print(error_message)
|
| 2110 |
+
session$sendCustomMessage(type = "agent_chat_response", message = list(text = paste("Sorry, an error occurred with the agent.")))
|
| 2111 |
+
})
|
| 2112 |
+
} else {
|
| 2113 |
+
warning("TaijiChat: Agent instance is NULL. Cannot process chat message.")
|
| 2114 |
+
print("TaijiChat: Agent instance is NULL. Cannot process chat message.")
|
| 2115 |
+
session$sendCustomMessage(type = "agent_chat_response", message = list(text = "The chat agent is not available. Please check server logs."))
|
| 2116 |
+
}
|
| 2117 |
+
} else {
|
| 2118 |
+
print("TaijiChat: Received empty user_chat_message.")
|
| 2119 |
+
}
|
| 2120 |
+
})
|
| 2121 |
+
# --- END: TaijiChat Message Handling ---
|
| 2122 |
+
|
| 2123 |
+
#Render and hyperlink table + edit size so that everything fits into webpage
|
| 2124 |
+
output$multiomicsdatatable <- renderDT({
|
| 2125 |
+
# Transform the "author" column to contain hyperlinks using the "DOI" column
|
| 2126 |
+
multiexcel_data <- multiexcel_data %>%
|
| 2127 |
+
mutate(
|
| 2128 |
+
Author = paste0(
|
| 2129 |
+
"<a href='",
|
| 2130 |
+
DOI, # Column with the full DOI URLs
|
| 2131 |
+
"' target='_blank'>",
|
| 2132 |
+
Author, # Column with the display text (e.g., author name)
|
| 2133 |
+
"</a>"
|
| 2134 |
+
)
|
| 2135 |
+
) %>%
|
| 2136 |
+
select(-DOI) # Remove the "DOI" column after linking it to the "author" column
|
| 2137 |
+
|
| 2138 |
+
# Dynamically remove empty columns ("18", "19", etc.)
|
| 2139 |
+
multiexcel_data <- multiexcel_data %>%
|
| 2140 |
+
select(where(~ !all(is.na(.)) & !all(. == ""))) # Keep only non-empty columns
|
| 2141 |
+
|
| 2142 |
+
# Render the data table with fit-to-page options
|
| 2143 |
+
datatable(
|
| 2144 |
+
multiexcel_data,
|
| 2145 |
+
options = list(
|
| 2146 |
+
autoWidth = TRUE, # Adjust column widths automatically
|
| 2147 |
+
scrollX = TRUE, # Enable horizontal scrolling
|
| 2148 |
+
pageLength = 10 # Limit rows displayed per page (adjustable)
|
| 2149 |
+
),
|
| 2150 |
+
rownames = FALSE,
|
| 2151 |
+
escape = FALSE # Allow HTML rendering for links
|
| 2152 |
+
)
|
| 2153 |
+
})
|
| 2154 |
+
|
| 2155 |
}
|
| 2156 |
|
| 2157 |
|
tested_queries.txt
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# --- Easy Queries (Navigation & Simple Data Retrieval) ---
|
| 2 |
+
|
| 3 |
+
# Navigation
|
| 4 |
+
1. "Show me the home page."
|
| 5 |
+
2. "Take me to the TE (Terminal Exhaustion) data section."
|
| 6 |
+
3. "I want to see the multi-omics data."
|
| 7 |
+
4. "Navigate to the TF (Transcription Factor) Wave Analysis overview."
|
| 8 |
+
5. "Where can I find information about TRM communities?"
|
| 9 |
+
|
| 10 |
+
# Simple Data Retrieval (from existing tables/UI elements)
|
| 11 |
+
6. "In the 'All Data Search' (main page), what are the TF activity scores for STAT3?"
|
| 12 |
+
7. "For the Naive T-cell state, search for scores related to JUNB."
|
| 13 |
+
8. "What waves is the TF 'BATF' a part of?" (Uses searchtfwaves.xlsx)
|
| 14 |
+
9. "Display the TRM communities table."
|
| 15 |
+
10. "Find the research paper by 'Chen' in the multi-omics data." (Assumes 'Chen' is an author)
|
| 16 |
+
|
| 17 |
+
# --- Medium Queries (Requires Tool Use & Simple Code for Analysis/Formatting) ---
|
| 18 |
+
|
| 19 |
+
# Basic Analysis / Data Manipulation (if agent can generate code for simple tasks)
|
| 20 |
+
11. "From the 'All Data Search' table, can you list the top 3 TFs with the highest scores in the first displayed cell state (e.g., Naive_Day0_vs_Day7_UP)?" (Requires identifying a column and finding max values)
|
| 21 |
+
12. "What is the average TF activity score for 'IRF4' across all displayed cell states in the 'All Data Search' section for the current view?" (Requires iterating through columns if multiple are shown for IRF4)
|
| 22 |
+
13. "Compare the TF activity scores for 'TCF7' and 'TOX' in the 'TE' (Terminal Exhaustion) dataset. Which one is generally higher?"
|
| 23 |
+
14. "If I search for 'BACH2' in the main TF activity score table, how many cell states show a score greater than 1.0?"
|
| 24 |
+
15. "Can you provide the TF activity scores for 'PRDM1' in the TEM (T Effector Memory) dataset, but only show me the cell states where the score is negative?"
|
| 25 |
+
|
| 26 |
+
# --- Difficult Queries (Requires LLM Interpretation, Insight Generation, Complex Tool Orchestration) ---
|
| 27 |
+
|
| 28 |
+
# Insight Generation & Interpretation
|
| 29 |
+
16. "Based on the available TF activity scores, which TFs seem to be most consistently upregulated across different exhausted T-cell states (e.g., TEXprog, TEXeff, TEXterm)?" (Requires understanding of "exhausted", cross-table comparison, and summarization)
|
| 30 |
+
17. "Is there a noticeable trend or pattern in the activity of 'EOMES' as T-cells progress from Naive to various effector and memory states shown in the data?" (Requires interpreting progression and comparing multiple datasets)
|
| 31 |
+
18. "Considering the TF communities data for TRM and TEX, are there any TFs that are prominent in both TRM and TEX communities, suggesting a shared role?" (Requires comparing two distinct datasets/visualizations and identifying overlaps)
|
| 32 |
+
19. "Analyze the TF activity scores for 'FOXO1'. Does its activity pattern suggest a role in maintaining T-cell quiescence or promoting activation/exhaustion based on the data available across different T-cell states?" (Requires biological interpretation linked to data patterns)
|
| 33 |
+
20. "If a researcher is interested in TFs that are highly active in T Effector Memory (TEM) cells but show low activity in Terminally Exhausted (TEXterm) cells, which TFs should they investigate further based on the provided datasets?" (Requires filtering, comparison across datasets, and a recommendation)
|
| 34 |
+
21. "Looking at the TF Wave Analysis, which TFs are predominantly active in early waves versus late waves? What might this imply about their roles in T-cell differentiation or response dynamics?" (Requires interpreting the wave data and drawing higher-level conclusions)
|
| 35 |
+
22. "The user uploaded an image of a UMAP plot showing clusters. The file is 'www/test_images/umap_example.png'. Can you describe what you see in the image and how it might relate to T-cell states if cluster A is Naive, cluster B is TEM, and cluster C is TEX?" (Requires multimodal input, assuming the agent can be pointed to local files for analysis - this tests the image upload and interpretation flow we built)
|
| 36 |
+
23. "Given the data in 'Table_TF PageRank Scores for Audrey.xlsx', identify three TFs that have significantly different activity scores between 'Naive_Day0_vs_Day7_UP' and 'MP_Day0_vs_Day7_UP'. Explain the potential biological significance of these differences." (Requires direct data analysis from a file, comparison, and biological reasoning)
|
| 37 |
+
|
| 38 |
+
# Creative/Hypothetical (tests robustness and deeper understanding)
|
| 39 |
+
24. "If we wanted to design an experiment to reverse T-cell exhaustion, which 2-3 TFs might be good targets for modulation (activation or inhibition) based on their activity profiles in the provided datasets, and why?"
|
| 40 |
+
25. "Explain the overall story the TF activity data tells about T-cell differentiation and exhaustion from Naive to Terminally Exhausted states, highlighting 3 key TF players and their changing roles."
|
tools/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# This file makes the 'tools' directory a Python package.
|
tools/agent_tools.py
CHANGED
|
@@ -3,6 +3,19 @@ import os
|
|
| 3 |
import json
|
| 4 |
import glob # For os.walk if needed, or can use glob directly
|
| 5 |
import mimetypes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
# --- Define Project Root and WWW Path relative to this file ---
|
| 8 |
# This file is in taijichat/tools/
|
|
@@ -540,7 +553,10 @@ def discover_excel_files_and_schemas(base_scan_directory_name: str = "www") -> d
|
|
| 540 |
"file_path": file_rel_path,
|
| 541 |
"table_identifier": table_identifier,
|
| 542 |
"columns": columns,
|
| 543 |
-
"sheets": xls.sheet_names if xls.sheet_names else [] # Store all sheet names
|
|
|
|
|
|
|
|
|
|
| 544 |
}
|
| 545 |
except Exception as e:
|
| 546 |
print(f"[Schema Discovery] Error reading or processing headers for {file_abs_path}: {e}")
|
|
@@ -613,7 +629,9 @@ def list_all_files_in_www_directory() -> list:
|
|
| 613 |
file_manifest.append({
|
| 614 |
"path": file_rel_path_from_project_root,
|
| 615 |
"type": mime_type,
|
| 616 |
-
"size": file_size
|
|
|
|
|
|
|
| 617 |
})
|
| 618 |
except FileNotFoundError: # Should not happen if os.walk found it, but as a safeguard
|
| 619 |
print(f"[File Manifest] Warning: File {file_abs_path} found by os.walk but then not accessible for size/type.")
|
|
@@ -625,9 +643,340 @@ def list_all_files_in_www_directory() -> list:
|
|
| 625 |
"path": file_rel_path_from_project_root,
|
| 626 |
"type": "unknown/error",
|
| 627 |
"size": 0,
|
|
|
|
| 628 |
"error": str(e)
|
| 629 |
})
|
| 630 |
|
| 631 |
return file_manifest
|
| 632 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 633 |
# No __main__ block here, this is a module of tools.
|
|
|
|
| 3 |
import json
|
| 4 |
import glob # For os.walk if needed, or can use glob directly
|
| 5 |
import mimetypes
|
| 6 |
+
from datetime import datetime
|
| 7 |
+
|
| 8 |
+
# --- NEW IMPORTS FOR LITERATURE SEARCH ---
|
| 9 |
+
from semanticscholar import SemanticScholar # For Semantic Scholar API
|
| 10 |
+
from Bio import Entrez # For PubMed
|
| 11 |
+
import arxiv # For ArXiv API
|
| 12 |
+
import time # For potential rate limiting / delays
|
| 13 |
+
# --- END NEW IMPORTS ---
|
| 14 |
+
|
| 15 |
+
# --- NEW IMPORTS FOR TEXT FETCHING FROM URLS ---
|
| 16 |
+
import requests
|
| 17 |
+
from bs4 import BeautifulSoup
|
| 18 |
+
# --- END NEW IMPORTS FOR TEXT FETCHING ---
|
| 19 |
|
| 20 |
# --- Define Project Root and WWW Path relative to this file ---
|
| 21 |
# This file is in taijichat/tools/
|
|
|
|
| 553 |
"file_path": file_rel_path,
|
| 554 |
"table_identifier": table_identifier,
|
| 555 |
"columns": columns,
|
| 556 |
+
"sheets": xls.sheet_names if xls.sheet_names else [], # Store all sheet names
|
| 557 |
+
"last_modified": datetime.now().isoformat(),
|
| 558 |
+
"file_size_bytes": os.path.getsize(file_abs_path),
|
| 559 |
+
"error": None
|
| 560 |
}
|
| 561 |
except Exception as e:
|
| 562 |
print(f"[Schema Discovery] Error reading or processing headers for {file_abs_path}: {e}")
|
|
|
|
| 629 |
file_manifest.append({
|
| 630 |
"path": file_rel_path_from_project_root,
|
| 631 |
"type": mime_type,
|
| 632 |
+
"size": file_size,
|
| 633 |
+
"last_modified": datetime.now().isoformat(),
|
| 634 |
+
"error": None
|
| 635 |
})
|
| 636 |
except FileNotFoundError: # Should not happen if os.walk found it, but as a safeguard
|
| 637 |
print(f"[File Manifest] Warning: File {file_abs_path} found by os.walk but then not accessible for size/type.")
|
|
|
|
| 643 |
"path": file_rel_path_from_project_root,
|
| 644 |
"type": "unknown/error",
|
| 645 |
"size": 0,
|
| 646 |
+
"last_modified": datetime.now().isoformat(),
|
| 647 |
"error": str(e)
|
| 648 |
})
|
| 649 |
|
| 650 |
return file_manifest
|
| 651 |
|
| 652 |
+
# --- START: Literature Search Tool Implementation ---
|
| 653 |
+
|
| 654 |
+
def _normalize_authors(authors_data, source="Unknown"):
|
| 655 |
+
"""Helper to normalize author lists from different APIs."""
|
| 656 |
+
if not authors_data:
|
| 657 |
+
return ["N/A"]
|
| 658 |
+
if source == "SemanticScholar": # List of dicts with 'name' key
|
| 659 |
+
return [author.get('name', "N/A") for author in authors_data]
|
| 660 |
+
if source == "PubMed": # List of strings
|
| 661 |
+
return authors_data
|
| 662 |
+
if source == "ArXiv": # List of arxiv.Result.Author objects
|
| 663 |
+
return [author.name for author in authors_data]
|
| 664 |
+
return [str(a) for a in authors_data] # Generic fallback
|
| 665 |
+
|
| 666 |
+
def _search_semanticscholar_internal(query: str, max_results: int = 2) -> list[dict]:
|
| 667 |
+
papers = []
|
| 668 |
+
# print(f"[Tool:_search_semanticscholar_internal] Querying Semantic Scholar for: '{query}' (max: {max_results})") # COMMENTED OUT
|
| 669 |
+
try:
|
| 670 |
+
s2 = SemanticScholar(timeout=15)
|
| 671 |
+
# Corrected: 'doi' is not a direct field for search_paper, 'externalIds' should be used.
|
| 672 |
+
results = s2.search_paper(query, limit=max_results, fields=['title', 'authors', 'year', 'abstract', 'url', 'venue', 'externalIds'])
|
| 673 |
+
if results and results.items:
|
| 674 |
+
for item in results.items:
|
| 675 |
+
doi_val = item.externalIds.get('DOI') if item.externalIds else None
|
| 676 |
+
papers.append({
|
| 677 |
+
"title": getattr(item, 'title', "N/A"),
|
| 678 |
+
"authors": _normalize_authors(getattr(item, 'authors', []), "SemanticScholar"),
|
| 679 |
+
"year": getattr(item, 'year', "N/A"),
|
| 680 |
+
"abstract": getattr(item, 'abstract', "N/A")[:500] + "..." if getattr(item, 'abstract', None) else "N/A",
|
| 681 |
+
"doi": doi_val, # Use the extracted DOI
|
| 682 |
+
"url": getattr(item, 'url', "N/A"),
|
| 683 |
+
"venue": getattr(item, 'venue', "N/A"),
|
| 684 |
+
"source_api": "Semantic Scholar"
|
| 685 |
+
})
|
| 686 |
+
except Exception as e:
|
| 687 |
+
# This print goes to stderr if run directly, but might still be captured by a simple exec context.
|
| 688 |
+
# For agent integration, actual errors should be raised or returned structured.
|
| 689 |
+
# For now, we'll assume ManagerAgent's error handling for the overall tool call is preferred.
|
| 690 |
+
# Let's comment this out for now to ensure no stdout interference.
|
| 691 |
+
# print(f"[Tool:_search_semanticscholar_internal] Error: {e}", file=sys.stderr)
|
| 692 |
+
pass # Allow the function to return an empty list on error.
|
| 693 |
+
return papers
|
| 694 |
+
|
| 695 |
+
def _search_pubmed_internal(query: str, max_results: int = 2) -> list[dict]:
|
| 696 |
+
papers = []
|
| 697 |
+
# print(f"[Tool:_search_pubmed_internal] Querying PubMed for: '{query}' (max: {max_results})") # COMMENTED OUT
|
| 698 |
+
try:
|
| 699 |
+
handle = Entrez.esearch(db="pubmed", term=query, retmax=str(max_results), sort="relevance")
|
| 700 |
+
record = Entrez.read(handle)
|
| 701 |
+
handle.close()
|
| 702 |
+
ids = record["IdList"]
|
| 703 |
+
if not ids:
|
| 704 |
+
return papers
|
| 705 |
+
|
| 706 |
+
handle = Entrez.efetch(db="pubmed", id=ids, rettype="medline", retmode="xml")
|
| 707 |
+
records = Entrez.read(handle) # This is MedlineParser.parse, returns a generator usually
|
| 708 |
+
handle.close()
|
| 709 |
+
|
| 710 |
+
for pubmed_article in records.get('PubmedArticle', []): # records is a list of dicts if multiple ids
|
| 711 |
+
article = pubmed_article.get('MedlineCitation', {}).get('Article', {})
|
| 712 |
+
title = article.get('ArticleTitle', "N/A")
|
| 713 |
+
abstract_text_list = article.get('Abstract', {}).get('AbstractText', [])
|
| 714 |
+
abstract = " ".join(abstract_text_list)[:500] + "..." if abstract_text_list else "N/A"
|
| 715 |
+
year = article.get('Journal', {}).get('JournalIssue', {}).get('PubDate', {}).get('Year', "N/A")
|
| 716 |
+
authors_list = []
|
| 717 |
+
author_info_list = article.get('AuthorList', [])
|
| 718 |
+
for auth in author_info_list:
|
| 719 |
+
if auth.get('LastName') and auth.get('ForeName'):
|
| 720 |
+
authors_list.append(f"{auth.get('ForeName')} {auth.get('LastName')}")
|
| 721 |
+
elif auth.get('CollectiveName'):
|
| 722 |
+
authors_list.append(auth.get('CollectiveName'))
|
| 723 |
+
|
| 724 |
+
doi = None
|
| 725 |
+
article_ids = pubmed_article.get('PubmedData', {}).get('ArticleIdList', [])
|
| 726 |
+
for aid in article_ids:
|
| 727 |
+
if aid.attributes.get('IdType') == 'doi':
|
| 728 |
+
doi = str(aid) # The content of the tag is the DOI
|
| 729 |
+
break
|
| 730 |
+
|
| 731 |
+
pmid = pubmed_article.get('MedlineCitation', {}).get('PMID', None)
|
| 732 |
+
url = f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/" if pmid else "N/A"
|
| 733 |
+
venue = article.get('Journal', {}).get('Title', "N/A")
|
| 734 |
+
|
| 735 |
+
papers.append({
|
| 736 |
+
"title": title,
|
| 737 |
+
"authors": _normalize_authors(authors_list, "PubMed"),
|
| 738 |
+
"year": year,
|
| 739 |
+
"abstract": abstract,
|
| 740 |
+
"doi": doi,
|
| 741 |
+
"url": url,
|
| 742 |
+
"venue": venue,
|
| 743 |
+
"source_api": "PubMed"
|
| 744 |
+
})
|
| 745 |
+
if len(papers) >= max_results: # Ensure we don't exceed due to structure of efetch
|
| 746 |
+
break
|
| 747 |
+
|
| 748 |
+
except Exception as e:
|
| 749 |
+
# print(f"[Tool:_search_pubmed_internal] Error: {e}", file=sys.stderr) # COMMENTED OUT
|
| 750 |
+
pass
|
| 751 |
+
return papers
|
| 752 |
+
|
| 753 |
+
def _search_arxiv_internal(query: str, max_results: int = 2) -> list[dict]:
|
| 754 |
+
papers = []
|
| 755 |
+
# print(f"[Tool:_search_arxiv_internal] Querying ArXiv for: '{query}' (max: {max_results})") # COMMENTED OUT
|
| 756 |
+
try:
|
| 757 |
+
search = arxiv.Search(
|
| 758 |
+
query = query,
|
| 759 |
+
max_results = max_results,
|
| 760 |
+
sort_by = arxiv.SortCriterion.Relevance
|
| 761 |
+
)
|
| 762 |
+
results = list(arxiv.Client().results(search)) # Convert generator to list
|
| 763 |
+
|
| 764 |
+
for result in results:
|
| 765 |
+
papers.append({
|
| 766 |
+
"title": getattr(result, 'title', "N/A"),
|
| 767 |
+
"authors": _normalize_authors(getattr(result, 'authors', []), "ArXiv"),
|
| 768 |
+
"year": getattr(result, 'published').year if getattr(result, 'published', None) else "N/A",
|
| 769 |
+
"abstract": getattr(result, 'summary', "N/A").replace('\n', ' ')[:500] + "...", # ArXiv abstracts can have newlines
|
| 770 |
+
"doi": getattr(result, 'doi', None),
|
| 771 |
+
"url": getattr(result, 'entry_id', "N/A"), # entry_id is the ArXiv URL like http://arxiv.org/abs/xxxx.xxxxx
|
| 772 |
+
"venue": "ArXiv", # ArXiv is the venue
|
| 773 |
+
"source_api": "ArXiv"
|
| 774 |
+
})
|
| 775 |
+
except Exception as e:
|
| 776 |
+
# print(f"[Tool:_search_arxiv_internal] Error: {e}", file=sys.stderr) # COMMENTED OUT
|
| 777 |
+
pass
|
| 778 |
+
return papers
|
| 779 |
+
|
| 780 |
+
def multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]:
|
| 781 |
+
# print(f"[Tool:multi_source_literature_search] Received {len(queries)} queries. Max results per source/query: {max_results_per_query_per_source}, Max total unique papers: {max_total_unique_papers}") # COMMENTED OUT
|
| 782 |
+
|
| 783 |
+
unique_papers_found_so_far = []
|
| 784 |
+
processed_dois = set()
|
| 785 |
+
processed_titles_authors = set()
|
| 786 |
+
|
| 787 |
+
for query_idx, query_str in enumerate(queries):
|
| 788 |
+
if len(unique_papers_found_so_far) >= max_total_unique_papers:
|
| 789 |
+
# print(f" Max total unique papers ({max_total_unique_papers}) reached. Stopping query processing early.") # COMMENTED OUT
|
| 790 |
+
break
|
| 791 |
+
|
| 792 |
+
# print(f" Processing query {query_idx+1}/{len(queries)}: '{query_str}'") # COMMENTED OUT
|
| 793 |
+
|
| 794 |
+
current_query_results_from_all_sources = []
|
| 795 |
+
|
| 796 |
+
# Semantic Scholar
|
| 797 |
+
if len(unique_papers_found_so_far) < max_total_unique_papers:
|
| 798 |
+
s2_results = _search_semanticscholar_internal(query_str, max_results_per_query_per_source)
|
| 799 |
+
current_query_results_from_all_sources.extend(s2_results)
|
| 800 |
+
|
| 801 |
+
# PubMed
|
| 802 |
+
if len(unique_papers_found_so_far) < max_total_unique_papers:
|
| 803 |
+
pubmed_results = _search_pubmed_internal(query_str, max_results_per_query_per_source)
|
| 804 |
+
current_query_results_from_all_sources.extend(pubmed_results)
|
| 805 |
+
|
| 806 |
+
# ArXiv
|
| 807 |
+
if len(unique_papers_found_so_far) < max_total_unique_papers:
|
| 808 |
+
arxiv_results = _search_arxiv_internal(query_str, max_results_per_query_per_source)
|
| 809 |
+
current_query_results_from_all_sources.extend(arxiv_results)
|
| 810 |
+
|
| 811 |
+
# De-duplicate results from current_query_results_from_all_sources and add to unique_papers_found_so_far
|
| 812 |
+
for paper in current_query_results_from_all_sources:
|
| 813 |
+
if len(unique_papers_found_so_far) >= max_total_unique_papers:
|
| 814 |
+
break
|
| 815 |
+
|
| 816 |
+
is_new_paper = False
|
| 817 |
+
doi = paper.get("doi")
|
| 818 |
+
if doi and doi != "N/A":
|
| 819 |
+
normalized_doi = doi.lower().strip()
|
| 820 |
+
if normalized_doi not in processed_dois:
|
| 821 |
+
processed_dois.add(normalized_doi)
|
| 822 |
+
is_new_paper = True
|
| 823 |
+
else: # Fallback to title + first author
|
| 824 |
+
title = paper.get("title", "").lower().strip()
|
| 825 |
+
first_author_list = paper.get("authors", [])
|
| 826 |
+
first_author = first_author_list[0].lower().strip() if first_author_list and first_author_list[0] != "N/A" else ""
|
| 827 |
+
title_author_key = f"{title}|{first_author}"
|
| 828 |
+
if title and first_author and title_author_key not in processed_titles_authors:
|
| 829 |
+
processed_titles_authors.add(title_author_key)
|
| 830 |
+
is_new_paper = True
|
| 831 |
+
elif title and not first_author and title not in processed_titles_authors:
|
| 832 |
+
processed_titles_authors.add(title)
|
| 833 |
+
is_new_paper = True
|
| 834 |
+
|
| 835 |
+
if is_new_paper:
|
| 836 |
+
unique_papers_found_so_far.append(paper)
|
| 837 |
+
|
| 838 |
+
if len(unique_papers_found_so_far) >= max_total_unique_papers:
|
| 839 |
+
# print(f" Max total unique papers ({max_total_unique_papers}) reached after processing query {query_idx+1}.") # COMMENTED OUT
|
| 840 |
+
break
|
| 841 |
+
|
| 842 |
+
final_results = unique_papers_found_so_far[:max_total_unique_papers]
|
| 843 |
+
|
| 844 |
+
# print(f"[Tool:multi_source_literature_search] Total unique papers found (capped at {max_total_unique_papers}): {len(final_results)}") # COMMENTED OUT
|
| 845 |
+
return final_results
|
| 846 |
+
|
| 847 |
+
# --- END: Literature Search Tool Implementation ---
|
| 848 |
+
|
| 849 |
+
# --- START: Text Fetching from URLs Tool Implementation ---
|
| 850 |
+
|
| 851 |
+
def fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]:
|
| 852 |
+
# print(f"[Tool:fetch_text_from_urls] Attempting to fetch text for {len(paper_info_list)} papers.") # COMMENTED OUT
|
| 853 |
+
|
| 854 |
+
updated_paper_info_list = []
|
| 855 |
+
headers = {
|
| 856 |
+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
|
| 857 |
+
}
|
| 858 |
+
|
| 859 |
+
for paper in paper_info_list:
|
| 860 |
+
url = paper.get("url")
|
| 861 |
+
retrieved_text = None
|
| 862 |
+
source_api = paper.get("source_api", "Unknown") # Get source for potential specific handling
|
| 863 |
+
|
| 864 |
+
if not url or not isinstance(url, str) or not url.startswith("http"):
|
| 865 |
+
retrieved_text = "Error: Invalid or missing URL."
|
| 866 |
+
paper["retrieved_text_content"] = retrieved_text
|
| 867 |
+
updated_paper_info_list.append(paper)
|
| 868 |
+
# print(f" Skipping paper '{paper.get('title', 'N/A')}' due to invalid URL: {url}") # COMMENTED OUT
|
| 869 |
+
continue
|
| 870 |
+
|
| 871 |
+
# print(f" Fetching text for: '{paper.get('title', 'N/A')}' from {url[:70]}...") # COMMENTED OUT
|
| 872 |
+
try:
|
| 873 |
+
response = requests.get(url, headers=headers, timeout=20, allow_redirects=True)
|
| 874 |
+
response.raise_for_status() # Raise an exception for HTTP errors
|
| 875 |
+
|
| 876 |
+
soup = BeautifulSoup(response.content, 'html.parser')
|
| 877 |
+
|
| 878 |
+
# Basic text extraction - attempt to find common article body tags or just get all text
|
| 879 |
+
# This will need refinement for specific site structures (e.g., arXiv, PubMed Central)
|
| 880 |
+
# For now, a general approach:
|
| 881 |
+
|
| 882 |
+
body_content = soup.find('body')
|
| 883 |
+
if body_content:
|
| 884 |
+
# Remove script and style tags
|
| 885 |
+
for script_or_style in body_content(["script", "style"]):
|
| 886 |
+
script_or_style.decompose()
|
| 887 |
+
|
| 888 |
+
# Try to get main article content if common tags exist
|
| 889 |
+
main_article_tags = ['article', 'main', '.main-content', '.article-body', '.abstract'] # Add more specific selectors
|
| 890 |
+
extracted_elements = []
|
| 891 |
+
for tag_selector in main_article_tags:
|
| 892 |
+
elements = body_content.select(tag_selector)
|
| 893 |
+
if elements:
|
| 894 |
+
for el in elements:
|
| 895 |
+
extracted_elements.append(el.get_text(separator=" ", strip=True))
|
| 896 |
+
break # Found a primary content block, assume this is good enough
|
| 897 |
+
|
| 898 |
+
if extracted_elements:
|
| 899 |
+
retrieved_text = " ".join(extracted_elements)
|
| 900 |
+
else:
|
| 901 |
+
retrieved_text = body_content.get_text(separator=" ", strip=True)
|
| 902 |
+
else:
|
| 903 |
+
retrieved_text = "Error: Could not find body content in HTML."
|
| 904 |
+
|
| 905 |
+
if retrieved_text and not retrieved_text.startswith("Error:"):
|
| 906 |
+
retrieved_text = retrieved_text[:max_chars_per_paper]
|
| 907 |
+
if len(retrieved_text) == max_chars_per_paper:
|
| 908 |
+
retrieved_text += "... (truncated)"
|
| 909 |
+
# print(f" Successfully extracted ~{len(retrieved_text)} chars.") # COMMENTED OUT
|
| 910 |
+
elif not retrieved_text:
|
| 911 |
+
retrieved_text = "Error: No text could be extracted."
|
| 912 |
+
|
| 913 |
+
except requests.exceptions.RequestException as e:
|
| 914 |
+
retrieved_text = f"Error fetching URL: {str(e)}"
|
| 915 |
+
# print(f" Error fetching URL {url}: {e}") # COMMENTED OUT
|
| 916 |
+
except Exception as e:
|
| 917 |
+
retrieved_text = f"Error processing HTML: {str(e)}"
|
| 918 |
+
# print(f" Error processing HTML for {url}: {e}") # COMMENTED OUT
|
| 919 |
+
|
| 920 |
+
paper["retrieved_text_content"] = retrieved_text
|
| 921 |
+
updated_paper_info_list.append(paper)
|
| 922 |
+
|
| 923 |
+
# Optional: add a small delay between requests if fetching from many URLs
|
| 924 |
+
# time.sleep(0.25)
|
| 925 |
+
|
| 926 |
+
# print(f"[Tool:fetch_text_from_urls] Finished fetching text for {len(updated_paper_info_list)} papers.") # COMMENTED OUT
|
| 927 |
+
return updated_paper_info_list
|
| 928 |
+
|
| 929 |
+
# --- END: Text Fetching from URLs Tool Implementation ---
|
| 930 |
+
|
| 931 |
+
# Example of how GenerationAgent would call this tool:
|
| 932 |
+
# Assume 'list_of_papers_from_search' is the output from multi_source_literature_search
|
| 933 |
+
# print(json.dumps({'intermediate_data_for_llm': fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
|
| 934 |
+
|
| 935 |
+
|
| 936 |
+
if __name__ == '__main__':
|
| 937 |
+
# Test basic Excel schema discovery
|
| 938 |
+
# print("Testing Excel Schema Discovery:")
|
| 939 |
+
# schemas = discover_excel_files_and_schemas(base_scan_directory_name="www")
|
| 940 |
+
# print(json.dumps(schemas, indent=2))
|
| 941 |
+
# print("\n")
|
| 942 |
+
|
| 943 |
+
# Test WWW file manifest
|
| 944 |
+
# print("Testing WWW File Manifest:")
|
| 945 |
+
# manifest = list_all_files_in_www_directory(base_directory_name="www")
|
| 946 |
+
# print(json.dumps(manifest, indent=2))
|
| 947 |
+
# print("\n")
|
| 948 |
+
|
| 949 |
+
# --- Test Literature Search ---
|
| 950 |
+
print("Testing Multi-Source Literature Search Tool:")
|
| 951 |
+
test_queries_lit = [
|
| 952 |
+
"novel targets for CAR-T cell therapy in solid tumors",
|
| 953 |
+
"role of microbiota in cancer immunotherapy response",
|
| 954 |
+
"epigenetic regulation of T cell exhaustion"
|
| 955 |
+
]
|
| 956 |
+
# To see output like GenerationAgent expects:
|
| 957 |
+
# search_results_for_llm = {"intermediate_data_for_llm": multi_source_literature_search(queries=test_queries_lit, max_results_per_query_per_source=1)}
|
| 958 |
+
# print(json.dumps(search_results_for_llm, indent=2))
|
| 959 |
+
|
| 960 |
+
# Simpler print for direct tool test:
|
| 961 |
+
results = multi_source_literature_search(queries=test_queries_lit, max_results_per_query_per_source=1, max_total_unique_papers=2) # Fetch 2 papers for testing text fetch
|
| 962 |
+
print(f"Found {len(results)} unique papers for text fetching test:")
|
| 963 |
+
# print(json.dumps(results, indent=2))
|
| 964 |
+
|
| 965 |
+
if results:
|
| 966 |
+
print("\nTesting Text Fetching from URLs Tool:")
|
| 967 |
+
# To see output like GenerationAgent expects for LLM:
|
| 968 |
+
# fetched_text_data_for_llm = {"intermediate_data_for_llm": fetch_text_from_urls(paper_info_list=results, max_chars_per_paper=5000)}
|
| 969 |
+
# print(json.dumps(fetched_text_data_for_llm, indent=2))
|
| 970 |
+
|
| 971 |
+
# Simpler print for direct tool test:
|
| 972 |
+
results_with_text = fetch_text_from_urls(paper_info_list=results, max_chars_per_paper=5000)
|
| 973 |
+
print(f"Processed {len(results_with_text)} papers for text content:")
|
| 974 |
+
for i, paper in enumerate(results_with_text):
|
| 975 |
+
print(f"--- Paper {i+1} ---")
|
| 976 |
+
print(f" Title: {paper.get('title')}")
|
| 977 |
+
print(f" URL: {paper.get('url')}")
|
| 978 |
+
text_content = paper.get('retrieved_text_content', 'Not found')
|
| 979 |
+
print(f" Retrieved Text (first 200 chars): {text_content[:200]}...")
|
| 980 |
+
print("\n")
|
| 981 |
+
|
| 982 |
# No __main__ block here, this is a module of tools.
|
tools/agent_tools_documentation.md
CHANGED
|
@@ -116,4 +116,75 @@ Description: Scans the entire `www/` directory (and its subdirectories, excludin
|
|
| 116 |
Input: None
|
| 117 |
Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
|
| 118 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
---
|
|
|
|
| 116 |
Input: None
|
| 117 |
Output: `file_manifest` (list of dictionaries) - Each dictionary represents a file and contains the keys: `path` (string), `type` (string), `size` (integer). Example item: `{"path": "www/data/report.txt", "type": "text/plain", "size": 1024}`. Returns an empty list if the `www` directory isn't found or is empty.
|
| 118 |
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
### `multi_source_literature_search(queries: list[str], max_results_per_query_per_source: int = 1, max_total_unique_papers: int = 10) -> list[dict]`
|
| 122 |
+
|
| 123 |
+
Searches for academic literature across multiple sources (Semantic Scholar, PubMed, ArXiv) using a list of provided search queries. It then de-duplicates the results based primarily on DOI, and secondarily on a combination of title and first author if DOI is not available. The search process stops early if the `max_total_unique_papers` limit is reached.
|
| 124 |
+
|
| 125 |
+
**Args:**
|
| 126 |
+
|
| 127 |
+
* `queries (list[str])`: A list of search query strings. The GenerationAgent should brainstorm 3-5 diverse queries relevant to the user's request.
|
| 128 |
+
* `max_results_per_query_per_source (int)`: The maximum number of results to fetch from EACH academic source (Semantic Scholar, PubMed, ArXiv) for EACH query string. Defaults to `1`.
|
| 129 |
+
* `max_total_unique_papers (int)`: The maximum total number of unique de-duplicated papers to return across all queries and sources. Defaults to `10`. The tool will stop fetching more data once this limit is met.
|
| 130 |
+
|
| 131 |
+
**Returns:**
|
| 132 |
+
|
| 133 |
+
* `list[dict]`: A consolidated and de-duplicated list of paper details, containing up to `max_total_unique_papers`. Each dictionary in the list represents a paper and has the following keys:
|
| 134 |
+
* `"title" (str)`: The title of the paper. "N/A" if not available.
|
| 135 |
+
* `"authors" (list[str])`: A list of author names. ["N/A"] if not available.
|
| 136 |
+
* `"year" (str | int)`: The publication year. "N/A" if not available.
|
| 137 |
+
* `"abstract" (str)`: A snippet of the abstract (typically up to 500 characters followed by "..."). "N/A" if not available.
|
| 138 |
+
* `"doi" (str | None)`: The Digital Object Identifier. `None` if not available.
|
| 139 |
+
* `"url" (str)`: A direct URL to the paper (e.g., PubMed link, ArXiv link, Semantic Scholar link). "N/A" if not available.
|
| 140 |
+
* `"venue" (str)`: The publication venue (e.g., journal name, "ArXiv"). "N/A" if not available.
|
| 141 |
+
* `"source_api" (str)`: The API from which this record was retrieved (e.g., "Semantic Scholar", "PubMed", "ArXiv").
|
| 142 |
+
|
| 143 |
+
**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
|
| 144 |
+
|
| 145 |
+
```python
|
| 146 |
+
# Example: User asks for up to 3 papers
|
| 147 |
+
print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["T-cell exhaustion markers AND cancer", "immunotherapy for melanoma AND biomarkers"], max_results_per_query_per_source=1, max_total_unique_papers=3)}))
|
| 148 |
+
|
| 149 |
+
# Example: Defaulting to 10 total unique papers
|
| 150 |
+
print(json.dumps({'intermediate_data_for_llm': tools.multi_source_literature_search(queries=["COVID-19 long-term effects"], max_results_per_query_per_source=2)}))
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
**Important Considerations for GenerationAgent:**
|
| 154 |
+
|
| 155 |
+
* When results are returned from this tool, the `GenerationAgent`'s `explanation` (for `CODE_COMPLETE` status) should present a summary of the *found papers* (e.g., titles, authors, URLs). It should clearly state that these are potential literature leads and should *not* yet claim to have read or summarized the full content of these papers in that same turn, unless a subsequent tool call for summarization is planned and executed.
|
| 156 |
+
|
| 157 |
+
---
|
| 158 |
+
|
| 159 |
+
### `fetch_text_from_urls(paper_info_list: list[dict], max_chars_per_paper: int = 15000) -> list[dict]`
|
| 160 |
+
|
| 161 |
+
Attempts to fetch and extract textual content from the URLs of papers provided in a list. This tool is typically used after `multi_source_literature_search` to gather content for summarization by the GenerationAgent.
|
| 162 |
+
|
| 163 |
+
**Args:**
|
| 164 |
+
|
| 165 |
+
* `paper_info_list (list[dict])`: A list of paper dictionaries, as returned by `multi_source_literature_search`. Each dictionary is expected to have at least a `"url"` key. Other keys like `"title"` and `"source_api"` are used for logging.
|
| 166 |
+
* `max_chars_per_paper (int)`: The maximum number of characters of text to retrieve and store for each paper. Defaults to `15000`. Text longer than this will be truncated.
|
| 167 |
+
|
| 168 |
+
**Returns:**
|
| 169 |
+
|
| 170 |
+
* `list[dict]`: The input `paper_info_list`, where each paper dictionary is augmented with a new key `"retrieved_text_content"`.
|
| 171 |
+
* If successful, `"retrieved_text_content" (str)` will contain the extracted text (up to `max_chars_per_paper`).
|
| 172 |
+
* If fetching or parsing fails for a paper, `"retrieved_text_content" (str)` will contain an error message (e.g., "Error: Invalid or missing URL.", "Error fetching URL: ...", "Error: No text could be extracted.").
|
| 173 |
+
|
| 174 |
+
**GenerationAgent Usage Example (for `python_code` field when `status` is `AWAITING_DATA`):**
|
| 175 |
+
|
| 176 |
+
This tool is usually the second step in a literature review process.
|
| 177 |
+
|
| 178 |
+
```python
|
| 179 |
+
# Assume 'list_of_papers_from_search' is a variable holding the output from a previous
|
| 180 |
+
# call to tools.multi_source_literature_search(...)
|
| 181 |
+
print(json.dumps({'intermediate_data_for_llm': tools.fetch_text_from_urls(paper_info_list=list_of_papers_from_search, max_chars_per_paper=10000)}))
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
**Important Considerations for GenerationAgent:**
|
| 185 |
+
|
| 186 |
+
* After this tool returns the `paper_info_list` (now with `"retrieved_text_content"`), the `GenerationAgent` is responsible for using its own LLM capabilities to read the `"retrieved_text_content"` for each paper and generate summaries if requested by the user or if it's part of its plan.
|
| 187 |
+
* The `GenerationAgent` should be prepared for `"retrieved_text_content"` to contain error messages and handle them gracefully in its summarization logic (e.g., by stating that text for a particular paper could not be retrieved).
|
| 188 |
+
* Web scraping is inherently unreliable; success in fetching and parsing text can vary greatly between websites. The agent should not assume text will always be available.
|
| 189 |
+
|
| 190 |
---
|
traces/list of required files.txt
DELETED
|
@@ -1,76 +0,0 @@
|
|
| 1 |
-
www/
|
| 2 |
-
|-- tablePagerank/ # Excel files for TF PageRank scores
|
| 3 |
-
| |-- Table_TF PageRank Scores for Audrey.xlsx
|
| 4 |
-
| |-- Naive.xlsx
|
| 5 |
-
| |-- TE.xlsx
|
| 6 |
-
| |-- MP.xlsx
|
| 7 |
-
| |-- TCM.xlsx
|
| 8 |
-
| |-- TEM.xlsx
|
| 9 |
-
| |-- TRM.xlsx
|
| 10 |
-
| |-- TEXprog.xlsx
|
| 11 |
-
| |-- TEXeff.xlsx
|
| 12 |
-
| |-- TEXterm.xlsx
|
| 13 |
-
|
|
| 14 |
-
|-- waveanalysis/ # Assets for TF Wave Analysis
|
| 15 |
-
| |-- searchtfwaves.xlsx
|
| 16 |
-
| |-- tfwaveanal.png # Overview image
|
| 17 |
-
| |-- c1.jpg # Wave 1 image
|
| 18 |
-
| |-- c2.jpg # Wave 2 image
|
| 19 |
-
| |-- c3.jpg # Wave 3 image
|
| 20 |
-
| |-- c4.jpg # Wave 4 image
|
| 21 |
-
| |-- c5.jpg # Wave 5 image
|
| 22 |
-
| |-- c6.jpg # Wave 6 image
|
| 23 |
-
| |-- c7.jpg # Wave 7 image
|
| 24 |
-
| |-- c1_selected_GO_KEGG.jpg
|
| 25 |
-
| |-- c2_selected_GO_KEGG_v2.jpg
|
| 26 |
-
| |-- c3_selected_GO_KEGG.jpg
|
| 27 |
-
| |-- c4_selected_GO_KEGG.jpg
|
| 28 |
-
| |-- c5_selected_GO_KEGG.jpg
|
| 29 |
-
| |-- c6_selected_GO_KEGG.jpg
|
| 30 |
-
| |-- c7_selected_GO_KEGG.jpg
|
| 31 |
-
| |
|
| 32 |
-
| |-- txtJPG/ # "Ranked Text" images for Wave Analysis
|
| 33 |
-
| |-- c1_ranked_1.jpg
|
| 34 |
-
| |-- c1_ranked_2.jpg
|
| 35 |
-
| |-- c2_ranked.jpg
|
| 36 |
-
| |-- c3_ranked.jpg
|
| 37 |
-
| |-- c4_ranked.jpg
|
| 38 |
-
| |-- c5_ranked.jpg
|
| 39 |
-
| |-- c6_ranked.jpg
|
| 40 |
-
| |-- c7_ranked.jpg
|
| 41 |
-
|
|
| 42 |
-
|-- TFcorintextrm/ # Data for TF-TF correlation
|
| 43 |
-
| |-- TF-TFcorTRMTEX.xlsx
|
| 44 |
-
|
|
| 45 |
-
|-- tfcommunities/ # Data for TF communities
|
| 46 |
-
| |-- trmcommunities.xlsx
|
| 47 |
-
| |-- texcommunities.xlsx
|
| 48 |
-
|
|
| 49 |
-
|-- bubbleplots/ # Images for cell-state specific bubble plots
|
| 50 |
-
| |-- naivebubble.jpg
|
| 51 |
-
| |-- tebubble.jpg
|
| 52 |
-
| |-- mpbubble.jpg
|
| 53 |
-
| |-- tcmbubble.jpg
|
| 54 |
-
| |-- tembubble.jpg
|
| 55 |
-
| |-- trmbubble.jpg
|
| 56 |
-
| |-- texprogbubble.jpg
|
| 57 |
-
| |-- texintbubble.jpg # (Used for TEXeff-like)
|
| 58 |
-
| |-- textermbubble.jpg
|
| 59 |
-
|
|
| 60 |
-
|-- tfcat/ # Images for the TF Catalog section
|
| 61 |
-
| |-- onlycellstates.png
|
| 62 |
-
| |-- multistatesheatmap.png
|
| 63 |
-
|
|
| 64 |
-
|-- networkanalysis/ # Images for TF Network Analysis section
|
| 65 |
-
| |-- tfcorrdesc.png
|
| 66 |
-
| |-- community.jpg
|
| 67 |
-
| |-- trmtexcom.png
|
| 68 |
-
| |-- tfcompathway.png
|
| 69 |
-
|
|
| 70 |
-
|-- multi-omicsdata.xlsx # Main multi-omics data file
|
| 71 |
-
|
|
| 72 |
-
|-- homedesc.png # Image for the home page
|
| 73 |
-
|-- ucsdlogo.png # UCSD Logo
|
| 74 |
-
|-- salklogo.png # Salk Logo
|
| 75 |
-
|-- unclogo.jpg # UNC Logo
|
| 76 |
-
|-- csdescrip.jpeg # Image for the modal dialog (if used)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
traces/log7.txt
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
User: can you analyze the trm community table and provide some insights
|
| 2 |
+
[Manager._process_turn] Processing query: 'can you analyze the trm community table and provide some insights ...'
|
| 3 |
+
[Manager._process_turn] Generation Attempt: 1/2
|
| 4 |
+
GenerationAgent Error: Run failed or timed out. Status: in_progress
|
| 5 |
+
[GenerationAgent] Thought (Data Fetch Attempt 1): Run failed or timed out. Status: in_progress
|
| 6 |
+
[Manager._process_turn] Error during generation/data fetching: Run failed or timed out. Status: in_progress
|
| 7 |
+
[Manager._process_turn] Generation Attempt: 2/2
|
| 8 |
+
[GenerationAgent] Thought (Data Fetch Attempt 1): I need to analyze the 'trmcommunities.xlsx' file to gain insights into the TRM communities. This file is listed in the dynamically discovered Excel schemas, and the first sheet appears to contain information on TRM Communities and TF Members. I'll fetch this data to analyze the community structure and any notable transcription factors involved.
|
| 9 |
+
[Manager._process_turn] GenerationAgent is AWAITING_DATA. Executing code to fetch data...
|
| 10 |
+
SupervisorAgent.review_code received code. Thought: Reviewing code for data fetching: I need to analyze the 'trmcommunities.xlsx' file to gain insights ...
|
| 11 |
+
[SupervisorAgent] Data Fetch Code Review: Code adheres to safety guidelines and uses permitted operations. No forbidden operations detected. (Status: APPROVED_FOR_EXECUTION)
|
| 12 |
+
ExecutorAgent received code for execution:
|
| 13 |
+
print(json.dumps({'intermediate_data_for_llm': tools.get_tf_community_sheet_data('trm')}))
|
| 14 |
+
[Manager._process_turn] Successfully fetched intermediate data. Size (approx chars): 1121
|
| 15 |
+
[GenerationAgent] Thought (Data Fetch Attempt 2): I will analyze the TRM Communities table data that has been provided. The table lists transcription factors (TFs) categorized into five communities (C1 to C5). Each community has a unique set of TFs associated with it. I will extract some insights from this distribution, such as the number of TFs per community, any notable commonalities, and potential implications of the key transcription factors.
|
| 16 |
+
TaijiChat > I analyzed the TRM Communities data. Here are some insights:
|
| 17 |
+
1. **Community Counts:**
|
| 18 |
+
- C1: 20 TFs
|
| 19 |
+
- C2: 50 TFs
|
| 20 |
+
- C3: 17 TFs
|
| 21 |
+
- C4: 48 TFs
|
| 22 |
+
- C5: 13 TFs
|
| 23 |
+
2. **Common TFs:** There are no TFs shared among all five communities, indicating distinct roles or regulatory pathways. Community C2 has the highest number of TFs, suggesting it may be critical in the context of TRM responses. Examples of significant TFs include 'Stat6' and 'Irf3', known for their roles in immune responses and development.
|
traces/log8.txt
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
> shiny::runApp()
|
| 2 |
+
|
| 3 |
+
Listening on http://127.0.0.1:5793
|
| 4 |
+
Browsing http://127.0.0.1:5793
|
| 5 |
+
[1] "Reticulate using Python at: C:/Users/28702/AppData/Local/Programs/Python/Python311/python.exe"
|
| 6 |
+
Python integration test: json module imported successfully.
|
| 7 |
+
[1] "Successfully imported Python 'agents.manager_agent' module."
|
| 8 |
+
Warning: Navigation containers expect a collection of `bslib::nav_panel()`/`shiny::tabPanel()`s and/or `bslib::nav_menu()`/`shiny::navbarMenu()`s. Consider using `header` or `footer` if you wish to place content above (or below) every panel's contents.
|
| 9 |
+
Warning: Navigation containers expect a collection of `bslib::nav_panel()`/`shiny::tabPanel()`s and/or `bslib::nav_menu()`/`shiny::navbarMenu()`s. Consider using `header` or `footer` if you wish to place content above (or below) every panel's contents.
|
| 10 |
+
[1] "--- In chat_ui.R, chatSidebarUI IS DEFINING ID AS: chatSidebar ---"
|
| 11 |
+
[1] "TaijiChat: API key successfully read in server.R."
|
| 12 |
+
[1] "TaijiChat: Python OpenAI client initialized successfully in server.R via reticulate."
|
| 13 |
+
[1] "TaijiChat: Successfully imported/retrieved 'agents.manager_agent' module in server.R."
|
| 14 |
+
ManagerAgent: R callback function provided and stored.
|
| 15 |
+
ManagerAgent: Initialized with a provided OpenAI client.
|
| 16 |
+
GenerationAgent: Initializing WWW file manifest discovery...
|
| 17 |
+
GenerationAgent attempting to load static docs from: C:\Users\28702\Desktop\work_space\taijichat\agents\..\tools\agent_tools_documentation.md
|
| 18 |
+
GenerationAgent: Updated existing Generation Assistant: asst_fiIaIymtRaNqMJ9eQ7E7O7Wp
|
| 19 |
+
SupervisorAgent: Updated existing Supervisor Assistant: asst_WVif16RGW4s5XMvTpHCaAsau
|
| 20 |
+
ExecutorAgent initialized.
|
| 21 |
+
ManagerAgent: Specialized agents (Generation, Supervisor, Executor) initialized.
|
| 22 |
+
[1] "TaijiChat: Python ManagerAgent instance created in server.R using pre-initialized client and R callback."
|
| 23 |
+
New names:
|
| 24 |
+
• `` -> `...18`
|
| 25 |
+
• `` -> `...19`
|
| 26 |
+
[1] "TaijiChat: API key successfully read in server.R."
|
| 27 |
+
[1] "TaijiChat: Python OpenAI client initialized successfully in server.R via reticulate."
|
| 28 |
+
[1] "TaijiChat: Successfully imported/retrieved 'agents.manager_agent' module in server.R."
|
| 29 |
+
ManagerAgent: R callback function provided and stored.
|
| 30 |
+
ManagerAgent: Initialized with a provided OpenAI client.
|
| 31 |
+
GenerationAgent: Initializing WWW file manifest discovery...
|
| 32 |
+
GenerationAgent attempting to load static docs from: C:\Users\28702\Desktop\work_space\taijichat\agents\..\tools\agent_tools_documentation.md
|
| 33 |
+
GenerationAgent: Updated existing Generation Assistant: asst_fiIaIymtRaNqMJ9eQ7E7O7Wp
|
| 34 |
+
SupervisorAgent: Updated existing Supervisor Assistant: asst_WVif16RGW4s5XMvTpHCaAsau
|
| 35 |
+
ExecutorAgent initialized.
|
| 36 |
+
ManagerAgent: Specialized agents (Generation, Supervisor, Executor) initialized.
|
| 37 |
+
[1] "TaijiChat: Python ManagerAgent instance created in server.R using pre-initialized client and R callback."
|
| 38 |
+
New names:
|
| 39 |
+
• `` -> `...18`
|
| 40 |
+
• `` -> `...19`
|
| 41 |
+
[1] "TaijiChat: Received user_chat_message - what information can this website provide? what researches can i do on top of it? what are some example research questions? and what are the relevant literature"
|
| 42 |
+
[1] "TaijiChat: Sending to Python agent - Query: what information can this website provide? what researches can i do on top of it? what are some example research questions? and what are the relevant literature"
|
| 43 |
+
[Manager.process_single_query] Received query: 'what information can this website provide? what researches can i do on top of it? what are some exam...'
|
| 44 |
+
[Manager._process_turn] Processing query: 'what information can this website provide? what researches can i do on top of it? what are some exam...'
|
| 45 |
+
[Manager._process_turn] Generation Attempt: 1/2
|
| 46 |
+
[GenerationAgent] Thought (Data Fetch Attempt 1): To provide a comprehensive response to the user's inquiry regarding the website, I will first gather general information about the datasets and resources available on the website to identify potential research questions and relevant literature. The site includes multiple datasets related to Transcription Factors (TFs), including correlations, wave analyses, and community effects. I will summarize findings and research angles based on available data and provide example research questions and literature references.
|
| 47 |
+
[Manager._process_turn] GenerationAgent is AWAITING_DATA. Executing code to fetch data...
|
| 48 |
+
SupervisorAgent.review_code received code. Thought: Reviewing code for data fetching: To provide a comprehensive response to the user's inquiry regardin...
|
| 49 |
+
[SupervisorAgent] Data Fetch Code Review: Code uses the 'tools' module correctly to fetch data and the 'print' function for output without any forbidden operations detected. (Status: APPROVED_FOR_EXECUTION)
|
| 50 |
+
ExecutorAgent received code for execution:
|
| 51 |
+
print(json.dumps({'intermediate_data_for_llm': tools.list_all_files_in_www_directory()}))
|
| 52 |
+
[Manager._process_turn] Successfully fetched intermediate data. Size (approx chars): 71600
|
| 53 |
+
[GenerationAgent] Thought (Data Fetch Attempt 2): To answer the user's query about the information the website can provide, potential research avenues, example questions, and relevant literature, I will analyze various resources available on the website. This includes reviewing Excel datasets, images, and descriptions related to transcription factors (TFs), wave analysis, and community data for potential biological studies. Key areas for exploration include data on TFs, their interactions, and pathways relevant to research in cell biology and molecular mechanisms. Additionally, I will consider existing literature referenced in the datasets, particularly the 'multi-omicsdata.xlsx' and relevant TF analysis files.
|
| 54 |
+
[Manager._process_turn] GenerationAgent is AWAITING_DATA. Executing code to fetch data...
|
| 55 |
+
SupervisorAgent.review_code received code. Thought: Reviewing code for data fetching: To answer the user's query about the information the website can p...
|
| 56 |
+
[SupervisorAgent] Data Fetch Code Review: Code uses the 'tools' module correctly and employs permitted built-in functions for data processing. No forbidden operations detected. (Status: APPROVED_FOR_EXECUTION)
|
| 57 |
+
ExecutorAgent received code for execution:
|
| 58 |
+
print(json.dumps({'intermediate_data_for_llm': tools.get_raw_excel_data('www/multi-omicsdata.xlsx')}))
|
| 59 |
+
[Manager._process_turn] Successfully fetched intermediate data. Size (approx chars): 3204
|
| 60 |
+
[Manager._process_turn] Could not complete data fetching: To answer the user's query about the information the website can provide, potential research avenues, example questions, and relevant literature, I will analyze various resources available on the website. This includes reviewing Excel datasets, images, and descriptions related to transcription factors (TFs), wave analysis, and community data for potential biological studies. Key areas for exploration include data on TFs, their interactions, and pathways relevant to research in cell biology and molecular mechanisms. Additionally, I will consider existing literature referenced in the datasets, particularly the 'multi-omicsdata.xlsx' and relevant TF analysis files.
|
| 61 |
+
[Manager._process_turn] Generation Attempt: 2/2
|
| 62 |
+
[GenerationAgent] Thought (Data Fetch Attempt 1): To fully address your query about the information provided by the website and potential research topics, I need to first review the available datasets found in the provided Excel files. The dataset named 'multi-omicsdata.xlsx' seems comprehensive, comprising multiple columns that may contain valuable information related to various research inquiries. I will extract the names of the sheets and the first few columns from this dataset to identify relevant research questions. Additionally, I will gather insights about related literature based on those research possibilities.
|
| 63 |
+
[Manager._process_turn] GenerationAgent is AWAITING_DATA. Executing code to fetch data...
|
| 64 |
+
SupervisorAgent.review_code received code. Thought: Reviewing code for data fetching: To fully address your query about the information provided by the ...
|
| 65 |
+
[SupervisorAgent] Data Fetch Code Review: Code uses the 'tools' module correctly to fetch data, and makes use of the json module to serialize the output. No forbidden operations detected. (Status: APPROVED_FOR_EXECUTION)
|
| 66 |
+
ExecutorAgent received code for execution:
|
| 67 |
+
print(json.dumps({'intermediate_data_for_llm': tools.get_raw_excel_data('www/multi-omicsdata.xlsx')}))
|
| 68 |
+
[Manager._process_turn] Successfully fetched intermediate data. Size (approx chars): 3204
|
| 69 |
+
[GenerationAgent] Thought (Data Fetch Attempt 2): The data provided includes detailed research related to various studies involving T cells, specifically focusing on different datasets including their authors, DOI links, data types, and species, all related to infections like LCMV. This dataset serves as a foundation for conducting further research into the dynamics of T cell responses and can lead to various research questions regarding the relationships between different T cell subtypes, their activation states, and how these factors correlate with response to infections. The included literature can be accessed through the provided DOIs, which serve as references for deeper insights into methodologies and findings relevant to T cell biology.
|
| 70 |
+
[1] "TaijiChat: Received from Python agent - The dataset offers a rich source of information to explore T cell activation, differentiation, and memory formation in response to various infections. Potential research questions could include: 'How do different T cell subtypes respond to viral infections?', 'What are the long-term effects of LCMV on T cell differentiation?', and 'Can memory T cells provide better protection against reinfection compared to naive T cells?'. The associated literature can be referenced through the DOIs provided, which guide towards comprehensive studies that these research questions may build upon."
|
ui.R
CHANGED
|
@@ -1,6 +1,47 @@
|
|
| 1 |
library(shiny)
|
| 2 |
library(shinythemes)
|
| 3 |
library(DT)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
# Source the new chat UI definitions
|
| 6 |
source("chat_ui.R", local = TRUE) # local = TRUE is good practice if chat_ui.R defines reactive expressions, though here it's just UI functions
|
|
|
|
| 1 |
library(shiny)
|
| 2 |
library(shinythemes)
|
| 3 |
library(DT)
|
| 4 |
+
library(reticulate)
|
| 5 |
+
|
| 6 |
+
# --- Python Environment Configuration (USER ACTION REQUIRED) ---
|
| 7 |
+
# Uncomment and configure ONE of the following lines to point to your Python environment
|
| 8 |
+
# where 'openai' and 'pandas' are installed (from requirements.txt).
|
| 9 |
+
# Ensure the path to python executable is correct or the virtual/conda env name is correct.
|
| 10 |
+
|
| 11 |
+
# Option 1: Specify path to Python executable (e.g., in a virtual environment)
|
| 12 |
+
reticulate::use_python("C:/Users/28702/AppData/Local/Programs/Python/Python311/python.exe", required = TRUE) # FIXME
|
| 13 |
+
|
| 14 |
+
# Option 2: Specify name of a virtual environment (if R can find it by name)
|
| 15 |
+
# reticulate::use_virtualenv("your_python_venv_name", required = TRUE)
|
| 16 |
+
|
| 17 |
+
# Option 3: Specify name of a Conda environment
|
| 18 |
+
# reticulate::use_condaenv("your_python_conda_env_name", required = TRUE)
|
| 19 |
+
|
| 20 |
+
# Ensure Python environment is configured before proceeding
|
| 21 |
+
if (!py_available(initialize = TRUE)) {
|
| 22 |
+
stop("Python environment with required packages (openai, pandas) not found or configured correctly for reticulate. Please check the comments above.")
|
| 23 |
+
} else {
|
| 24 |
+
print(paste("Reticulate using Python at:", py_config()$python))
|
| 25 |
+
# Try importing a core Python module to confirm setup
|
| 26 |
+
py_run_string("import json; print(\'Python integration test: json module imported successfully.\')")
|
| 27 |
+
}
|
| 28 |
+
|
| 29 |
+
# --- Import Python Agent Module ---
|
| 30 |
+
# This assumes server.R is in the project root, and 'agents' is a subdirectory.
|
| 31 |
+
# If your R working directory is not the project root, adjust sys.path or import call.
|
| 32 |
+
# Ensure agents/__init__.py exists.
|
| 33 |
+
manager_agent_module <- NULL
|
| 34 |
+
tryCatch({
|
| 35 |
+
# Add project root to Python's sys.path if not already discoverable by reticulate
|
| 36 |
+
# This makes `import agents.manager_agent` work if R CWD is project root.
|
| 37 |
+
reticulate::py_run_string("import sys; import os; os.chdir(os.path.abspath('.')); sys.path.insert(0, os.getcwd())")
|
| 38 |
+
manager_agent_module <- reticulate::import("agents.manager_agent")
|
| 39 |
+
print("Successfully imported Python 'agents.manager_agent' module.")
|
| 40 |
+
}, error = function(e) {
|
| 41 |
+
stop(paste("Failed to import Python 'agents.manager_agent' module. Error:", e$message,
|
| 42 |
+
"Ensure your R working directory is the project root (containing server.R and the 'agents' folder), ",
|
| 43 |
+
"and that agents/__init__.py exists. Also check Python environment."))
|
| 44 |
+
})
|
| 45 |
|
| 46 |
# Source the new chat UI definitions
|
| 47 |
source("chat_ui.R", local = TRUE) # local = TRUE is good practice if chat_ui.R defines reactive expressions, though here it's just UI functions
|
www/chat_script.js
CHANGED
|
@@ -3,6 +3,8 @@
|
|
| 3 |
$(document).on('shiny:connected', function(event) {
|
| 4 |
console.log("Shiny connected. chat_script.js executing.");
|
| 5 |
|
|
|
|
|
|
|
| 6 |
// --- Dynamically create and insert the Chat tab --- START ---
|
| 7 |
var chatTabExists = $('#customChatTabLink').length > 0;
|
| 8 |
if (!chatTabExists) {
|
|
@@ -65,7 +67,13 @@ $(document).on('shiny:connected', function(event) {
|
|
| 65 |
if (sidebar.is(':visible')) {
|
| 66 |
sidebar.fadeOut();
|
| 67 |
} else {
|
| 68 |
-
sidebar.fadeIn()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
}
|
| 70 |
return false; // Extra measure to prevent default behavior
|
| 71 |
});
|
|
@@ -77,15 +85,55 @@ $(document).on('shiny:connected', function(event) {
|
|
| 77 |
$('#chatSidebar').fadeOut();
|
| 78 |
});
|
| 79 |
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
|
|
|
|
|
|
|
|
|
|
| 82 |
var $chatMessages = $('#chatMessages');
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
$chatMessages.scrollTop($chatMessages[0].scrollHeight);
|
| 87 |
}
|
| 88 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
$('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
|
| 90 |
var messageText = $('#chatInput').val();
|
| 91 |
if (messageText.trim() !== '') {
|
|
@@ -103,12 +151,64 @@ $(document).on('shiny:connected', function(event) {
|
|
| 103 |
}
|
| 104 |
});
|
| 105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
|
| 107 |
-
if(message && message.text) {
|
| 108 |
console.log("Received agent response from Shiny:", message.text);
|
|
|
|
| 109 |
addChatMessage(message.text, 'agent');
|
| 110 |
} else {
|
| 111 |
-
console.warn("Received empty or
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
}
|
| 113 |
});
|
| 114 |
});
|
|
|
|
| 3 |
$(document).on('shiny:connected', function(event) {
|
| 4 |
console.log("Shiny connected. chat_script.js executing.");
|
| 5 |
|
| 6 |
+
var isFirstChatOpenThisSession = true; // Flag for initial greeting
|
| 7 |
+
|
| 8 |
// --- Dynamically create and insert the Chat tab --- START ---
|
| 9 |
var chatTabExists = $('#customChatTabLink').length > 0;
|
| 10 |
if (!chatTabExists) {
|
|
|
|
| 67 |
if (sidebar.is(':visible')) {
|
| 68 |
sidebar.fadeOut();
|
| 69 |
} else {
|
| 70 |
+
sidebar.fadeIn(function() { // Callback function after fadeIn completes
|
| 71 |
+
if (isFirstChatOpenThisSession) {
|
| 72 |
+
addChatMessage("How can I help you today?", 'agent');
|
| 73 |
+
isFirstChatOpenThisSession = false; // Set flag to false after showing message
|
| 74 |
+
console.log("Initial agent greeting displayed.");
|
| 75 |
+
}
|
| 76 |
+
});
|
| 77 |
}
|
| 78 |
return false; // Extra measure to prevent default behavior
|
| 79 |
});
|
|
|
|
| 85 |
$('#chatSidebar').fadeOut();
|
| 86 |
});
|
| 87 |
|
| 88 |
+
// Variable to keep track of the thinking message element and its thoughts container
|
| 89 |
+
var thinkingMessageElement = null;
|
| 90 |
+
var currentThoughtsContainer = null; // To store the div where thoughts are appended
|
| 91 |
+
|
| 92 |
+
function addChatMessage(messageText, messageType, isThinkingMessage = false) {
|
| 93 |
var messageClass = messageType === 'user' ? 'user-message' : 'agent-message';
|
| 94 |
+
if (isThinkingMessage) {
|
| 95 |
+
messageClass += ' thinking-message'; // Add a special class for styling if needed
|
| 96 |
+
}
|
| 97 |
var $chatMessages = $('#chatMessages');
|
| 98 |
+
|
| 99 |
+
// If this is a normal agent message and a thinking message exists, remove it first
|
| 100 |
+
if (messageType === 'agent' && !isThinkingMessage && thinkingMessageElement) {
|
| 101 |
+
thinkingMessageElement.remove();
|
| 102 |
+
thinkingMessageElement = null;
|
| 103 |
+
if (currentThoughtsContainer) { // Also clear the thoughts container reference
|
| 104 |
+
currentThoughtsContainer = null;
|
| 105 |
+
}
|
| 106 |
+
}
|
| 107 |
+
|
| 108 |
+
var $messageDiv = $('<div></div>').addClass('chat-message').addClass(messageClass);
|
| 109 |
+
|
| 110 |
+
if (isThinkingMessage) {
|
| 111 |
+
// Arrow (toggle) + Text + Hidden Thoughts Area
|
| 112 |
+
$messageDiv.html('<span class="thought-toggle-arrow" role="button" tabindex="0">►</span> ' +
|
| 113 |
+
'<span class="thinking-text">' + messageText + '</span>' +
|
| 114 |
+
'<div class="thoughts-area" style="display: none; margin-left: 20px; font-style: italic; color: #555;"></div>');
|
| 115 |
+
thinkingMessageElement = $messageDiv;
|
| 116 |
+
currentThoughtsContainer = $messageDiv.find('.thoughts-area'); // Store reference to the new thoughts area
|
| 117 |
+
} else {
|
| 118 |
+
$messageDiv.text(messageText); // For user and final agent messages, use .text() for safety
|
| 119 |
+
}
|
| 120 |
+
|
| 121 |
+
$chatMessages.append($messageDiv);
|
| 122 |
$chatMessages.scrollTop($chatMessages[0].scrollHeight);
|
| 123 |
}
|
| 124 |
|
| 125 |
+
// Delegated click handler for thought toggle arrows
|
| 126 |
+
$('#chatMessages').off('click.thoughtToggle').on('click.thoughtToggle', '.thought-toggle-arrow', function() {
|
| 127 |
+
var $arrow = $(this);
|
| 128 |
+
var $thoughtsArea = $arrow.siblings('.thoughts-area');
|
| 129 |
+
$thoughtsArea.slideToggle(200);
|
| 130 |
+
if ($arrow.html() === '►') { // Using direct character comparison for simplicity
|
| 131 |
+
$arrow.html('▼');
|
| 132 |
+
} else {
|
| 133 |
+
$arrow.html('►');
|
| 134 |
+
}
|
| 135 |
+
});
|
| 136 |
+
|
| 137 |
$('#sendChatMsg').off('click.chatSend').on('click.chatSend', function() {
|
| 138 |
var messageText = $('#chatInput').val();
|
| 139 |
if (messageText.trim() !== '') {
|
|
|
|
| 151 |
}
|
| 152 |
});
|
| 153 |
|
| 154 |
+
// Handler for when the agent starts thinking
|
| 155 |
+
Shiny.addCustomMessageHandler("agent_thinking_started", function(message) {
|
| 156 |
+
console.log("JS: Received 'agent_thinking_started' message from R:", message);
|
| 157 |
+
if(message && typeof message.text === 'string') {
|
| 158 |
+
// If a thinking message already exists, remove it and its thoughts area before adding a new one
|
| 159 |
+
if (thinkingMessageElement) {
|
| 160 |
+
thinkingMessageElement.remove();
|
| 161 |
+
thinkingMessageElement = null;
|
| 162 |
+
}
|
| 163 |
+
if (currentThoughtsContainer) {
|
| 164 |
+
// currentThoughtsContainer is part of thinkingMessageElement, so removing parent is enough
|
| 165 |
+
currentThoughtsContainer = null;
|
| 166 |
+
}
|
| 167 |
+
addChatMessage(message.text, 'agent', true); // true indicates it's a thinking message
|
| 168 |
+
} else {
|
| 169 |
+
console.warn("Received empty or invalid agent_thinking_started message from Shiny:", message);
|
| 170 |
+
}
|
| 171 |
+
});
|
| 172 |
+
|
| 173 |
+
// Handler for new incoming thoughts from the agent
|
| 174 |
+
Shiny.addCustomMessageHandler("agent_new_thought", function(message) {
|
| 175 |
+
console.log("JS: Received 'agent_new_thought':", message);
|
| 176 |
+
if (message && typeof message.text === 'string' && currentThoughtsContainer) {
|
| 177 |
+
var $thoughtDiv = $('<div></div>').addClass('thought-item').text(message.text);
|
| 178 |
+
currentThoughtsContainer.append($thoughtDiv);
|
| 179 |
+
// Optionally, auto-scroll the main chat window if thoughts make it grow too much
|
| 180 |
+
var $chatMessages = $('#chatMessages');
|
| 181 |
+
$chatMessages.scrollTop($chatMessages[0].scrollHeight);
|
| 182 |
+
// Optionally, ensure the thoughts area is visible if a new thought arrives and it was hidden
|
| 183 |
+
// if (!currentThoughtsContainer.is(':visible')) {
|
| 184 |
+
// currentThoughtsContainer.slideDown();
|
| 185 |
+
// var $arrow = thinkingMessageElement.find('.thought-toggle-arrow');
|
| 186 |
+
// if ($arrow.html() === '►') $arrow.html('▼');
|
| 187 |
+
// }
|
| 188 |
+
} else if (!currentThoughtsContainer) {
|
| 189 |
+
console.warn("Received 'agent_new_thought' but no active thinking message/thoughts container.");
|
| 190 |
+
} else {
|
| 191 |
+
console.warn("Received invalid 'agent_new_thought' message:", message);
|
| 192 |
+
}
|
| 193 |
+
});
|
| 194 |
+
|
| 195 |
Shiny.addCustomMessageHandler("agent_chat_response", function(message) {
|
| 196 |
+
if(message && typeof message.text === 'string') { // Check if message.text is a string
|
| 197 |
console.log("Received agent response from Shiny:", message.text);
|
| 198 |
+
// addChatMessage will handle removing thinkingMessageElement and currentThoughtsContainer
|
| 199 |
addChatMessage(message.text, 'agent');
|
| 200 |
} else {
|
| 201 |
+
console.warn("Received empty, invalid, or non-string agent_chat_response from Shiny:", message);
|
| 202 |
+
// If a thinking message was active, clear it since we got a non-response or error
|
| 203 |
+
if (thinkingMessageElement) {
|
| 204 |
+
thinkingMessageElement.remove();
|
| 205 |
+
thinkingMessageElement = null;
|
| 206 |
+
}
|
| 207 |
+
if (currentThoughtsContainer) {
|
| 208 |
+
currentThoughtsContainer = null; // It would have been part of thinkingMessageElement
|
| 209 |
+
}
|
| 210 |
+
// Optionally, display a generic error to the user in the chat
|
| 211 |
+
// addChatMessage("An error occurred receiving the agent's response.", 'agent');
|
| 212 |
}
|
| 213 |
});
|
| 214 |
});
|
www/pages_description.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
homepage:
|
| 2 |
+
- This webpage introduces the "TF atlas of CD8+ T cell states," a platform resulting from a multi-omics study focused on understanding and selectively programming T cell differentiation. The research, a collaboration involving UC San Diego, the Salk Institute, and The University of North Carolina at Chapel Hill, leverages a comprehensive transcriptional and epigenetic atlas generated from RNA-seq and ATAC-seq data. The atlas helps predict transcription factor (TF) activity and define differentiation trajectories, aiming to identify TFs that can control specific T cell states, such as terminally exhausted and tissue-resident memory T cells, for potential therapeutic applications in areas like cancer and viral infections.
|
| 3 |
+
|
| 4 |
+
TF Catalog - Search TF Scores:
|
| 5 |
+
This webpage provides a tool to "Search TF Scores" related to T cell differentiation. It features a diagram illustrating the "Memory path" and "Exhaustion path" of T cells, including states like Naive, Memory Precursor (MP), Effector T cell (TE), Tissue-Resident Memory (TRM), Terminal Effector (TEM), Progenitor Exhausted (TEXprog), and Terminally Exhausted (TEX). The core of the page is a searchable table displaying "TF activity score" for various transcription factors (TFs) across different T cell states and datasets. This allows users to explore and compare the activity levels of specific TFs in distinct T cell populations, aiding in the understanding of T cell fate decisions.
|
| 6 |
+
|
| 7 |
+
TF Catalog - Cell State Specific TF Catalog:
|
| 8 |
+
This webpage displays "Naive Specific Cells & normalized TF Activity Scores" as part of a "Cell State Specific TF Catalog." It features a dot plot visualization where each row represents a transcription factor (TF) and each column likely represents a specific sample or condition within naive T cells. The color intensity of the dots corresponds to the normalized TF activity score (PageRank score), while the size of the dots indicates the log-transformed gene expression level (TPM). This allows users to explore and compare the activity and expression of various TFs within the naive T cell state.
|
| 9 |
+
|
| 10 |
+
TF Catalog - Multi-State TFs
|
| 11 |
+
This webpage displays a series of heatmaps visualizing normalized PageRank scores, likely representing transcription factor activity, across various T cell differentiation states (Naive, MP, TE, TRM, TEM, TEXprog, TEXeff, TEXterm). The heatmaps are segmented into categories such as "Shared in cell states from acute infection," "Shared in cell states from chronic infection," and specific T cell subsets like "TRM & TEXPROG" and "MP, TE, TEXPROG." This presentation allows for the comparative analysis of transcription factor activity profiles across different T cell populations and under varying immunological contexts, revealing potential regulatory patterns.
|
| 12 |
+
|
| 13 |
+
TF Wave Analysis:
|
| 14 |
+
This webpage, titled "TF Wave Analysis," is dedicated to exploring the dynamic activity patterns of transcription factors (TFs) during T cell differentiation. It presents a series of visualizations, referred to as "TF Waves" (Wave 1, Wave 2, etc.), which illustrate how the activity of different sets of TFs changes as T cells transition through various states (Naive, MP, TRM, TEM, TEXprog, TEXterm). These waves are depicted on diagrams of T cell differentiation pathways, with color intensity and accompanying bar graphs likely indicating the strength or timing of TF activity. A table at the bottom of the page lists specific TFs and their association with these identified waves, allowing users to understand the sequential and coordinated roles of TFs in orchestrating T cell fate.
|
| 15 |
+
|
| 16 |
+
TF Network Analysis - Search TF-TF Correlation in TRM/TEXterm:
|
| 17 |
+
This webpage provides a tool to "Search TF-TF Correlation in TRM/TEXterm," allowing users to explore interactions and correlations between transcription factors (TFs) specifically within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cell states.
|
| 18 |
+
|
| 19 |
+
The page explains that it uses data from ChIP-seq and Hi-C to build TF interaction networks. It visualizes these relationships, showing how a "TF-regulatee network" and a "TF X TF correlation matrix" contribute to understanding "TF-TF association." Users can enter a transcription factor of interest to search for its correlations. A key explains the network visualization: circle color indicates TF specificity to TRM (green) or TEXterm (brown), line thickness denotes interaction intensity, and line color shows if the interaction is found in TRM (green) or TEXterm (brown). This tool aims to identify cooperations between DNA-binding proteins in these specific T cell states.
|
| 20 |
+
|
| 21 |
+
TF Network Analysis - TF Community in TRM/TEXterm:
|
| 22 |
+
This webpage focuses on "TF Community in TRM/TEXterm," illustrating how transcription factor (TF) associations are analyzed through clustering to identify distinct TF communities within Tissue-Resident Memory (TRM) and Terminally Exhausted (TEXterm) T cells.
|
| 23 |
+
|
| 24 |
+
The page displays several network visualizations: one showing combined TRM and TEXterm TF communities and their interconnections, another detailing TRM-specific TF-TF interactions organized into communities (C1-C5), and a third depicting TEXterm-specific TF-TF interactions, also grouped into communities (C1-C5).
|
| 25 |
+
|
| 26 |
+
Below these networks, tables likely provide details about the TFs that constitute each identified community. Furthermore, the webpage highlights "Shared pathways" between TRM and TEXterm communities and "Enriched pathways" specific to either the TRM or TEXterm TF networks, linking these TF communities to biological functions such as "IL-2 production," "cell-cell adhesion," "T cell activation," "intrinsic apoptosis," and "catabolism." This allows for an understanding of the collaborative roles of TFs in regulating distinct cellular processes within these two T cell states.
|
| 27 |
+
|
| 28 |
+
Multi-omics Data:
|
| 29 |
+
This webpage, titled "Multi-omics Data," presents a comprehensive, scrollable table cataloging various experimental datasets relevant to T cell research. Each entry in the table details specific studies, including information such as the primary author, laboratory, publication year, data accession number, type of data (e.g., RNA-seq, ATAC-seq), biological species (primarily mouse), infection model used (e.g., LCMV Arm), and the specific T cell populations analyzed (e.g., Naive, MP, TE, TRM, TexProg, TexTerm) along with their defining markers or characteristics. The page includes features for searching and adjusting the number of displayed entries, indicating it serves as an interactive repository for accessing and reviewing details of diverse multi-omics datasets.
|
| 30 |
+
|