Spaces:

holistic-ai
/

AgentGraph

Running

App Files Files Community

wu981526092 commited on Sep 6, 2025

Commit

c469fc9

1 Parent(s): 1d4dcf3

add

Browse files

Files changed (1) hide show

agentgraph/methods/production/openai_structured_extractor.py +78 -28

agentgraph/methods/production/openai_structured_extractor.py CHANGED Viewed

@@ -57,15 +57,27 @@ class OpenAIStructuredExtractor:
         # System prompt - focus on your role and methodology
         system_prompt = """You are an expert knowledge graph analyst specializing in agent system traces.
-Your task is to extract structured knowledge graphs from agent execution traces. You identify entities (Agents, Tasks, Tools, Inputs, Outputs, Humans) and their relationships, providing precise content references when line markers are available.
-CRITICAL PRINCIPLES:
-1. COMPREHENSIVENESS: Include ALL entities that play any role in the system, no matter how minor
-2. CONSISTENCY: Follow the example's level of detail and thoroughness
-3. COMPLETENESS: Every named agent, tool, task, input, and output should be captured
-4. ACCURACY: Match entity types and relationships to the actual trace content
-You always return a complete knowledge graph with meaningful entities, logical relationships, and accurate metadata."""
         # User prompt - specific instructions with few-shot example and data
         user_prompt = f"""Analyze this agent system trace and extract a knowledge graph with the following specifications:
@@ -116,6 +128,14 @@ Here's the expected knowledge graph structure for multi-agent collaboration trac
       "raw_prompt": "Verify the accuracy of the provided costs for a daily ticket and a season pass for California's Great America in San Jose for the summer of 2024.",
       "raw_prompt_ref": [{{"line_start": 8, "line_end": 8}}, {{"line_start": 10, "line_end": 10}}, {{"line_start": 11, "line_end": 12}}]
     }},
     {{
       "id": "input_001",
       "type": "Input",
@@ -189,25 +209,43 @@ Here's the expected knowledge graph structure for multi-agent collaboration trac
     }},
     {{
       "id": "relation_006",
-      "source": "agent_002",
-      "target": "task_001",
-      "type": "INTERVENES",
       "importance": "HIGH",
-      "interaction_prompt": "Agent intervenes in the task process to provide verification and validation",
       "interaction_prompt_ref": [{{"line_start": 164, "line_end": 164}}]
     }},
     {{
       "id": "relation_007",
-      "source": "agent_001",
       "target": "agent_004",
-      "type": "USES",
       "importance": "MEDIUM",
-      "interaction_prompt": "Agent uses Computer Terminal for computational tasks",
       "interaction_prompt_ref": [{{"line_start": 50, "line_end": 55}}]
     }}
   ],
-  "failures": [],
-  "optimizations": []
 }}
 Now analyze the following trace data:
@@ -253,20 +291,32 @@ IMPORTANT: Only create content references when you see explicit <L#> line marker
 Also provide system_name and system_summary for the overall system.
-EXTRACTION GUIDELINES:
-1. AGENT IDENTIFICATION: Include every named agent, expert, assistant, or role mentioned in the trace
-2. TOOL DISCOVERY: Capture all computational tools, terminals, systems, analyzers, or utilities
-3. TASK MAPPING: Identify all tasks, objectives, goals, verification steps, and subtasks
-4. INTERACTION TRACKING: Include all inputs, outputs, intermediate results, and data flows
-5. STAKEHOLDER INCLUSION: Identify all human users, requesters, and beneficiaries
-QUALITY STANDARDS:
-- Match the example's thoroughness and attention to detail
-- Include entities even if they appear briefly or seem minor
-- Ensure every significant component of the system is represented
-- Create meaningful relationships that reflect actual interactions
-Apply these principles to extract a comprehensive knowledge graph from the following trace data.
 TRACE DATA:
 {input_data}"""

         # System prompt - focus on your role and methodology
         system_prompt = """You are an expert knowledge graph analyst specializing in agent system traces.
+Extract comprehensive knowledge graphs capturing all entities and their precise relationships. Focus on workflow accuracy and relationship completeness.
+CORE PRINCIPLES:
+1. Capture ALL participants: agents, tools, tasks, inputs, outputs, humans
+2. Use professional naming (spaces, not underscores)
+3. Map complete workflows: Input → Agents perform Tasks → Output → Human
+4. Connect sequential tasks with NEXT relationships
+5. Show tool dependencies with REQUIRED_BY relationships
+6. Identify failures: errors, mistakes, broken processes, incorrect outputs
+7. Suggest optimizations: improvements, efficiency gains, better approaches
+RELATIONSHIP TYPES (use exactly these):
+- CONSUMED_BY: Input consumed by agent
+- PERFORMS: Agent performs task
+- NEXT: Task A leads to Task B (critical for workflow)
+- PRODUCES: Task produces output
+- DELIVERS_TO: Output delivered to human
+- REQUIRED_BY: Task needs tool to execute (not USES)
+- USES: Agent uses tool for general support
+Return complete, accurate knowledge graphs with proper workflow sequences."""
         # User prompt - specific instructions with few-shot example and data
         user_prompt = f"""Analyze this agent system trace and extract a knowledge graph with the following specifications:
       "raw_prompt": "Verify the accuracy of the provided costs for a daily ticket and a season pass for California's Great America in San Jose for the summer of 2024.",
       "raw_prompt_ref": [{{"line_start": 8, "line_end": 8}}, {{"line_start": 10, "line_end": 10}}, {{"line_start": 11, "line_end": 12}}]
     }},
+    {{
+      "id": "task_002",
+      "type": "Task",
+      "name": "Calculate Savings Amount",
+      "importance": "HIGH",
+      "raw_prompt": "Calculate the amount saved by purchasing a season pass instead of daily tickets for 4 visits.",
+      "raw_prompt_ref": [{{"line_start": 119, "line_end": 126}}]
+    }},
     {{
       "id": "input_001",
       "type": "Input",
     }},
     {{
       "id": "relation_006",
+      "source": "task_001",
+      "target": "task_002",
+      "type": "NEXT",
       "importance": "HIGH",
+      "interaction_prompt": "Verification task leads to arithmetic calculation task",
       "interaction_prompt_ref": [{{"line_start": 164, "line_end": 164}}]
     }},
     {{
       "id": "relation_007",
+      "source": "task_002",
       "target": "agent_004",
+      "type": "REQUIRED_BY",
       "importance": "MEDIUM",
+      "interaction_prompt": "Calculation task requires computer terminal for execution",
       "interaction_prompt_ref": [{{"line_start": 50, "line_end": 55}}]
     }}
   ],
+  "failures": [
+    {{
+      "id": "failure_001",
+      "description": "Verification Expert failed to access real-time pricing data, relying on potentially outdated cost estimates",
+      "raw_text": "However, since I am currently unable to access external websites, I will use the provided cost",
+      "raw_text_ref": [],
+      "affected_id": "agent_002",
+      "risk_type": "RETRIEVAL_ERROR"
+    }}
+  ],
+  "optimizations": [
+    {{
+      "id": "opt_001",
+      "description": "Implement automated price verification system to reduce manual verification overhead and improve accuracy",
+      "raw_text": "Enhanced price verification with real-time data access",
+      "raw_text_ref": [],
+      "affected_ids": ["agent_002", "task_001"],
+      "recommendation_type": "TOOL_ENHANCEMENT"
+    }}
+  ]
 }}
 Now analyze the following trace data:
 Also provide system_name and system_summary for the overall system.
+EXTRACTION FOCUS:
+1. Identify ALL named participants (agents, tools, tasks, inputs, outputs, human)
+2. Create sequential task chains: Task1 NEXT Task2 NEXT Task3
+3. Show tool dependencies: Task REQUIRED_BY Tool (when task needs tool to execute)
+4. Use clean professional naming (no underscores)
+5. Complete workflow: Input CONSUMED_BY Agent PERFORMS Task PRODUCES Output DELIVERS_TO Human
+6. DETECT FAILURES: Look for errors, exceptions, incorrect results, failed executions, incomplete tasks, missing validations
+7. SUGGEST OPTIMIZATIONS: Identify inefficiencies, redundancies, improvement opportunities, missing tools, workflow enhancements
+CRITICAL: Use NEXT for task sequences, REQUIRED_BY for tool dependencies.
+FAILURE DETECTION (look for):
+- Execution errors, exceptions, failed operations
+- Incorrect outputs, wrong calculations, invalid results
+- Missing validations, incomplete processes
+- Tool limitations, access restrictions
+- Agent coordination problems
+OPTIMIZATION OPPORTUNITIES (suggest):
+- Tool enhancements, automation possibilities
+- Workflow simplifications, redundancy removal
+- Agent merging or specialization
+- Performance improvements, efficiency gains
+- Missing capabilities or better approaches
+IMPORTANT: Always provide at least 1-2 failures and optimizations based on trace analysis.
 TRACE DATA:
 {input_data}"""