Spaces:

holistic-ai
/

AgentGraph

Sleeping

wu981526092 commited on Sep 1, 2025

Commit

fd2d92b

1 Parent(s): d72d57d

🎨 Complete Sample Data Structure Overhaul

✅ 问题解决:
• 分析了真实知识图谱数据结构，发现示例数据过于简化
• 缺少重要字段：importance, raw_prompt, raw_prompt_ref, failures, optimizations
• 缺少system_name, system_summary和完整的metadata
• Entity类型应该大写，relations需要interaction_prompt字段

🔍 真实结构分析:
• Entity: 需要importance、完整raw_prompt、content references
• Relations: 需要importance、interaction_prompt、interaction_prompt_ref
• 必须包含failures数组用于错误分析
• 必须包含optimizations数组用于改进建议
• System级别需要描述性name和summary
• Metadata包含processing_params、trace_info等完整信息

🎯 新示例数据特性:
1. Financial Analysis System (多智能体SQL分析)
- 7个entities: SQL Expert, Business Analyst, Data Validator等
- 6个relations: 包含完整交互流程
- 1个failure: 数据异常检测
- 1个optimization: 自动异常检测建议

2. Email Processing System (智能邮件处理)
- 8个entities: 分类器、搜索代理、响应生成器等
- 7个relations: 完整的邮件处理流程
- 2个optimizations: 流程改进建议

🚀 技术改进:
• 所有entities包含realistic的raw_prompt内容
• Content references with line_start/line_end/confidence
• Professional system summaries with指代词
• Complete metadata matching production format
• 符合真实AgentGraph输出的数据结构

💡 用户体验提升:
• 示例traces展示真实multi-agent协作场景
• Knowledge graphs有完整的颜色编码和层次结构
• Failures和optimizations提供有意义的insights
• 用户可以立即体验完整的AgentGraph功能

Files changed (4) hide show

backend/database/init_db.py +2 -2
backend/database/sample_data.py +770 -510
backend/database/sample_data_old.py +623 -0
fix_sample_kgs.py +91 -0

backend/database/init_db.py CHANGED Viewed

@@ -335,13 +335,13 @@ def init_database(reset=False, force=False):
         logger.info("Empty database detected, inserting sample data for better user experience...")
         try:
             # Import here to avoid circular imports
-            from .sample_data import insert_sample_data
             from . import SessionLocal
             # Use SQLAlchemy session for sample data insertion
             session = SessionLocal()
             try:
-                results = insert_sample_data(session)
                 session.commit()
             except Exception as e:
                 session.rollback()

         logger.info("Empty database detected, inserting sample data for better user experience...")
         try:
             # Import here to avoid circular imports
+            from .sample_data import insert_enhanced_sample_data
             from . import SessionLocal
             # Use SQLAlchemy session for sample data insertion
             session = SessionLocal()
             try:
+                results = insert_enhanced_sample_data(session)
                 session.commit()
             except Exception as e:
                 session.rollback()

backend/database/sample_data.py CHANGED Viewed

@@ -1,539 +1,796 @@
 #!/usr/bin/env python
 """
-Sample data for database initialization.
-Provides curated examples of traces and knowledge graphs for new users.
 """
 import json
 import logging
-from typing import Dict, List, Any
 logger = logging.getLogger(__name__)
-# Sample traces with increasing complexity
-SAMPLE_TRACES = [
     {
-        "filename": "sample_basic_question.txt",
-        "title": "Basic Q&A: California Great America Season Pass",
-        "description": "Simple arithmetic calculation comparing season pass vs daily tickets",
-        "trace_type": "conversation",
-        "trace_source": "sample_data",
-        "tags": ["arithmetic", "simple", "calculation"],
-        "content": """User: How much did I save by purchasing a season pass instead of daily tickets for California's Great America in San Jose, if I planned to visit once a month in June, July, August, and September during the summer of 2024?
-Assistant: I'd be happy to help you calculate your savings! To give you an accurate answer, I need to find the current ticket prices for California's Great America.
-Let me break this down:
-- You planned to visit once a month for 4 months (June, July, August, September)
-- So you need 4 daily tickets total
-Based on 2024 pricing:
-- Daily ticket: approximately $60
-- Season pass: approximately $120
-Calculation:
-- Cost of 4 daily tickets: 4 × $60 = $240
-- Cost of season pass: $120
-- Your savings: $240 - $120 = $120
-You saved $120 by purchasing the season pass instead of individual daily tickets!
-The season pass was definitely the smart choice for 4+ visits."""
     },
     {
-        "filename": "sample_research_question.txt",
-        "title": "Research Task: Restaurant Near State Park",
-        "description": "Multi-step research involving location search and business hours verification",
-        "trace_type": "research",
         "trace_source": "sample_data",
-        "tags": ["research", "location", "business_hours"],
-        "content": """User: What is the closest eatery to Harkness Memorial State Park that is still open at 11pm on Wednesdays?
-Assistant: I'll help you find the closest eatery to Harkness Memorial State Park in Waterford, Connecticut that stays open until 11pm on Wednesdays.
-Let me research this step by step:
-1. **Location**: Harkness Memorial State Park is located at 275 Great Neck Rd, Waterford, CT 06385
-2. **Nearby eateries**: Searching for restaurants within a reasonable distance...
-3. **Wednesday hours verification**: Checking which places stay open until 11pm on Wednesdays...
-After researching local eateries and their hours:
-**Answer: McDonald's** (located at 1465 Hartford Turnpike, Waterford, CT)
-- Distance: Approximately 3.2 miles from Harkness Memorial State Park
-- Wednesday hours: Open 24 hours
-- This is the closest eatery that meets your 11pm requirement
-Most local restaurants in the Waterford area close by 9pm on weekdays, making McDonald's the most reliable option for late dining near the park."""
     },
     {
-        "filename": "sample_complex_task.txt",
-        "title": "Complex Analysis: Multi-Agent Collaboration",
-        "description": "Advanced task showing multi-agent collaboration with tool usage and reasoning",
-        "trace_type": "multi_agent",
-        "trace_source": "sample_data",
-        "tags": ["complex", "multi_agent", "collaboration", "tools"],
-        "content": """User: Analyze the efficiency of different sorting algorithms for a dataset of 10,000 random integers.
-Planning Agent: I'll break this task into components:
-1. Generate test data (10,000 random integers)
-2. Implement multiple sorting algorithms
-3. Measure performance metrics
-4. Compare and analyze results
-Code Agent: ```python
-import random
-import time
-import sys
-# Generate test data
-test_data = [random.randint(1, 100000) for _ in range(10000)]
-# Bubble Sort
-def bubble_sort(arr):
-    n = len(arr)
-    for i in range(n):
-        for j in range(0, n-i-1):
-            if arr[j] > arr[j+1]:
-                arr[j], arr[j+1] = arr[j+1], arr[j]
-    return arr
-# Quick Sort
-def quick_sort(arr):
-    if len(arr) <= 1:
-        return arr
-    pivot = arr[len(arr) // 2]
-    left = [x for x in arr if x < pivot]
-    middle = [x for x in arr if x == pivot]
-    right = [x for x in arr if x > pivot]
-    return quick_sort(left) + middle + quick_sort(right)
-# Merge Sort
-def merge_sort(arr):
-    if len(arr) <= 1:
-        return arr
-    mid = len(arr) // 2
-    left = merge_sort(arr[:mid])
-    right = merge_sort(arr[mid:])
-    return merge(left, right)
-def merge(left, right):
-    result = []
-    i = j = 0
-    while i < len(left) and j < len(right):
-        if left[i] <= right[j]:
-            result.append(left[i])
-            i += 1
-        else:
-            result.append(right[j])
-            j += 1
-    result.extend(left[i:])
-    result.extend(right[j:])
-    return result
-```
-Testing Agent: Running performance tests...
-Results:
-- **Python Built-in Sort**: 0.003 seconds
-- **Merge Sort**: 0.045 seconds
-- **Quick Sort**: 0.032 seconds
-- **Bubble Sort**: 12.456 seconds (extremely slow for 10k elements)
-Analysis Agent: **Performance Analysis:**
-1. **Winner: Python's built-in sort** (Timsort)
-   - Highly optimized hybrid algorithm
-   - Best for real-world usage
-2. **Quick Sort**: Good general-purpose algorithm
-   - Average O(n log n), worst-case O(n²)
-   - Fast for random data
-3. **Merge Sort**: Consistent performance
-   - Always O(n log n)
-   - Stable sort, good for linked lists
-4. **Bubble Sort**: Educational only
-   - O(n²) complexity makes it impractical
-   - 4000x slower than optimized algorithms
-**Recommendation**: Use Python's built-in `sorted()` or `.sort()` for production code, as it's specifically optimized for real-world data patterns."""
     }
 ]
-# Sample knowledge graphs corresponding to the traces
-SAMPLE_KNOWLEDGE_GRAPHS = [
     {
-        "filename": "kg_basic_question_001.json",
-        "trace_index": 0,  # Links to first trace
         "graph_data": {
             "entities": [
                 {
-                    "id": "user",
-                    "type": "agent",
-                    "name": "User",
-                    "properties": {"role": "question_asker"}
-                },
-                {
-                    "id": "assistant",
-                    "type": "agent",
-                    "name": "Assistant",
-                    "properties": {"role": "problem_solver"}
-                },
-                {
-                    "id": "california_great_america",
-                    "type": "location",
-                    "name": "California's Great America",
-                    "properties": {"city": "San Jose", "type": "amusement_park"}
-                },
-                {
-                    "id": "season_pass",
-                    "type": "product",
-                    "name": "Season Pass",
-                    "properties": {"price": "$120", "validity": "full_season"}
-                },
-                {
-                    "id": "daily_ticket",
-                    "type": "product",
-                    "name": "Daily Ticket",
-                    "properties": {"price": "$60", "validity": "single_day"}
-                },
-                {
-                    "id": "calculation_task",
-                    "type": "task",
-                    "name": "Price Comparison Calculation",
-                    "properties": {"type": "arithmetic", "complexity": "simple"}
                 }
             ],
             "relations": [
                 {
-                    "id": "asks_question",
-                    "source": "user",
-                    "target": "assistant",
-                    "type": "asks",
-                    "properties": {"question_type": "calculation"}
-                },
-                {
-                    "id": "performs_calculation",
-                    "source": "assistant",
-                    "target": "calculation_task",
-                    "type": "executes",
-                    "properties": {"method": "arithmetic"}
-                },
-                {
-                    "id": "compares_products",
-                    "source": "calculation_task",
-                    "target": "season_pass",
-                    "type": "compares",
-                    "properties": {"comparison_type": "cost_benefit"}
-                },
-                {
-                    "id": "compares_products_alt",
-                    "source": "calculation_task",
-                    "target": "daily_ticket",
-                    "type": "compares",
-                    "properties": {"comparison_type": "cost_benefit"}
-                },
-                {
-                    "id": "relates_to_location",
-                    "source": "season_pass",
-                    "target": "california_great_america",
-                    "type": "valid_at",
-                    "properties": {"access_type": "unlimited"}
                 }
-            ]
-        }
-    },
-    {
-        "filename": "kg_research_question_001.json",
-        "trace_index": 1,  # Links to second trace
-        "graph_data": {
-            "entities": [
-                {
-                    "id": "user",
-                    "type": "agent",
-                    "name": "User",
-                    "properties": {"role": "information_seeker"}
-                },
-                {
-                    "id": "assistant",
-                    "type": "agent",
-                    "name": "Assistant",
-                    "properties": {"role": "researcher"}
-                },
-                {
-                    "id": "harkness_park",
-                    "type": "location",
-                    "name": "Harkness Memorial State Park",
-                    "properties": {"address": "275 Great Neck Rd, Waterford, CT 06385", "type": "state_park"}
-                },
-                {
-                    "id": "mcdonalds",
-                    "type": "business",
-                    "name": "McDonald's",
-                    "properties": {"address": "1465 Hartford Turnpike, Waterford, CT", "hours": "24/7", "type": "restaurant"}
-                },
-                {
-                    "id": "research_task",
-                    "type": "task",
-                    "name": "Location and Hours Research",
-                    "properties": {"type": "multi_step_research", "complexity": "medium"}
-                },
-                {
-                    "id": "time_constraint",
-                    "type": "constraint",
-                    "name": "11pm Wednesday Hours",
-                    "properties": {"day": "Wednesday", "time": "11:00 PM"}
                 }
             ],
-            "relations": [
-                {
-                    "id": "requests_research",
-                    "source": "user",
-                    "target": "assistant",
-                    "type": "requests",
-                    "properties": {"request_type": "location_research"}
-                },
-                {
-                    "id": "conducts_research",
-                    "source": "assistant",
-                    "target": "research_task",
-                    "type": "executes",
-                    "properties": {"method": "systematic_search"}
-                },
-                {
-                    "id": "research_focuses_on",
-                    "source": "research_task",
-                    "target": "harkness_park",
-                    "type": "focuses_on",
-                    "properties": {"search_radius": "nearby_area"}
-                },
-                {
-                    "id": "applies_constraint",
-                    "source": "research_task",
-                    "target": "time_constraint",
-                    "type": "applies",
-                    "properties": {"filter_type": "business_hours"}
-                },
-                {
-                    "id": "identifies_result",
-                    "source": "research_task",
-                    "target": "mcdonalds",
-                    "type": "identifies",
-                    "properties": {"match_type": "best_option"}
-                },
-                {
-                    "id": "is_near",
-                    "source": "mcdonalds",
-                    "target": "harkness_park",
-                    "type": "located_near",
-                    "properties": {"distance": "3.2 miles"}
                 }
-            ]
         }
     },
     {
-        "filename": "kg_complex_task_001.json",
-        "trace_index": 2,  # Links to third trace
         "graph_data": {
             "entities": [
                 {
-                    "id": "user",
-                    "type": "agent",
-                    "name": "User",
-                    "properties": {"role": "task_requester"}
-                },
-                {
-                    "id": "planning_agent",
-                    "type": "agent",
-                    "name": "Planning Agent",
-                    "properties": {"role": "task_decomposer", "specialization": "planning"}
-                },
-                {
-                    "id": "code_agent",
-                    "type": "agent",
-                    "name": "Code Agent",
-                    "properties": {"role": "implementer", "specialization": "programming"}
-                },
-                {
-                    "id": "testing_agent",
-                    "type": "agent",
-                    "name": "Testing Agent",
-                    "properties": {"role": "evaluator", "specialization": "performance_testing"}
-                },
-                {
-                    "id": "analysis_agent",
-                    "type": "agent",
-                    "name": "Analysis Agent",
-                    "properties": {"role": "analyzer", "specialization": "result_interpretation"}
-                },
-                {
-                    "id": "sorting_analysis_task",
-                    "type": "task",
-                    "name": "Sorting Algorithm Analysis",
-                    "properties": {"type": "algorithm_comparison", "complexity": "high", "dataset_size": "10000"}
-                },
-                {
-                    "id": "bubble_sort",
-                    "type": "algorithm",
-                    "name": "Bubble Sort",
-                    "properties": {"complexity": "O(n²)", "performance": "12.456s", "category": "comparison_sort"}
-                },
-                {
-                    "id": "quick_sort",
-                    "type": "algorithm",
-                    "name": "Quick Sort",
-                    "properties": {"complexity": "O(n log n)", "performance": "0.032s", "category": "divide_conquer"}
-                },
-                {
-                    "id": "merge_sort",
-                    "type": "algorithm",
-                    "name": "Merge Sort",
-                    "properties": {"complexity": "O(n log n)", "performance": "0.045s", "category": "divide_conquer"}
-                },
-                {
-                    "id": "python_sort",
-                    "type": "algorithm",
-                    "name": "Python Built-in Sort (Timsort)",
-                    "properties": {"complexity": "O(n log n)", "performance": "0.003s", "category": "hybrid_optimized"}
                 }
             ],
             "relations": [
                 {
-                    "id": "requests_analysis",
-                    "source": "user",
-                    "target": "planning_agent",
-                    "type": "requests",
-                    "properties": {"request_type": "algorithm_analysis"}
-                },
-                {
-                    "id": "decomposes_task",
-                    "source": "planning_agent",
-                    "target": "sorting_analysis_task",
-                    "type": "decomposes",
-                    "properties": {"decomposition_type": "step_by_step"}
-                },
-                {
-                    "id": "delegates_implementation",
-                    "source": "planning_agent",
-                    "target": "code_agent",
-                    "type": "delegates",
-                    "properties": {"task_type": "algorithm_implementation"}
-                },
-                {
-                    "id": "implements_algorithms",
-                    "source": "code_agent",
-                    "target": "bubble_sort",
-                    "type": "implements",
-                    "properties": {"language": "python"}
-                },
-                {
-                    "id": "implements_algorithms_2",
-                    "source": "code_agent",
-                    "target": "quick_sort",
-                    "type": "implements",
-                    "properties": {"language": "python"}
-                },
-                {
-                    "id": "implements_algorithms_3",
-                    "source": "code_agent",
-                    "target": "merge_sort",
-                    "type": "implements",
-                    "properties": {"language": "python"}
-                },
-                {
-                    "id": "delegates_testing",
-                    "source": "planning_agent",
-                    "target": "testing_agent",
-                    "type": "delegates",
-                    "properties": {"task_type": "performance_evaluation"}
-                },
-                {
-                    "id": "tests_performance",
-                    "source": "testing_agent",
-                    "target": "bubble_sort",
-                    "type": "tests",
-                    "properties": {"metric": "execution_time"}
-                },
-                {
-                    "id": "tests_performance_2",
-                    "source": "testing_agent",
-                    "target": "quick_sort",
-                    "type": "tests",
-                    "properties": {"metric": "execution_time"}
-                },
-                {
-                    "id": "tests_performance_3",
-                    "source": "testing_agent",
-                    "target": "merge_sort",
-                    "type": "tests",
-                    "properties": {"metric": "execution_time"}
-                },
-                {
-                    "id": "tests_performance_4",
-                    "source": "testing_agent",
-                    "target": "python_sort",
-                    "type": "tests",
-                    "properties": {"metric": "execution_time"}
-                },
-                {
-                    "id": "delegates_analysis",
-                    "source": "planning_agent",
-                    "target": "analysis_agent",
-                    "type": "delegates",
-                    "properties": {"task_type": "result_interpretation"}
-                },
-                {
-                    "id": "analyzes_results",
-                    "source": "analysis_agent",
-                    "target": "sorting_analysis_task",
-                    "type": "analyzes",
-                    "properties": {"analysis_type": "comparative_performance"}
-                },
-                {
-                    "id": "outperforms",
-                    "source": "python_sort",
-                    "target": "quick_sort",
-                    "type": "outperforms",
-                    "properties": {"factor": "10x_faster"}
-                },
-                {
-                    "id": "outperforms_2",
-                    "source": "python_sort",
-                    "target": "merge_sort",
-                    "type": "outperforms",
-                    "properties": {"factor": "15x_faster"}
-                },
-                {
-                    "id": "outperforms_3",
-                    "source": "python_sort",
-                    "target": "bubble_sort",
-                    "type": "outperforms",
-                    "properties": {"factor": "4000x_faster"}
                 }
-            ]
         }
     }
 ]
-def insert_sample_data(session, force_insert=False):
     """
-    Insert sample traces and knowledge graphs into the database.
-    Args:
-        session: Database session
-        force_insert: If True, insert even if data already exists
-    Returns:
-        Dict with insertion results
     """
-    from backend.database.utils import save_trace, save_knowledge_graph
-    from backend.database.models import Trace, KnowledgeGraph
     results = {
         "traces_inserted": 0,
         "knowledge_graphs_inserted": 0,
@@ -546,22 +803,21 @@ def insert_sample_data(session, force_insert=False):
         existing_sample = session.query(Trace).filter(
             Trace.trace_source == "sample_data"
         ).first()
         if existing_sample:
-            logger.info("Sample data already exists, skipping insertion")
-            results["skipped"] = len(SAMPLE_TRACES)
             return results
     try:
-        # Insert sample traces
         trace_ids = []
-        for i, trace_data in enumerate(SAMPLE_TRACES):
             try:
                 trace = save_trace(
                     session=session,
                     content=trace_data["content"],
                     filename=trace_data["filename"],
-                    title=trace_data["title"],
                     description=trace_data["description"],
                     trace_type=trace_data["trace_type"],
                     trace_source=trace_data["trace_source"],
@@ -569,14 +825,14 @@ def insert_sample_data(session, force_insert=False):
                 )
                 trace_ids.append(trace.trace_id)
                 results["traces_inserted"] += 1
-                logger.info(f"Inserted sample trace: {trace_data['title']}")
             except Exception as e:
-                error_msg = f"Error inserting trace {i}: {str(e)}"
                 logger.error(error_msg)
                 results["errors"].append(error_msg)
-        # Insert corresponding knowledge graphs
-        for kg_data in SAMPLE_KNOWLEDGE_GRAPHS:
             try:
                 trace_index = kg_data["trace_index"]
                 if trace_index < len(trace_ids):
@@ -586,38 +842,42 @@ def insert_sample_data(session, force_insert=False):
                         graph_data=kg_data["graph_data"],
                         trace_id=trace_ids[trace_index],
                         window_index=None,  # None for final KG
-                        window_total=1,     # Not None to indicate it's a processed final KG
                         is_original=True
                     )
                     results["knowledge_graphs_inserted"] += 1
-                    logger.info(f"Inserted sample knowledge graph: {kg_data['filename']}")
             except Exception as e:
-                error_msg = f"Error inserting knowledge graph {kg_data['filename']}: {str(e)}"
                 logger.error(error_msg)
                 results["errors"].append(error_msg)
-        logger.info(f"Sample data insertion completed: {results}")
     except Exception as e:
-        error_msg = f"Fatal error during sample data insertion: {str(e)}"
         logger.error(error_msg)
         results["errors"].append(error_msg)
         raise  # Re-raise to trigger rollback in calling code
     return results
-def get_sample_data_info():
     """
-    Get information about the available sample data.
-    Returns:
-        Dict with sample data statistics
     """
     return {
-        "traces_count": len(SAMPLE_TRACES),
-        "knowledge_graphs_count": len(SAMPLE_KNOWLEDGE_GRAPHS),
-        "trace_types": list(set(t["trace_type"] for t in SAMPLE_TRACES)),
-        "complexity_levels": ["simple", "medium", "high"],
-        "description": "Curated examples showing basic Q&A, research tasks, and multi-agent collaboration"
-    }

 #!/usr/bin/env python
 """
+Improved sample data based on real knowledge graph structure analysis.
+This data matches the actual format used by the production system.
 """
 import json
 import logging
+from typing import Dict, Any, List
+from backend.database.utils import save_trace, save_knowledge_graph
+from backend.database.models import Trace, KnowledgeGraph
 logger = logging.getLogger(__name__)
+# Enhanced sample traces that will produce realistic knowledge graphs
+ENHANCED_SAMPLE_TRACES = [
     {
+        "filename": "financial_analysis_conversation.txt",
+        "title": "Financial Analysis: Quarterly Budget Review",
+        "description": "Multi-agent collaboration for analyzing quarterly budget data with SQL analysis and business insights",
+        "content": """=== Financial Analysis Agent System ===
+System Configuration: Multi-agent financial analysis system
+=== Agent Definitions ===
+Agent: SQL_Expert
+Role: Database analyst specializing in financial data queries
+Capabilities: Complex SQL queries, data validation, performance optimization
+Backstory: You are an expert SQL developer with 10+ years of experience in financial data analysis. You specialize in creating efficient queries for large datasets and ensuring data accuracy.
+Agent: Business_Analyst
+Role: Financial analyst with expertise in budget analysis and KPI reporting
+Capabilities: Financial modeling, trend analysis, business intelligence reporting
+Backstory: You are a senior business analyst with deep knowledge of financial metrics, budget planning, and quarterly reporting processes.
+Agent: Data_Validator
+Role: Quality assurance specialist ensuring data integrity and accuracy
+Capabilities: Data validation, anomaly detection, cross-referencing
+Backstory: You are a meticulous data quality specialist who ensures all financial reports meet audit standards and regulatory requirements.
+=== Task Execution ===
+User Request: "Analyze Q3 2024 budget performance vs Q3 2023, focusing on operational expenses and revenue trends"
+SQL_Expert: I'll create a comprehensive query to compare Q3 2024 vs Q3 2023 financial data:
+```sql
+SELECT
+    fy.fiscal_year,
+    fy.quarter,
+    SUM(CASE WHEN category = 'Revenue' THEN amount ELSE 0 END) as total_revenue,
+    SUM(CASE WHEN category = 'OpEx' THEN amount ELSE 0 END) as operational_expenses,
+    SUM(CASE WHEN category = 'Revenue' THEN amount ELSE 0 END) -
+    SUM(CASE WHEN category = 'OpEx' THEN amount ELSE 0 END) as net_margin
+FROM financial_data fd
+JOIN fiscal_years fy ON fd.period_id = fy.id
+WHERE fy.fiscal_year IN (2023, 2024)
+    AND fy.quarter = 'Q3'
+    AND fd.status = 'approved'
+GROUP BY fy.fiscal_year, fy.quarter
+ORDER BY fy.fiscal_year;
+```
+Business_Analyst: Based on the SQL results, I'll analyze the trends:
+Q3 2024 vs Q3 2023 Performance Analysis:
+- Revenue Growth: 12.5% YoY increase ($2.4M to $2.7M)
+- OpEx Control: 8.2% YoY increase ($1.8M to $1.95M)
+- Net Margin Improvement: 23.1% increase ($600K to $750K)
+Key Insights:
+1. Revenue growth outpaced expense growth (12.5% vs 8.2%)
+2. Operational efficiency improvements evident in margin expansion
+3. Cost control measures implemented in Q2 showing positive impact
+Data_Validator: Validation completed. Cross-referenced with:
+- General ledger entries: ✓ Match
+- Budget vs actual variance reports: ✓ Within 2% tolerance
+- Prior quarter reconciliations: ✓ Consistent methodology
+- Audit trail documentation: ✓ Complete
+Warning: Revenue spike in Week 3 of Q3 2024 needs explanation (unusual $200K transaction on 7/15/2024)
+Final Report: Q3 2024 shows strong performance with controlled operational expenses and healthy revenue growth. Recommend investigating the July revenue anomaly for next quarter's planning.
+""",
+        "trace_type": "multi_agent",
+        "trace_source": "sample_data",
+        "tags": ["financial_analysis", "sql", "business_intelligence", "multi_agent"]
     },
     {
+        "filename": "email_search_automation.txt",
+        "title": "Email Search Automation: Customer Support Workflow",
+        "description": "Intelligent email processing system with search capabilities, categorization, and automated responses",
+        "content": """=== Email Processing Agent System ===
+System Configuration: Automated email processing and search system
+=== Agent Definitions ===
+Agent: Email_Classifier
+Role: Email categorization specialist using NLP and pattern recognition
+Capabilities: Content analysis, priority scoring, category assignment, urgency detection
+Backstory: You are an AI specialist trained on thousands of customer support emails, capable of accurately categorizing and prioritizing incoming messages based on content, sentiment, and business rules.
+Agent: Search_Agent
+Role: Intelligent search coordinator with access to knowledge base and email archives
+Capabilities: Semantic search, query optimization, result ranking, context understanding
+Backstory: You are an expert search specialist with deep knowledge of information retrieval systems, capable of finding relevant information across multiple data sources and formats.
+Agent: Response_Generator
+Role: Customer communication specialist generating personalized responses
+Capabilities: Template customization, tone adaptation, policy compliance, multilingual support
+Backstory: You are a customer service expert with excellent communication skills, trained to generate helpful, professional, and empathetic responses that resolve customer issues efficiently.
+=== Tool Definitions ===
+Tool: email_search_engine
+Function: search_emails(query: str, filters: dict, limit: int) -> List[EmailResult]
+Purpose: Search through email archives using semantic search and metadata filters
+Parameters:
+- query: Natural language search query
+- filters: Date range, sender, subject, category filters
+- limit: Maximum number of results to return
+Tool: knowledge_base_search
+Function: search_kb(query: str, category: str) -> List[KBArticle]
+Purpose: Search internal knowledge base for policy information and solutions
+Parameters:
+- query: Search terms for knowledge base lookup
+- category: Product category or issue type filter
+Tool: response_template_engine
+Function: generate_response(template_id: str, variables: dict) -> str
+Purpose: Generate personalized email responses using approved templates
+Parameters:
+- template_id: Template identifier for response type
+- variables: Customer-specific information for personalization
+=== Task Execution Workflow ===
+Incoming Email: "I can't find my order confirmation from last month and need to return an item. Order was placed around March 15th for a laptop bag, but I never received the confirmation email."
+Email_Classifier: Analyzing incoming email...
+Category: ORDER_INQUIRY + RETURN_REQUEST
+Priority: MEDIUM (non-urgent return request)
+Sentiment: NEUTRAL (factual inquiry, no frustration indicators)
+Complexity: MEDIUM (requires order lookup + return policy)
+Estimated Resolution Time: 15-20 minutes
+Search_Agent: Executing search operations...
+1. Searching email archives for order confirmation:
+email_search_engine(
+    query="laptop bag order confirmation March 15",
+    filters={"date_range": "2024-03-10 to 2024-03-20", "type": "order_confirmation"},
+    limit=10
+)
+Result: Found 3 potential matches for laptop bag orders in March 2024
+2. Knowledge base lookup for return policy:
+knowledge_base_search(
+    query="laptop bag return policy procedure",
+    category="returns_exchanges"
+)
+Result: Return policy allows 30-day returns for accessories with original packaging
+Response_Generator: Generating personalized response...
+Template: ORDER_INQUIRY_WITH_RETURN
+Variables: {
+    "customer_name": "Valued Customer",
+    "order_date": "March 15, 2024",
+    "product": "laptop bag",
+    "return_window": "30 days",
+    "next_steps": "order lookup and return authorization"
+}
+Generated Response:
+"Dear Valued Customer,
+Thank you for contacting us regarding your laptop bag order from March 15, 2024. I understand you're looking for your order confirmation and would like to initiate a return.
+I've located your order in our system and will send you the confirmation details separately. For your return request, our policy allows returns within 30 days of purchase for accessories in original packaging.
+To proceed with your return:
+1. I'll email you a prepaid return label
+2. Package the item in its original packaging if available
+3. Schedule pickup or drop off at any authorized location
+Your return authorization number is RMA-2024-0847. Please reference this in all future communications.
+Is there anything else I can help you with today?
+Best regards,
+Customer Support Team"
+System Summary: Successfully processed order inquiry with return request. Customer issue resolved through automated search and personalized response generation.
+""",
+        "trace_type": "conversation",
         "trace_source": "sample_data",
+        "tags": ["email_automation", "customer_support", "search", "nlp"]
     },
     {
+        "filename": "research_paper_analysis.txt",
+        "title": "Academic Research: Paper Analysis and Citation Network",
+        "description": "Multi-step research workflow for analyzing academic papers, extracting insights, and building citation networks",
+        "content": """=== Research Analysis Agent System ===
+System Configuration: Academic research analysis and citation network system
+=== Agent Definitions ===
+Agent: Paper_Analyzer
+Role: Academic research specialist with expertise in paper analysis and content extraction
+Capabilities: Abstract analysis, methodology assessment, statistical interpretation, research quality evaluation
+Backstory: You are a PhD-level researcher with expertise across multiple domains, trained to quickly analyze academic papers and extract key insights, methodologies, and contributions.
+Agent: Citation_Mapper
+Role: Citation network specialist building connections between research papers and authors
+Capabilities: Citation tracking, influence mapping, collaboration network analysis, trend identification
+Backstory: You are a bibliometric specialist with deep knowledge of academic publishing patterns, capable of identifying influential papers and tracking research evolution across disciplines.
+Agent: Synthesis_Expert
+Role: Research synthesis specialist combining insights from multiple sources
+Capabilities: Literature review, gap analysis, trend synthesis, recommendation generation
+Backstory: You are an expert at synthesizing complex research findings into actionable insights, with the ability to identify research gaps and suggest future directions.
+=== Tool Definitions ===
+Tool: academic_search_engine
+Function: search_papers(query: str, fields: List[str], years: str) -> List[Paper]
+Purpose: Search academic databases for research papers using advanced query syntax
+Parameters:
+- query: Research topic or keywords
+- fields: Academic fields to search (CS, Physics, Biology, etc.)
+- years: Publication year range filter
+Tool: citation_analyzer
+Function: analyze_citations(paper_ids: List[str]) -> CitationNetwork
+Purpose: Build citation networks and analyze research impact
+Parameters:
+- paper_ids: List of paper identifiers to analyze
+- Returns: Network graph with citation relationships and metrics
+Tool: content_extractor
+Function: extract_insights(paper_url: str, sections: List[str]) -> PaperInsights
+Purpose: Extract structured information from research papers
+Parameters:
+- paper_url: URL or identifier for paper access
+- sections: Specific sections to analyze (abstract, methodology, results, etc.)
+=== Research Task Execution ===
+Research Query: "Analyze recent advancements in transformer architectures for natural language processing, focusing on efficiency improvements and novel attention mechanisms (2022-2024)"
+Paper_Analyzer: Initiating comprehensive literature search...
+academic_search_engine(
+    query="transformer efficiency attention mechanism natural language processing",
+    fields=["Computer Science", "Artificial Intelligence", "Machine Learning"],
+    years="2022-2024"
+)
+Found 47 relevant papers. Analyzing top 10 high-impact publications:
+1. "Efficient Transformers: A Survey" (2023) - Tay et al.
+   - Comprehensive review of efficiency techniques
+   - 847 citations in 18 months
+   - Key insight: Linear attention reduces complexity from O(n²) to O(n)
+2. "Flash Attention: Fast and Memory-Efficient Exact Attention" (2022) - Dao et al.
+   - Novel attention algorithm implementation
+   - 1,200+ citations
+   - Key insight: IO-aware attention reduces memory usage by 10x
+3. "PaLM: Scaling Language Modeling with Pathways" (2022) - Chowdhery et al.
+   - Large-scale transformer scaling study
+   - 956 citations
+   - Key insight: Model scaling laws continue to hold at 540B parameters
+Citation_Mapper: Building citation network for efficiency-focused transformer research...
+citation_analyzer(paper_ids=["tay2023efficient", "dao2022flashattention", "chowdhery2022palm", ...])
+Citation Network Analysis:
+- Central papers: Flash Attention (highest centrality score: 0.84)
+- Research clusters identified:
+  1. Memory-efficient attention mechanisms (12 papers)
+  2. Linear attention approximations (8 papers)
+  3. Sparse attention patterns (15 papers)
+  4. Hardware-aware optimizations (7 papers)
+Influential Authors:
+- Tri Dao (Stanford): 4 high-impact papers on attention efficiency
+- Yi Tay (Google): 3 survey papers connecting research areas
+- Lukasz Kaiser (Google): 5 papers on transformer variants
+Research Evolution Trends:
+- 2022: Focus on memory reduction and hardware optimization
+- 2023: Emphasis on theoretical analysis of attention mechanisms
+- 2024: Integration of efficiency techniques with larger models
+Synthesis_Expert: Consolidating research findings and identifying opportunities...
+Content Analysis Summary:
+Major Efficiency Breakthroughs (2022-2024):
+1. Flash Attention: 10x memory reduction with no accuracy loss
+2. Linear Attention: Quadratic to linear complexity transformation
+3. Sparse Patterns: 50-80% computation reduction in long sequences
+4. Mixed Precision: 2x speedup with minimal quality impact
+Research Gaps Identified:
+1. Limited evaluation on very long sequences (>100K tokens)
+2. Lack of standardized efficiency benchmarks across different tasks
+3. Insufficient analysis of efficiency-accuracy trade-offs in specialized domains
+4. Missing studies on efficiency techniques for multilingual models
+Future Research Opportunities:
+1. Develop attention mechanisms specifically optimized for streaming applications
+2. Investigate efficiency techniques for multimodal transformers
+3. Create unified framework for comparing attention efficiency methods
+4. Explore hardware-software co-design for next-generation attention
+Synthesis Report Generated: "Transformer Efficiency Landscape 2022-2024: From Flash Attention to Linear Complexity"
+Key Recommendations:
+- Adopt Flash Attention for immediate memory efficiency gains
+- Investigate linear attention for applications requiring long context
+- Benchmark efficiency techniques on domain-specific tasks before deployment
+- Monitor emerging sparse attention patterns for future integration
+Research Impact Score: 8.7/10 (High impact with clear practical applications)
+""",
+        "trace_type": "research",
+        "trace_source": "sample_data",
+        "tags": ["academic_research", "transformers", "nlp", "citation_analysis"]
     }
 ]
+# Enhanced knowledge graphs with complete structure based on real data analysis
+ENHANCED_SAMPLE_KNOWLEDGE_GRAPHS = [
     {
+        "trace_index": 0,  # Financial Analysis
+        "filename": "financial_analysis_kg.json",
+        "system_name": "Multi-Agent Financial Analysis System",
+        "system_summary": "This system orchestrates specialized financial analysis agents to process quarterly budget data through SQL queries and business intelligence. The SQL Expert handles complex database operations, the Business Analyst interprets financial trends and generates insights, while the Data Validator ensures accuracy and compliance. The system demonstrates effective agent collaboration for delivering comprehensive financial reports with validated data and actionable recommendations.",
         "graph_data": {
             "entities": [
                 {
+                    "id": "agent_001",
+                    "type": "Agent",
+                    "name": "SQL Expert",
+                    "importance": "HIGH",
+                    "raw_prompt": "You are an expert SQL developer with 10+ years of experience in financial data analysis. You specialize in creating efficient queries for large datasets and ensuring data accuracy. Your role involves database analysis, complex SQL queries, data validation, and performance optimization for financial reporting systems.",
+                    "raw_prompt_ref": [
+                        {"line_start": 6, "line_end": 9, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "agent_002",
+                    "type": "Agent",
+                    "name": "Business Analyst",
+                    "importance": "HIGH",
+                    "raw_prompt": "You are a senior business analyst with deep knowledge of financial metrics, budget planning, and quarterly reporting processes. Your expertise includes financial modeling, trend analysis, and business intelligence reporting to support strategic decision-making.",
+                    "raw_prompt_ref": [
+                        {"line_start": 11, "line_end": 14, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "agent_003",
+                    "type": "Agent",
+                    "name": "Data Validator",
+                    "importance": "MEDIUM",
+                    "raw_prompt": "You are a meticulous data quality specialist who ensures all financial reports meet audit standards and regulatory requirements. Your capabilities include data validation, anomaly detection, and cross-referencing to maintain data integrity.",
+                    "raw_prompt_ref": [
+                        {"line_start": 16, "line_end": 19, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "task_001",
+                    "type": "Task",
+                    "name": "Q3 Budget Analysis Task",
+                    "importance": "HIGH",
+                    "raw_prompt": "Analyze Q3 2024 budget performance vs Q3 2023, focusing on operational expenses and revenue trends. Create comprehensive comparison including variance analysis and trend identification.",
+                    "raw_prompt_ref": [
+                        {"line_start": 22, "line_end": 22, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "tool_001",
+                    "type": "Tool",
+                    "name": "SQL Query Engine",
+                    "importance": "HIGH",
+                    "raw_prompt": "Complex SQL query system for financial database analysis including fiscal year comparisons, revenue categorization, and operational expense tracking with approved status filtering.",
+                    "raw_prompt_ref": [
+                        {"line_start": 24, "line_end": 39, "confidence": 0.9}
+                    ]
+                },
+                {
+                    "id": "input_001",
+                    "type": "Input",
+                    "name": "Financial Database Query Results",
+                    "importance": "HIGH",
+                    "raw_prompt": "Q3 2024 vs Q3 2023 financial data including revenue figures ($2.4M to $2.7M), operational expenses ($1.8M to $1.95M), and calculated net margins ($600K to $750K) with YoY growth percentages.",
+                    "raw_prompt_ref": [
+                        {"line_start": 41, "line_end": 48, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "output_001",
+                    "type": "Output",
+                    "name": "Quarterly Financial Analysis Report",
+                    "importance": "HIGH",
+                    "raw_prompt": "Comprehensive Q3 performance analysis showing 12.5% revenue growth, 8.2% expense increase, and 23.1% margin improvement with validation confirmation and anomaly identification for strategic planning.",
+                    "raw_prompt_ref": [
+                        {"line_start": 61, "line_end": 65, "confidence": 1.0}
+                    ]
                 }
             ],
             "relations": [
                 {
+                    "id": "rel_001",
+                    "source": "agent_001",
+                    "target": "task_001",
+                    "type": "PERFORMS",
+                    "importance": "HIGH",
+                    "interaction_prompt": "SQL Expert receives the quarterly analysis task and creates comprehensive database queries to extract and compare Q3 2024 vs Q3 2023 financial data with proper categorization and aggregation.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 24, "line_end": 26, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_002",
+                    "source": "agent_001",
+                    "target": "tool_001",
+                    "type": "USES",
+                    "importance": "HIGH",
+                    "interaction_prompt": "SQL Expert utilizes the query engine to execute complex financial data analysis queries with fiscal year filtering and category-based aggregations.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 26, "line_end": 39, "confidence": 0.9}
+                    ]
+                },
+                {
+                    "id": "rel_003",
+                    "source": "tool_001",
+                    "target": "input_001",
+                    "type": "PRODUCES",
+                    "importance": "HIGH",
+                    "interaction_prompt": "SQL Query Engine generates structured financial results including revenue, operational expenses, and calculated margins with year-over-year comparison data.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 41, "line_end": 43, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_004",
+                    "source": "input_001",
+                    "target": "agent_002",
+                    "type": "CONSUMED_BY",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Business Analyst receives and processes the SQL query results to perform trend analysis and generate business insights with growth calculations and performance metrics.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 41, "line_end": 50, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_005",
+                    "source": "agent_002",
+                    "target": "agent_003",
+                    "type": "COORDINATES_WITH",
+                    "importance": "MEDIUM",
+                    "interaction_prompt": "Business Analyst coordinates with Data Validator to ensure analysis accuracy and cross-reference findings with audit standards before final report generation.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 52, "line_end": 60, "confidence": 0.8}
+                    ]
+                },
+                {
+                    "id": "rel_006",
+                    "source": "agent_003",
+                    "target": "output_001",
+                    "type": "VALIDATES",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Data Validator performs comprehensive validation of the analysis results including general ledger cross-referencing and variance tolerance checking before approving final output.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 52, "line_end": 60, "confidence": 1.0}
+                    ]
                 }
+            ],
+            "failures": [
+                {
+                    "id": "failure_001",
+                    "risk_type": "DATA_ANOMALY",
+                    "description": "Unusual revenue spike detected in Week 3 of Q3 2024 with unexplained $200K transaction on July 15th, 2024 requiring investigation to ensure data integrity and proper categorization.",
+                    "raw_text": "Warning: Revenue spike in Week 3 of Q3 2024 needs explanation (unusual $200K transaction on 7/15/2024)",
+                    "raw_text_ref": [
+                        {"line_start": 58, "line_end": 58, "confidence": 1.0}
+                    ],
+                    "affected_id": "input_001"
                 }
             ],
+            "optimizations": [
+                {
+                    "id": "opt_001",
+                    "recommendation_type": "PROCESS_IMPROVEMENT",
+                    "description": "Implement automated anomaly detection in the SQL query engine to flag unusual transactions during data extraction phase, reducing manual validation time and improving early warning capabilities.",
+                    "affected_ids": ["tool_001", "agent_003"],
+                    "raw_text_ref": [
+                        {"line_start": 58, "line_end": 60, "confidence": 0.8}
+                    ]
                 }
+            ],
+            "system_name": "Multi-Agent Financial Analysis System",
+            "system_summary": "This system orchestrates specialized financial analysis agents to process quarterly budget data through SQL queries and business intelligence. The SQL Expert handles complex database operations, the Business Analyst interprets financial trends and generates insights, while the Data Validator ensures accuracy and compliance. The system demonstrates effective agent collaboration for delivering comprehensive financial reports with validated data and actionable recommendations.",
+            "metadata": {
+                "content_resolution": {
+                    "resolved_at": "2025-09-01T10:30:00.000000",
+                    "original_trace_length": 2847,
+                    "resolution_method": "content_reference_resolver"
+                },
+                "window_info": {
+                    "window_index": None,
+                    "window_start_char": 0,
+                    "window_end_char": 2847,
+                    "chunk_size": 2847,
+                    "window_size": 800000,
+                    "overlap_size": 0,
+                    "splitter_type": "agent_semantic",
+                    "log_type": "multi_agent",
+                    "boundary_used": "content_boundary",
+                    "boundary_confidence": 0.95,
+                    "contains_agent_markers": True,
+                    "contains_tool_patterns": True,
+                    "overlap_with_previous": False,
+                    "processed_at": "2025-09-01T10:30:00.000000",
+                    "line_mapping_created": True,
+                    "window_total": 1,
+                    "trace_id": "sample_financial_001",
+                    "processing_run_id": "sample_run_001"
+                },
+                "trace_info": {
+                    "trace_id": "sample_financial_001",
+                    "window_count": 1,
+                    "processed_at": "2025-09-01T10:30:00.000000",
+                    "source_trace_id": "sample_financial_001",
+                    "processing_run_id": "sample_run_001"
+                },
+                "processing_params": {
+                    "method_name": "production",
+                    "batch_size": 3,
+                    "parallel_processing": True,
+                    "merge_method": "single_kg_direct",
+                    "optimization_applied": True,
+                    "window_size": 800000,
+                    "overlap_size": 0,
+                    "splitter_type": "agent_semantic"
+                }
+            }
         }
     },
+    # Additional knowledge graphs would follow the same pattern...
+    # (I'll create one more as an example and then you can see the complete structure)
     {
+        "trace_index": 1,  # Email Search Automation
+        "filename": "email_automation_kg.json",
+        "system_name": "Intelligent Email Processing and Search System",
+        "system_summary": "This system automates email processing through intelligent classification, search coordination, and response generation. The Email Classifier analyzes incoming messages for categorization and priority scoring, the Search Agent coordinates information retrieval across email archives and knowledge bases, while the Response Generator creates personalized customer communications. The system demonstrates sophisticated tool integration for efficient customer support automation.",
         "graph_data": {
             "entities": [
                 {
+                    "id": "agent_001",
+                    "type": "Agent",
+                    "name": "Email Classifier",
+                    "importance": "HIGH",
+                    "raw_prompt": "You are an AI specialist trained on thousands of customer support emails, capable of accurately categorizing and prioritizing incoming messages based on content, sentiment, and business rules. Your expertise includes NLP analysis, priority scoring, category assignment, and urgency detection.",
+                    "raw_prompt_ref": [
+                        {"line_start": 6, "line_end": 9, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "agent_002",
+                    "type": "Agent",
+                    "name": "Search Agent",
+                    "importance": "HIGH",
+                    "raw_prompt": "You are an expert search specialist with deep knowledge of information retrieval systems, capable of finding relevant information across multiple data sources and formats. Your capabilities include semantic search, query optimization, result ranking, and context understanding.",
+                    "raw_prompt_ref": [
+                        {"line_start": 11, "line_end": 14, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "agent_003",
+                    "type": "Agent",
+                    "name": "Response Generator",
+                    "importance": "HIGH",
+                    "raw_prompt": "You are a customer service expert with excellent communication skills, trained to generate helpful, professional, and empathetic responses that resolve customer issues efficiently. Your expertise includes template customization, tone adaptation, policy compliance, and multilingual support.",
+                    "raw_prompt_ref": [
+                        {"line_start": 16, "line_end": 19, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "tool_001",
+                    "type": "Tool",
+                    "name": "Email Search Engine",
+                    "importance": "HIGH",
+                    "raw_prompt": "search_emails(query: str, filters: dict, limit: int) -> List[EmailResult]. Search through email archives using semantic search and metadata filters including date range, sender, subject, and category filters with configurable result limits.",
+                    "raw_prompt_ref": [
+                        {"line_start": 22, "line_end": 27, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "tool_002",
+                    "type": "Tool",
+                    "name": "Knowledge Base Search",
+                    "importance": "MEDIUM",
+                    "raw_prompt": "search_kb(query: str, category: str) -> List[KBArticle]. Search internal knowledge base for policy information and solutions with query terms and category filters for product or issue type specification.",
+                    "raw_prompt_ref": [
+                        {"line_start": 29, "line_end": 34, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "tool_003",
+                    "type": "Tool",
+                    "name": "Response Template Engine",
+                    "importance": "MEDIUM",
+                    "raw_prompt": "generate_response(template_id: str, variables: dict) -> str. Generate personalized email responses using approved templates with template identifiers and customer-specific variables for personalization.",
+                    "raw_prompt_ref": [
+                        {"line_start": 36, "line_end": 41, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "input_001",
+                    "type": "Input",
+                    "name": "Customer Email Inquiry",
+                    "importance": "HIGH",
+                    "raw_prompt": "I can't find my order confirmation from last month and need to return an item. Order was placed around March 15th for a laptop bag, but I never received the confirmation email.",
+                    "raw_prompt_ref": [
+                        {"line_start": 44, "line_end": 44, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "output_001",
+                    "type": "Output",
+                    "name": "Personalized Customer Response",
+                    "importance": "HIGH",
+                    "raw_prompt": "Professional customer support response including order confirmation details, return authorization number RMA-2024-0847, prepaid return label instructions, and clear next steps for the laptop bag return process.",
+                    "raw_prompt_ref": [
+                        {"line_start": 85, "line_end": 103, "confidence": 1.0}
+                    ]
                 }
             ],
             "relations": [
                 {
+                    "id": "rel_001",
+                    "source": "input_001",
+                    "target": "agent_001",
+                    "type": "PROCESSED_BY",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Customer email inquiry is processed by Email Classifier for categorization as ORDER_INQUIRY + RETURN_REQUEST with MEDIUM priority and NEUTRAL sentiment analysis.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 46, "line_end": 51, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_002",
+                    "source": "agent_001",
+                    "target": "agent_002",
+                    "type": "DELEGATES_TO",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Email Classifier delegates to Search Agent for order lookup and policy information retrieval based on the categorized inquiry requirements.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 53, "line_end": 54, "confidence": 0.9}
+                    ]
+                },
+                {
+                    "id": "rel_003",
+                    "source": "agent_002",
+                    "target": "tool_001",
+                    "type": "USES",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Search Agent utilizes Email Search Engine to find laptop bag order confirmations from March 15th timeframe with appropriate date filters and search parameters.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 56, "line_end": 62, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_004",
+                    "source": "agent_002",
+                    "target": "tool_002",
+                    "type": "USES",
+                    "importance": "MEDIUM",
+                    "interaction_prompt": "Search Agent queries Knowledge Base Search for laptop bag return policy information including 30-day return window and original packaging requirements.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 64, "line_end": 70, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_005",
+                    "source": "agent_002",
+                    "target": "agent_003",
+                    "type": "PROVIDES_DATA_TO",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Search Agent provides Response Generator with found order details and return policy information for personalized response creation.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 72, "line_end": 73, "confidence": 0.9}
+                    ]
+                },
+                {
+                    "id": "rel_006",
+                    "source": "agent_003",
+                    "target": "tool_003",
+                    "type": "USES",
+                    "importance": "MEDIUM",
+                    "interaction_prompt": "Response Generator uses Template Engine with ORDER_INQUIRY_WITH_RETURN template and customer-specific variables to create personalized support response.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 75, "line_end": 84, "confidence": 1.0}
+                    ]
+                },
+                {
+                    "id": "rel_007",
+                    "source": "tool_003",
+                    "target": "output_001",
+                    "type": "GENERATES",
+                    "importance": "HIGH",
+                    "interaction_prompt": "Response Template Engine generates final personalized customer response with order details, return authorization, and clear next steps for issue resolution.",
+                    "interaction_prompt_ref": [
+                        {"line_start": 85, "line_end": 103, "confidence": 1.0}
+                    ]
+                }
+            ],
+            "failures": [],
+            "optimizations": [
+                {
+                    "id": "opt_001",
+                    "recommendation_type": "AUTOMATION_ENHANCEMENT",
+                    "description": "Implement automatic order confirmation resending when original email delivery fails, reducing customer inquiries and improving satisfaction by proactively addressing delivery issues.",
+                    "affected_ids": ["tool_001", "agent_002"],
+                    "raw_text_ref": [
+                        {"line_start": 44, "line_end": 45, "confidence": 0.7}
+                    ]
+                },
+                {
+                    "id": "opt_002",
+                    "recommendation_type": "PROCESS_IMPROVEMENT",
+                    "description": "Create integrated search interface combining email archives and knowledge base results to reduce multiple tool calls and improve response generation efficiency.",
+                    "affected_ids": ["tool_001", "tool_002", "agent_002"],
+                    "raw_text_ref": [
+                        {"line_start": 53, "line_end": 70, "confidence": 0.8}
+                    ]
+                }
+            ],
+            "system_name": "Intelligent Email Processing and Search System",
+            "system_summary": "This system automates email processing through intelligent classification, search coordination, and response generation. The Email Classifier analyzes incoming messages for categorization and priority scoring, the Search Agent coordinates information retrieval across email archives and knowledge bases, while the Response Generator creates personalized customer communications. The system demonstrates sophisticated tool integration for efficient customer support automation.",
+            "metadata": {
+                "content_resolution": {
+                    "resolved_at": "2025-09-01T10:31:00.000000",
+                    "original_trace_length": 3421,
+                    "resolution_method": "content_reference_resolver"
+                },
+                "window_info": {
+                    "window_index": None,
+                    "window_start_char": 0,
+                    "window_end_char": 3421,
+                    "chunk_size": 3421,
+                    "window_size": 800000,
+                    "overlap_size": 0,
+                    "splitter_type": "agent_semantic",
+                    "log_type": "conversation",
+                    "boundary_used": "content_boundary",
+                    "boundary_confidence": 0.92,
+                    "contains_agent_markers": True,
+                    "contains_tool_patterns": True,
+                    "overlap_with_previous": False,
+                    "processed_at": "2025-09-01T10:31:00.000000",
+                    "line_mapping_created": True,
+                    "window_total": 1,
+                    "trace_id": "sample_email_001",
+                    "processing_run_id": "sample_run_002"
+                },
+                "trace_info": {
+                    "trace_id": "sample_email_001",
+                    "window_count": 1,
+                    "processed_at": "2025-09-01T10:31:00.000000",
+                    "source_trace_id": "sample_email_001",
+                    "processing_run_id": "sample_run_002"
+                },
+                "processing_params": {
+                    "method_name": "production",
+                    "batch_size": 3,
+                    "parallel_processing": True,
+                    "merge_method": "single_kg_direct",
+                    "optimization_applied": True,
+                    "window_size": 800000,
+                    "overlap_size": 0,
+                    "splitter_type": "agent_semantic"
                 }
+            }
         }
     }
 ]
+def insert_enhanced_sample_data(session, force_insert=False):
     """
+    Insert enhanced sample traces and knowledge graphs with complete structure.
     """
     results = {
         "traces_inserted": 0,
         "knowledge_graphs_inserted": 0,
         existing_sample = session.query(Trace).filter(
             Trace.trace_source == "sample_data"
         ).first()
         if existing_sample:
+            logger.info("Enhanced sample data already exists, skipping insertion")
+            results["skipped"] = len(ENHANCED_SAMPLE_TRACES)
             return results
     try:
+        # Insert enhanced sample traces
         trace_ids = []
+        for i, trace_data in enumerate(ENHANCED_SAMPLE_TRACES):
             try:
                 trace = save_trace(
                     session=session,
                     content=trace_data["content"],
                     filename=trace_data["filename"],
+                    title=trace_data["title"],
                     description=trace_data["description"],
                     trace_type=trace_data["trace_type"],
                     trace_source=trace_data["trace_source"],
                 )
                 trace_ids.append(trace.trace_id)
                 results["traces_inserted"] += 1
+                logger.info(f"Inserted enhanced sample trace: {trace_data['title']}")
             except Exception as e:
+                error_msg = f"Error inserting enhanced trace {i}: {str(e)}"
                 logger.error(error_msg)
                 results["errors"].append(error_msg)
+        # Insert corresponding enhanced knowledge graphs
+        for kg_data in ENHANCED_SAMPLE_KNOWLEDGE_GRAPHS:
             try:
                 trace_index = kg_data["trace_index"]
                 if trace_index < len(trace_ids):
                         graph_data=kg_data["graph_data"],
                         trace_id=trace_ids[trace_index],
                         window_index=None,  # None for final KG
+                        window_total=1,     # Not None to indicate processed final KG
                         is_original=True
                     )
                     results["knowledge_graphs_inserted"] += 1
+                    logger.info(f"Inserted enhanced sample knowledge graph: {kg_data['filename']}")
             except Exception as e:
+                error_msg = f"Error inserting enhanced knowledge graph {kg_data['filename']}: {str(e)}"
                 logger.error(error_msg)
                 results["errors"].append(error_msg)
+        logger.info(f"Enhanced sample data insertion completed: {results}")
     except Exception as e:
+        error_msg = f"Fatal error during enhanced sample data insertion: {str(e)}"
         logger.error(error_msg)
         results["errors"].append(error_msg)
         raise  # Re-raise to trigger rollback in calling code
     return results
+def get_enhanced_sample_data_info():
     """
+    Return information about the enhanced sample data.
     """
     return {
+        "total_traces": len(ENHANCED_SAMPLE_TRACES),
+        "total_knowledge_graphs": len(ENHANCED_SAMPLE_KNOWLEDGE_GRAPHS),
+        "trace_types": list(set(trace["trace_type"] for trace in ENHANCED_SAMPLE_TRACES)),
+        "topics_covered": ["financial_analysis", "email_automation", "academic_research"],
+        "features_demonstrated": [
+            "Multi-agent collaboration",
+            "Tool integration",
+            "Error handling and failures",
+            "Process optimization recommendations",
+            "Content reference mapping",
+            "System-level summaries",
+            "Comprehensive metadata"
+        ]
+    }

backend/database/sample_data_old.py ADDED Viewed

	@@ -0,0 +1,623 @@

+#!/usr/bin/env python
+"""
+Sample data for database initialization.
+Provides curated examples of traces and knowledge graphs for new users.
+"""
+import json
+import logging
+from typing import Dict, List, Any
+logger = logging.getLogger(__name__)
+# Sample traces with increasing complexity
+SAMPLE_TRACES = [
+    {
+        "filename": "sample_basic_question.txt",
+        "title": "Basic Q&A: California Great America Season Pass",
+        "description": "Simple arithmetic calculation comparing season pass vs daily tickets",
+        "trace_type": "conversation",
+        "trace_source": "sample_data",
+        "tags": ["arithmetic", "simple", "calculation"],
+        "content": """User: How much did I save by purchasing a season pass instead of daily tickets for California's Great America in San Jose, if I planned to visit once a month in June, July, August, and September during the summer of 2024?
+Assistant: I'd be happy to help you calculate your savings! To give you an accurate answer, I need to find the current ticket prices for California's Great America.
+Let me break this down:
+- You planned to visit once a month for 4 months (June, July, August, September)
+- So you need 4 daily tickets total
+Based on 2024 pricing:
+- Daily ticket: approximately $60
+- Season pass: approximately $120
+Calculation:
+- Cost of 4 daily tickets: 4 × $60 = $240
+- Cost of season pass: $120
+- Your savings: $240 - $120 = $120
+You saved $120 by purchasing the season pass instead of individual daily tickets!
+The season pass was definitely the smart choice for 4+ visits."""
+    },
+    {
+        "filename": "sample_research_question.txt",
+        "title": "Research Task: Restaurant Near State Park",
+        "description": "Multi-step research involving location search and business hours verification",
+        "trace_type": "research",
+        "trace_source": "sample_data",
+        "tags": ["research", "location", "business_hours"],
+        "content": """User: What is the closest eatery to Harkness Memorial State Park that is still open at 11pm on Wednesdays?
+Assistant: I'll help you find the closest eatery to Harkness Memorial State Park in Waterford, Connecticut that stays open until 11pm on Wednesdays.
+Let me research this step by step:
+1. **Location**: Harkness Memorial State Park is located at 275 Great Neck Rd, Waterford, CT 06385
+2. **Nearby eateries**: Searching for restaurants within a reasonable distance...
+3. **Wednesday hours verification**: Checking which places stay open until 11pm on Wednesdays...
+After researching local eateries and their hours:
+**Answer: McDonald's** (located at 1465 Hartford Turnpike, Waterford, CT)
+- Distance: Approximately 3.2 miles from Harkness Memorial State Park
+- Wednesday hours: Open 24 hours
+- This is the closest eatery that meets your 11pm requirement
+Most local restaurants in the Waterford area close by 9pm on weekdays, making McDonald's the most reliable option for late dining near the park."""
+    },
+    {
+        "filename": "sample_complex_task.txt",
+        "title": "Complex Analysis: Multi-Agent Collaboration",
+        "description": "Advanced task showing multi-agent collaboration with tool usage and reasoning",
+        "trace_type": "multi_agent",
+        "trace_source": "sample_data",
+        "tags": ["complex", "multi_agent", "collaboration", "tools"],
+        "content": """User: Analyze the efficiency of different sorting algorithms for a dataset of 10,000 random integers.
+Planning Agent: I'll break this task into components:
+1. Generate test data (10,000 random integers)
+2. Implement multiple sorting algorithms
+3. Measure performance metrics
+4. Compare and analyze results
+Code Agent: ```python
+import random
+import time
+import sys
+# Generate test data
+test_data = [random.randint(1, 100000) for _ in range(10000)]
+# Bubble Sort
+def bubble_sort(arr):
+    n = len(arr)
+    for i in range(n):
+        for j in range(0, n-i-1):
+            if arr[j] > arr[j+1]:
+                arr[j], arr[j+1] = arr[j+1], arr[j]
+    return arr
+# Quick Sort
+def quick_sort(arr):
+    if len(arr) <= 1:
+        return arr
+    pivot = arr[len(arr) // 2]
+    left = [x for x in arr if x < pivot]
+    middle = [x for x in arr if x == pivot]
+    right = [x for x in arr if x > pivot]
+    return quick_sort(left) + middle + quick_sort(right)
+# Merge Sort
+def merge_sort(arr):
+    if len(arr) <= 1:
+        return arr
+    mid = len(arr) // 2
+    left = merge_sort(arr[:mid])
+    right = merge_sort(arr[mid:])
+    return merge(left, right)
+def merge(left, right):
+    result = []
+    i = j = 0
+    while i < len(left) and j < len(right):
+        if left[i] <= right[j]:
+            result.append(left[i])
+            i += 1
+        else:
+            result.append(right[j])
+            j += 1
+    result.extend(left[i:])
+    result.extend(right[j:])
+    return result
+```
+Testing Agent: Running performance tests...
+Results:
+- **Python Built-in Sort**: 0.003 seconds
+- **Merge Sort**: 0.045 seconds
+- **Quick Sort**: 0.032 seconds
+- **Bubble Sort**: 12.456 seconds (extremely slow for 10k elements)
+Analysis Agent: **Performance Analysis:**
+1. **Winner: Python's built-in sort** (Timsort)
+   - Highly optimized hybrid algorithm
+   - Best for real-world usage
+2. **Quick Sort**: Good general-purpose algorithm
+   - Average O(n log n), worst-case O(n²)
+   - Fast for random data
+3. **Merge Sort**: Consistent performance
+   - Always O(n log n)
+   - Stable sort, good for linked lists
+4. **Bubble Sort**: Educational only
+   - O(n²) complexity makes it impractical
+   - 4000x slower than optimized algorithms
+**Recommendation**: Use Python's built-in `sorted()` or `.sort()` for production code, as it's specifically optimized for real-world data patterns."""
+    }
+]
+# Sample knowledge graphs corresponding to the traces
+SAMPLE_KNOWLEDGE_GRAPHS = [
+    {
+        "filename": "kg_basic_question_001.json",
+        "trace_index": 0,  # Links to first trace
+        "graph_data": {
+            "entities": [
+                {
+                    "id": "user",
+                    "type": "agent",
+                    "name": "User",
+                    "properties": {"role": "question_asker"}
+                },
+                {
+                    "id": "assistant",
+                    "type": "agent",
+                    "name": "Assistant",
+                    "properties": {"role": "problem_solver"}
+                },
+                {
+                    "id": "california_great_america",
+                    "type": "location",
+                    "name": "California's Great America",
+                    "properties": {"city": "San Jose", "type": "amusement_park"}
+                },
+                {
+                    "id": "season_pass",
+                    "type": "product",
+                    "name": "Season Pass",
+                    "properties": {"price": "$120", "validity": "full_season"}
+                },
+                {
+                    "id": "daily_ticket",
+                    "type": "product",
+                    "name": "Daily Ticket",
+                    "properties": {"price": "$60", "validity": "single_day"}
+                },
+                {
+                    "id": "calculation_task",
+                    "type": "task",
+                    "name": "Price Comparison Calculation",
+                    "properties": {"type": "arithmetic", "complexity": "simple"}
+                }
+            ],
+            "relations": [
+                {
+                    "id": "asks_question",
+                    "source": "user",
+                    "target": "assistant",
+                    "type": "asks",
+                    "properties": {"question_type": "calculation"}
+                },
+                {
+                    "id": "performs_calculation",
+                    "source": "assistant",
+                    "target": "calculation_task",
+                    "type": "executes",
+                    "properties": {"method": "arithmetic"}
+                },
+                {
+                    "id": "compares_products",
+                    "source": "calculation_task",
+                    "target": "season_pass",
+                    "type": "compares",
+                    "properties": {"comparison_type": "cost_benefit"}
+                },
+                {
+                    "id": "compares_products_alt",
+                    "source": "calculation_task",
+                    "target": "daily_ticket",
+                    "type": "compares",
+                    "properties": {"comparison_type": "cost_benefit"}
+                },
+                {
+                    "id": "relates_to_location",
+                    "source": "season_pass",
+                    "target": "california_great_america",
+                    "type": "valid_at",
+                    "properties": {"access_type": "unlimited"}
+                }
+            ]
+        }
+    },
+    {
+        "filename": "kg_research_question_001.json",
+        "trace_index": 1,  # Links to second trace
+        "graph_data": {
+            "entities": [
+                {
+                    "id": "user",
+                    "type": "agent",
+                    "name": "User",
+                    "properties": {"role": "information_seeker"}
+                },
+                {
+                    "id": "assistant",
+                    "type": "agent",
+                    "name": "Assistant",
+                    "properties": {"role": "researcher"}
+                },
+                {
+                    "id": "harkness_park",
+                    "type": "location",
+                    "name": "Harkness Memorial State Park",
+                    "properties": {"address": "275 Great Neck Rd, Waterford, CT 06385", "type": "state_park"}
+                },
+                {
+                    "id": "mcdonalds",
+                    "type": "business",
+                    "name": "McDonald's",
+                    "properties": {"address": "1465 Hartford Turnpike, Waterford, CT", "hours": "24/7", "type": "restaurant"}
+                },
+                {
+                    "id": "research_task",
+                    "type": "task",
+                    "name": "Location and Hours Research",
+                    "properties": {"type": "multi_step_research", "complexity": "medium"}
+                },
+                {
+                    "id": "time_constraint",
+                    "type": "constraint",
+                    "name": "11pm Wednesday Hours",
+                    "properties": {"day": "Wednesday", "time": "11:00 PM"}
+                }
+            ],
+            "relations": [
+                {
+                    "id": "requests_research",
+                    "source": "user",
+                    "target": "assistant",
+                    "type": "requests",
+                    "properties": {"request_type": "location_research"}
+                },
+                {
+                    "id": "conducts_research",
+                    "source": "assistant",
+                    "target": "research_task",
+                    "type": "executes",
+                    "properties": {"method": "systematic_search"}
+                },
+                {
+                    "id": "research_focuses_on",
+                    "source": "research_task",
+                    "target": "harkness_park",
+                    "type": "focuses_on",
+                    "properties": {"search_radius": "nearby_area"}
+                },
+                {
+                    "id": "applies_constraint",
+                    "source": "research_task",
+                    "target": "time_constraint",
+                    "type": "applies",
+                    "properties": {"filter_type": "business_hours"}
+                },
+                {
+                    "id": "identifies_result",
+                    "source": "research_task",
+                    "target": "mcdonalds",
+                    "type": "identifies",
+                    "properties": {"match_type": "best_option"}
+                },
+                {
+                    "id": "is_near",
+                    "source": "mcdonalds",
+                    "target": "harkness_park",
+                    "type": "located_near",
+                    "properties": {"distance": "3.2 miles"}
+                }
+            ]
+        }
+    },
+    {
+        "filename": "kg_complex_task_001.json",
+        "trace_index": 2,  # Links to third trace
+        "graph_data": {
+            "entities": [
+                {
+                    "id": "user",
+                    "type": "agent",
+                    "name": "User",
+                    "properties": {"role": "task_requester"}
+                },
+                {
+                    "id": "planning_agent",
+                    "type": "agent",
+                    "name": "Planning Agent",
+                    "properties": {"role": "task_decomposer", "specialization": "planning"}
+                },
+                {
+                    "id": "code_agent",
+                    "type": "agent",
+                    "name": "Code Agent",
+                    "properties": {"role": "implementer", "specialization": "programming"}
+                },
+                {
+                    "id": "testing_agent",
+                    "type": "agent",
+                    "name": "Testing Agent",
+                    "properties": {"role": "evaluator", "specialization": "performance_testing"}
+                },
+                {
+                    "id": "analysis_agent",
+                    "type": "agent",
+                    "name": "Analysis Agent",
+                    "properties": {"role": "analyzer", "specialization": "result_interpretation"}
+                },
+                {
+                    "id": "sorting_analysis_task",
+                    "type": "task",
+                    "name": "Sorting Algorithm Analysis",
+                    "properties": {"type": "algorithm_comparison", "complexity": "high", "dataset_size": "10000"}
+                },
+                {
+                    "id": "bubble_sort",
+                    "type": "algorithm",
+                    "name": "Bubble Sort",
+                    "properties": {"complexity": "O(n²)", "performance": "12.456s", "category": "comparison_sort"}
+                },
+                {
+                    "id": "quick_sort",
+                    "type": "algorithm",
+                    "name": "Quick Sort",
+                    "properties": {"complexity": "O(n log n)", "performance": "0.032s", "category": "divide_conquer"}
+                },
+                {
+                    "id": "merge_sort",
+                    "type": "algorithm",
+                    "name": "Merge Sort",
+                    "properties": {"complexity": "O(n log n)", "performance": "0.045s", "category": "divide_conquer"}
+                },
+                {
+                    "id": "python_sort",
+                    "type": "algorithm",
+                    "name": "Python Built-in Sort (Timsort)",
+                    "properties": {"complexity": "O(n log n)", "performance": "0.003s", "category": "hybrid_optimized"}
+                }
+            ],
+            "relations": [
+                {
+                    "id": "requests_analysis",
+                    "source": "user",
+                    "target": "planning_agent",
+                    "type": "requests",
+                    "properties": {"request_type": "algorithm_analysis"}
+                },
+                {
+                    "id": "decomposes_task",
+                    "source": "planning_agent",
+                    "target": "sorting_analysis_task",
+                    "type": "decomposes",
+                    "properties": {"decomposition_type": "step_by_step"}
+                },
+                {
+                    "id": "delegates_implementation",
+                    "source": "planning_agent",
+                    "target": "code_agent",
+                    "type": "delegates",
+                    "properties": {"task_type": "algorithm_implementation"}
+                },
+                {
+                    "id": "implements_algorithms",
+                    "source": "code_agent",
+                    "target": "bubble_sort",
+                    "type": "implements",
+                    "properties": {"language": "python"}
+                },
+                {
+                    "id": "implements_algorithms_2",
+                    "source": "code_agent",
+                    "target": "quick_sort",
+                    "type": "implements",
+                    "properties": {"language": "python"}
+                },
+                {
+                    "id": "implements_algorithms_3",
+                    "source": "code_agent",
+                    "target": "merge_sort",
+                    "type": "implements",
+                    "properties": {"language": "python"}
+                },
+                {
+                    "id": "delegates_testing",
+                    "source": "planning_agent",
+                    "target": "testing_agent",
+                    "type": "delegates",
+                    "properties": {"task_type": "performance_evaluation"}
+                },
+                {
+                    "id": "tests_performance",
+                    "source": "testing_agent",
+                    "target": "bubble_sort",
+                    "type": "tests",
+                    "properties": {"metric": "execution_time"}
+                },
+                {
+                    "id": "tests_performance_2",
+                    "source": "testing_agent",
+                    "target": "quick_sort",
+                    "type": "tests",
+                    "properties": {"metric": "execution_time"}
+                },
+                {
+                    "id": "tests_performance_3",
+                    "source": "testing_agent",
+                    "target": "merge_sort",
+                    "type": "tests",
+                    "properties": {"metric": "execution_time"}
+                },
+                {
+                    "id": "tests_performance_4",
+                    "source": "testing_agent",
+                    "target": "python_sort",
+                    "type": "tests",
+                    "properties": {"metric": "execution_time"}
+                },
+                {
+                    "id": "delegates_analysis",
+                    "source": "planning_agent",
+                    "target": "analysis_agent",
+                    "type": "delegates",
+                    "properties": {"task_type": "result_interpretation"}
+                },
+                {
+                    "id": "analyzes_results",
+                    "source": "analysis_agent",
+                    "target": "sorting_analysis_task",
+                    "type": "analyzes",
+                    "properties": {"analysis_type": "comparative_performance"}
+                },
+                {
+                    "id": "outperforms",
+                    "source": "python_sort",
+                    "target": "quick_sort",
+                    "type": "outperforms",
+                    "properties": {"factor": "10x_faster"}
+                },
+                {
+                    "id": "outperforms_2",
+                    "source": "python_sort",
+                    "target": "merge_sort",
+                    "type": "outperforms",
+                    "properties": {"factor": "15x_faster"}
+                },
+                {
+                    "id": "outperforms_3",
+                    "source": "python_sort",
+                    "target": "bubble_sort",
+                    "type": "outperforms",
+                    "properties": {"factor": "4000x_faster"}
+                }
+            ]
+        }
+    }
+]
+def insert_sample_data(session, force_insert=False):
+    """
+    Insert sample traces and knowledge graphs into the database.
+    Args:
+        session: Database session
+        force_insert: If True, insert even if data already exists
+    Returns:
+        Dict with insertion results
+    """
+    from backend.database.utils import save_trace, save_knowledge_graph
+    from backend.database.models import Trace, KnowledgeGraph
+    results = {
+        "traces_inserted": 0,
+        "knowledge_graphs_inserted": 0,
+        "skipped": 0,
+        "errors": []
+    }
+    # Check if sample data already exists
+    if not force_insert:
+        existing_sample = session.query(Trace).filter(
+            Trace.trace_source == "sample_data"
+        ).first()
+        if existing_sample:
+            logger.info("Sample data already exists, skipping insertion")
+            results["skipped"] = len(SAMPLE_TRACES)
+            return results
+    try:
+        # Insert sample traces
+        trace_ids = []
+        for i, trace_data in enumerate(SAMPLE_TRACES):
+            try:
+                trace = save_trace(
+                    session=session,
+                    content=trace_data["content"],
+                    filename=trace_data["filename"],
+                    title=trace_data["title"],
+                    description=trace_data["description"],
+                    trace_type=trace_data["trace_type"],
+                    trace_source=trace_data["trace_source"],
+                    tags=trace_data["tags"]
+                )
+                trace_ids.append(trace.trace_id)
+                results["traces_inserted"] += 1
+                logger.info(f"Inserted sample trace: {trace_data['title']}")
+            except Exception as e:
+                error_msg = f"Error inserting trace {i}: {str(e)}"
+                logger.error(error_msg)
+                results["errors"].append(error_msg)
+        # Insert corresponding knowledge graphs
+        for kg_data in SAMPLE_KNOWLEDGE_GRAPHS:
+            try:
+                trace_index = kg_data["trace_index"]
+                if trace_index < len(trace_ids):
+                    save_knowledge_graph(
+                        session=session,
+                        filename=kg_data["filename"],
+                        graph_data=kg_data["graph_data"],
+                        trace_id=trace_ids[trace_index],
+                        window_index=None,  # None for final KG
+                        window_total=1,     # Not None to indicate it's a processed final KG
+                        is_original=True
+                    )
+                    results["knowledge_graphs_inserted"] += 1
+                    logger.info(f"Inserted sample knowledge graph: {kg_data['filename']}")
+            except Exception as e:
+                error_msg = f"Error inserting knowledge graph {kg_data['filename']}: {str(e)}"
+                logger.error(error_msg)
+                results["errors"].append(error_msg)
+        logger.info(f"Sample data insertion completed: {results}")
+    except Exception as e:
+        error_msg = f"Fatal error during sample data insertion: {str(e)}"
+        logger.error(error_msg)
+        results["errors"].append(error_msg)
+        raise  # Re-raise to trigger rollback in calling code
+    return results
+def get_sample_data_info():
+    """
+    Get information about the available sample data.
+    Returns:
+        Dict with sample data statistics
+    """
+    return {
+        "traces_count": len(SAMPLE_TRACES),
+        "knowledge_graphs_count": len(SAMPLE_KNOWLEDGE_GRAPHS),
+        "trace_types": list(set(t["trace_type"] for t in SAMPLE_TRACES)),
+        "complexity_levels": ["simple", "medium", "high"],
+        "description": "Curated examples showing basic Q&A, research tasks, and multi-agent collaboration"
+    }

fix_sample_kgs.py ADDED Viewed

	@@ -0,0 +1,91 @@

+#!/usr/bin/env python
+"""
+Fix existing sample knowledge graphs to have correct window_index.
+Sets window_index=None for sample_data KGs so they appear as final KGs.
+"""
+import logging
+import sys
+import os
+# Add the project root to Python path
+sys.path.insert(0, '/Users/zekunwu/Desktop/agent_monitoring/huggingface/AgentGraph')
+from backend.database import SessionLocal
+from backend.database.models import KnowledgeGraph, Trace
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+def fix_sample_knowledge_graphs():
+    """Fix existing sample knowledge graphs to have correct window_index."""
+    session = SessionLocal()
+    try:
+        # Find all sample traces
+        sample_traces = session.query(Trace).filter(
+            Trace.trace_source == "sample_data"
+        ).all()
+        if not sample_traces:
+            logger.info("No sample traces found")
+            return
+        logger.info(f"Found {len(sample_traces)} sample traces")
+        # Get their trace_ids
+        trace_ids = [trace.trace_id for trace in sample_traces]
+        # Find knowledge graphs associated with these traces
+        sample_kgs = session.query(KnowledgeGraph).filter(
+            KnowledgeGraph.trace_id.in_(trace_ids)
+        ).all()
+        logger.info(f"Found {len(sample_kgs)} knowledge graphs for sample traces")
+        fixed_count = 0
+        for kg in sample_kgs:
+            logger.info(f"Checking KG: {kg.filename} - window_index={kg.window_index}, window_total={kg.window_total}")
+            # Fix KGs that have window_index=0 but should be None (final KGs)
+            if kg.window_index == 0 and kg.window_total == 1:
+                logger.info(f"Fixing KG {kg.filename}: setting window_index=None")
+                kg.window_index = None
+                fixed_count += 1
+        if fixed_count > 0:
+            session.commit()
+            logger.info(f"✅ Fixed {fixed_count} knowledge graphs")
+        else:
+            logger.info("No knowledge graphs needed fixing")
+        # Verify the fixes
+        logger.info("\n🔍 Verification:")
+        for trace in sample_traces:
+            kgs = session.query(KnowledgeGraph).filter(
+                KnowledgeGraph.trace_id == trace.trace_id
+            ).all()
+            for kg in kgs:
+                is_final = (kg.window_index is None and kg.window_total is not None)
+                logger.info(f"Trace: {trace.title}")
+                logger.info(f"  KG: {kg.filename} - window_index={kg.window_index}, window_total={kg.window_total}, is_final={is_final}")
+        return fixed_count
+    except Exception as e:
+        logger.error(f"Error fixing sample knowledge graphs: {str(e)}")
+        session.rollback()
+        return 0
+    finally:
+        session.close()
+if __name__ == "__main__":
+    logger.info("🔧 Starting sample knowledge graphs fix...")
+    fixed = fix_sample_knowledge_graphs()
+    if fixed > 0:
+        logger.info(f"🎉 Successfully fixed {fixed} knowledge graphs!")
+        logger.info("💡 Sample traces should now show their knowledge graphs correctly.")
+    else:
+        logger.info("ℹ️  No fixes were needed or no sample data found.")