AgentGraph / backend /database /README_sample_data.md
wu981526092's picture
πŸ”„ Replace Oxford Economics with Python Documentation Use Case
85ffdc8
|
raw
history blame
6.21 kB

Enhanced Sample Data System

Overview

The enhanced sample data system automatically inserts curated examples showcasing AgentGraph's complete feature set into new databases. Instead of starting with an empty system, users immediately see examples of traces and knowledge graphs with failure detection, optimization recommendations, and advanced content referencing capabilities.

Features

πŸ“Š Automatic Insertion

  • Triggered when initializing an empty database
  • Non-destructive: skips insertion if existing data is found
  • Logs all operations for transparency

🎯 Enhanced Examples

The system includes 2 carefully selected examples showcasing AgentGraph's advanced capabilities:

  1. Python Documentation Assistant (Enhanced)

    • Type: documentation_search
    • Example: RAG-powered assistant processing programming inquiry with knowledge search and failure detection
    • 6 entities, 5 relations, 1 failure, 2 optimizations
    • Features: Content references, quality scoring, system summary
  2. Simple Q&A Demonstration (Basic)

    • Type: conversation
    • Example: Basic Python programming concept inquiry
    • 4 entities, 4 relations, 0 failures, 1 optimization
    • Features: Streamlined structure, clear interaction flow

πŸ•ΈοΈ Enhanced Knowledge Graph Examples

Each trace comes with a pre-generated knowledge graph showcasing AgentGraph's complete feature set:

  • Agent interactions and roles with detailed prompts and content references
  • Task decomposition with clear importance levels
  • Information flow with specific interaction prompts
  • RAG-powered knowledge search retrieving relevant documents and context
  • Failure detection identifying real issues (spelling errors, system gaps)
  • Optimization recommendations providing actionable improvements
  • Quality assessment with confidence scores and metadata
  • System summaries with natural language descriptions using pronouns

Technical Implementation

Files

  • backend/database/sample_data.py - Contains sample data and insertion logic
  • backend/database/init_db.py - Modified to call sample data insertion
  • backend/database/README_sample_data.md - This documentation

Database Integration

  • Insertion happens after table creation in init_database()
  • Only triggers when trace_count == 0 (empty database)
  • Uses existing save_trace() and save_knowledge_graph() functions
  • Full transaction support with rollback on errors

Data Structure

SAMPLE_TRACES = [
    {
        "filename": "sample_basic_question.txt",
        "title": "Basic Q&A: California Great America Season Pass",
        "description": "Simple arithmetic calculation...",
        "trace_type": "conversation",
        "trace_source": "sample_data",
        "tags": ["arithmetic", "simple", "calculation"],
        "content": "User: ... Assistant: ..."
    }
]

SAMPLE_KNOWLEDGE_GRAPHS = [
    {
        "filename": "kg_basic_question_001.json",
        "trace_index": 0,  # Links to first trace
        "graph_data": {
            "entities": [...],
            "relations": [...]
        }
    }
]

Usage

Automatic (Default)

Sample data is inserted automatically when:

  • Creating a new database
  • Resetting an existing database with --reset --force
  • Database has zero traces

Manual Control

from backend.database.sample_data import insert_sample_data, get_sample_data_info

# Get information about available samples
info = get_sample_data_info()
print(f"Available: {info['traces_count']} traces, {info['knowledge_graphs_count']} KGs")

# Manual insertion (with force to override existing data check)
with get_session() as session:
    results = insert_sample_data(session, force_insert=True)
    print(f"Inserted: {results['traces_inserted']} traces, {results['knowledge_graphs_inserted']} KGs")

Disabling Sample Data

To disable automatic sample data insertion, modify init_db.py:

# Comment out this section in init_database():
# if trace_count == 0:
#     # ... sample data insertion code ...

Benefits for Users

  1. Immediate Value: New users see AgentGraph's complete capabilities immediately
  2. Learning: Examples demonstrate RAG search, failure detection, optimization suggestions, and advanced features
  3. Testing: Users can test all AgentGraph features including quality assessment and content referencing
  4. Reference: Examples serve as high-quality templates showcasing best practices
  5. Feature Discovery: Users understand the full potential of knowledge graph enhancement
  6. Quality Standards: Examples demonstrate what production-ready knowledge graphs should contain

Quality Assurance

  • All sample traces are realistic and demonstrate real-world scenarios
  • Knowledge graphs are hand-crafted to showcase AgentGraph's complete feature set
  • Examples include actual failure detection (spelling errors, system gaps)
  • RAG search capabilities demonstrate knowledge retrieval workflows
  • Optimization recommendations are practical and actionable
  • Content references are accurate and support proper traceability
  • Quality scores reflect realistic assessment metrics
  • Content is appropriate and safe for all audiences
  • Regular validation ensures data integrity and feature completeness

Maintenance

To update sample data:

  1. Modify SAMPLE_TRACES and SAMPLE_KNOWLEDGE_GRAPHS in sample_data.py
  2. Ensure trace_index links are correct between traces and KGs
  3. Test with a fresh database initialization
  4. Update this documentation if needed

Troubleshooting

Sample Data Not Appearing

  • Check logs for "Sample data already exists, skipping insertion"
  • Verify database is actually empty: SELECT COUNT(*) FROM traces;
  • Force insertion manually with force_insert=True

Insertion Errors

  • Check logs for specific error messages
  • Verify database schema is up to date
  • Ensure all required tables exist
  • Check for foreign key constraint issues

Performance Impact

  • Sample data insertion adds ~2-3 seconds to database initialization
  • Total size: ~4KB of text content + ~15KB of JSON data
  • Negligible impact on production systems