Spaces:
Sleeping
Sleeping
Enhanced Sample Data System
Overview
The enhanced sample data system automatically inserts curated examples showcasing AgentGraph's complete feature set into new databases. Instead of starting with an empty system, users immediately see examples of traces and knowledge graphs with failure detection, optimization recommendations, and advanced content referencing capabilities.
Features
π Automatic Insertion
- Triggered when initializing an empty database
- Non-destructive: skips insertion if existing data is found
- Logs all operations for transparency
π― Enhanced Examples
The system includes 2 carefully selected examples showcasing AgentGraph's advanced capabilities:
Python Documentation Assistant (Enhanced)
- Type:
documentation_search - Example: RAG-powered assistant processing programming inquiry with knowledge search and failure detection
- 6 entities, 5 relations, 1 failure, 2 optimizations
- Features: Content references, quality scoring, system summary
- Type:
Simple Q&A Demonstration (Basic)
- Type:
conversation - Example: Basic Python programming concept inquiry
- 4 entities, 4 relations, 0 failures, 1 optimization
- Features: Streamlined structure, clear interaction flow
- Type:
πΈοΈ Enhanced Knowledge Graph Examples
Each trace comes with a pre-generated knowledge graph showcasing AgentGraph's complete feature set:
- Agent interactions and roles with detailed prompts and content references
- Task decomposition with clear importance levels
- Information flow with specific interaction prompts
- RAG-powered knowledge search retrieving relevant documents and context
- Failure detection identifying real issues (spelling errors, system gaps)
- Optimization recommendations providing actionable improvements
- Quality assessment with confidence scores and metadata
- System summaries with natural language descriptions using pronouns
Technical Implementation
Files
backend/database/sample_data.py- Contains sample data and insertion logicbackend/database/init_db.py- Modified to call sample data insertionbackend/database/README_sample_data.md- This documentation
Database Integration
- Insertion happens after table creation in
init_database() - Only triggers when
trace_count == 0(empty database) - Uses existing
save_trace()andsave_knowledge_graph()functions - Full transaction support with rollback on errors
Data Structure
SAMPLE_TRACES = [
{
"filename": "sample_basic_question.txt",
"title": "Basic Q&A: California Great America Season Pass",
"description": "Simple arithmetic calculation...",
"trace_type": "conversation",
"trace_source": "sample_data",
"tags": ["arithmetic", "simple", "calculation"],
"content": "User: ... Assistant: ..."
}
]
SAMPLE_KNOWLEDGE_GRAPHS = [
{
"filename": "kg_basic_question_001.json",
"trace_index": 0, # Links to first trace
"graph_data": {
"entities": [...],
"relations": [...]
}
}
]
Usage
Automatic (Default)
Sample data is inserted automatically when:
- Creating a new database
- Resetting an existing database with
--reset --force - Database has zero traces
Manual Control
from backend.database.sample_data import insert_sample_data, get_sample_data_info
# Get information about available samples
info = get_sample_data_info()
print(f"Available: {info['traces_count']} traces, {info['knowledge_graphs_count']} KGs")
# Manual insertion (with force to override existing data check)
with get_session() as session:
results = insert_sample_data(session, force_insert=True)
print(f"Inserted: {results['traces_inserted']} traces, {results['knowledge_graphs_inserted']} KGs")
Disabling Sample Data
To disable automatic sample data insertion, modify init_db.py:
# Comment out this section in init_database():
# if trace_count == 0:
# # ... sample data insertion code ...
Benefits for Users
- Immediate Value: New users see AgentGraph's complete capabilities immediately
- Learning: Examples demonstrate RAG search, failure detection, optimization suggestions, and advanced features
- Testing: Users can test all AgentGraph features including quality assessment and content referencing
- Reference: Examples serve as high-quality templates showcasing best practices
- Feature Discovery: Users understand the full potential of knowledge graph enhancement
- Quality Standards: Examples demonstrate what production-ready knowledge graphs should contain
Quality Assurance
- All sample traces are realistic and demonstrate real-world scenarios
- Knowledge graphs are hand-crafted to showcase AgentGraph's complete feature set
- Examples include actual failure detection (spelling errors, system gaps)
- RAG search capabilities demonstrate knowledge retrieval workflows
- Optimization recommendations are practical and actionable
- Content references are accurate and support proper traceability
- Quality scores reflect realistic assessment metrics
- Content is appropriate and safe for all audiences
- Regular validation ensures data integrity and feature completeness
Maintenance
To update sample data:
- Modify
SAMPLE_TRACESandSAMPLE_KNOWLEDGE_GRAPHSinsample_data.py - Ensure trace_index links are correct between traces and KGs
- Test with a fresh database initialization
- Update this documentation if needed
Troubleshooting
Sample Data Not Appearing
- Check logs for "Sample data already exists, skipping insertion"
- Verify database is actually empty:
SELECT COUNT(*) FROM traces; - Force insertion manually with
force_insert=True
Insertion Errors
- Check logs for specific error messages
- Verify database schema is up to date
- Ensure all required tables exist
- Check for foreign key constraint issues
Performance Impact
- Sample data insertion adds ~2-3 seconds to database initialization
- Total size: ~4KB of text content + ~15KB of JSON data
- Negligible impact on production systems