AgentGraph / backend /database /README_sample_data.md
wu981526092's picture
๐Ÿ”„ Replace Oxford Economics with Python Documentation Use Case
85ffdc8
|
raw
history blame
6.21 kB
# Enhanced Sample Data System
## Overview
The enhanced sample data system automatically inserts curated examples showcasing AgentGraph's complete feature set into new databases. Instead of starting with an empty system, users immediately see examples of traces and knowledge graphs with failure detection, optimization recommendations, and advanced content referencing capabilities.
## Features
### ๐Ÿ“Š Automatic Insertion
- Triggered when initializing an empty database
- Non-destructive: skips insertion if existing data is found
- Logs all operations for transparency
### ๐ŸŽฏ Enhanced Examples
The system includes 2 carefully selected examples showcasing AgentGraph's advanced capabilities:
1. **Python Documentation Assistant** (Enhanced)
- Type: `documentation_search`
- Example: RAG-powered assistant processing programming inquiry with knowledge search and failure detection
- 6 entities, 5 relations, 1 failure, 2 optimizations
- Features: Content references, quality scoring, system summary
2. **Simple Q&A Demonstration** (Basic)
- Type: `conversation`
- Example: Basic Python programming concept inquiry
- 4 entities, 4 relations, 0 failures, 1 optimization
- Features: Streamlined structure, clear interaction flow
### ๐Ÿ•ธ๏ธ Enhanced Knowledge Graph Examples
Each trace comes with a pre-generated knowledge graph showcasing AgentGraph's complete feature set:
- **Agent interactions and roles** with detailed prompts and content references
- **Task decomposition** with clear importance levels
- **Information flow** with specific interaction prompts
- **RAG-powered knowledge search** retrieving relevant documents and context
- **Failure detection** identifying real issues (spelling errors, system gaps)
- **Optimization recommendations** providing actionable improvements
- **Quality assessment** with confidence scores and metadata
- **System summaries** with natural language descriptions using pronouns
## Technical Implementation
### Files
- `backend/database/sample_data.py` - Contains sample data and insertion logic
- `backend/database/init_db.py` - Modified to call sample data insertion
- `backend/database/README_sample_data.md` - This documentation
### Database Integration
- Insertion happens after table creation in `init_database()`
- Only triggers when `trace_count == 0` (empty database)
- Uses existing `save_trace()` and `save_knowledge_graph()` functions
- Full transaction support with rollback on errors
### Data Structure
```python
SAMPLE_TRACES = [
{
"filename": "sample_basic_question.txt",
"title": "Basic Q&A: California Great America Season Pass",
"description": "Simple arithmetic calculation...",
"trace_type": "conversation",
"trace_source": "sample_data",
"tags": ["arithmetic", "simple", "calculation"],
"content": "User: ... Assistant: ..."
}
]
SAMPLE_KNOWLEDGE_GRAPHS = [
{
"filename": "kg_basic_question_001.json",
"trace_index": 0, # Links to first trace
"graph_data": {
"entities": [...],
"relations": [...]
}
}
]
```
## Usage
### Automatic (Default)
Sample data is inserted automatically when:
- Creating a new database
- Resetting an existing database with `--reset --force`
- Database has zero traces
### Manual Control
```python
from backend.database.sample_data import insert_sample_data, get_sample_data_info
# Get information about available samples
info = get_sample_data_info()
print(f"Available: {info['traces_count']} traces, {info['knowledge_graphs_count']} KGs")
# Manual insertion (with force to override existing data check)
with get_session() as session:
results = insert_sample_data(session, force_insert=True)
print(f"Inserted: {results['traces_inserted']} traces, {results['knowledge_graphs_inserted']} KGs")
```
### Disabling Sample Data
To disable automatic sample data insertion, modify `init_db.py`:
```python
# Comment out this section in init_database():
# if trace_count == 0:
# # ... sample data insertion code ...
```
## Benefits for Users
1. **Immediate Value**: New users see AgentGraph's complete capabilities immediately
2. **Learning**: Examples demonstrate RAG search, failure detection, optimization suggestions, and advanced features
3. **Testing**: Users can test all AgentGraph features including quality assessment and content referencing
4. **Reference**: Examples serve as high-quality templates showcasing best practices
5. **Feature Discovery**: Users understand the full potential of knowledge graph enhancement
6. **Quality Standards**: Examples demonstrate what production-ready knowledge graphs should contain
## Quality Assurance
- All sample traces are realistic and demonstrate real-world scenarios
- Knowledge graphs are hand-crafted to showcase AgentGraph's complete feature set
- Examples include actual failure detection (spelling errors, system gaps)
- RAG search capabilities demonstrate knowledge retrieval workflows
- Optimization recommendations are practical and actionable
- Content references are accurate and support proper traceability
- Quality scores reflect realistic assessment metrics
- Content is appropriate and safe for all audiences
- Regular validation ensures data integrity and feature completeness
## Maintenance
To update sample data:
1. Modify `SAMPLE_TRACES` and `SAMPLE_KNOWLEDGE_GRAPHS` in `sample_data.py`
2. Ensure trace_index links are correct between traces and KGs
3. Test with a fresh database initialization
4. Update this documentation if needed
## Troubleshooting
### Sample Data Not Appearing
- Check logs for "Sample data already exists, skipping insertion"
- Verify database is actually empty: `SELECT COUNT(*) FROM traces;`
- Force insertion manually with `force_insert=True`
### Insertion Errors
- Check logs for specific error messages
- Verify database schema is up to date
- Ensure all required tables exist
- Check for foreign key constraint issues
### Performance Impact
- Sample data insertion adds ~2-3 seconds to database initialization
- Total size: ~4KB of text content + ~15KB of JSON data
- Negligible impact on production systems