wu981526092 commited on
Commit
286c429
·
1 Parent(s): cac81df
backend/database/README_sample_data.md DELETED
@@ -1,165 +0,0 @@
1
- # Enhanced Sample Data System
2
-
3
- ## Overview
4
-
5
- The enhanced sample data system automatically inserts curated examples showcasing AgentGraph's complete feature set into new databases. Instead of starting with an empty system, users immediately see examples of traces and knowledge graphs with failure detection, optimization recommendations, and advanced content referencing capabilities.
6
-
7
- ## Features
8
-
9
- ### 📊 Automatic Insertion
10
-
11
- - Triggered when initializing an empty database
12
- - Non-destructive: skips insertion if existing data is found
13
- - Logs all operations for transparency
14
-
15
- ### 🎯 Enhanced Examples
16
-
17
- The system includes a carefully selected example showcasing AgentGraph's advanced capabilities:
18
-
19
- **Python Documentation Assistant** (Comprehensive)
20
-
21
- - Type: `documentation_search`
22
- - Example: RAG-powered assistant processing multi-turn programming inquiry with knowledge search and failure detection
23
- - 6 entities, 5 relations, 1 failure, 2 optimizations
24
- - Features: Multi-step workflow, educational interactions, content references, quality scoring
25
-
26
- ### 🕸️ Enhanced Knowledge Graph Examples
27
-
28
- Each trace comes with a pre-generated knowledge graph showcasing AgentGraph's complete feature set:
29
-
30
- - **Agent interactions and roles** with detailed prompts and content references
31
- - **Task decomposition** with clear importance levels
32
- - **Information flow** with specific interaction prompts
33
- - **RAG-powered knowledge search** retrieving relevant documents and context
34
- - **Failure detection** identifying real issues (spelling errors, system gaps)
35
- - **Optimization recommendations** providing actionable improvements
36
- - **Quality assessment** with confidence scores and metadata
37
- - **System summaries** with natural language descriptions using pronouns
38
-
39
- ## Technical Implementation
40
-
41
- ### Files
42
-
43
- - `backend/database/sample_data.py` - Contains sample data and insertion logic
44
- - `backend/database/init_db.py` - Modified to call sample data insertion
45
- - `backend/database/README_sample_data.md` - This documentation
46
-
47
- ### Database Integration
48
-
49
- - Insertion happens after table creation in `init_database()`
50
- - Only triggers when `trace_count == 0` (empty database)
51
- - Uses existing `save_trace()` and `save_knowledge_graph()` functions
52
- - Full transaction support with rollback on errors
53
-
54
- ### Data Structure
55
-
56
- ```python
57
- SAMPLE_TRACES = [
58
- {
59
- "filename": "sample_basic_question.txt",
60
- "title": "Basic Q&A: California Great America Season Pass",
61
- "description": "Simple arithmetic calculation...",
62
- "trace_type": "conversation",
63
- "trace_source": "sample_data",
64
- "tags": ["arithmetic", "simple", "calculation"],
65
- "content": "User: ... Assistant: ..."
66
- }
67
- ]
68
-
69
- SAMPLE_KNOWLEDGE_GRAPHS = [
70
- {
71
- "filename": "kg_basic_question_001.json",
72
- "trace_index": 0, # Links to first trace
73
- "graph_data": {
74
- "entities": [...],
75
- "relations": [...]
76
- }
77
- }
78
- ]
79
- ```
80
-
81
- ## Usage
82
-
83
- ### Automatic (Default)
84
-
85
- Sample data is inserted automatically when:
86
-
87
- - Creating a new database
88
- - Resetting an existing database with `--reset --force`
89
- - Database has zero traces
90
-
91
- ### Manual Control
92
-
93
- ```python
94
- from backend.database.sample_data import insert_sample_data, get_sample_data_info
95
-
96
- # Get information about available samples
97
- info = get_sample_data_info()
98
- print(f"Available: {info['traces_count']} traces, {info['knowledge_graphs_count']} KGs")
99
-
100
- # Manual insertion (with force to override existing data check)
101
- with get_session() as session:
102
- results = insert_sample_data(session, force_insert=True)
103
- print(f"Inserted: {results['traces_inserted']} traces, {results['knowledge_graphs_inserted']} KGs")
104
- ```
105
-
106
- ### Disabling Sample Data
107
-
108
- To disable automatic sample data insertion, modify `init_db.py`:
109
-
110
- ```python
111
- # Comment out this section in init_database():
112
- # if trace_count == 0:
113
- # # ... sample data insertion code ...
114
- ```
115
-
116
- ## Benefits for Users
117
-
118
- 1. **Immediate Value**: New users see AgentGraph's complete capabilities immediately
119
- 2. **Learning**: Example demonstrates RAG search, failure detection, optimization suggestions, and advanced features
120
- 3. **Testing**: Users can test all AgentGraph features including quality assessment and content referencing
121
- 4. **Reference**: Examples serve as high-quality templates showcasing best practices
122
- 5. **Feature Discovery**: Users understand the full potential of knowledge graph enhancement
123
- 6. **Quality Standards**: Examples demonstrate what production-ready knowledge graphs should contain
124
-
125
- ## Quality Assurance
126
-
127
- - All sample traces are realistic and demonstrate real-world scenarios
128
- - Knowledge graphs are hand-crafted to showcase AgentGraph's complete feature set
129
- - Examples include actual failure detection (spelling errors, system gaps)
130
- - RAG search capabilities demonstrate knowledge retrieval workflows
131
- - Optimization recommendations are practical and actionable
132
- - Content references are accurate and support proper traceability
133
- - Quality scores reflect realistic assessment metrics
134
- - Content is appropriate and safe for all audiences
135
- - Regular validation ensures data integrity and feature completeness
136
-
137
- ## Maintenance
138
-
139
- To update sample data:
140
-
141
- 1. Modify `SAMPLE_TRACES` and `SAMPLE_KNOWLEDGE_GRAPHS` in `sample_data.py`
142
- 2. Ensure trace_index links are correct between trace and KG
143
- 3. Test with a fresh database initialization
144
- 4. Update this documentation if needed
145
-
146
- ## Troubleshooting
147
-
148
- ### Sample Data Not Appearing
149
-
150
- - Check logs for "Sample data already exists, skipping insertion"
151
- - Verify database is actually empty: `SELECT COUNT(*) FROM traces;`
152
- - Force insertion manually with `force_insert=True`
153
-
154
- ### Insertion Errors
155
-
156
- - Check logs for specific error messages
157
- - Verify database schema is up to date
158
- - Ensure all required tables exist
159
- - Check for foreign key constraint issues
160
-
161
- ### Performance Impact
162
-
163
- - Sample data insertion adds ~2-3 seconds to database initialization
164
- - Total size: ~4KB of text content + ~15KB of JSON data
165
- - Negligible impact on production systems
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
backend/database/sample_data.py CHANGED
@@ -1,342 +1,170 @@
1
  #!/usr/bin/env python
2
  """
3
- Sample data for database initialization.
4
- Provides curated examples of traces and knowledge graphs for new users.
5
  """
6
 
7
  import json
8
  import logging
 
 
9
  from typing import Dict, List, Any
10
 
11
  logger = logging.getLogger(__name__)
12
 
13
- # Enhanced sample traces showcasing AgentGraph's full capabilities
14
- SAMPLE_TRACES = [
15
- {
16
- "filename": "python_documentation_inquiry.json",
17
- "title": "Python Documentation Assistant Demo",
18
- "description": "Comprehensive example showing RAG-powered AI assistant handling multi-turn programming inquiry with knowledge search, detailed explanations, code examples, performance analysis, and interactive learning",
19
- "trace_type": "documentation_search",
20
- "trace_source": "sample_data",
21
- "tags": ["programming", "rag_assistant", "documentation", "failure_detection", "optimization"],
22
- "content": """{
23
- "id": "doc_trace_demo_001",
24
- "timestamp": "2025-01-27T00:00:00",
25
- "metadata": {
26
- "source": "AgentGraph_Demo",
27
- "row_index": 0,
28
- "converted_at": "2025-01-27T12:00:00.000000"
29
- },
30
- "data": {
31
- "total_observations": 4,
32
- "summary": "Python documentation inquiry with RAG-powered assistant response including knowledge search, explanation, and follow-up code examples"
33
- },
34
- "observations": [
35
- {
36
- "id": "demo_obs_001",
37
- "type": "user_query",
38
- "timestamp": "2025-01-27T00:00:00",
39
- "input": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
40
- "output": "I'll help you understand Python list comprehensions! Let me search our documentation to give you a comprehensive explanation.",
41
- "metadata": {
42
- "request_date": "2025-01-27T00:00:00",
43
- "interaction_type": "initial_query",
44
- "user_level": "beginner",
45
- "topic_category": "python_syntax"
46
- },
47
- "latency": 0.124
48
- },
49
- {
50
- "id": "demo_obs_002",
51
- "type": "knowledge_search",
52
- "timestamp": "2025-01-27T00:00:05",
53
- "input": "Searching Python documentation for: list comprehensions, syntax, use cases, performance benefits",
54
- "output": "Found relevant documentation: ['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering', 'Python data structures - lists, dictionaries, sets, tuples']",
55
- "metadata": {
56
- "search_query": "python list comprehensions syntax use cases",
57
- "documents_found": 5,
58
- "search_time_ms": 23,
59
- "relevance_score": 0.94
60
- },
61
- "latency": 0.089
62
- },
63
- {
64
- "id": "demo_obs_003",
65
- "type": "inference",
66
- "timestamp": "2025-01-27T00:00:08",
67
- "input": "You are a helpful Python programming assistant designed to answer questions accurately based on the provided documentation. Use the information retrieved from the knowledge base below to respond concisely and correctly.\\n\\n### Retrieved Documents:\\n['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering']\\n\\n### Question:\\nwhat are python list comprehensions used for and when should I use them?\\n\\n### Answer:",
68
- "output": "Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for:\\n\\n**Main Uses:**\\n1. **Creating lists from existing iterables** - Transform elements from sequences\\n2. **Filtering data** - Include only elements that meet specific conditions\\n3. **Mathematical operations** - Apply calculations efficiently\\n4. **Combining operations** - Filter and transform in one line\\n\\n**When to use them:**\\n- When you need to create a new list from an existing iterable\\n- For simple transformations (keep it readable!)\\n- When performance matters (they're faster than equivalent for loops)\\n- To make code more Pythonic and concise\\n\\n**Basic syntax:** `[expression for item in iterable if condition]`",
69
- "metadata": {
70
- "request_date": "2025-01-27T00:00:00",
71
- "databricks_request_id": "demo-request-003",
72
- "request_time": "2025-01-27T12:00:08.000",
73
- "status_code": 200,
74
- "sampling_fraction": 1,
75
- "request_raw": "{\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful Python programming assistant\"},{\"role\":\"user\",\"content\":\"You are a helpful Python programming assistant designed to answer questions accurately based on the provided documentation. Use the information retrieved from the knowledge base below to respond concisely and correctly.\\\\n\\\\n### Retrieved Documents:\\\\n['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering']\\\\n\\\\n### Question:\\\\nwhat are python list comprehensions used for and when should I use them?\\\\n\\\\n### Answer:\"}]}",
76
- "response_raw": "{\"choices\":[{\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}},\"finish_reason\":\"stop\",\"index\":0,\"logprobs\":null,\"message\":{\"content\":\"Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for:\\\\n\\\\n**Main Uses:**\\\\n1. **Creating lists from existing iterables** - Transform elements from sequences\\\\n2. **Filtering data** - Include only elements that meet specific conditions\\\\n3. **Mathematical operations** - Apply calculations efficiently\\\\n4. **Combining operations** - Filter and transform in one line\\\\n\\\\n**When to use them:**\\\\n- When you need to create a new list from an existing iterable\\\\n- For simple transformations (keep it readable!)\\\\n- When performance matters (they're faster than equivalent for loops)\\\\n- To make code more Pythonic and concise\\\\n\\\\n**Basic syntax:** `[expression for item in iterable if condition]`\",\"refusal\":null,\"role\":\"assistant\"}}],\"created\":1737979208,\"id\":\"chatcmpl-DEMO003\",\"model\":\"gpt-4o-2024-11-20\",\"object\":\"chat.completion\",\"system_fingerprint\":\"fp_demo_003\",\"usage\":{\"completion_tokens\":156,\"completion_tokens_details\":{\"accepted_prediction_tokens\":0,\"audio_tokens\":0,\"reasoning_tokens\":0,\"rejected_prediction_tokens\":0},\"prompt_tokens\":184,\"prompt_tokens_details\":{\"audio_tokens\":0,\"cached_tokens\":0},\"total_tokens\":340}}",
77
- "served_entity_id": "demo-entity-003",
78
- "logging_error_codes": "[]",
79
- "requester": "demo-user-001"
80
- },
81
- "latency": 1.247,
82
- "usage": {
83
- "completion_tokens": 156,
84
- "completion_tokens_details": {
85
- "accepted_prediction_tokens": 0,
86
- "audio_tokens": 0,
87
- "reasoning_tokens": 0,
88
- "rejected_prediction_tokens": 0
89
- },
90
- "prompt_tokens": 184,
91
- "prompt_tokens_details": {
92
- "audio_tokens": 0,
93
- "cached_tokens": 0
94
- },
95
- "total_tokens": 340
96
- },
97
- "model": "gpt-4o-2024-11-20"
98
- },
99
- {
100
- "id": "demo_obs_004",
101
- "type": "follow_up",
102
- "timestamp": "2025-01-27T00:00:15",
103
- "input": "That's really helpful! Could you show me some practical examples? I'd love to see the difference between a regular for loop and a list comprehension.",
104
- "output": "Absolutely! Here are some practical examples:\\n\\n**Example 1: Squaring numbers**\\n```python\\n# Traditional for loop\\nsquares = []\\nfor x in range(5):\\n squares.append(x**2)\\n# Result: [0, 1, 4, 9, 16]\\n\\n# List comprehension\\nsquares = [x**2 for x in range(5)]\\n# Same result: [0, 1, 4, 9, 16]\\n```\\n\\n**Example 2: Filtering even numbers**\\n```python\\n# Traditional approach\\nevens = []\\nfor x in range(10):\\n if x % 2 == 0:\\n evens.append(x)\\n\\n# List comprehension\\nevens = [x for x in range(10) if x % 2 == 0]\\n# Result: [0, 2, 4, 6, 8]\\n```\\n\\n**Example 3: String processing**\\n```python\\nwords = ['hello', 'world', 'python']\\n# Capitalize and get lengths\\nlengths = [len(word.upper()) for word in words]\\n# Result: [5, 5, 6]\\n```\\n\\nList comprehensions are not only more concise but also typically 20-30% faster than equivalent for loops!",
105
- "metadata": {
106
- "interaction_type": "follow_up_examples",
107
- "code_examples_provided": 3,
108
- "response_time": "2025-01-27T12:00:15.000"
109
- },
110
- "latency": 0.892,
111
- "usage": {
112
- "completion_tokens": 287,
113
- "total_tokens": 445
114
- }
115
- }
116
- ]
117
- }"""
118
- }
119
- ]
120
 
121
- # Enhanced knowledge graphs demonstrating AgentGraph's complete feature set
122
- SAMPLE_KNOWLEDGE_GRAPHS = [
123
- {
124
- "filename": "kg_python_documentation_enhanced.json",
125
- "trace_index": 0, # Links to first trace
126
- "graph_data": {
127
- "system_name": "Python Documentation Assistant",
128
- "system_summary": "This intelligent assistant processes user inquiries about Python programming through a comprehensive multi-step workflow. When users submit questions, the agent performs knowledge search, delivers detailed explanations with code examples, and engages in follow-up interactions to ensure thorough understanding of Python concepts, syntax, and performance considerations.",
129
- "entities": [
130
- {
131
- "id": "agent_001",
132
- "type": "Agent",
133
- "name": "Python Documentation Agent",
134
- "importance": "HIGH",
135
- "raw_prompt": "You are a helpful Python programming assistant designed to answer questions accurately based on retrieved documentation context. Use the search results to provide precise responses.",
136
- "raw_prompt_ref": [
137
- {
138
- "line_start": 31,
139
- "line_end": 32
140
- }
141
- ]
142
- },
143
- {
144
- "id": "task_001",
145
- "type": "Task",
146
- "name": "Programming Question Processing",
147
- "importance": "HIGH",
148
- "raw_prompt": "Process user inquiry about Python programming and generate an accurate, contextual response based on available documentation and programming best practices.",
149
- "raw_prompt_ref": [
150
- {
151
- "line_start": 26,
152
- "line_end": 28
153
- }
154
- ]
155
- },
156
- {
157
- "id": "input_001",
158
- "type": "Input",
159
- "name": "User Programming Query",
160
- "importance": "HIGH",
161
- "raw_prompt": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
162
- "raw_prompt_ref": [
163
- {
164
- "line_start": 19,
165
- "line_end": 19
166
- }
167
- ]
168
- },
169
- {
170
- "id": "output_001",
171
- "type": "Output",
172
- "name": "Programming Concept Explanation",
173
- "importance": "HIGH",
174
- "raw_prompt": "Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for: Main Uses: 1. Creating lists from existing iterables, 2. Filtering data, 3. Mathematical operations, 4. Combining operations. When to use them: For simple transformations, when performance matters, to make code more Pythonic and concise.",
175
- "raw_prompt_ref": [
176
- {
177
- "line_start": 20,
178
- "line_end": 20
179
- }
180
- ]
181
- },
182
- {
183
- "id": "human_001",
184
- "type": "Human",
185
- "name": "Python Developer",
186
- "importance": "MEDIUM",
187
- "raw_prompt": "Developer seeking Python programming guidance and documentation",
188
- "raw_prompt_ref": [
189
- {
190
- "line_start": 31,
191
- "line_end": 31
192
- }
193
- ]
194
- },
195
- {
196
- "id": "tool_001",
197
- "type": "Tool",
198
- "name": "Python Documentation Search",
199
- "importance": "HIGH",
200
- "raw_prompt": "Retrieval-Augmented Generation (RAG) system that searches Python documentation knowledge base for relevant concepts, syntax examples, and best practices to provide contextual information.",
201
- "raw_prompt_ref": [
202
- {
203
- "line_start": 49,
204
- "line_end": 49
205
- }
206
- ]
207
- }
208
- ],
209
- "relations": [
210
- {
211
- "id": "rel_001",
212
- "source": "input_001",
213
- "target": "agent_001",
214
- "type": "CONSUMED_BY",
215
- "importance": "HIGH",
216
- "interaction_prompt": "Extended user inquiry about Python list comprehensions received and processed through multi-step RAG workflow",
217
- "interaction_prompt_ref": [
218
- {
219
- "line_start": 19,
220
- "line_end": 19
221
- }
222
- ]
223
- },
224
- {
225
- "id": "rel_002",
226
- "source": "agent_001",
227
- "target": "task_001",
228
- "type": "PERFORMS",
229
- "importance": "HIGH",
230
- "interaction_prompt": "Agent executes comprehensive programming question processing including knowledge search, explanation, and code examples",
231
- "interaction_prompt_ref": [
232
- {
233
- "line_start": 26,
234
- "line_end": 28
235
- }
236
- ]
237
- },
238
- {
239
- "id": "rel_003",
240
- "source": "task_001",
241
- "target": "output_001",
242
- "type": "PRODUCES",
243
- "importance": "HIGH",
244
- "interaction_prompt": "Processing task generates detailed multi-part explanation with examples, performance analysis, and interactive follow-ups",
245
- "interaction_prompt_ref": [
246
- {
247
- "line_start": 20,
248
- "line_end": 20
249
- }
250
- ]
251
- },
252
- {
253
- "id": "rel_004",
254
- "source": "output_001",
255
- "target": "human_001",
256
- "type": "DELIVERS_TO",
257
- "importance": "HIGH",
258
- "interaction_prompt": "Comprehensive programming tutorial with examples and performance insights delivered to developer",
259
- "interaction_prompt_ref": [
260
- {
261
- "line_start": 20,
262
- "line_end": 20
263
- }
264
- ]
265
- },
266
- {
267
- "id": "rel_005",
268
- "source": "agent_001",
269
- "target": "tool_001",
270
- "type": "USES",
271
- "importance": "HIGH",
272
- "interaction_prompt": "Agent performs multi-step knowledge search retrieving documentation, examples, and performance comparisons for comprehensive response",
273
- "interaction_prompt_ref": [
274
- {
275
- "line_start": 49,
276
- "line_end": 49
277
- }
278
- ]
279
- }
280
- ],
281
- "failures": [
282
- {
283
- "id": "failure_001",
284
- "risk_type": "HALLUCINATION",
285
- "description": "Initial query could benefit from more specific learning objectives, though the multi-turn interaction successfully addressed this through follow-up questions.",
286
- "raw_text": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
287
- "raw_text_ref": [
288
- {
289
- "line_start": 19,
290
- "line_end": 19
291
- }
292
- ],
293
- "affected_id": "input_001"
294
  }
295
- ],
296
- "optimizations": [
297
- {
298
- "id": "opt_001",
299
- "recommendation_type": "PROMPT_REFINEMENT",
300
- "description": "Enhance initial query processing to identify learning level and tailor explanations accordingly. The current multi-turn approach works well but could be optimized with upfront user profiling.",
301
- "affected_ids": ["agent_001"],
302
- "raw_text_ref": [
303
- {
304
- "line_start": 31,
305
- "line_end": 32
306
- }
307
- ]
308
- },
309
- {
310
- "id": "opt_002",
311
- "recommendation_type": "TOOL_ENHANCEMENT",
312
- "description": "Integrate real-time code execution environment for testing examples, and expand knowledge base to include performance benchmarks and best practice recommendations.",
313
- "affected_ids": ["tool_001"],
314
- "raw_text_ref": [
315
- {
316
- "line_start": 49,
317
- "line_end": 49
318
- }
319
- ]
320
  }
321
- ],
322
- "metadata": {
323
- "creation_timestamp": "2025-01-27T12:00:00Z",
324
- "schema_version": "2.1.0",
325
- "quality_score": 0.89,
326
- "entity_count": 6,
327
- "relation_count": 5,
328
- "failure_count": 1,
329
- "optimization_count": 2,
330
- "interaction_depth": "multi_turn",
331
- "educational_value": "high",
332
- "processing_method": "production_enhanced",
333
- "content_source": "documentation_trace",
334
- "language": "en",
335
- "domain": "programming_documentation"
336
- }
 
 
 
 
 
 
 
 
 
337
  }
338
- }
339
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
340
 
341
 
342
  def insert_sample_data(session, force_insert=False):
@@ -360,6 +188,10 @@ def insert_sample_data(session, force_insert=False):
360
  "errors": []
361
  }
362
 
 
 
 
 
363
  # Check if sample data already exists
364
  if not force_insert:
365
  existing_sample = session.query(Trace).filter(
@@ -368,13 +200,13 @@ def insert_sample_data(session, force_insert=False):
368
 
369
  if existing_sample:
370
  logger.info("Sample data already exists, skipping insertion")
371
- results["skipped"] = len(SAMPLE_TRACES)
372
  return results
373
 
374
  try:
375
  # Insert sample traces
376
  trace_ids = []
377
- for i, trace_data in enumerate(SAMPLE_TRACES):
378
  try:
379
  trace = save_trace(
380
  session=session,
@@ -395,7 +227,7 @@ def insert_sample_data(session, force_insert=False):
395
  results["errors"].append(error_msg)
396
 
397
  # Insert corresponding knowledge graphs
398
- for kg_data in SAMPLE_KNOWLEDGE_GRAPHS:
399
  try:
400
  trace_index = kg_data["trace_index"]
401
  if trace_index < len(trace_ids):
@@ -433,11 +265,51 @@ def get_sample_data_info():
433
  Returns:
434
  Dict with sample data statistics
435
  """
436
- return {
437
- "traces_count": len(SAMPLE_TRACES),
438
- "knowledge_graphs_count": len(SAMPLE_KNOWLEDGE_GRAPHS),
439
- "trace_types": list(set(t["trace_type"] for t in SAMPLE_TRACES)),
440
- "complexity_levels": ["enhanced", "simple"],
441
- "features": ["rag_search", "failure_detection", "optimization_recommendations", "content_references", "quality_scoring"],
442
- "description": "Comprehensive AgentGraph example showcasing Python Documentation RAG-powered assistant with multi-turn interactions, detailed knowledge search, educational content delivery, failure detection, optimization suggestions, and advanced knowledge graph features"
443
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  #!/usr/bin/env python
2
  """
3
+ Sample data loader for database initialization.
4
+ Loads curated examples of traces and knowledge graphs from JSON files for new users.
5
  """
6
 
7
  import json
8
  import logging
9
+ import os
10
+ from pathlib import Path
11
  from typing import Dict, List, Any
12
 
13
  logger = logging.getLogger(__name__)
14
 
15
+ # Get the directory where this file is located
16
+ CURRENT_DIR = Path(__file__).parent
17
+ SAMPLES_DIR = CURRENT_DIR / "samples"
18
+ CONFIG_FILE = SAMPLES_DIR / "samples_config.json"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+
21
+ class SampleDataLoader:
22
+ """Loads sample data from JSON files."""
23
+
24
+ def __init__(self):
25
+ self._config = None
26
+ self._traces = None
27
+ self._knowledge_graphs = None
28
+
29
+ def _load_config(self) -> Dict[str, Any]:
30
+ """Load the samples configuration."""
31
+ if self._config is None:
32
+ try:
33
+ with open(CONFIG_FILE, 'r', encoding='utf-8') as f:
34
+ self._config = json.load(f)
35
+ logger.info(f"Loaded sample data configuration from {CONFIG_FILE}")
36
+ except FileNotFoundError:
37
+ logger.error(f"Configuration file not found: {CONFIG_FILE}")
38
+ raise
39
+ except json.JSONDecodeError as e:
40
+ logger.error(f"Invalid JSON in configuration file: {e}")
41
+ raise
42
+ return self._config
43
+
44
+ def _load_trace(self, trace_file: str) -> Dict[str, Any]:
45
+ """Load a single trace from JSON file."""
46
+ trace_path = SAMPLES_DIR / trace_file
47
+ try:
48
+ with open(trace_path, 'r', encoding='utf-8') as f:
49
+ return json.load(f)
50
+ except FileNotFoundError:
51
+ logger.error(f"Trace file not found: {trace_path}")
52
+ raise
53
+ except json.JSONDecodeError as e:
54
+ logger.error(f"Invalid JSON in trace file {trace_path}: {e}")
55
+ raise
56
+
57
+ def _load_knowledge_graph(self, kg_file: str) -> Dict[str, Any]:
58
+ """Load a single knowledge graph from JSON file."""
59
+ kg_path = SAMPLES_DIR / kg_file
60
+ try:
61
+ with open(kg_path, 'r', encoding='utf-8') as f:
62
+ return json.load(f)
63
+ except FileNotFoundError:
64
+ logger.error(f"Knowledge graph file not found: {kg_path}")
65
+ raise
66
+ except json.JSONDecodeError as e:
67
+ logger.error(f"Invalid JSON in knowledge graph file {kg_path}: {e}")
68
+ raise
69
+
70
+ def get_traces(self) -> List[Dict[str, Any]]:
71
+ """Get all sample traces in the expected format."""
72
+ if self._traces is None:
73
+ config = self._load_config()
74
+ self._traces = []
75
+
76
+ for sample in config["samples"]:
77
+ # Load the trace data
78
+ trace_data = self._load_trace(sample["trace_file"])
79
+
80
+ # Convert to the expected format
81
+ trace_entry = {
82
+ "filename": sample["name"].replace(" ", "_").lower() + ".json",
83
+ "title": sample["name"],
84
+ "description": sample["description"],
85
+ "trace_type": sample["trace_type"],
86
+ "trace_source": sample["trace_source"],
87
+ "tags": sample["tags"],
88
+ "content": json.dumps(trace_data["content"]) # Convert content back to JSON string
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  }
90
+ self._traces.append(trace_entry)
91
+
92
+ logger.info(f"Loaded {len(self._traces)} sample traces")
93
+
94
+ return self._traces
95
+
96
+ def get_knowledge_graphs(self) -> List[Dict[str, Any]]:
97
+ """Get all sample knowledge graphs in the expected format."""
98
+ if self._knowledge_graphs is None:
99
+ config = self._load_config()
100
+ self._knowledge_graphs = []
101
+
102
+ for i, sample in enumerate(config["samples"]):
103
+ # Load the knowledge graph data
104
+ kg_data = self._load_knowledge_graph(sample["knowledge_graph_file"])
105
+
106
+ # Convert to the expected format
107
+ kg_entry = {
108
+ "filename": sample["knowledge_graph_file"].split("/")[-1], # Get just the filename
109
+ "trace_index": i, # Links to trace by index
110
+ "graph_data": kg_data["graph_data"]
 
 
 
 
111
  }
112
+ self._knowledge_graphs.append(kg_entry)
113
+
114
+ logger.info(f"Loaded {len(self._knowledge_graphs)} sample knowledge graphs")
115
+
116
+ return self._knowledge_graphs
117
+
118
+ def get_sample_info(self) -> Dict[str, Any]:
119
+ """Get information about the available sample data."""
120
+ config = self._load_config()
121
+ traces = self.get_traces()
122
+ knowledge_graphs = self.get_knowledge_graphs()
123
+
124
+ # Extract unique features from all samples
125
+ all_features = set()
126
+ for sample in config["samples"]:
127
+ all_features.update(sample.get("features", []))
128
+
129
+ return {
130
+ "traces_count": len(traces),
131
+ "knowledge_graphs_count": len(knowledge_graphs),
132
+ "trace_types": list(set(t["trace_type"] for t in traces)),
133
+ "complexity_levels": list(set(sample.get("complexity", "standard") for sample in config["samples"])),
134
+ "features": list(all_features),
135
+ "description": config["metadata"]["description"],
136
+ "version": config["metadata"]["version"]
137
  }
138
+
139
+
140
+ # Create a global loader instance
141
+ _loader = SampleDataLoader()
142
+
143
+ # Maintain backward compatibility by exposing the same interface
144
+ def get_sample_traces() -> List[Dict[str, Any]]:
145
+ """Get sample traces (backward compatibility)."""
146
+ return _loader.get_traces()
147
+
148
+ def get_sample_knowledge_graphs() -> List[Dict[str, Any]]:
149
+ """Get sample knowledge graphs (backward compatibility)."""
150
+ return _loader.get_knowledge_graphs()
151
+
152
+ # Legacy global variables for backward compatibility
153
+ @property
154
+ def SAMPLE_TRACES():
155
+ """Legacy property for backward compatibility."""
156
+ return _loader.get_traces()
157
+
158
+ @property
159
+ def SAMPLE_KNOWLEDGE_GRAPHS():
160
+ """Legacy property for backward compatibility."""
161
+ return _loader.get_knowledge_graphs()
162
+
163
+ # Make them accessible as module-level variables
164
+ import sys
165
+ current_module = sys.modules[__name__]
166
+ current_module.SAMPLE_TRACES = _loader.get_traces()
167
+ current_module.SAMPLE_KNOWLEDGE_GRAPHS = _loader.get_knowledge_graphs()
168
 
169
 
170
  def insert_sample_data(session, force_insert=False):
 
188
  "errors": []
189
  }
190
 
191
+ # Get sample data from loader
192
+ sample_traces = _loader.get_traces()
193
+ sample_knowledge_graphs = _loader.get_knowledge_graphs()
194
+
195
  # Check if sample data already exists
196
  if not force_insert:
197
  existing_sample = session.query(Trace).filter(
 
200
 
201
  if existing_sample:
202
  logger.info("Sample data already exists, skipping insertion")
203
+ results["skipped"] = len(sample_traces)
204
  return results
205
 
206
  try:
207
  # Insert sample traces
208
  trace_ids = []
209
+ for i, trace_data in enumerate(sample_traces):
210
  try:
211
  trace = save_trace(
212
  session=session,
 
227
  results["errors"].append(error_msg)
228
 
229
  # Insert corresponding knowledge graphs
230
+ for kg_data in sample_knowledge_graphs:
231
  try:
232
  trace_index = kg_data["trace_index"]
233
  if trace_index < len(trace_ids):
 
265
  Returns:
266
  Dict with sample data statistics
267
  """
268
+ return _loader.get_sample_info()
269
+
270
+
271
+ # Additional utility functions for managing samples
272
+ def add_sample(sample_id: str, name: str, description: str, trace_file: str,
273
+ knowledge_graph_file: str, tags: List[str], trace_type: str = "custom",
274
+ trace_source: str = "sample_data", complexity: str = "standard",
275
+ features: List[str] = None):
276
+ """
277
+ Add a new sample to the configuration (utility function for future use).
278
+
279
+ Args:
280
+ sample_id: Unique identifier for the sample
281
+ name: Human-readable name
282
+ description: Description of the sample
283
+ trace_file: Path to trace JSON file relative to samples directory
284
+ knowledge_graph_file: Path to KG JSON file relative to samples directory
285
+ tags: List of tags
286
+ trace_type: Type of trace
287
+ trace_source: Source of trace
288
+ complexity: Complexity level
289
+ features: List of features demonstrated
290
+ """
291
+ # This would modify the config file - implementation depends on requirements
292
+ logger.info(f"Add sample feature called for: {sample_id}")
293
+ pass
294
+
295
+
296
+ def list_available_samples() -> List[Dict[str, Any]]:
297
+ """List all available samples with their metadata."""
298
+ config = _loader._load_config()
299
+ return config["samples"]
300
+
301
+
302
+ if __name__ == "__main__":
303
+ # Quick test of the loader
304
+ try:
305
+ info = get_sample_data_info()
306
+ print("Sample Data Info:", json.dumps(info, indent=2))
307
+
308
+ traces = get_sample_traces()
309
+ print(f"Loaded {len(traces)} traces")
310
+
311
+ kgs = get_sample_knowledge_graphs()
312
+ print(f"Loaded {len(kgs)} knowledge graphs")
313
+
314
+ except Exception as e:
315
+ print(f"Error testing sample data loader: {e}")
backend/database/samples/README.md ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AgentGraph Sample Data System
2
+
3
+ 这是重构后的 sample data 系统,使用 JSON 文件而不是硬编码的 Python 数据,更容易维护和扩展。
4
+
5
+ ## 📁 文件结构
6
+
7
+ ```
8
+ samples/
9
+ ├── README.md # 本文档
10
+ ├── samples_config.json # 样本配置文件
11
+ ├── traces/ # Trace数据目录
12
+ │ └── python_documentation_inquiry.json
13
+ └── knowledge_graphs/ # Knowledge Graph数据目录
14
+ └── kg_python_documentation_enhanced.json
15
+ ```
16
+
17
+ ## 🔧 配置系统
18
+
19
+ ### `samples_config.json`
20
+
21
+ 主配置文件,定义所有可用的样本:
22
+
23
+ ```json
24
+ {
25
+ "samples": [
26
+ {
27
+ "id": "python_documentation_demo",
28
+ "name": "Python Documentation Assistant Demo",
29
+ "description": "...",
30
+ "trace_file": "traces/python_documentation_inquiry.json",
31
+ "knowledge_graph_file": "knowledge_graphs/kg_python_documentation_enhanced.json",
32
+ "tags": ["programming", "rag_assistant", "documentation"],
33
+ "complexity": "enhanced",
34
+ "trace_type": "documentation_search",
35
+ "trace_source": "sample_data",
36
+ "features": [
37
+ "rag_search",
38
+ "failure_detection",
39
+ "optimization_recommendations"
40
+ ]
41
+ }
42
+ ],
43
+ "metadata": {
44
+ "version": "1.0.0",
45
+ "created": "2025-01-27",
46
+ "description": "..."
47
+ }
48
+ }
49
+ ```
50
+
51
+ ## 📄 数据文件格式
52
+
53
+ ### Trace 文件
54
+
55
+ - 位置:`traces/`目录
56
+ - 格式:标准 JSON,包含 filename, title, description, content 等字段
57
+ - Content 字段包含完整的 trace 数据(observations, metadata 等)
58
+
59
+ ### Knowledge Graph 文件
60
+
61
+ - 位置:`knowledge_graphs/`目录
62
+ - 格式:标准 JSON,包含 filename, trace_index, graph_data 等字段
63
+ - Graph_data 包含 entities, relations, failures, optimizations, metadata
64
+
65
+ ## 🔄 向后兼容性
66
+
67
+ 新的`sample_data.py`保持了与旧 API 的完全兼容性:
68
+
69
+ ```python
70
+ # 这些调用仍然正常工作
71
+ from backend.database.sample_data import SAMPLE_TRACES, SAMPLE_KNOWLEDGE_GRAPHS
72
+ from backend.database.sample_data import insert_sample_data, get_sample_data_info
73
+ ```
74
+
75
+ ## ✨ 新增功能
76
+
77
+ ### 动态加载
78
+
79
+ - 支持运行时添加新样本(修改 JSON 文件即可)
80
+ - 自动验证 JSON 格式
81
+ - 更好的错误处理和日志
82
+
83
+ ### 配置管理
84
+
85
+ ```python
86
+ from backend.database.sample_data import list_available_samples, get_sample_data_info
87
+
88
+ # 列出所有可用样本
89
+ samples = list_available_samples()
90
+
91
+ # 获取详细信息
92
+ info = get_sample_data_info()
93
+ ```
94
+
95
+ ## 🚀 添加新样本
96
+
97
+ ### 1. 准备数据文件
98
+
99
+ 创建 trace 和 knowledge graph 的 JSON 文件,放在相应目录下。
100
+
101
+ ### 2. 更新配置
102
+
103
+ 在`samples_config.json`中添加新条目:
104
+
105
+ ```json
106
+ {
107
+ "id": "new_sample_id",
108
+ "name": "New Sample Name",
109
+ "description": "Description of the sample",
110
+ "trace_file": "traces/new_trace.json",
111
+ "knowledge_graph_file": "knowledge_graphs/new_kg.json",
112
+ "tags": ["tag1", "tag2"],
113
+ "complexity": "standard",
114
+ "trace_type": "custom",
115
+ "trace_source": "sample_data",
116
+ "features": ["feature1", "feature2"]
117
+ }
118
+ ```
119
+
120
+ ### 3. 自动加载
121
+
122
+ 系统会自动检测并加载新样本,无需重启。
123
+
124
+ ## 🎯 优势
125
+
126
+ 1. **易于维护**:数据与代码分离,修改样本不需要改 Python 代码
127
+ 2. **版本控制友好**:JSON diff 更清晰,方便 code review
128
+ 3. **扩展性强**:添加新样本只需添加 JSON 文件
129
+ 4. **类型安全**:JSON schema 验证(可扩展)
130
+ 5. **向后兼容**:现有代码无需修改
131
+
132
+ ## 🛠️ 开发工具
133
+
134
+ ### 测试新系统
135
+
136
+ ```bash
137
+ cd backend/database
138
+ python sample_data_new.py
139
+ ```
140
+
141
+ ### 验证 JSON 格式
142
+
143
+ ```bash
144
+ python -m json.tool samples/traces/new_trace.json
145
+ python -m json.tool samples/knowledge_graphs/new_kg.json
146
+ ```
147
+
148
+ ## 📊 从 algorithm-generated.jsonl 迁移
149
+
150
+ 当我们准备好从 algorithm-generated.jsonl 中选择样本时:
151
+
152
+ 1. 运行`multi_agent_knowledge_extractor.py`生成 KG
153
+ 2. 将 trace 和 KG 分别保存为 JSON 文件
154
+ 3. 在`samples_config.json`中添加配置条目
155
+ 4. 自动集成到系统中
156
+
157
+ 这个结构使得批量添加真实样本变得非常简单!
backend/database/samples/add_algorithm_sample_example.py ADDED
@@ -0,0 +1,162 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python
2
+ """
3
+ 示例脚本:如何从algorithm-generated.jsonl添加新样本到系统中
4
+ """
5
+
6
+ import json
7
+ import sys
8
+ from pathlib import Path
9
+
10
+ def extract_algorithm_sample(jsonl_path: str, sample_id: int = 0):
11
+ """
12
+ 从algorithm-generated.jsonl中提取指定样本并转换为我们的格式
13
+
14
+ Args:
15
+ jsonl_path: algorithm-generated.jsonl文件路径
16
+ sample_id: 要提取的样本ID
17
+ """
18
+
19
+ # 读取JSONL文件
20
+ samples = []
21
+ with open(jsonl_path, 'r', encoding='utf-8') as f:
22
+ for line in f:
23
+ if line.strip():
24
+ samples.append(json.loads(line))
25
+
26
+ if sample_id >= len(samples):
27
+ print(f"错误:样本ID {sample_id} 超出范围,最大ID为 {len(samples)-1}")
28
+ return
29
+
30
+ sample = samples[sample_id]
31
+
32
+ # 提取trace数据
33
+ trace_data = {
34
+ "filename": f"algorithm_sample_{sample_id}.json",
35
+ "title": f"Algorithm Sample {sample_id}: {sample['question'][:50]}...",
36
+ "description": f"Multi-agent collaboration sample from algorithm-generated dataset. Agents: {', '.join(sample['agents'])}. Question: {sample['question'][:100]}...",
37
+ "trace_type": "multi_agent_collaboration",
38
+ "trace_source": "algorithm_generated",
39
+ "tags": ["multi_agent", "algorithm_generated", "real_failure"] + sample.get('agents', []),
40
+ "content": {
41
+ "id": f"algorithm_trace_{sample_id}",
42
+ "timestamp": "2025-01-27T00:00:00",
43
+ "metadata": {
44
+ "source": "algorithm-generated.jsonl",
45
+ "original_id": sample['id'],
46
+ "mistake_step": sample.get('mistake_step', 0),
47
+ "mistake_agent": sample.get('mistake_agent', 'unknown'),
48
+ "mistake_reason": sample.get('mistake_reason', 'unknown'),
49
+ "ground_truth": sample.get('ground_truth', 'unknown'),
50
+ "is_correct": sample.get('is_correct', False)
51
+ },
52
+ "data": {
53
+ "question": sample['question'],
54
+ "agents": sample['agents'],
55
+ "total_observations": len(json.loads(sample['trace'])) if isinstance(sample['trace'], str) else len(sample['trace'])
56
+ },
57
+ "observations": json.loads(sample['trace']) if isinstance(sample['trace'], str) else sample['trace']
58
+ }
59
+ }
60
+
61
+ print(f"✅ 成功提取样本 {sample_id}")
62
+ print(f" 问题: {sample['question'][:100]}...")
63
+ print(f" 智能体: {', '.join(sample['agents'])}")
64
+ print(f" 观察数量: {len(trace_data['content']['observations'])}")
65
+ print(f" 错误步骤: {sample.get('mistake_step', 'N/A')}")
66
+ print(f" 错误智能体: {sample.get('mistake_agent', 'N/A')}")
67
+
68
+ return trace_data
69
+
70
+ def create_sample_config_entry(sample_id: int, trace_data: dict):
71
+ """创建样本配置条目"""
72
+
73
+ sample_config = {
74
+ "id": f"algorithm_sample_{sample_id}",
75
+ "name": f"Algorithm Generated Sample {sample_id}",
76
+ "description": trace_data["description"],
77
+ "trace_file": f"traces/algorithm_sample_{sample_id}.json",
78
+ "knowledge_graph_file": f"knowledge_graphs/kg_algorithm_sample_{sample_id}.json",
79
+ "tags": trace_data["tags"],
80
+ "complexity": "advanced",
81
+ "trace_type": trace_data["trace_type"],
82
+ "trace_source": trace_data["trace_source"],
83
+ "features": [
84
+ "multi_agent_collaboration",
85
+ "real_failure_analysis",
86
+ "complex_reasoning",
87
+ "tool_usage",
88
+ "error_patterns"
89
+ ]
90
+ }
91
+
92
+ return sample_config
93
+
94
+ def demo_algorithm_sample_extraction():
95
+ """演示如何提取algorithm样本的过程"""
96
+
97
+ print("🔍 AgentGraph Sample Data 系统 - Algorithm Sample 集成演示")
98
+ print("=" * 60)
99
+
100
+ # 模拟从algorithm-generated.jsonl提取样本
101
+ print("\n1️⃣ 从algorithm-generated.jsonl中选择最有价值的样本:")
102
+ sample_recommendations = [
103
+ {"id": 0, "reason": "数学计算 + 多智能体协作,相对简单但真实"},
104
+ {"id": 1, "reason": "地理查询 + 复杂搜索验证流程,展示网络服务集成"},
105
+ {"id": 2, "reason": "API调用失败,典型的认证和网络服务问题"}
106
+ ]
107
+
108
+ for rec in sample_recommendations:
109
+ print(f" 📝 样本 #{rec['id']}: {rec['reason']}")
110
+
111
+ print("\n2️⃣ 数据提取和转换流程:")
112
+ print(" ✅ 从JSONL提取原始trace数据")
113
+ print(" ✅ 转换为AgentGraph标准格式")
114
+ print(" ✅ 添加metadata和分类标签")
115
+ print(" ✅ 生成JSON文件")
116
+
117
+ print("\n3️⃣ Knowledge Graph生成:")
118
+ print(" 🤖 运行 multi_agent_knowledge_extractor.py")
119
+ print(" 📊 分析智能体角色和交互关系")
120
+ print(" ⚠️ 识别失败模式和原因")
121
+ print(" 🚀 生成优化建议")
122
+
123
+ print("\n4️⃣ 系统集成:")
124
+ print(" 📁 保存trace和KG为JSON文件")
125
+ print(" ⚙️ 更新samples_config.json")
126
+ print(" 🔄 自动加载到AgentGraph系统")
127
+
128
+ print("\n5️⃣ 预期结果:")
129
+ print(" 🎯 真实的多智能体失败案例")
130
+ print(" 📈 比现有Python文档示例更复杂和真实")
131
+ print(" 🛠️ 展示AgentGraph分析复杂系统的能力")
132
+ print(" 🌟 为用户提供production-ready的示例")
133
+
134
+ print("\n6️⃣ 下一步操作:")
135
+ print(" 1. 选择3-5个最有代表性的algorithm样本")
136
+ print(" 2. 运行knowledge graph提取")
137
+ print(" 3. 集成到新的JSON系统中")
138
+ print(" 4. 测试并优化样本质量")
139
+
140
+ print("\n" + "=" * 60)
141
+ print("✨ 新系统已准备好接收algorithm-generated样本!")
142
+
143
+ if __name__ == "__main__":
144
+ demo_algorithm_sample_extraction()
145
+
146
+ # 如果提供了JSONL文件路径,可以进行实际提取
147
+ if len(sys.argv) > 1:
148
+ jsonl_path = sys.argv[1]
149
+ sample_id = int(sys.argv[2]) if len(sys.argv) > 2 else 0
150
+
151
+ print(f"\n🔄 实际提取样本 {sample_id} from {jsonl_path}")
152
+ trace_data = extract_algorithm_sample(jsonl_path, sample_id)
153
+
154
+ if trace_data:
155
+ config_entry = create_sample_config_entry(sample_id, trace_data)
156
+ print("\n📋 生成的配置条目:")
157
+ print(json.dumps(config_entry, indent=2, ensure_ascii=False))
158
+
159
+ print("\n💾 要保存这个样本,请:")
160
+ print(f" 1. 将trace数据保存到: samples/traces/algorithm_sample_{sample_id}.json")
161
+ print(f" 2. 运行KG提取生成: samples/knowledge_graphs/kg_algorithm_sample_{sample_id}.json")
162
+ print(f" 3. 将配置条目添加到: samples/samples_config.json")
backend/database/samples/knowledge_graphs/kg_python_documentation_enhanced.json ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "filename": "kg_python_documentation_enhanced.json",
3
+ "trace_index": 0,
4
+ "graph_data": {
5
+ "system_name": "Python Documentation Assistant",
6
+ "system_summary": "This intelligent assistant processes user inquiries about Python programming through a comprehensive multi-step workflow. When users submit questions, the agent performs knowledge search, delivers detailed explanations with code examples, and engages in follow-up interactions to ensure thorough understanding of Python concepts, syntax, and performance considerations.",
7
+ "entities": [
8
+ {
9
+ "id": "agent_001",
10
+ "type": "Agent",
11
+ "name": "Python Documentation Agent",
12
+ "importance": "HIGH",
13
+ "raw_prompt": "You are a helpful Python programming assistant designed to answer questions accurately based on retrieved documentation context. Use the search results to provide precise responses.",
14
+ "raw_prompt_ref": [
15
+ {
16
+ "line_start": 31,
17
+ "line_end": 32
18
+ }
19
+ ]
20
+ },
21
+ {
22
+ "id": "task_001",
23
+ "type": "Task",
24
+ "name": "Programming Question Processing",
25
+ "importance": "HIGH",
26
+ "raw_prompt": "Process user inquiry about Python programming and generate an accurate, contextual response based on available documentation and programming best practices.",
27
+ "raw_prompt_ref": [
28
+ {
29
+ "line_start": 26,
30
+ "line_end": 28
31
+ }
32
+ ]
33
+ },
34
+ {
35
+ "id": "input_001",
36
+ "type": "Input",
37
+ "name": "User Programming Query",
38
+ "importance": "HIGH",
39
+ "raw_prompt": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
40
+ "raw_prompt_ref": [
41
+ {
42
+ "line_start": 19,
43
+ "line_end": 19
44
+ }
45
+ ]
46
+ },
47
+ {
48
+ "id": "output_001",
49
+ "type": "Output",
50
+ "name": "Programming Concept Explanation",
51
+ "importance": "HIGH",
52
+ "raw_prompt": "Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for: Main Uses: 1. Creating lists from existing iterables, 2. Filtering data, 3. Mathematical operations, 4. Combining operations. When to use them: For simple transformations, when performance matters, to make code more Pythonic and concise.",
53
+ "raw_prompt_ref": [
54
+ {
55
+ "line_start": 20,
56
+ "line_end": 20
57
+ }
58
+ ]
59
+ },
60
+ {
61
+ "id": "human_001",
62
+ "type": "Human",
63
+ "name": "Python Developer",
64
+ "importance": "MEDIUM",
65
+ "raw_prompt": "Developer seeking Python programming guidance and documentation",
66
+ "raw_prompt_ref": [
67
+ {
68
+ "line_start": 31,
69
+ "line_end": 31
70
+ }
71
+ ]
72
+ },
73
+ {
74
+ "id": "tool_001",
75
+ "type": "Tool",
76
+ "name": "Python Documentation Search",
77
+ "importance": "HIGH",
78
+ "raw_prompt": "Retrieval-Augmented Generation (RAG) system that searches Python documentation knowledge base for relevant concepts, syntax examples, and best practices to provide contextual information.",
79
+ "raw_prompt_ref": [
80
+ {
81
+ "line_start": 49,
82
+ "line_end": 49
83
+ }
84
+ ]
85
+ }
86
+ ],
87
+ "relations": [
88
+ {
89
+ "id": "rel_001",
90
+ "source": "input_001",
91
+ "target": "agent_001",
92
+ "type": "CONSUMED_BY",
93
+ "importance": "HIGH",
94
+ "interaction_prompt": "Extended user inquiry about Python list comprehensions received and processed through multi-step RAG workflow",
95
+ "interaction_prompt_ref": [
96
+ {
97
+ "line_start": 19,
98
+ "line_end": 19
99
+ }
100
+ ]
101
+ },
102
+ {
103
+ "id": "rel_002",
104
+ "source": "agent_001",
105
+ "target": "task_001",
106
+ "type": "PERFORMS",
107
+ "importance": "HIGH",
108
+ "interaction_prompt": "Agent executes comprehensive programming question processing including knowledge search, explanation, and code examples",
109
+ "interaction_prompt_ref": [
110
+ {
111
+ "line_start": 26,
112
+ "line_end": 28
113
+ }
114
+ ]
115
+ },
116
+ {
117
+ "id": "rel_003",
118
+ "source": "task_001",
119
+ "target": "output_001",
120
+ "type": "PRODUCES",
121
+ "importance": "HIGH",
122
+ "interaction_prompt": "Processing task generates detailed multi-part explanation with examples, performance analysis, and interactive follow-ups",
123
+ "interaction_prompt_ref": [
124
+ {
125
+ "line_start": 20,
126
+ "line_end": 20
127
+ }
128
+ ]
129
+ },
130
+ {
131
+ "id": "rel_004",
132
+ "source": "output_001",
133
+ "target": "human_001",
134
+ "type": "DELIVERS_TO",
135
+ "importance": "HIGH",
136
+ "interaction_prompt": "Comprehensive programming tutorial with examples and performance insights delivered to developer",
137
+ "interaction_prompt_ref": [
138
+ {
139
+ "line_start": 20,
140
+ "line_end": 20
141
+ }
142
+ ]
143
+ },
144
+ {
145
+ "id": "rel_005",
146
+ "source": "agent_001",
147
+ "target": "tool_001",
148
+ "type": "USES",
149
+ "importance": "HIGH",
150
+ "interaction_prompt": "Agent performs multi-step knowledge search retrieving documentation, examples, and performance comparisons for comprehensive response",
151
+ "interaction_prompt_ref": [
152
+ {
153
+ "line_start": 49,
154
+ "line_end": 49
155
+ }
156
+ ]
157
+ }
158
+ ],
159
+ "failures": [
160
+ {
161
+ "id": "failure_001",
162
+ "risk_type": "HALLUCINATION",
163
+ "description": "Initial query could benefit from more specific learning objectives, though the multi-turn interaction successfully addressed this through follow-up questions.",
164
+ "raw_text": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
165
+ "raw_text_ref": [
166
+ {
167
+ "line_start": 19,
168
+ "line_end": 19
169
+ }
170
+ ],
171
+ "affected_id": "input_001"
172
+ }
173
+ ],
174
+ "optimizations": [
175
+ {
176
+ "id": "opt_001",
177
+ "recommendation_type": "PROMPT_REFINEMENT",
178
+ "description": "Enhance initial query processing to identify learning level and tailor explanations accordingly. The current multi-turn approach works well but could be optimized with upfront user profiling.",
179
+ "affected_ids": ["agent_001"],
180
+ "raw_text_ref": [
181
+ {
182
+ "line_start": 31,
183
+ "line_end": 32
184
+ }
185
+ ]
186
+ },
187
+ {
188
+ "id": "opt_002",
189
+ "recommendation_type": "TOOL_ENHANCEMENT",
190
+ "description": "Integrate real-time code execution environment for testing examples, and expand knowledge base to include performance benchmarks and best practice recommendations.",
191
+ "affected_ids": ["tool_001"],
192
+ "raw_text_ref": [
193
+ {
194
+ "line_start": 49,
195
+ "line_end": 49
196
+ }
197
+ ]
198
+ }
199
+ ],
200
+ "metadata": {
201
+ "creation_timestamp": "2025-01-27T12:00:00Z",
202
+ "schema_version": "2.1.0",
203
+ "quality_score": 0.89,
204
+ "entity_count": 6,
205
+ "relation_count": 5,
206
+ "failure_count": 1,
207
+ "optimization_count": 2,
208
+ "interaction_depth": "multi_turn",
209
+ "educational_value": "high",
210
+ "processing_method": "production_enhanced",
211
+ "content_source": "documentation_trace",
212
+ "language": "en",
213
+ "domain": "programming_documentation"
214
+ }
215
+ }
216
+ }
backend/database/samples/samples_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "samples": [
3
+ {
4
+ "id": "python_documentation_demo",
5
+ "name": "Python Documentation Assistant Demo",
6
+ "description": "Comprehensive example showing RAG-powered AI assistant handling multi-turn programming inquiry with knowledge search, detailed explanations, code examples, performance analysis, and interactive learning",
7
+ "trace_file": "traces/python_documentation_inquiry.json",
8
+ "knowledge_graph_file": "knowledge_graphs/kg_python_documentation_enhanced.json",
9
+ "tags": [
10
+ "programming",
11
+ "rag_assistant",
12
+ "documentation",
13
+ "failure_detection",
14
+ "optimization"
15
+ ],
16
+ "complexity": "enhanced",
17
+ "trace_type": "documentation_search",
18
+ "trace_source": "sample_data",
19
+ "features": [
20
+ "rag_search",
21
+ "failure_detection",
22
+ "optimization_recommendations",
23
+ "content_references",
24
+ "quality_scoring"
25
+ ]
26
+ }
27
+ ],
28
+ "metadata": {
29
+ "version": "1.0.0",
30
+ "created": "2025-01-27",
31
+ "description": "Comprehensive AgentGraph sample data showcasing real multi-agent interactions, failures, and optimizations"
32
+ }
33
+ }
backend/database/samples/traces/python_documentation_inquiry.json ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "filename": "python_documentation_inquiry.json",
3
+ "title": "Python Documentation Assistant Demo",
4
+ "description": "Comprehensive example showing RAG-powered AI assistant handling multi-turn programming inquiry with knowledge search, detailed explanations, code examples, performance analysis, and interactive learning",
5
+ "trace_type": "documentation_search",
6
+ "trace_source": "sample_data",
7
+ "tags": [
8
+ "programming",
9
+ "rag_assistant",
10
+ "documentation",
11
+ "failure_detection",
12
+ "optimization"
13
+ ],
14
+ "content": {
15
+ "id": "doc_trace_demo_001",
16
+ "timestamp": "2025-01-27T00:00:00",
17
+ "metadata": {
18
+ "source": "AgentGraph_Demo",
19
+ "row_index": 0,
20
+ "converted_at": "2025-01-27T12:00:00.000000"
21
+ },
22
+ "data": {
23
+ "total_observations": 4,
24
+ "summary": "Python documentation inquiry with RAG-powered assistant response including knowledge search, explanation, and follow-up code examples"
25
+ },
26
+ "observations": [
27
+ {
28
+ "id": "demo_obs_001",
29
+ "type": "user_query",
30
+ "timestamp": "2025-01-27T00:00:00",
31
+ "input": "Hello! I'm learning Python and I keep seeing this syntax with square brackets that looks different from regular loops. Can you help me understand what Python list comprehensions are used for and when I should use them?",
32
+ "output": "I'll help you understand Python list comprehensions! Let me search our documentation to give you a comprehensive explanation.",
33
+ "metadata": {
34
+ "request_date": "2025-01-27T00:00:00",
35
+ "interaction_type": "initial_query",
36
+ "user_level": "beginner",
37
+ "topic_category": "python_syntax"
38
+ },
39
+ "latency": 0.124
40
+ },
41
+ {
42
+ "id": "demo_obs_002",
43
+ "type": "knowledge_search",
44
+ "timestamp": "2025-01-27T00:00:05",
45
+ "input": "Searching Python documentation for: list comprehensions, syntax, use cases, performance benefits",
46
+ "output": "Found relevant documentation: ['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering', 'Python data structures - lists, dictionaries, sets, tuples']",
47
+ "metadata": {
48
+ "search_query": "python list comprehensions syntax use cases",
49
+ "documents_found": 5,
50
+ "search_time_ms": 23,
51
+ "relevance_score": 0.94
52
+ },
53
+ "latency": 0.089
54
+ },
55
+ {
56
+ "id": "demo_obs_003",
57
+ "type": "inference",
58
+ "timestamp": "2025-01-27T00:00:08",
59
+ "input": "You are a helpful Python programming assistant designed to answer questions accurately based on the provided documentation. Use the information retrieved from the knowledge base below to respond concisely and correctly.\\n\\n### Retrieved Documents:\\n['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering']\\n\\n### Question:\\nwhat are python list comprehensions used for and when should I use them?\\n\\n### Answer:",
60
+ "output": "Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for:\\n\\n**Main Uses:**\\n1. **Creating lists from existing iterables** - Transform elements from sequences\\n2. **Filtering data** - Include only elements that meet specific conditions\\n3. **Mathematical operations** - Apply calculations efficiently\\n4. **Combining operations** - Filter and transform in one line\\n\\n**When to use them:**\\n- When you need to create a new list from an existing iterable\\n- For simple transformations (keep it readable!)\\n- When performance matters (they're faster than equivalent for loops)\\n- To make code more Pythonic and concise\\n\\n**Basic syntax:** `[expression for item in iterable if condition]`",
61
+ "metadata": {
62
+ "request_date": "2025-01-27T00:00:00",
63
+ "databricks_request_id": "demo-request-003",
64
+ "request_time": "2025-01-27T12:00:08.000",
65
+ "status_code": 200,
66
+ "sampling_fraction": 1,
67
+ "request_raw": "{\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful Python programming assistant\"},{\"role\":\"user\",\"content\":\"You are a helpful Python programming assistant designed to answer questions accurately based on the provided documentation. Use the information retrieved from the knowledge base below to respond concisely and correctly.\\\\n\\\\n### Retrieved Documents:\\\\n['Python list comprehensions - efficient way to create lists', 'List comprehension vs loops performance comparison', 'Python built-in functions - len(), range(), enumerate()', 'Nested list comprehensions and filtering']\\\\n\\\\n### Question:\\\\nwhat are python list comprehensions used for and when should I use them?\\\\n\\\\n### Answer:\"}]}",
68
+ "response_raw": "{\"choices\":[{\"content_filter_results\":{\"hate\":{\"filtered\":false,\"severity\":\"safe\"},\"self_harm\":{\"filtered\":false,\"severity\":\"safe\"},\"sexual\":{\"filtered\":false,\"severity\":\"safe\"},\"violence\":{\"filtered\":false,\"severity\":\"safe\"}},\"finish_reason\":\"stop\",\"index\":0,\"logprobs\":null,\"message\":{\"content\":\"Python list comprehensions are a powerful and concise way to create lists! Here's what they're used for:\\\\n\\\\n**Main Uses:**\\\\n1. **Creating lists from existing iterables** - Transform elements from sequences\\\\n2. **Filtering data** - Include only elements that meet specific conditions\\\\n3. **Mathematical operations** - Apply calculations efficiently\\\\n4. **Combining operations** - Filter and transform in one line\\\\n\\\\n**When to use them:**\\\\n- When you need to create a new list from an existing iterable\\\\n- For simple transformations (keep it readable!)\\\\n- When performance matters (they're faster than equivalent for loops)\\\\n- To make code more Pythonic and concise\\\\n\\\\n**Basic syntax:** `[expression for item in iterable if condition]`\",\"refusal\":null,\"role\":\"assistant\"}}],\"created\":1737979208,\"id\":\"chatcmpl-DEMO003\",\"model\":\"gpt-4o-2024-11-20\",\"object\":\"chat.completion\",\"system_fingerprint\":\"fp_demo_003\",\"usage\":{\"completion_tokens\":156,\"completion_tokens_details\":{\"accepted_prediction_tokens\":0,\"audio_tokens\":0,\"reasoning_tokens\":0,\"rejected_prediction_tokens\":0},\"prompt_tokens\":184,\"prompt_tokens_details\":{\"audio_tokens\":0,\"cached_tokens\":0},\"total_tokens\":340}}",
69
+ "served_entity_id": "demo-entity-003",
70
+ "logging_error_codes": "[]",
71
+ "requester": "demo-user-001"
72
+ },
73
+ "latency": 1.247,
74
+ "usage": {
75
+ "completion_tokens": 156,
76
+ "completion_tokens_details": {
77
+ "accepted_prediction_tokens": 0,
78
+ "audio_tokens": 0,
79
+ "reasoning_tokens": 0,
80
+ "rejected_prediction_tokens": 0
81
+ },
82
+ "prompt_tokens": 184,
83
+ "prompt_tokens_details": {
84
+ "audio_tokens": 0,
85
+ "cached_tokens": 0
86
+ },
87
+ "total_tokens": 340
88
+ },
89
+ "model": "gpt-4o-2024-11-20"
90
+ },
91
+ {
92
+ "id": "demo_obs_004",
93
+ "type": "follow_up",
94
+ "timestamp": "2025-01-27T00:00:15",
95
+ "input": "That's really helpful! Could you show me some practical examples? I'd love to see the difference between a regular for loop and a list comprehension.",
96
+ "output": "Absolutely! Here are some practical examples:\\n\\n**Example 1: Squaring numbers**\\n```python\\n# Traditional for loop\\nsquares = []\\nfor x in range(5):\\n squares.append(x**2)\\n# Result: [0, 1, 4, 9, 16]\\n\\n# List comprehension\\nsquares = [x**2 for x in range(5)]\\n# Same result: [0, 1, 4, 9, 16]\\n```\\n\\n**Example 2: Filtering even numbers**\\n```python\\n# Traditional approach\\nevens = []\\nfor x in range(10):\\n if x % 2 == 0:\\n evens.append(x)\\n\\n# List comprehension\\nevens = [x for x in range(10) if x % 2 == 0]\\n# Result: [0, 2, 4, 6, 8]\\n```\\n\\n**Example 3: String processing**\\n```python\\nwords = ['hello', 'world', 'python']\\n# Capitalize and get lengths\\nlengths = [len(word.upper()) for word in words]\\n# Result: [5, 5, 6]\\n```\\n\\nList comprehensions are not only more concise but also typically 20-30% faster than equivalent for loops!",
97
+ "metadata": {
98
+ "interaction_type": "follow_up_examples",
99
+ "code_examples_provided": 3,
100
+ "response_time": "2025-01-27T12:00:15.000"
101
+ },
102
+ "latency": 0.892,
103
+ "usage": {
104
+ "completion_tokens": 287,
105
+ "total_tokens": 445
106
+ }
107
+ }
108
+ ]
109
+ }
110
+ }