SQL Agents Guide
Overview
SQL agents allow you to execute database queries as part of your agent pipeline. They integrate seamlessly with the pub/sub architecture, enabling data retrieval and manipulation through SQL queries.
How SQL Agents Work
Agent Type: SQL
Model Selection: Choose "SQL" from the model dropdown
Prompt Processing: SQL agents work like other agents:
- Render the prompt with placeholders
- The rendered prompt IS the SQL query
- Execute the query via
/queryendpoint - Return results as JSON
Example
Agent Configuration:
Title: Product Query
Model: SQL
Prompt: SELECT * FROM products WHERE category = '{Category}' LIMIT 10
Subscribe: START
Publish: QUERY_RESULTS
β Show result
What happens:
- Prompt renders:
SELECT * FROM products WHERE category = 'Electronics' LIMIT 10 - SQL agent executes this query
- Results returned as JSON
- Published to QUERY_RESULTS topic
- Displayed in Final Result box
Prompt Template Features
Using Placeholders
SQL agents support all standard placeholders:
Data Sources:
Prompt: SELECT * FROM {TableName} WHERE status = 'active'
User Question:
Prompt: -- User asked: {question}
SELECT * FROM orders WHERE customer_name LIKE '%{question}%'
Previous Agent Output:
Prompt: -- Previous analysis: {input}
SELECT * FROM products WHERE id IN ({input})
Multiple Placeholders:
Prompt: SELECT {Columns}
FROM {TableName}
WHERE date > '{StartDate}'
AND status = '{Status}'
ORDER BY {OrderBy}
Dynamic Query Construction
Example 1: User-Driven Filters
Data Source: Filters
Content: category = 'Books' AND price < 50
Agent:
Prompt: SELECT * FROM products WHERE {Filters}
Example 2: Column Selection
Data Source: SelectedColumns
Content: id, name, price
Agent:
Prompt: SELECT {SelectedColumns} FROM products
Example 3: Parameterized Query
User Question: Electronics under $100
Agent 1 (LLM): Extract parameters
- Output: category='Electronics', max_price=100
Agent 2 (SQL):
Prompt: SELECT * FROM products WHERE category='{input}' AND price < 100
Output Format
JSON Result Structure
{
"columns": ["id", "name", "price"],
"rows": [
[1, "Widget", 29.99],
[2, "Gadget", 49.99]
],
"row_count": 2
}
Fields
- columns: List of column names
- rows: Array of rows (each row is an array of values)
- row_count: Total number of rows returned
Error Format
{
"error": "table products does not exist"
}
Use Cases
Use Case 1: Data Retrieval for Analysis
Workflow: Query β Analyze β Report
Agent 1 (SQL): Fetch Data
- Prompt: SELECT * FROM sales WHERE date > '2024-01-01'
- Publish: RAW_DATA
Agent 2 (LLM): Analyze
- Prompt: Analyze this sales data: {input}
- Subscribe: RAW_DATA
- Publish: ANALYSIS
Agent 3 (LLM): Report
- Prompt: Create report from: {input}
- Subscribe: ANALYSIS
Use Case 2: Iterative Queries
Workflow: Query β Process β Query Again
Agent 1 (SQL): Get Categories
- Prompt: SELECT DISTINCT category FROM products
- Publish: CATEGORIES
Agent 2 (LLM): Pick Category
- Prompt: From these categories: {input}
Pick the most relevant for: {question}
- Subscribe: CATEGORIES
- Publish: SELECTED_CATEGORY
Agent 3 (SQL): Get Products
- Prompt: SELECT * FROM products WHERE category = '{input}'
- Subscribe: SELECTED_CATEGORY
- Publish: PRODUCTS
Use Case 3: Database Updates
Note: Use with caution! Ensure proper validation.
Agent 1 (LLM): Generate Update Statement
- Prompt: Create SQL UPDATE for: {question}
Table schema: {Schema}
- Publish: UPDATE_SQL
Agent 2 (SQL): Execute Update
- Prompt: {input}
- Subscribe: UPDATE_SQL
Use Case 4: Data Validation
Workflow: Query β Validate β Report
Data Source: ExpectedCount
Content: 1000
Agent 1 (SQL): Count Records
- Prompt: SELECT COUNT(*) as count FROM orders WHERE date = '2024-01-15'
- Publish: ACTUAL_COUNT
Agent 2 (LLM): Validate
- Prompt: Expected: {ExpectedCount}
Actual: {input}
Are they equal? Explain any discrepancy.
- Subscribe: ACTUAL_COUNT
Use Case 5: Multi-Table Join
Data Source: JoinClause
Content: INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
Agent (SQL):
Prompt: SELECT o.id, o.date, p.name, oi.quantity
FROM orders o
{JoinClause}
WHERE o.date > '2024-01-01'
Best Practices
1. SQL Injection Protection
Don't:
Prompt: SELECT * FROM users WHERE name = '{input}'
If input is: ' OR '1'='1 β SQL injection!
Do: Use agent validation first:
Agent 1 (LLM): Validate Input
- Prompt: Is this a safe user name? {input}
Return SAFE or UNSAFE
Agent 2 (SQL): Only if SAFE
- Conditional execution based on validation
2. Query Complexity
Keep queries readable:
β
Good:
Prompt: SELECT id, name, price
FROM products
WHERE category = '{Category}'
AND price < {MaxPrice}
ORDER BY price DESC
LIMIT 10
β Too Complex:
Prompt: SELECT ... (100 lines of complex query)
Break complex queries into multiple agents.
3. Error Handling
Check for errors in downstream agents:
Agent 1 (SQL): Query
- Might return error
Agent 2 (LLM): Handle Result
- Prompt: Check if this is an error: {input}
If error, explain. If success, analyze data.
4. Result Size
Be mindful of result size:
β
Good:
Prompt: SELECT * FROM orders LIMIT 100
β Risky:
Prompt: SELECT * FROM orders -- Could return millions of rows
Use LIMIT clauses appropriately.
5. Comments in SQL
Add context for debugging:
Prompt: -- Query generated for: {question}
-- Analyzing sales data for category: {Category}
SELECT
product_id,
SUM(quantity) as total_sold,
SUM(revenue) as total_revenue
FROM sales
WHERE category = '{Category}'
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 10
Integration with Other Agents
SQL β LLM Analysis
SQL Agent: Retrieve data
β (JSON result)
LLM Agent: Analyze and interpret
β (Analysis text)
LLM Agent: Generate report
LLM β SQL β LLM
LLM Agent: Generate SQL based on question
β (SQL query)
SQL Agent: Execute query
β (Results)
LLM Agent: Format results for user
NER β SQL
NER Agent: Extract entities from text
β (Entities JSON)
LLM Agent: Build WHERE clause from entities
β (SQL condition)
SQL Agent: Query with conditions
SQL β Data Source β SQL
SQL Agent: Get list of IDs
β (Published to topic)
Data Source: Store IDs (subscribes to topic)
β (IDs available as {IDList})
SQL Agent: Query details using {IDList}
Examples
Example 1: Customer Orders
Setup:
User Question: Show me orders for customer ID 12345
Agent 1 (SQL): Get Orders
- Model: SQL
- Prompt: SELECT
order_id,
order_date,
total_amount,
status
FROM orders
WHERE customer_id = '{question}'
ORDER BY order_date DESC
- Subscribe: START
- Publish: ORDERS
- β Show result
Expected Output (Final Result box):
{
"columns": ["order_id", "order_date", "total_amount", "status"],
"rows": [
[1001, "2024-01-15", 149.99, "delivered"],
[1002, "2024-01-20", 89.50, "shipped"]
],
"row_count": 2
}
Example 2: Sales Analytics
Setup:
Data Source: DateRange
Content: 2024-01-01
Agent 1 (SQL): Sales Data
- Prompt: SELECT
DATE(order_date) as date,
COUNT(*) as order_count,
SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{DateRange}'
GROUP BY DATE(order_date)
ORDER BY date
- Subscribe: START
- Publish: SALES_DATA
Agent 2 (LLM): Analyze Trends
- Model: phi3
- Prompt: Analyze these sales trends: {input}
Identify patterns and anomalies.
- Subscribe: SALES_DATA
- β Show result
Example 3: Dynamic Table Query
Setup:
Data Source: Config
Content: {"table": "products", "condition": "price < 100"}
Agent 1 (LLM): Parse Config
- Prompt: From this config: {Config}
Extract table name and output it
- Publish: TABLE_NAME
Agent 2 (SQL): Query Table
- Prompt: SELECT * FROM {input} WHERE price < 100 LIMIT 20
- Subscribe: TABLE_NAME
- β Show result
Troubleshooting
Issue: SQL syntax error
Cause: Invalid SQL in rendered prompt
Solution:
- Check prompt in execution log
- Verify placeholder values are valid
- Test query manually in database
Issue: Database not ready
Cause: /status returns not ready
Solution: Wait for database initialization to complete
Issue: Empty result set
Cause: Query returns no rows
Output:
{
"columns": [...],
"rows": [],
"row_count": 0
}
This is not an error - handle in downstream agents
Issue: Large result set
Cause: Query returns too many rows
Solution: Add LIMIT clause or filter conditions
Issue: Timeout
Cause: Query takes too long
Solution:
- Optimize query
- Add indexes
- Reduce result set size
Limitations
- Read/Write: All queries supported (SELECT, INSERT, UPDATE, DELETE)
- Transactions: No multi-statement transaction support
- Result Size: Large results may cause memory issues
- Timeout: 30 second timeout per query
- No Prepared Statements: Use validation for user input
Security Considerations
- Validate User Input: Always validate before using in SQL
- Limit Permissions: Database user should have minimal required permissions
- Audit Queries: Log all SQL queries for review
- No Sensitive Data: Be careful with password fields, tokens, etc.
- Result Filtering: Consider if all columns should be exposed
Future Enhancements
Potential improvements:
- Query result caching
- Parameterized queries (prepared statements)
- Transaction support
- Batch query execution
- Query templates library
- Result pagination
- Query performance metrics
- Visual query builder