sml-agents-publish-subscribe-dbvec / NER_AGENTS_GUIDE.md
santanche's picture
refactor (dbvec): adding a vector database
16d3318

SQL Agents Guide

Overview

SQL agents allow you to execute database queries as part of your agent pipeline. They integrate seamlessly with the pub/sub architecture, enabling data retrieval and manipulation through SQL queries.

How SQL Agents Work

Agent Type: SQL

Model Selection: Choose "SQL" from the model dropdown

Prompt Processing: SQL agents work like other agents:

  1. Render the prompt with placeholders
  2. The rendered prompt IS the SQL query
  3. Execute the query via /query endpoint
  4. Return results as JSON

Example

Agent Configuration:

Title: Product Query
Model: SQL
Prompt: SELECT * FROM products WHERE category = '{Category}' LIMIT 10
Subscribe: START
Publish: QUERY_RESULTS
β˜‘ Show result

What happens:

  1. Prompt renders: SELECT * FROM products WHERE category = 'Electronics' LIMIT 10
  2. SQL agent executes this query
  3. Results returned as JSON
  4. Published to QUERY_RESULTS topic
  5. Displayed in Final Result box

Prompt Template Features

Using Placeholders

SQL agents support all standard placeholders:

Data Sources:

Prompt: SELECT * FROM {TableName} WHERE status = 'active'

User Question:

Prompt: -- User asked: {question}
SELECT * FROM orders WHERE customer_name LIKE '%{question}%'

Previous Agent Output:

Prompt: -- Previous analysis: {input}
SELECT * FROM products WHERE id IN ({input})

Multiple Placeholders:

Prompt: SELECT {Columns} 
FROM {TableName} 
WHERE date > '{StartDate}' 
  AND status = '{Status}'
ORDER BY {OrderBy}

Dynamic Query Construction

Example 1: User-Driven Filters

Data Source: Filters
Content: category = 'Books' AND price < 50

Agent:
Prompt: SELECT * FROM products WHERE {Filters}

Example 2: Column Selection

Data Source: SelectedColumns
Content: id, name, price

Agent:
Prompt: SELECT {SelectedColumns} FROM products

Example 3: Parameterized Query

User Question: Electronics under $100

Agent 1 (LLM): Extract parameters
- Output: category='Electronics', max_price=100

Agent 2 (SQL):
Prompt: SELECT * FROM products WHERE category='{input}' AND price < 100

Output Format

JSON Result Structure

{
  "columns": ["id", "name", "price"],
  "rows": [
    [1, "Widget", 29.99],
    [2, "Gadget", 49.99]
  ],
  "row_count": 2
}

Fields

  • columns: List of column names
  • rows: Array of rows (each row is an array of values)
  • row_count: Total number of rows returned

Error Format

{
  "error": "table products does not exist"
}

Use Cases

Use Case 1: Data Retrieval for Analysis

Workflow: Query β†’ Analyze β†’ Report

Agent 1 (SQL): Fetch Data
- Prompt: SELECT * FROM sales WHERE date > '2024-01-01'
- Publish: RAW_DATA

Agent 2 (LLM): Analyze
- Prompt: Analyze this sales data: {input}
- Subscribe: RAW_DATA
- Publish: ANALYSIS

Agent 3 (LLM): Report
- Prompt: Create report from: {input}
- Subscribe: ANALYSIS

Use Case 2: Iterative Queries

Workflow: Query β†’ Process β†’ Query Again

Agent 1 (SQL): Get Categories
- Prompt: SELECT DISTINCT category FROM products
- Publish: CATEGORIES

Agent 2 (LLM): Pick Category
- Prompt: From these categories: {input}
         Pick the most relevant for: {question}
- Subscribe: CATEGORIES
- Publish: SELECTED_CATEGORY

Agent 3 (SQL): Get Products
- Prompt: SELECT * FROM products WHERE category = '{input}'
- Subscribe: SELECTED_CATEGORY
- Publish: PRODUCTS

Use Case 3: Database Updates

Note: Use with caution! Ensure proper validation.

Agent 1 (LLM): Generate Update Statement
- Prompt: Create SQL UPDATE for: {question}
         Table schema: {Schema}
- Publish: UPDATE_SQL

Agent 2 (SQL): Execute Update
- Prompt: {input}
- Subscribe: UPDATE_SQL

Use Case 4: Data Validation

Workflow: Query β†’ Validate β†’ Report

Data Source: ExpectedCount
Content: 1000

Agent 1 (SQL): Count Records
- Prompt: SELECT COUNT(*) as count FROM orders WHERE date = '2024-01-15'
- Publish: ACTUAL_COUNT

Agent 2 (LLM): Validate
- Prompt: Expected: {ExpectedCount}
         Actual: {input}
         Are they equal? Explain any discrepancy.
- Subscribe: ACTUAL_COUNT

Use Case 5: Multi-Table Join

Data Source: JoinClause
Content: INNER JOIN order_items oi ON o.id = oi.order_id
         INNER JOIN products p ON oi.product_id = p.id

Agent (SQL):
Prompt: SELECT o.id, o.date, p.name, oi.quantity
FROM orders o
{JoinClause}
WHERE o.date > '2024-01-01'

Best Practices

1. SQL Injection Protection

Don't:

Prompt: SELECT * FROM users WHERE name = '{input}'

If input is: ' OR '1'='1 β†’ SQL injection!

Do: Use agent validation first:

Agent 1 (LLM): Validate Input
- Prompt: Is this a safe user name? {input}
         Return SAFE or UNSAFE

Agent 2 (SQL): Only if SAFE
- Conditional execution based on validation

2. Query Complexity

Keep queries readable:

βœ… Good:
Prompt: SELECT id, name, price 
FROM products 
WHERE category = '{Category}' 
  AND price < {MaxPrice}
ORDER BY price DESC
LIMIT 10

❌ Too Complex:
Prompt: SELECT ... (100 lines of complex query)

Break complex queries into multiple agents.

3. Error Handling

Check for errors in downstream agents:

Agent 1 (SQL): Query
- Might return error

Agent 2 (LLM): Handle Result
- Prompt: Check if this is an error: {input}
         If error, explain. If success, analyze data.

4. Result Size

Be mindful of result size:

βœ… Good:
Prompt: SELECT * FROM orders LIMIT 100

❌ Risky:
Prompt: SELECT * FROM orders  -- Could return millions of rows

Use LIMIT clauses appropriately.

5. Comments in SQL

Add context for debugging:

Prompt: -- Query generated for: {question}
-- Analyzing sales data for category: {Category}
SELECT 
    product_id,
    SUM(quantity) as total_sold,
    SUM(revenue) as total_revenue
FROM sales
WHERE category = '{Category}'
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 10

Integration with Other Agents

SQL β†’ LLM Analysis

SQL Agent: Retrieve data
↓ (JSON result)
LLM Agent: Analyze and interpret
↓ (Analysis text)
LLM Agent: Generate report

LLM β†’ SQL β†’ LLM

LLM Agent: Generate SQL based on question
↓ (SQL query)
SQL Agent: Execute query
↓ (Results)
LLM Agent: Format results for user

NER β†’ SQL

NER Agent: Extract entities from text
↓ (Entities JSON)
LLM Agent: Build WHERE clause from entities
↓ (SQL condition)
SQL Agent: Query with conditions

SQL β†’ Data Source β†’ SQL

SQL Agent: Get list of IDs
↓ (Published to topic)
Data Source: Store IDs (subscribes to topic)
↓ (IDs available as {IDList})
SQL Agent: Query details using {IDList}

Examples

Example 1: Customer Orders

Setup:

User Question: Show me orders for customer ID 12345

Agent 1 (SQL): Get Orders
- Model: SQL
- Prompt: SELECT 
    order_id,
    order_date,
    total_amount,
    status
FROM orders
WHERE customer_id = '{question}'
ORDER BY order_date DESC
- Subscribe: START
- Publish: ORDERS
- β˜‘ Show result

Expected Output (Final Result box):

{
  "columns": ["order_id", "order_date", "total_amount", "status"],
  "rows": [
    [1001, "2024-01-15", 149.99, "delivered"],
    [1002, "2024-01-20", 89.50, "shipped"]
  ],
  "row_count": 2
}

Example 2: Sales Analytics

Setup:

Data Source: DateRange
Content: 2024-01-01

Agent 1 (SQL): Sales Data
- Prompt: SELECT 
    DATE(order_date) as date,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{DateRange}'
GROUP BY DATE(order_date)
ORDER BY date
- Subscribe: START
- Publish: SALES_DATA

Agent 2 (LLM): Analyze Trends
- Model: phi3
- Prompt: Analyze these sales trends: {input}
         Identify patterns and anomalies.
- Subscribe: SALES_DATA
- β˜‘ Show result

Example 3: Dynamic Table Query

Setup:

Data Source: Config
Content: {"table": "products", "condition": "price < 100"}

Agent 1 (LLM): Parse Config
- Prompt: From this config: {Config}
         Extract table name and output it
- Publish: TABLE_NAME

Agent 2 (SQL): Query Table
- Prompt: SELECT * FROM {input} WHERE price < 100 LIMIT 20
- Subscribe: TABLE_NAME
- β˜‘ Show result

Troubleshooting

Issue: SQL syntax error

Cause: Invalid SQL in rendered prompt

Solution:

  1. Check prompt in execution log
  2. Verify placeholder values are valid
  3. Test query manually in database

Issue: Database not ready

Cause: /status returns not ready

Solution: Wait for database initialization to complete

Issue: Empty result set

Cause: Query returns no rows

Output:

{
  "columns": [...],
  "rows": [],
  "row_count": 0
}

This is not an error - handle in downstream agents

Issue: Large result set

Cause: Query returns too many rows

Solution: Add LIMIT clause or filter conditions

Issue: Timeout

Cause: Query takes too long

Solution:

  • Optimize query
  • Add indexes
  • Reduce result set size

Limitations

  1. Read/Write: All queries supported (SELECT, INSERT, UPDATE, DELETE)
  2. Transactions: No multi-statement transaction support
  3. Result Size: Large results may cause memory issues
  4. Timeout: 30 second timeout per query
  5. No Prepared Statements: Use validation for user input

Security Considerations

  1. Validate User Input: Always validate before using in SQL
  2. Limit Permissions: Database user should have minimal required permissions
  3. Audit Queries: Log all SQL queries for review
  4. No Sensitive Data: Be careful with password fields, tokens, etc.
  5. Result Filtering: Consider if all columns should be exposed

Future Enhancements

Potential improvements:

  • Query result caching
  • Parameterized queries (prepared statements)
  • Transaction support
  • Batch query execution
  • Query templates library
  • Result pagination
  • Query performance metrics
  • Visual query builder