sml-agents-publish-subscribe-dbvec

Sleeping

App Files Files Community

sml-agents-publish-subscribe-dbvec / NER_AGENTS_GUIDE.md

santanche

refactor (dbvec): adding a vector database

16d3318 2 months ago

preview code

raw

history blame contribute delete

10.1 kB

SQL Agents Guide

Overview

SQL agents allow you to execute database queries as part of your agent pipeline. They integrate seamlessly with the pub/sub architecture, enabling data retrieval and manipulation through SQL queries.

How SQL Agents Work

Agent Type: SQL

Model Selection: Choose "SQL" from the model dropdown

Prompt Processing: SQL agents work like other agents:

Render the prompt with placeholders
The rendered prompt IS the SQL query
Execute the query via /query endpoint
Return results as JSON

Example

Agent Configuration:

Title: Product Query
Model: SQL
Prompt: SELECT * FROM products WHERE category = '{Category}' LIMIT 10
Subscribe: START
Publish: QUERY_RESULTS
☑ Show result

What happens:

Prompt renders: SELECT * FROM products WHERE category = 'Electronics' LIMIT 10
SQL agent executes this query
Results returned as JSON
Published to QUERY_RESULTS topic
Displayed in Final Result box

Prompt Template Features

Using Placeholders

SQL agents support all standard placeholders:

Data Sources:

Prompt: SELECT * FROM {TableName} WHERE status = 'active'

User Question:

Prompt: -- User asked: {question}
SELECT * FROM orders WHERE customer_name LIKE '%{question}%'

Previous Agent Output:

Prompt: -- Previous analysis: {input}
SELECT * FROM products WHERE id IN ({input})

Multiple Placeholders:

Prompt: SELECT {Columns} 
FROM {TableName} 
WHERE date > '{StartDate}' 
  AND status = '{Status}'
ORDER BY {OrderBy}

Dynamic Query Construction

Example 1: User-Driven Filters

Data Source: Filters
Content: category = 'Books' AND price < 50

Agent:
Prompt: SELECT * FROM products WHERE {Filters}

Example 2: Column Selection

Data Source: SelectedColumns
Content: id, name, price

Agent:
Prompt: SELECT {SelectedColumns} FROM products

Example 3: Parameterized Query

User Question: Electronics under $100

Agent 1 (LLM): Extract parameters
- Output: category='Electronics', max_price=100

Agent 2 (SQL):
Prompt: SELECT * FROM products WHERE category='{input}' AND price < 100

Output Format

JSON Result Structure

{
  "columns": ["id", "name", "price"],
  "rows": [
    [1, "Widget", 29.99],
    [2, "Gadget", 49.99]
  ],
  "row_count": 2
}

Fields

columns: List of column names
rows: Array of rows (each row is an array of values)
row_count: Total number of rows returned

Error Format

{
  "error": "table products does not exist"
}

Use Cases

Use Case 1: Data Retrieval for Analysis

Workflow: Query → Analyze → Report

Agent 1 (SQL): Fetch Data
- Prompt: SELECT * FROM sales WHERE date > '2024-01-01'
- Publish: RAW_DATA

Agent 2 (LLM): Analyze
- Prompt: Analyze this sales data: {input}
- Subscribe: RAW_DATA
- Publish: ANALYSIS

Agent 3 (LLM): Report
- Prompt: Create report from: {input}
- Subscribe: ANALYSIS

Use Case 2: Iterative Queries

Workflow: Query → Process → Query Again

Agent 1 (SQL): Get Categories
- Prompt: SELECT DISTINCT category FROM products
- Publish: CATEGORIES

Agent 2 (LLM): Pick Category
- Prompt: From these categories: {input}
         Pick the most relevant for: {question}
- Subscribe: CATEGORIES
- Publish: SELECTED_CATEGORY

Agent 3 (SQL): Get Products
- Prompt: SELECT * FROM products WHERE category = '{input}'
- Subscribe: SELECTED_CATEGORY
- Publish: PRODUCTS

Use Case 3: Database Updates

Note: Use with caution! Ensure proper validation.

Agent 1 (LLM): Generate Update Statement
- Prompt: Create SQL UPDATE for: {question}
         Table schema: {Schema}
- Publish: UPDATE_SQL

Agent 2 (SQL): Execute Update
- Prompt: {input}
- Subscribe: UPDATE_SQL

Use Case 4: Data Validation

Workflow: Query → Validate → Report

Data Source: ExpectedCount
Content: 1000

Agent 1 (SQL): Count Records
- Prompt: SELECT COUNT(*) as count FROM orders WHERE date = '2024-01-15'
- Publish: ACTUAL_COUNT

Agent 2 (LLM): Validate
- Prompt: Expected: {ExpectedCount}
         Actual: {input}
         Are they equal? Explain any discrepancy.
- Subscribe: ACTUAL_COUNT

Use Case 5: Multi-Table Join

Data Source: JoinClause
Content: INNER JOIN order_items oi ON o.id = oi.order_id
         INNER JOIN products p ON oi.product_id = p.id

Agent (SQL):
Prompt: SELECT o.id, o.date, p.name, oi.quantity
FROM orders o
{JoinClause}
WHERE o.date > '2024-01-01'

Best Practices

1. SQL Injection Protection

Don't:

Prompt: SELECT * FROM users WHERE name = '{input}'

If input is: ' OR '1'='1 → SQL injection!

Do: Use agent validation first:

Agent 1 (LLM): Validate Input
- Prompt: Is this a safe user name? {input}
         Return SAFE or UNSAFE

Agent 2 (SQL): Only if SAFE
- Conditional execution based on validation

2. Query Complexity

Keep queries readable:

✅ Good:
Prompt: SELECT id, name, price 
FROM products 
WHERE category = '{Category}' 
  AND price < {MaxPrice}
ORDER BY price DESC
LIMIT 10

❌ Too Complex:
Prompt: SELECT ... (100 lines of complex query)

Break complex queries into multiple agents.

3. Error Handling

Check for errors in downstream agents:

Agent 1 (SQL): Query
- Might return error

Agent 2 (LLM): Handle Result
- Prompt: Check if this is an error: {input}
         If error, explain. If success, analyze data.

4. Result Size

Be mindful of result size:

✅ Good:
Prompt: SELECT * FROM orders LIMIT 100

❌ Risky:
Prompt: SELECT * FROM orders  -- Could return millions of rows

Use LIMIT clauses appropriately.

5. Comments in SQL

Add context for debugging:

Prompt: -- Query generated for: {question}
-- Analyzing sales data for category: {Category}
SELECT 
    product_id,
    SUM(quantity) as total_sold,
    SUM(revenue) as total_revenue
FROM sales
WHERE category = '{Category}'
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT 10

Integration with Other Agents

SQL → LLM Analysis

SQL Agent: Retrieve data
↓ (JSON result)
LLM Agent: Analyze and interpret
↓ (Analysis text)
LLM Agent: Generate report

LLM → SQL → LLM

LLM Agent: Generate SQL based on question
↓ (SQL query)
SQL Agent: Execute query
↓ (Results)
LLM Agent: Format results for user

NER → SQL

NER Agent: Extract entities from text
↓ (Entities JSON)
LLM Agent: Build WHERE clause from entities
↓ (SQL condition)
SQL Agent: Query with conditions

SQL → Data Source → SQL

SQL Agent: Get list of IDs
↓ (Published to topic)
Data Source: Store IDs (subscribes to topic)
↓ (IDs available as {IDList})
SQL Agent: Query details using {IDList}

Examples

Example 1: Customer Orders

Setup:

User Question: Show me orders for customer ID 12345

Agent 1 (SQL): Get Orders
- Model: SQL
- Prompt: SELECT 
    order_id,
    order_date,
    total_amount,
    status
FROM orders
WHERE customer_id = '{question}'
ORDER BY order_date DESC
- Subscribe: START
- Publish: ORDERS
- ☑ Show result

Expected Output (Final Result box):

{
  "columns": ["order_id", "order_date", "total_amount", "status"],
  "rows": [
    [1001, "2024-01-15", 149.99, "delivered"],
    [1002, "2024-01-20", 89.50, "shipped"]
  ],
  "row_count": 2
}

Example 2: Sales Analytics

Setup:

Data Source: DateRange
Content: 2024-01-01

Agent 1 (SQL): Sales Data
- Prompt: SELECT 
    DATE(order_date) as date,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
WHERE order_date >= '{DateRange}'
GROUP BY DATE(order_date)
ORDER BY date
- Subscribe: START
- Publish: SALES_DATA

Agent 2 (LLM): Analyze Trends
- Model: phi3
- Prompt: Analyze these sales trends: {input}
         Identify patterns and anomalies.
- Subscribe: SALES_DATA
- ☑ Show result

Example 3: Dynamic Table Query

Setup:

Data Source: Config
Content: {"table": "products", "condition": "price < 100"}

Agent 1 (LLM): Parse Config
- Prompt: From this config: {Config}
         Extract table name and output it
- Publish: TABLE_NAME

Agent 2 (SQL): Query Table
- Prompt: SELECT * FROM {input} WHERE price < 100 LIMIT 20
- Subscribe: TABLE_NAME
- ☑ Show result

Troubleshooting

Issue: SQL syntax error

Cause: Invalid SQL in rendered prompt

Solution:

Check prompt in execution log
Verify placeholder values are valid
Test query manually in database

Issue: Database not ready

Cause: /status returns not ready

Solution: Wait for database initialization to complete

Issue: Empty result set

Cause: Query returns no rows

Output:

{
  "columns": [...],
  "rows": [],
  "row_count": 0
}

This is not an error - handle in downstream agents

Issue: Large result set

Cause: Query returns too many rows

Solution: Add LIMIT clause or filter conditions

Issue: Timeout

Cause: Query takes too long

Solution:

Optimize query
Add indexes
Reduce result set size

Limitations

Read/Write: All queries supported (SELECT, INSERT, UPDATE, DELETE)
Transactions: No multi-statement transaction support
Result Size: Large results may cause memory issues
Timeout: 30 second timeout per query
No Prepared Statements: Use validation for user input

Security Considerations

Validate User Input: Always validate before using in SQL
Limit Permissions: Database user should have minimal required permissions
Audit Queries: Log all SQL queries for review
No Sensitive Data: Be careful with password fields, tokens, etc.
Result Filtering: Consider if all columns should be exposed

Future Enhancements

Potential improvements:

Query result caching
Parameterized queries (prepared statements)
Transaction support
Batch query execution
Query templates library
Result pagination
Query performance metrics
Visual query builder