Spaces:
Sleeping
Sleeping
SQL Agent Mind Guide
This document is a practical SQL reference used by the agent to reason deeply about data quality tasks.
Core SQL command pattern
- Allowed:
SELECT,WITH(CTEs) - Blocked: destructive statements (
DROP,DELETE,UPDATE, etc.)
Most important SQL functions in this environment
Aggregation
COUNT(*)SUM(...)AVG(...)MIN(...),MAX(...)
Data quality checks
CASE WHEN ... THEN ... ELSE ... ENDIS NULLTRY_CAST(...)REPLACE(...)
Deduplication logic
GROUP BY ... HAVING COUNT(*) > 1SUM(c - 1)wherecis duplicate group count
Drift analysis
- Baseline vs current mean comparison with subqueries
LEFT JOIN ... WHERE right_col IS NULLfor novelty/referential drift- Distribution checks with
GROUP BY
Task-specific deep probe examples
Task 1: Nulls + duplicates
SELECT SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_email,
SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) AS null_customer_id
FROM customers;
SELECT COALESCE(SUM(c - 1), 0) AS duplicate_rows
FROM (
SELECT customer_id, email, name, signup_date, country, COUNT(*) AS c
FROM customers
GROUP BY 1,2,3,4,5
HAVING COUNT(*) > 1
) t;
Task 2: Schema and range violations
SELECT SUM(CASE WHEN quantity < 0 THEN 1 ELSE 0 END) AS negative_quantity_rows
FROM orders;
SELECT SUM(CASE WHEN try_cast(replace(amount, '$', '') AS DOUBLE) IS NULL THEN 1 ELSE 0 END) AS unparseable_amount_rows
FROM orders;
Task 3: Silent drift
SELECT
(SELECT AVG(amount) FROM transactions_baseline) AS baseline_mean,
(SELECT AVG(amount) FROM transactions_current) AS current_mean;
SELECT DISTINCT c.category
FROM transactions_current c
LEFT JOIN (SELECT DISTINCT category FROM transactions_baseline) b
ON c.category = b.category
WHERE b.category IS NULL
ORDER BY c.category;
SELECT AVG(CASE WHEN user_id >= 1000 THEN 1.0 ELSE 0.0 END) AS new_user_row_pct
FROM transactions_current;
Deeper testing strategy
- Run sample + aggregate checks first.
- Validate each scoring dimension with one explicit probe.
- Add distribution probes to avoid blind spots.
- Submit report only after all dimensions are covered.