NL→SQL Assistant — схема движения данных

01Offline indexingscripts/build_index.py · build_fewshot_index.py — однократно при setup / смене схемы

02Online query pipelineagent/graph.py · LangGraph StateGraph · PipelineState

03LLM provider layerединый Protocol · кэш как основа $0-бюджета

04Eval harness & auditablation → voting → merge → re-score → audit · residue-цикл ×31 (47% → 94.0%)

Data sources

scripts/download_data.py → data/

BIRD Mini-Dev — 11 SQLite БД · 500 Q→SQL (dev)
BIRD train — 9 428 Q→SQL пар (HF parquet)
Chinook.sqlite — smoke
PostgreSQL 16 — docker-compose, opt-in

SQLitePostgreSQL 16

Introspector

schema_index/introspector.py

SQLAlchemy reflection (read-only): таблицы, колонки, PK/FK, top-K sample values, NULL count, distinct count.

SQLAlchemy

Chunker

schema_index/chunker.py

1 таблица = 1 card: имя + колонки + типы + samples + FK от/к + business-hints; fk_targets → metadata.

Indexer

schema_index/indexer.py

Upsert со stable chunk_id — без дублей при переиндексации; API чтения — SchemaIndex.

Vector store

ChromaDB · chroma_data/ (persistent)

schema_chunks — 1 запись = (db, table)
fewshot_qsql — эмбеддится только вопрос; SQL + db_id + intent в metadata

chromadb

Few-shot builder

scripts/build_fewshot_index.py

Q→SQL только из train · hard guard от dev-leakage.

Embeddings

mistral-embed · 1024-dim

CachingEmbeddingProvider → diskcache: реиндексация = 0 API.

FK graph

in-memory dict

SchemaIndex.fk_graph из fk_targets. Не в Chroma: FK-рёбра не несут семантики.

Пользователь

вопрос на RU / EN

Streamlit UI

app/streamlit_app.py · 8 модулей

Chat + sample-вопросы · schema explorer · show-working trace · EN↔RU · режимы Accurate / Fast / Debug.

StreamlitPlotly@st.cache_resource

FastAPI

src/nl_sql/api/main.py

POST /ask · GET /databases · /healthz · /readyz · /eval/latest. X-API-Key + token bucket 60 req/min · Singletons DI.

FastAPIPydantic v2uvicorn

Hugging Face Spaces

liovina-nl-sql.hf.space

Docker free tier · UI + API в одном контейнере · deploy ≈ 90 с.

context_builder

schema_index/retriever.py

Dense top-k=5 schema-чанков (filter db_id) → FK BFS ≤1 hop, бюджет 12 таблиц → few-shot k=3 (cross-db на BIRD) → extended samples 3→5 → dialect hints. Выход: ContextBundle.

plan_queryopt-in

JSON-скелет: tables · joins · filters · group_by · sort · limit

generate_sql

codestral-latest · T=0

Structured JSON: {sql, rationale, tables_used, confidence}. Шаблоны: cards | M-Schema (XiYan) | DAC (CHASE-SQL) + P3.F hints — 11 правил, gated db_id+phrase.

validate · AST guard

execution/guards.py · sqlglot

SELECT-only · no DML/DDL в дереве · 1 statement · function denylist (pg_sleep, load_extension…) · denied tables · ATTACH / PRAGMA block.

sqlglot

execute · read-only

execution/runner.py → db/connection.py

SQLite: URI mode=ro + PRAGMA query_only + progress-deadline; Postgres: read-only транзакции + statement_timeout 30 s; row cap 10 000.

3-layer safetySQLAlchemy

×1

repair_once

agent/nodes/repair_once.py

Ровно 1 повтор с error-context (guard repair_attempted): validate-fail / runtime-fail / empty (G) / critique-fail.

grounded_critiqueopt-in

Row-shape проверка результата → не более 1 retry.

deterministic_formatno LLM

render/picker.py · formats.py

Чистый Python, эвристики по shape результата: Scalar · Sentence · Table · Line · Bar · Pie · Scatter.

Plotly

explain_trace

mistral-large-latest

NL-caption ≤ 2 предложений; финализация trace (model, tokens, latency, confidence по каждому узлу).

Ответ · AskResponse

PipelineRunResult

answer + SQL (подсветка) + rationale + confidence + caption + полный trace. Error taxonomy: invalid_sql · execution_timeout · execution_failed · empty_result · low_confidence · repair_failed

LLMProvider Protocol

llm/providers/base.py · factory.py

PEP 544 runtime_checkable · build_provider(name) — смена модели = env var · ProviderError taxonomy. Embed-протокол отдельно.

Caching layer

llm/cache.py · diskcache

Ключ sha256(provider · model · system · prompt · T · max_tok): hit = 0 quota, 0 latency — основа $0-бюджета. В тестах — fake-провайдеры, CI без live API.

diskcache

7 provider-модулей + GraceKelly browser-мост · voting / residue-слои

Mistralprimary

codestral-latest (SQL) · mistral-large (NL) · mistral-embed. La Plateforme free.

Groq

llama-3.3-70b · qwen3-32b · gpt-oss. TPM/TPD-bounded.

GitHub Models

gpt-4o-mini · auth по PAT · OpenAI-compatible SDK.

OpenRouter

deepseek-v4-flash:free + 24 беспл. reasoning/code-моделей.

Ollama

local · qwen2.5-coder:7b — offline-слот bakeoff.

Perplexity / helallao

GPT-5.2 · Grok-4.1 · Claude-4.5 · Kimi-K2 (reasoning / Pro).

GraceKelly

browser-orchestrator → Sonnet 4.6 (eval-only мост).

$0 hard constraint — free tiers + кэш + user-подписки; ротация аккаунтов запрещена.

Dataset

eval/dataset.py

BIRD Mini-Dev loader · dev_split(seed=0, n) — stable-prefix: n=50 ⊂ n=200 → кэш промптов переиспользуется.

Ablation runner

eval/runner.py

Конфиги A · C · D · E · F (T-sweep 0.2–0.8) · G (B=BM25, N/I): A–D без repair, G + verify_retry_on_empty → eval/reports/*.json.

Voting / rescue scripts

scripts/run_*.py

groq_voting · sonnet (GraceKelly) · helallao · openrouter · critique_retry · selfcon (T-sweep) · wide_schema · ensemble_vote. Работают по residue (misses) поверх v_N.

Merge

scripts/merge_voting_rescues.py

--reverify — re-exec через safe_compare_pred · archive_sweep / rescore → merged baseline v_N+1.

Metrics

eval/metrics/

execution_accuracy — BIRD-official set-equality · safe_compare_pred (pred-fail → False) · schema_recall@k.

Audit gates

scripts/

audit_rescore — row-by-row re-execution · p3f_acceptance — 11 gates (req/forbidden columns по AST) · error_taxonomy buckets · refresh_baseline_summary.

v31 baseline · 94.0% EA (188/200) · 0 mismatches

→ README headline · GET /eval/latest · HF redeploy. Lift trace: 47% (config A) → 94.0% (v31), 31 версия, каждая с negative/saturation evidence.

05 · Quality gates & CI/CD pytest · 370 green coverage 91% mypy --strict · 0 issues / 59 files ruff check + format GitHub Actions · Ubuntu · py3.13 · uv uv.lock pinned + requirements-guard fake-провайдеры — CI без live API .deploy_hf.py + Playwright E2E grep-gate Makefile docker-compose · postgres / langfuse profiles