PEFT
Safetensors
text-to-sql
qlora
sql
schema-grounded-generation
safety
smoke-test

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SchemaSage-SQL Model Card

This is a pre-release model card for the SchemaSage-SQL project. A 10-step QLoRA smoke adapter has been trained and uploaded to validate the cloud training and Hugging Face upload path. A release-quality adapter has not been trained yet.

Model Details

  • Project name: SchemaSage-SQL
  • Model status: cloud smoke adapter trained; release model not trained yet
  • Planned base model: configurable through configs/model.yaml
  • Current default base model: Qwen/Qwen3-4B-Instruct-2507
  • Planned method: supervised fine-tuning with LoRA or QLoRA
  • Release format: adapter-first, optional merged model later
  • Smoke adapter: rishhh/schemasage-sql-qwen3-4b-smoke

Intended Use

SchemaSage-SQL is intended for:

  • Research and prototyping around Text-to-SQL.
  • Portfolio demonstration of LLM data, training, evaluation, and deployment engineering.
  • Generating draft read-only analytical SQL from a schema and natural-language question.

Generated SQL should be reviewed before execution.

Out-of-Scope Use

SchemaSage-SQL is not intended for:

  • Autonomous execution against production databases.
  • Destructive database operations.
  • Legal, financial, medical, or compliance-critical decision making.
  • Credential extraction, filesystem access, or network exfiltration.
  • SQL generation without a provided schema.

Datasets

Current data pipeline uses public Hugging Face datasets:

  • gretelai/synthetic_text_to_sql
  • b-mc2/sql-create-context

Current processed split summary:

Split Examples
Train 151,723
Validation 16,858
Test 5,248

Processing removed:

  • Empty or incomplete examples.
  • Exact duplicates.
  • Destructive SQL examples by default.
  • Sample INSERT rows from schema context where source datasets included data values.

The validation split is a deterministic train holdout because the selected sources do not provide a native validation split.

Training Procedure

Training is implemented in src/training/train_sft.py but has not been run for a final adapter. A tiny local trainer smoke test was completed with sshleifer/tiny-gpt2; that artifact is not a release model and should not be used for model-quality claims.

A cloud QLoRA smoke run completed on Hugging Face Jobs:

Field Value
Job ID 6a0c94bd2dc5b1243da4ffee
Hardware a10g-large
Base model Qwen/Qwen3-4B-Instruct-2507
Train rows 112
Eval rows 16
Steps 10
Final train loss 1.908
Final eval loss 1.188
Adapter repo rishhh/schemasage-sql-qwen3-4b-smoke

Smoke adapter commit:

https://huggingface.co/rishhh/schemasage-sql-qwen3-4b-smoke/commit/6a8241d2f6cc3f0a917ba527f2f32b0cc4bf9933

This smoke adapter validates the training and upload path. It is not a final model-quality release.

Planned command:

python -m src.training.train_sft --config configs/train_qlora.yaml

Smoke/dry-run validation command:

python -m src.training.train_sft --config configs/train_qlora.yaml --smoke-test --dry-run

Apple Silicon local development should use:

python -m src.training.train_sft --config configs/train_local_smoke.yaml --smoke-test --dry-run

Trainer-path validation command that has been run locally:

python -m src.training.train_sft --config configs/train_local_smoke.yaml --smoke-test

Evaluation

The full local reference-answer evaluation is a sanity check for the evaluator and processed data, not trained-model performance.

Metric Value
Exact match 1.0000
Normalized exact match 1.0000
SQL parse validity 0.9998
Schema adherence rate 0.8994
Hallucinated table rate 0.0494
Hallucinated column rate 0.0878
Unsafe query rate 0.0071
Execution accuracy 1.0000
Execution comparable examples 4,007

These metrics validate the evaluator and processed reference data. They should not be presented as model quality.

The uploaded smoke adapter was also evaluated on 64 held-out examples from gretelai/synthetic_text_to_sql test. These metrics describe the 10-step smoke adapter only:

Metric Value
Exact match 0.2188
Normalized exact match 0.2344
SQL parse validity 1.0000
Schema adherence rate 0.9688
Hallucinated table rate 0.0156
Hallucinated column rate 0.0312
Unsafe query rate 0.0000
Execution accuracy 0.8409
Execution comparable examples 44
Mean generated SQL length 11.80
Mean latency seconds 23.57

Evaluation artifacts:

  • evaluation/smoke_64/eval_results.json
  • evaluation/smoke_64/eval_report.md
  • evaluation/smoke_64/predictions.jsonl
  • evaluation/smoke_64/metrics_overview.svg
  • evaluation/smoke_64/risk_rates.svg

Safety Policy

The project defaults to read-only analytical SQL.

The safety layer blocks or refuses:

  • DROP
  • DELETE
  • TRUNCATE
  • ALTER
  • UPDATE
  • INSERT
  • MERGE
  • REPLACE
  • CREATE DATABASE
  • CREATE USER
  • GRANT
  • REVOKE
  • EXEC and EXECUTE
  • CALL
  • COPY, LOAD, and UNLOAD
  • Multiple SQL statements
  • Prompt-injection-like requests to ignore safety or schema rules

Safety checks are implemented in src/inference/safety.py using lexical checks plus sqlglot parsing.

Limitations

  • No release trained adapter is available yet.
  • The uploaded Qwen3 4B smoke adapter exists only to validate the cloud trainer and Hub upload path.
  • A tiny local smoke adapter exists only to validate the local trainer path.
  • Real smoke-adapter predictions have been evaluated on a small held-out subset, but final release-model predictions have not been evaluated yet.
  • The smoke adapter is undertrained; exact match is low despite good parse validity and schema adherence.
  • The model can continue generating prompt-like text after the canonical response, so inference uses canonical-response parsing and should add stricter stop criteria.
  • QLoRA training is expected to require CUDA-capable GPU hardware.
  • Apple Silicon is suitable for development, tests, data prep, dry-runs, and the Gradio app, but not the default 4-bit QLoRA path.
  • SQL equivalence is difficult; exact match is not enough to judge model quality.
  • SQLite execution evaluation skips dialect-incompatible examples.
  • Schema adherence checks can still miss complex aliasing or dialect-specific constructs.

Example Usage

python -m src.inference.generate_sql \
  --schema-file data/samples/schema.sql \
  --question "Which product categories had the highest revenue in Q4 2025?" \
  --dry-run

Model-backed inference requires a configured model or adapter path.

Hardware Notes

The repository was developed and validated locally on a MacBook Pro M5 Pro environment using Python 3.11. Full QLoRA training should be run on CUDA GPU hardware or a managed GPU environment.

How to Improve

  • Run a longer QLoRA training job on the full normalized train split or a carefully filtered high-quality subset.
  • Add Spider-style held-out execution evaluation.
  • Add refusal and unanswerable-question training examples.
  • Add batched inference and stop strings to reduce latency and trailing prompt continuation.
  • Publish final metrics only after evaluating generated predictions, not references.

License

Project code is MIT licensed. Dataset and base-model licenses must be reviewed before publishing trained artifacts.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rishhh/schemasage-sql-qwen3-4b-smoke

Adapter
(5504)
this model

Datasets used to train rishhh/schemasage-sql-qwen3-4b-smoke