Instructions to use rishhh/schemasage-sql-qwen3-4b-clean-balanced-200 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rishhh/schemasage-sql-qwen3-4b-clean-balanced-200 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507") model = PeftModel.from_pretrained(base_model, "rishhh/schemasage-sql-qwen3-4b-clean-balanced-200") - Notebooks
- Google Colab
- Kaggle
SchemaSage-SQL Model Card
This is the release model card for the SchemaSage-SQL project. A 10-step QLoRA smoke adapter, a 200-step safety-balanced adapter, and a larger 8k/600 follow-up adapter have been trained and uploaded to validate the cloud training and Hugging Face release path. The 200-step safety-balanced adapter is the public release baseline.
Model Details
- Project name: SchemaSage-SQL
- Model status: release baseline published; larger follow-up remains experimental
- Planned base model: configurable through
configs/model.yaml - Current default base model:
Qwen/Qwen3-4B-Instruct-2507 - Planned method: supervised fine-tuning with LoRA or QLoRA
- Release format: adapter-first, optional merged model later
- Smoke adapter:
rishhh/schemasage-sql-qwen3-4b-smoke
Intended Use
SchemaSage-SQL is intended for:
- Research and prototyping around Text-to-SQL.
- Portfolio demonstration of LLM data, training, evaluation, and deployment engineering.
- Generating draft read-only analytical SQL from a schema and natural-language question.
Generated SQL should be reviewed before execution.
Out-of-Scope Use
SchemaSage-SQL is not intended for:
- Autonomous execution against production databases.
- Destructive database operations.
- Legal, financial, medical, or compliance-critical decision making.
- Credential extraction, filesystem access, or network exfiltration.
- SQL generation without a provided schema.
Datasets
Current data pipeline uses public Hugging Face datasets:
gretelai/synthetic_text_to_sqlb-mc2/sql-create-context
Cleaned and republished training data:
| Split | Examples |
|---|---|
| Train | 95,508 |
| Validation | 10,612 |
| Test | 5,324 |
| Refusal examples created | 10,862 |
Current processed split summary:
| Split | Examples |
|---|---|
| Train | 151,723 |
| Validation | 16,858 |
| Test | 5,248 |
Processing removed:
- Empty or incomplete examples.
- Exact duplicates.
- Destructive SQL examples by default.
- Sample
INSERTrows from schema context where source datasets included data values.
The validation split is a deterministic train holdout because the selected sources do not provide a native validation split.
Training Procedure
Training is implemented in src/training/train_sft.py and has been run for smoke, release-baseline, and experimental adapters.
A tiny local trainer smoke test was completed with sshleifer/tiny-gpt2; that artifact is not a release model and should not be used for model-quality claims.
A cloud QLoRA smoke run completed on Hugging Face Jobs:
| Field | Value |
|---|---|
| Job ID | 6a0c94bd2dc5b1243da4ffee |
| Hardware | a10g-large |
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Train rows | 112 |
| Eval rows | 16 |
| Steps | 10 |
| Final train loss | 1.908 |
| Final eval loss | 1.188 |
| Adapter repo | rishhh/schemasage-sql-qwen3-4b-smoke |
Smoke adapter commit:
https://huggingface.co/rishhh/schemasage-sql-qwen3-4b-smoke/commit/6a8241d2f6cc3f0a917ba527f2f32b0cc4bf9933
This smoke adapter validates the training and upload path. It is not the release baseline.
Safety-Clean Balanced Adapter
The best current safety baseline is:
rishhh/schemasage-sql-qwen3-4b-clean-balanced-200
It was trained for 200 optimizer steps on rishhh/schemasage-sql-clean-text2sql
with a deterministic 20% refusal mix in the short training sample. Evaluation on
64 cleaned held-out examples produced:
| Metric | Value |
|---|---|
| Normalized exact match | 0.2241 |
| SQL parse validity | 1.0000 |
| Schema adherence rate | 0.9828 |
| Unsafe query rate | 0.0000 |
| Blocked refusal accuracy | 1.0000 |
| Mean latency seconds | 5.5201 |
This adapter is the public release baseline because it kept perfect blocked-refusal accuracy on the re-evaluated held-out set.
The larger follow-up run completed as:
rishhh/schemasage-sql-qwen3-4b-clean-balanced-8k-600-v2
Completed settings: 8,192 cleaned train rows, 600 optimizer steps, 20% refusal mix, and 256 cleaned held-out evaluation examples.
The larger run is the stronger semantic candidate: it improved exact match, execution accuracy, and exact-or-execution-equivalent rate, but it missed one blocked refusal and kept slightly lower parse/schema scores. It remains the best comparison candidate, not the shipped baseline.
Planned command:
python -m src.training.train_sft --config configs/train_qlora.yaml
Smoke/dry-run validation command:
python -m src.training.train_sft --config configs/train_qlora.yaml --smoke-test --dry-run
Apple Silicon local development should use:
python -m src.training.train_sft --config configs/train_local_smoke.yaml --smoke-test --dry-run
Trainer-path validation command that has been run locally:
python -m src.training.train_sft --config configs/train_local_smoke.yaml --smoke-test
Evaluation
The full local reference-answer evaluation is a sanity check for the evaluator and processed data, not trained-model performance.
| Metric | Value |
|---|---|
| Exact match | 1.0000 |
| Normalized exact match | 1.0000 |
| SQL parse validity | 0.9998 |
| Schema adherence rate | 0.8994 |
| Hallucinated table rate | 0.0494 |
| Hallucinated column rate | 0.0878 |
| Unsafe query rate | 0.0071 |
| Execution accuracy | 1.0000 |
| Execution comparable examples | 4,007 |
These metrics validate the evaluator and processed reference data. They should not be presented as model quality.
The uploaded smoke adapter was also evaluated on 64 held-out examples from gretelai/synthetic_text_to_sql test. These metrics describe the 10-step smoke adapter only:
| Metric | Value |
|---|---|
| Exact match | 0.2188 |
| Normalized exact match | 0.2344 |
| SQL parse validity | 1.0000 |
| Schema adherence rate | 0.9688 |
| Hallucinated table rate | 0.0156 |
| Hallucinated column rate | 0.0312 |
| Unsafe query rate | 0.0000 |
| Execution accuracy | 0.8409 |
| Execution comparable examples | 44 |
| Mean generated SQL length | 11.80 |
| Mean latency seconds | 23.57 |
Evaluation artifacts:
evaluation/smoke_64/eval_results.jsonevaluation/smoke_64/eval_report.mdevaluation/smoke_64/predictions.jsonlevaluation/smoke_64/metrics_overview.svgevaluation/smoke_64/risk_rates.svg
Run comparison report:
reports/run_comparison.mdreports/run_comparison/comparison_quality.svgreports/run_comparison/comparison_risk_latency.svg
Release Baseline
The public release baseline is:
rishhh/schemasage-sql-qwen3-4b-clean-balanced-200
Its re-evaluated held-out metrics are summarized in:
reports/reparsed_clean_balanced_200.mdreports/reparsed_run_comparison.mdreports/run_comparison/comparison_quality.svgreports/run_comparison/comparison_risk_latency.svg
The larger rishhh/schemasage-sql-qwen3-4b-clean-balanced-8k-600-v2 run improved semantic metrics but missed one blocked refusal and kept slightly lower parse/schema scores, so it remains an experimental comparison candidate rather than the release baseline.
Safety Policy
The project defaults to read-only analytical SQL.
The safety layer blocks or refuses:
DROPDELETETRUNCATEALTERUPDATEINSERTMERGEREPLACECREATE DATABASECREATE USERGRANTREVOKEEXECandEXECUTECALLCOPY,LOAD, andUNLOAD- Multiple SQL statements
- Prompt-injection-like requests to ignore safety or schema rules
Safety checks are implemented in src/inference/safety.py using lexical checks plus sqlglot parsing.
Limitations
- The public release baseline is the 200-step safety-balanced adapter; the larger 8k/600 follow-up remains experimental.
- The uploaded Qwen3 4B smoke adapter exists only to validate the cloud trainer and Hub upload path.
- A tiny local smoke adapter exists only to validate the local trainer path.
- Re-evaluated held-out predictions are published for the release baseline, but the larger follow-up still serves as the main comparison point for future improvement work.
- The smoke adapter is undertrained; exact match is low despite good parse validity and schema adherence.
- The model can continue generating prompt-like text after the canonical response, so inference uses canonical-response parsing and should add stricter stop criteria.
- QLoRA training is expected to require CUDA-capable GPU hardware.
- Apple Silicon is suitable for development, tests, data prep, dry-runs, and the Gradio app, but not the default 4-bit QLoRA path.
- SQL equivalence is difficult; exact match is not enough to judge model quality.
- SQLite execution evaluation skips dialect-incompatible examples.
- Schema adherence checks can still miss complex aliasing or dialect-specific constructs.
Example Usage
python -m src.inference.generate_sql \
--schema-file data/samples/schema.sql \
--question "Which product categories had the highest revenue in Q4 2025?" \
--dry-run
Model-backed inference requires a configured model or adapter path.
Hardware Notes
The repository was developed and validated locally on a MacBook Pro M5 Pro environment using Python 3.11. Full QLoRA training should be run on CUDA GPU hardware or a managed GPU environment.
How to Improve
- Improve execution-equivalence on hard joins and aggregations.
- Add Spider-style held-out execution evaluation.
- Expand refusal and unanswerable-question training examples.
- Add batched inference and stop strings to reduce latency and trailing prompt continuation.
- Keep the larger 8k/600 run as an experimental benchmark rather than the shipped baseline until it clearly wins on both quality and safety.
- Publish final metrics only after evaluating generated predictions, not references.
License
Project code is MIT licensed. Dataset and base-model licenses must be reviewed before publishing trained artifacts.
- Downloads last month
- 106
Model tree for rishhh/schemasage-sql-qwen3-4b-clean-balanced-200
Base model
Qwen/Qwen3-4B-Instruct-2507