Text Generation
Transformers
Safetensors
qwen3
text-to-sql
sql
llamafactory
spider
spider-test-suite
conversational
text-generation-inference
Instructions to use bsq1989/qwen_4b_sql with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use bsq1989/qwen_4b_sql with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="bsq1989/qwen_4b_sql") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("bsq1989/qwen_4b_sql") model = AutoModelForCausalLM.from_pretrained("bsq1989/qwen_4b_sql") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use bsq1989/qwen_4b_sql with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bsq1989/qwen_4b_sql" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsq1989/qwen_4b_sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bsq1989/qwen_4b_sql
- SGLang
How to use bsq1989/qwen_4b_sql with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "bsq1989/qwen_4b_sql" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsq1989/qwen_4b_sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "bsq1989/qwen_4b_sql" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bsq1989/qwen_4b_sql", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use bsq1989/qwen_4b_sql with Docker Model Runner:
docker model run hf.co/bsq1989/qwen_4b_sql
| base_model: Qwen/Qwen3-4B-Base | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - text-to-sql | |
| - sql | |
| - qwen3 | |
| - llamafactory | |
| - spider | |
| - spider-test-suite | |
| # qwen_4b_sql | |
| `qwen_4b_sql` is a `Qwen3-4B-Base` model finetuned for text-to-SQL generation with full SFT on a cleaned split of `PipableAI/pip-txt-to-sql-spider-bird-dataset`. | |
| This repository tracks the stronger 4B checkpoint from our H20 single-GPU training runs. In our internal comparison, this checkpoint outperformed the corresponding `Qwen3-1.7B-Base` baseline on Spider execution accuracy. | |
| ## Base Model | |
| - Base model: [`Qwen/Qwen3-4B-Base`](https://huggingface.co/Qwen/Qwen3-4B-Base) | |
| - Finetuning framework: `LLaMA-Factory` | |
| - Training mode: `Full SFT` | |
| - Task: `schema + question -> SQL only` | |
| ## Training Data | |
| - Primary dataset: [`PipableAI/pip-txt-to-sql-spider-bird-dataset`](https://huggingface.co/datasets/PipableAI/pip-txt-to-sql-spider-bird-dataset) | |
| - We used a cleaned local split derived from that dataset for train/validation | |
| ## Training Setup | |
| - Hardware: single `NVIDIA H20 96GB` | |
| - Precision: `bf16` | |
| - Context length: `2048` | |
| - Per-device train batch size: `1` | |
| - Gradient accumulation steps: `8` | |
| - Effective batch size: `8` | |
| - Learning rate: `5e-6` | |
| - Scheduler: `cosine` | |
| - Warmup steps: `300` | |
| - Epochs: `4.0` | |
| - Template: `qwen3_nothink` | |
| - Best-checkpoint selection: `load_best_model_at_end = true` | |
| ## Spider Benchmark | |
| The following numbers are from Spider dev using the official evaluation tooling: | |
| - Official `match` evaluation from `test-suite-sql-eval` | |
| - Official Spider `Test Suite` execution evaluation | |
| ### Main Results | |
| | Metric | Score | | |
| | --- | ---: | | |
| | Spider official exact match | 35.0% | | |
| | Spider Test Suite execution accuracy | 67.6% | | |
| ### Difficulty Breakdown | |
| | Difficulty | Exact Match | Test Suite Exec | | |
| | --- | ---: | ---: | | |
| | Easy | 64.9% | 87.5% | | |
| | Medium | 37.4% | 72.9% | | |
| | Hard | 16.1% | 50.0% | | |
| | Extra | 3.6% | 42.2% | | |
| ## Notes | |
| - This model is stronger under execution-based Spider evaluation than our best `Qwen3-1.7B-Base` run. | |
| - In our experiments, exact-match metrics were often stricter than execution-based metrics because semantically valid SQL rewrites do not always match the Spider gold form exactly. | |
| - A later 4B rerun with altered training settings underperformed this checkpoint on Spider and is not the checkpoint published here. | |
| ## Intended Use | |
| This model is intended for: | |
| - text-to-SQL research baselines | |
| - schema-conditioned SQL generation experiments | |
| - single-turn SQL generation from natural language plus schema text | |
| It is not validated for: | |
| - production-grade database access control | |
| - unrestricted execution over arbitrary enterprise schemas | |
| - multi-turn agent workflows without extra prompting / tooling | |
| ## Example Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "bsq1989/qwen_4b_sql" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype="auto", | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| prompt = """Generate SQL from the given schema and question. Output SQL only. | |
| Schema: | |
| CREATE TABLE twitter (TweetID INTEGER, UserID INTEGER, LocationID INTEGER, Lang TEXT, ...); | |
| CREATE TABLE location (LocationID INTEGER, Country TEXT, City TEXT, ...); | |
| Question: | |
| How many tweets are in English? | |
| """ | |
| messages = [{"role": "user", "content": prompt}] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| ) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=256) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Limitations | |
| - Performance drops on more open-ended and heterogeneous SQL benchmarks than Spider. | |
| - The model can still produce invalid column references on out-of-distribution schemas. | |
| - Benchmark numbers here reflect our current internal setup and should be reproduced with the same evaluation pipeline for strict comparison. | |