| --- |
| language: |
| - en |
| license: gemma |
| base_model: unsloth/gemma-3-1b-it |
| tags: |
| - text-to-sql |
| - finetuning |
| datasets: |
| - gretelai/synthetic_text_to_sql |
| pipeline_tag: text-generation |
| --- |
| |
| # SQL-Gemma3 |
|
|
| `SQL-Gemma3` is a fine-tuned version of `Gemma 3 1B Instruct` for text-to-SQL generation. It was trained on a balanced sampled subset of the [Gretel synthetic_text_to_sql dataset](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) to improve SQL generation from table schema and natural language questions. |
|
|
| ## Model Details |
|
|
| - Base model: `unsloth/gemma-3-1b-it` |
| - Task: Natural language to SQL |
| - Training data: balanced sampled subset of `gretelai/synthetic_text_to_sql` |
| - Reported training loss: `0.201` |
| - Reported test loss: `0.21` |
|
|
| ## Intended Use |
|
|
| This model is intended for: |
|
|
| - Generating SQL queries from schema-aware prompts |
| - Learning and experimentation with text-to-SQL workflows |
| - Prototyping NL-to-SQL assistants |
|
|
| It is not guaranteed to produce correct, executable, or secure SQL for every prompt. Review generated queries before using them in production systems. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| model_id = "vishnurchityala/sql-gemma3" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| |
| messages = [ |
| { |
| "role": "user", |
| "content": ( |
| "CREATE TABLE employees(id INT, name TEXT, salary INT);\n\n" |
| "Find the average salary of all employees." |
| ), |
| } |
| ] |
| |
| inputs = tokenizer( |
| tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True, |
| ), |
| return_tensors="pt", |
| ) |
| |
| outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Limitations |
|
|
| - Performance is summarized here using loss only, not execution accuracy |
| - Output quality depends heavily on schema clarity and prompt format |
| - The model may generate dialect-specific or invalid SQL in some cases |
|
|
| ## Acknowledgements |
|
|
| - Base model: [Gemma 3](https://huggingface.co/google) |
| - Dataset: [Gretel AI synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) |