LoRA Adapters for `sqlchat` Model

This repository contains the LoRA (Low-Rank Adaptation) adapters for the nnul/sqlchat model. These adapters represent the fine-tuned "knowledge layer" that specializes the base model for Text-to-SQL tasks.

Using these adapters provides maximum flexibility. You can load them on top of the original base model to replicate the sqlchat model, or use them as a starting point for further fine-tuning. This approach is highly efficient for experimentation and allows for easy conversion to various quantized formats (like GGUF) with minimal quality loss.

Model Details

Base Model: Qwen/Qwen3-1.7B
Fine-Tuning Library: Unsloth
Technique: LoRA (Low-Rank Adaptation)
- Rank (r): 32
- Alpha (lora_alpha): 32
Training Dataset: nnul/sql-chat-dataset (a combination of b-mc2/sql-create-context and gretelai/synthetic_text_to_sql).

How to Use These Adapters

To use these LoRA adapters, you must load them on top of the original base model using the Unsloth library. This ensures all performance optimizations are correctly applied.

Prerequisites

First, install the necessary libraries.

pip install unsloth
pip install "torch>=2.3.1"

Running Inference with LoRA Adapters

Here is a Python script demonstrating how to load the base model and apply these LoRA adapters for inference.

import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer

# When loading LoRA adapters, you must specify the base model they were trained on.
# Unsloth will first load the 4-bit base model, then fuse these adapters into it.
print("Loading base model and applying sqlchat-lora adapters...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="nnul/sqlchat-lora", # YOUR LoRA adapter repository
    max_seq_length=4096,
    dtype=None,
    load_in_4bit=True,
)
print("Model and adapters loaded successfully.")

# Optimize the model for the fastest possible inference.
FastLanguageModel.for_inference(model)

def generate_sql(instruction: str, context: str = ""):
    """
    A helper function to generate SQL from a natural language prompt.
    """
    prompt = tokenizer.apply_chat_template(
        [
            {"role": "system", "content": "You are a helpful assistant that generates SQL queries based on natural language questions and database schemas."},
            {"role": "user", "content": f"### Instruction:\n{instruction}\n\n### Context:\n{context}"},
        ],
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False, # Ensures direct SQL output
    )

    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
    text_streamer = TextStreamer(tokenizer, skip_prompt=True, clean_up_tokenization_spaces=True)

    print(f"User Instruction: {instruction}")
    print("\nModel Output:")
    print("---------------------------------")
    _ = model.generate(
        **inputs,
        streamer=text_streamer,
        max_new_tokens=256,
        do_sample=False, # Use greedy decoding for deterministic output
        use_cache=True,
    )
    print("---------------------------------\n")

# --- Example Usage ---
generate_sql(
    instruction="Which department has the most number of employees?",
    context="CREATE TABLE department (name VARCHAR, num_employees INTEGER)"
)

Merging the Adapters

If you wish to create a standalone, merged model from these adapters (as was done for nnul/sqlchat), you can do so easily.

# Load the model and adapters as shown above
model, tokenizer = FastLanguageModel.from_pretrained(model_name="nnul/sqlchat-lora", ...)

# Merge and save locally
model.save_pretrained_merged("sqlchat_merged_4bit", tokenizer, save_method="merged_4bit_forced")

# Or, push the merged model directly to a new Hub repository
# model.push_to_hub_merged("your-username/your-new-merged-repo", tokenizer, save_method="merged_4bit_forced")

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW