sqlchat-lora / README.md

Update README.md

b976900 verified 6 months ago

4.04 kB


	# LoRA Adapters for `sqlchat` Model

	This repository contains the LoRA (Low-Rank Adaptation) adapters for the `nnul/sqlchat` model. These adapters represent the fine-tuned "knowledge layer" that specializes the base model for Text-to-SQL tasks.

	Using these adapters provides maximum flexibility. You can load them on top of the original base model to replicate the `sqlchat` model, or use them as a starting point for further fine-tuning. This approach is highly efficient for experimentation and allows for easy conversion to various quantized formats (like GGUF) with minimal quality loss.

	## Model Details

	* Base Model: `Qwen/Qwen3-1.7B`
	* Fine-Tuning Library: [Unsloth](https://github.com/unslothai/unsloth)
	* Technique: LoRA (Low-Rank Adaptation)
	* Rank (`r`): 32
	* Alpha (`lora_alpha`): 32
	* Training Dataset: `nnul/sql-chat-dataset` (a combination of `b-mc2/sql-create-context` and `gretelai/synthetic_text_to_sql`).

	## How to Use These Adapters

	To use these LoRA adapters, you must load them on top of the original base model using the Unsloth library. This ensures all performance optimizations are correctly applied.

	### Prerequisites

	First, install the necessary libraries.

	```bash
	pip install unsloth
	pip install "torch>=2.3.1"
	```

	### Running Inference with LoRA Adapters

	Here is a Python script demonstrating how to load the base model and apply these LoRA adapters for inference.

	```python
	import torch
	from unsloth import FastLanguageModel
	from transformers import TextStreamer

	# When loading LoRA adapters, you must specify the base model they were trained on.
	# Unsloth will first load the 4-bit base model, then fuse these adapters into it.
	print("Loading base model and applying sqlchat-lora adapters...")
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name="nnul/sqlchat-lora", # YOUR LoRA adapter repository
	max_seq_length=4096,
	dtype=None,
	load_in_4bit=True,
	)
	print("Model and adapters loaded successfully.")

	# Optimize the model for the fastest possible inference.
	FastLanguageModel.for_inference(model)

	def generate_sql(instruction: str, context: str = ""):
	"""
	A helper function to generate SQL from a natural language prompt.
	"""
	prompt = tokenizer.apply_chat_template(
	[
	{"role": "system", "content": "You are a helpful assistant that generates SQL queries based on natural language questions and database schemas."},
	{"role": "user", "content": f"### Instruction:\n{instruction}\n\n### Context:\n{context}"},
	],
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=False, # Ensures direct SQL output
	)

	inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
	text_streamer = TextStreamer(tokenizer, skip_prompt=True, clean_up_tokenization_spaces=True)

	print(f"User Instruction: {instruction}")
	print("\nModel Output:")
	print("---------------------------------")
	_ = model.generate(
	**inputs,
	streamer=text_streamer,
	max_new_tokens=256,
	do_sample=False, # Use greedy decoding for deterministic output
	use_cache=True,
	)
	print("---------------------------------\n")

	# --- Example Usage ---
	generate_sql(
	instruction="Which department has the most number of employees?",
	context="CREATE TABLE department (name VARCHAR, num_employees INTEGER)"
	)
	```

	## Merging the Adapters

	If you wish to create a standalone, merged model from these adapters (as was done for `nnul/sqlchat`), you can do so easily.

	```python
	# Load the model and adapters as shown above
	model, tokenizer = FastLanguageModel.from_pretrained(model_name="nnul/sqlchat-lora", ...)

	# Merge and save locally
	model.save_pretrained_merged("sqlchat_merged_4bit", tokenizer, save_method="merged_4bit_forced")

	# Or, push the merged model directly to a new Hub repository
	# model.push_to_hub_merged("your-username/your-new-merged-repo", tokenizer, save_method="merged_4bit_forced")
	```