Upload supervised-fine-tuned text-to-SQL model

9fcb518 verified about 2 months ago

6.03 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: peft
	tags:
	- text-to-sql
	- sql-generation
	- code-generation
	- llama
	- fine-tuned
	- lora
	- text2sql
	- natural-language-to-sql
	datasets:
	- chrisjcc/text-to-sql-spider-dataset
	base_model: meta-llama/Llama-3.1-8B-Instruct
	pipeline_tag: text-generation
	---

	# Llama-3.1-8B-Instruct-text-to-sql-adapter

	This is a LoRA Adapter fine-tuned from [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) for text-to-SQL generation tasks.

	## 📋 Model Description

	- Base Model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
	- Model Type: LoRA Adapter
	- Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
	- Training Dataset: chrisjcc/text-to-sql-spider-dataset
	- Task: Convert natural language questions into SQL queries
	- Language: English
	- License: apache-2.0

	## 🎯 Intended Use

	This model is designed to translate natural language questions into SQL queries for database interaction. It works best when provided with:
	1. A database schema (CREATE TABLE statements)
	2. A natural language question about the data

	## 🚀 Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-3.1-8B-Instruct",
	device_map="auto",
	torch_dtype="auto"
	)

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")
	tokenizer = AutoTokenizer.from_pretrained("chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter")

	# For inference, merge adapter for better performance (optional)
	model = model.merge_and_unload()

	# Create text generation pipeline
	pipe = pipeline(
	"text-generation",
	model=model,
	tokenizer=tokenizer,
	max_new_tokens=256,
	do_sample=False,
	)

	# Example usage
	schema = """
	CREATE TABLE users (
	id INTEGER PRIMARY KEY,
	name VARCHAR(100),
	email VARCHAR(100),
	created_at TIMESTAMP
	);
	"""

	question = "Show me all users who registered in the last 7 days"

	messages = [
	{
	"role": "system",
	"content": f"You are a text to SQL translator.\n\nSCHEMA:\n{schema}"
	},
	{"role": "user", "content": question}
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	outputs = pipe(prompt)
	sql_query = outputs[0]['generated_text'][len(prompt):].strip()

	print("Generated SQL:", sql_query)
	```

	## ⚙️ Training Configuration

	### Model Architecture
	- LoRA Rank (r): 16
	- LoRA Alpha: 32
	- LoRA Dropout: 0.1
	- Target Modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
	- Max Sequence Length: 2048

	### Training Hyperparameters
	- Number of Epochs: 5
	- Per-Device Batch Size: 1
	- Gradient Accumulation Steps: 8
	- Effective Batch Size: 8
	- Learning Rate: 5e-05
	- Learning Rate Scheduler: Constant
	- Optimizer: AdamW (torch fused)
	- Weight Decay: 0
	- Warmup Ratio: 0.03
	- Max Gradient Norm: 1.0
	- Precision: bfloat16

	### Training Infrastructure
	- Hardware: NVIDIA GPU with bfloat16 support
	- Framework: Transformers + PEFT + TRL
	- Gradient Checkpointing: Enabled
	- Flash Attention: Enabled

	## 📊 Training Details

	The model was fine-tuned using Supervised Fine-Tuning (SFT) with the following approach:

	1. Dataset Format: Chat template with system/user/assistant roles
	2. System Prompt: Includes database schema for context
	3. User Prompt: Natural language question
	4. Assistant Response: SQL query

	### Example Training Sample

	```json
	{
	"messages": [
	{
	"role": "system",
	"content": "You are a text to SQL translator...\n\nSCHEMA:\nCREATE TABLE..."
	},
	{
	"role": "user",
	"content": "Show me all customers from New York"
	},
	{
	"role": "assistant",
	"content": "SELECT * FROM customers WHERE city = 'New York';"
	}
	]
	}
	```

	## 🎓 Model Performance

	The model has been trained to generate syntactically correct SQL queries for various database schemas. Performance may vary based on:
	- Complexity of the database schema
	- Ambiguity in the natural language question
	- Similarity to training data

	## ⚠️ Limitations

	- Schema Knowledge: The model must be provided with the database schema at inference time
	- SQL Dialect: Primarily trained on standard SQL; may require adjustments for specific database systems (PostgreSQL, MySQL, etc.)
	- Complex Queries: Performance may degrade on very complex multi-join queries or advanced SQL features
	- Ambiguity: May struggle with ambiguous natural language questions
	- Context Length: Limited to 2048 tokens (including schema + question)

	## 🔄 Version History

	- v1.0: Initial release with 5 epochs of training

	## 📚 Citation

	If you use this model in your research or application, please cite:

	```bibtex
	@misc{chrisjcc_Llama_3.1_8B_Instruct_text_to_sql_adapter,
	author = {Christian Contreras Campana},
	title = {Llama-3.1-8B-Instruct-text-to-sql-adapter: Fine-tuned Text-to-SQL Model},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/chrisjcc/Llama-3.1-8B-Instruct-text-to-sql-adapter}}
	}
	```

	## 📄 License

	This model is released under the APACHE-2.0 license. The base model [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) has its own license terms.

	## 🙏 Acknowledgments

	- Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
	- Training framework: Hugging Face Transformers, PEFT, TRL
	- Dataset: chrisjcc/text-to-sql-spider-dataset

	## 🤝 Contact

	For questions or feedback, please open an issue on the model repository.

	---

	Model Type: LoRA adapter weights
	Training Date: 2025
	Model Size: ~8B parameters