---
license: apache-2.0
base_model:
- meta-llama/Llama-3.2-3B
library_name: peft
---
# Model Card for llama3-sql2plan

This model is a fine-tuned version of [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) using LoRA (Low-Rank Adaptation) for the task of generating PostgreSQL execution plans from SQL queries. The model takes SQL queries as input and outputs the corresponding PostgreSQL execution plan in JSON format.

## Model Details

### Model Description

This model is specifically designed to convert SQL queries into PostgreSQL execution plans. It was fine-tuned using Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters, allowing efficient training while maintaining the base model's capabilities.

- **Developed by:** Anirudh Bharadwaj
- **Model type:** Causal Language Model (Decoder-only)
- **Language(s) (NLP):** English (SQL and JSON)
- **License:** Apache 2.0
- **Finetuned from model:** [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)

### Model Sources

- **Repository:** [abharadwaj123/llama3-sql2plan](https://huggingface.co/abharadwaj123/llama3-sql2plan)
- **Base Model:** [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)

## Uses

### Direct Use

This model can be used directly to generate PostgreSQL execution plans from SQL queries. It is intended for:

- Database query optimization analysis
- Understanding query execution strategies
- Educational purposes for learning PostgreSQL query planning
- Database performance analysis tools

### Example Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "abharadwaj123/llama3-sql2plan"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

sql_query = "SELECT * FROM users WHERE age > 25;"
prompt = (
    "Generate the PostgreSQL execution plan in JSON format for the SQL query.\n\n"
    "[QUERY]\n" + sql_query + "\n\n[PLAN]\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
    )

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
plan = generated_text.split("[PLAN]\n")[-1].strip()
print(plan)
```

### Out-of-Scope Use

This model should not be used for:
- Generating actual executable SQL queries (it generates execution plans, not queries)
- Real-time database query execution
- Production database systems without proper validation
- Any use case requiring guaranteed accuracy of execution plans

## Bias, Risks, and Limitations

### Limitations

- **Accuracy**: The model generates execution plans based on training data patterns and may not always produce accurate or optimal plans for all SQL queries.
- **PostgreSQL-specific**: The model is trained specifically for PostgreSQL execution plans and may not be suitable for other database systems.
- **Training Data Scope**: The model was trained on a subset of Stack Overflow data (10,000 samples from ~16,332 available), which may not cover all SQL query patterns.
- **No Database Context**: The model does not have access to actual database schema, indexes, or statistics, which are crucial for accurate execution plan generation.

### Recommendations

Users should:
- Validate generated execution plans against actual PostgreSQL EXPLAIN output
- Not rely solely on this model for critical database optimization decisions
- Use this model as a tool for understanding and learning, not as a replacement for actual database query planning
- Be aware that execution plans may vary based on database configuration, schema, and data distribution

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "abharadwaj123/llama3-sql2plan"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

# Prepare input
sql_query = "SELECT * FROM users WHERE age > 25;"
prompt = (
    "Generate the PostgreSQL execution plan in JSON format for the SQL query.\n\n"
    "[QUERY]\n" + sql_query + "\n\n[PLAN]\n"
)

# Generate plan
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
)

# Extract plan
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
plan = generated_text.split("[PLAN]\n")[-1].strip()
```

## Training Details

### Training Data

The model was trained on a dataset derived from Stack Overflow data (`stackoverflow_n18147.csv`), containing SQL queries and their corresponding PostgreSQL execution plans in JSON format.

- **Total samples in dataset:** 18,147
- **Training samples used:** 10,000 (sampled from first 90% of dataset, ~16,332 samples)
- **Sampling method:** Random sampling with random_state=42
- **Data format:** SQL query text paired with PostgreSQL execution plan JSON

The training data was filtered to remove rows with missing `sql_text` or `plan_json` values.

### Training Procedure

#### Preprocessing

1. **Data Loading**: Loaded CSV file and filtered out rows with missing SQL text or plan JSON
2. **Data Splitting**: Used first 90% of dataset as training pool, then randomly sampled 10,000 examples
3. **Formatting**: Each example was formatted with a prompt template:
   ```
   Generate the PostgreSQL execution plan in JSON format for the SQL query.
   
   [QUERY]
   {sql_text}
   
   [PLAN]
   {plan_json}
   ```
4. **Tokenization**: 
   - Maximum sequence length: 2048 tokens
   - Input prompt tokens were masked in labels (set to -100) to only train on plan generation
   - Padding to max_length

#### Training Hyperparameters

- **Training regime:** FP16 mixed precision training
- **LoRA Configuration:**
  - `r`: 64
  - `lora_alpha`: 32
  - `lora_dropout`: 0.05
  - `target_modules`: ["q_proj", "k_proj", "v_proj", "o_proj"]
- **Training Arguments:**
  - `per_device_train_batch_size`: 2
  - `gradient_accumulation_steps`: 8
  - `effective_batch_size`: 16
  - `learning_rate`: 1e-5
  - `warmup_ratio`: 0.03
  - `num_train_epochs`: 2
  - `gradient_checkpointing`: True
  - `logging_steps`: 20
  - `save_strategy`: "epoch"


#### Testing Data

The remaining 10% of the original dataset (~1,815 samples) was held out and could be used for evaluation.

## Model Examination

The model uses LoRA (Low-Rank Adaptation) fine-tuning, which allows efficient training by only updating a small number of parameters (low-rank matrices) while keeping the base model weights frozen. This approach:

- Reduces memory requirements during training
- Enables faster training compared to full fine-tuning
- Maintains the base model's general capabilities
- Allows easy merging of adapters with the base model


## Technical Specifications

### Model Architecture and Objective

- **Architecture:** Transformer-based decoder-only language model (Llama-3.2-3B)
- **Objective:** Causal language modeling with masked input tokens
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation) via PEFT
- **Base Model Parameters:** ~3 billion
- **Trainable Parameters:** Significantly reduced via LoRA (exact count depends on LoRA rank and target modules)


## Model Card Authors

Anirudh Bharadwaj

## Model Card Contact

For questions or issues, please contact through the Hugging Face model repository: [abharadwaj123/llama3-sql2plan](https://huggingface.co/abharadwaj123/llama3-sql2plan)