Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,124 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# TinyLlama for Text-to-SQL
|
| 6 |
+
|
| 7 |
+
## Problem Statement
|
| 8 |
+
|
| 9 |
+
I need a small generative model that can generate SQL code in response to user queries while avoiding any additional commentary. This will help reduce operational costs, increase throughput, and lower latency.
|
| 10 |
+
|
| 11 |
+
## Solution
|
| 12 |
+
|
| 13 |
+
### Part 1: Initial Experimentation (Refer to `Run_Tinyllama_Chat.ipynb`)
|
| 14 |
+
|
| 15 |
+
#### Step 1: Using an Off-the-Shelf Model
|
| 16 |
+
|
| 17 |
+
I started with the TinyLlama model. Below is an example of the initial request and response:
|
| 18 |
+
|
| 19 |
+
```
|
| 20 |
+
<|system|>
|
| 21 |
+
CREATE TABLE head(age INTEGER)</s>
|
| 22 |
+
<|user|>
|
| 23 |
+
How many heads of the departments are older than 56?</s>
|
| 24 |
+
<|assistant|>
|
| 25 |
+
I don't have access to the latest data or the current headcount of the departments...
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
The model did not return the expected SQL query, which is understandable given the lack of context.
|
| 29 |
+
|
| 30 |
+
#### Step 2: Prompt Engineering
|
| 31 |
+
|
| 32 |
+
I attempted prompt engineering by adding more details to the context:
|
| 33 |
+
|
| 34 |
+
```
|
| 35 |
+
<|system|>
|
| 36 |
+
You can only reply in SQL query language. Provide only SQL for the user's query given this context --> CREATE TABLE head(age INTEGER)</s>
|
| 37 |
+
<|user|>
|
| 38 |
+
How many heads of the departments are older than 56?</s>
|
| 39 |
+
<|assistant|>
|
| 40 |
+
SELECT COUNT(*) FROM head WHERE age > 56
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
The model generated the SQL query but included additional commentary, which I wanted to avoid.
|
| 44 |
+
|
| 45 |
+
#### Step 3: Further Refinement
|
| 46 |
+
|
| 47 |
+
Despite additional prompt engineering efforts, the model still produced unwanted explanations:
|
| 48 |
+
|
| 49 |
+
```
|
| 50 |
+
<|assistant|>
|
| 51 |
+
To calculate the number of heads of the departments older than 56, you can use the following SQL query:
|
| 52 |
+
|
| 53 |
+
SELECT COUNT(*) FROM departments WHERE age > 56;
|
| 54 |
+
|
| 55 |
+
In the above query, "departments" is the name of the table and "age" is the column name...
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
This led me to consider fine-tuning the model.
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
### Part 2: Fine-Tuning the Model
|
| 63 |
+
|
| 64 |
+
I decided to fine-tune TinyLlama for better SQL-specific responses. Below are the steps to replicate the fine-tuning process.
|
| 65 |
+
|
| 66 |
+
#### Setup Environment and Run Fine-Tuning Job on RunPod.io
|
| 67 |
+
|
| 68 |
+
```bash
|
| 69 |
+
#!/bin/bash
|
| 70 |
+
pip install -q accelerate transformers peft deepspeed bitsandbytes --no-build-isolation
|
| 71 |
+
pip install trl==0.9.6
|
| 72 |
+
pip install packaging ninja
|
| 73 |
+
MAX_JOBS=16 pip install flash-attn==2.6.0.post1 --no-build-isolation
|
| 74 |
+
git clone https://github.com/Rajesh-Nair/llm-text2sql-finetuning
|
| 75 |
+
cd llm-text2sql-finetuning
|
| 76 |
+
accelerate launch --config_file "ds_z3_qlora_config.yaml" train.py run_config.yaml | tee accelerate_output.log
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
#### Key Components of Fine-Tuning
|
| 80 |
+
|
| 81 |
+
1. **Dataset**: Utilized `b-mc2/sql-create-context` from Hugging Face for fine-tuning. High-quality data is essential for improving model performance.
|
| 82 |
+
2. **Accelerate**: Leveraged `accelerate` to enhance training speed and minimize boilerplate code.
|
| 83 |
+
3. **Distributed Training**:
|
| 84 |
+
- Deployed across two GPUs on a single node via RunPod.io.
|
| 85 |
+
- Hardware specifications: L4 GPU, PyTorch 2.1, Python 3.10, CUDA 11.8 (Ubuntu image).
|
| 86 |
+
4. **QLoRA**:
|
| 87 |
+
- Applied QLoRA for memory-efficient fine-tuning.
|
| 88 |
+
- Configured LoRA with 8-rank matrices for all linear layers.
|
| 89 |
+
5. **DeepSpeed Zero3**: Implemented for optimized sharding of optimizers, gradients, and parameters.
|
| 90 |
+
6. **Mixed Precision**: Utilized to accelerate training and improve GPU efficiency.
|
| 91 |
+
7. **Batch Size & Gradient Accumulation**:
|
| 92 |
+
- Set batch size per device to 4.
|
| 93 |
+
- Applied gradient accumulation every 2 steps for optimal performance.
|
| 94 |
+
- Increasing batch size beyond this sometimes led to GPU communication bottlenecks.
|
| 95 |
+
8. **Gradient Clipping**: Enabled to prevent unexpected exploding gradients.
|
| 96 |
+
9. **Training Duration & Cost**:
|
| 97 |
+
- Each epoch took approximately 1 hour.
|
| 98 |
+
- Training was force-stopped after 3 epochs due to negligible improvements in training loss.
|
| 99 |
+
- Total fine-tuning cost on RunPod: under \$3.
|
| 100 |
+
10. **Training Logs**: Captured logs in `accelerate_outlog.log` for future analysis and reference.
|
| 101 |
+
|
| 102 |
+
#### Serving the Fine-Tuned Model
|
| 103 |
+
|
| 104 |
+
Refer to `Run_ft_Tinyllama_Chat.ipynb` for deploying the fine-tuned model.
|
| 105 |
+
|
| 106 |
+
Example Query and Response:
|
| 107 |
+
|
| 108 |
+
```
|
| 109 |
+
<|system|>
|
| 110 |
+
CREATE TABLE head(age INTEGER)</s>
|
| 111 |
+
<|user|>
|
| 112 |
+
How many heads of the departments are older than 56?</s>
|
| 113 |
+
<|assistant|>
|
| 114 |
+
SELECT COUNT(*) FROM head WHERE age > 56
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
The fine-tuned model now returns only the SQL query, as intended.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
### Final Model & Deployment
|
| 122 |
+
|
| 123 |
+
After fine-tuning, I merged the trained adapter with the base model and uploaded it to Hugging Face: Here is the full code : 🔗 [**TinyLlama-1.1B-Chat-Text2SQL-v1.0**](https://github.com/Rajesh-Nair/llm-text2sql-finetuning)
|
| 124 |
+
|