File size: 3,101 Bytes
606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 606badb 8ed3943 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | ---
language:
- en
license: llama3.2
base_model: unsloth/Llama-3.2-3B
tags:
- text-generation
- sql
- distributed-databases
- qlora
- peft
- fine-tuned
- e-commerce
pipeline_tag: text-generation
---
# Llama 3.2 3B — E-commerce Distributed SQL
Fine-tuned version of Llama 3.2 3B that converts natural language questions
into SQL queries for distributed e-commerce databases.
## Example
**Input:**
```
### Instruction:
Convert to distributed SQL
### Input:
Find all customers who spent more than 1000 euros in Germany
### Response:
```
**Output:**
```sql
SELECT * FROM customers
WHERE country = 'Germany' AND amount > 1000;
```
## Model Details
| Property | Value |
|----------|-------|
| Base model | Llama 3.2 3B |
| Fine-tuning method | QLoRA (4-bit quantization + LoRA) |
| LoRA rank | 16 |
| Trainable parameters | 0.14% |
| Training GPU | Google Colab T4 (free tier) |
| Training time | ~20 minutes |
| Dataset size | 25 examples |
| Training epochs | 3 |
## Training Details
Fine-tuned using QLoRA — 4-bit NF4 quantization with LoRA adapters on the
attention layers (`q_proj`, `v_proj`). This reduced memory requirements enough
to train on a free Colab T4 GPU (15GB VRAM) in under 20 minutes, while only
updating 0.14% of parameters.
**Libraries used:** HuggingFace Transformers, PEFT, TRL (SFTTrainer),
bitsandbytes, datasets
## Dataset
25 natural language → SQL pairs covering distributed e-commerce scenarios:
- Orders across regions and shards
- Inventory across warehouses
- Customer analytics and segmentation
- Revenue aggregations
- JOIN queries across fragmented tables
**Prompt format used during training:**
```
### Instruction:
Convert to distributed SQL
### Input:
{natural language question}
### Response:
{SQL query}
```
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
# Load base model + adapter
base = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B")
model = PeftModel.from_pretrained(base, "haricharanhl22/ecommerce-distributed-sql")
tokenizer = AutoTokenizer.from_pretrained("haricharanhl22/ecommerce-distributed-sql")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
query = """### Instruction:
Convert to distributed SQL
### Input:
Find top 5 customers by total order value
### Response:"""
result = pipe(query, max_new_tokens=100, do_sample=False)
print(result[0]["generated_text"])
```
## Limitations
- Trained on a small dataset (25 examples) — works best for common query patterns
- Optimized for e-commerce schemas (orders, customers, products, inventory)
- May not generalize well to very complex multi-level nested subqueries
- SQL dialect closest to standard SQL / SQLite
## Author
**Hari Charan Hosakote Lokesh**
M.Sc. Digital Engineering — Otto-von-Guericke-Universität Magdeburg
- GitHub: [haricharanhl22](https://github.com/haricharanhl22)
- LinkedIn: [haricharanhl22](https://linkedin.com/in/haricharanhl22)
- Live project: [ai-bewerbung-assistant.vercel.app](https://ai-bewerbung-assistant.vercel.app) |