File size: 3,039 Bytes
b737d9b
cd7a19b
 
b737d9b
cd7a19b
 
 
 
b737d9b
cd7a19b
b737d9b
cd7a19b
 
 
b737d9b
 
cd7a19b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b737d9b
cd7a19b
 
 
 
 
 
 
 
 
b737d9b
cd7a19b
b737d9b
cd7a19b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
language:
- en
tags:
- sql
- text-to-sql
- daraz
- llama3
- unsloth
- ecommerce
license: apache-2.0
datasets:
- custom
base_model: unsloth/llama-3-8b-bnb-4bit
---

# drz-sql-llama3

This model is a fine-tuned version of Llama 3 (8B) for generating SQL queries specific to the Daraz e-commerce platform.

## Model Description

- **Base Model:** Llama 3 8B (4-bit quantized)
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **Training Data:** 20 Daraz-specific SQL query examples
- **Use Case:** Converting natural language questions to SQL queries for Daraz analytics

## Training Details

- **Framework:** Unsloth
- **LoRA Rank:** 16
- **Training Steps:** 100
- **Batch Size:** 2
- **Gradient Accumulation:** 4
- **Learning Rate:** 0.0002

## Key Features

This model understands Daraz-specific:
- Table schemas (e.g., `daraz_cdm.dwd_drz_trd_core_df`, `daraz_cdm.dwd_drz_prd_sku_extension`)
- Business logic (Choice classification, KAM assignments, industry mapping)
- Query patterns (MAX_PT for partitions, DATEADD for date filtering)
- Metrics (GMV, L7/L30 calculations, order types)

## Usage

```python
from unsloth import FastLanguageModel

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Bilal326/drz-sql-llama3",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)

# Generate SQL
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

prompt = alpaca_prompt.format(
    "Generate SQL for the following request:",
    "Get total GMV for last 30 days in Pakistan",
    ""
)

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.5)
print(tokenizer.decode(outputs[0]))
```

## Example Queries

The model can handle:
- Simple aggregations: "Get total GMV and orders for last 30 days"
- Complex joins: "Get seller performance with KAM assignments"
- Time-based analysis: "Show monthly GMV trend by industry"
- Advanced logic: "Compare Choice vs Non-Choice GMV in Crossborder"

## Limitations

- Trained specifically for Daraz schema and business logic
- May not generalize to other SQL dialects or schemas
- Requires Daraz-specific tables to be available

## Training Dataset

Custom dataset of 20 SQL query examples covering:
- Revenue and GMV analysis
- Product performance metrics
- Seller segmentation
- Category and brand analysis
- Time-based trends

## Citation

If you use this model, please cite:

```
@misc{drz-sql-llama3,
  author = {Bilal326},
  title = {drz-sql-llama3: Daraz SQL Generation Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Bilal326/drz-sql-llama3}
}
```

## Acknowledgments

- Built with [Unsloth](https://github.com/unslothai/unsloth)
- Based on Meta's Llama 3
- Fine-tuned for Daraz e-commerce analytics