File size: 3,886 Bytes
8dd12fb c7e4025 8dd12fb 904ccb7 8dd12fb 904ccb7 c7e4025 8dd12fb c7e4025 8dd12fb c7e4025 8dd12fb c7e4025 8dd12fb c7e4025 8dd12fb c7e4025 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 | ---
language:
- en
license: apache-2.0
base_model: LiquidAI/LFM2.5-350M
pipeline_tag: text-generation
tags:
- text-to-sql
- liquid-ai
- lfm
- unsloth
- qlora
- synthetic-data
- database
datasets:
- gretelai/synthetic_text_to_sql
metrics:
- loss
model-index:
- name: SS-350M-SQL-Strict
results: []
---
# **Model Card: SS-350M-SQL-Strict**
## **Model Summary**
**SS-350M-SQL-Strict** is a specialized, lightweight LLM fine-tuned for the singular task of **Text-to-SQL translation**. Built upon the **LiquidAI LFM2.5-350M** architecture, this model has been engineered to follow a "Strict" output protocol: it generates **only** raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.
By leveraging **4-bit QLoRA** and **Unsloth** optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.
---
## **Model Details**
- **Developed by:** Saad Salman
- **Architecture:** Liquid Foundation Model (LFM) 2.5
- **Parameters:** 350 Million
- **Quantization:** 4-bit (bitsandbytes)
- **Fine-tuning Method:** QLoRA
- **Primary Task:** Natural Language to SQL (Strict)
---
## **Training Logic & Parameters**
The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of **Completion-Only Loss masking**, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.
### **Hyperparameters**
| Parameter | Value | Description |
| :--- | :--- | :--- |
| **Max Steps** | 800 | Optimal convergence point for 350M params |
| **Learning Rate** | 2e-4 | High enough for rapid logic acquisition |
| **Batch Size** | 16 | (4 per device with 4 grad accumulation) |
| **Rank (r)** | 32 | High rank to capture complex SQL logic |
| **Alpha** | 32 | Scaling factor for LoRA weights |
| **Optimizer** | AdamW 8-bit | Memory-efficient optimization |
### **Training Curve Analysis**
The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between **8.0 and 11.0**. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.
---
## **Prompting Specification (ChatML)**
To ensure the "Strict" behavior, you **must** use the following ChatML format. Failure to use this format may result in hallucinated text.
### **Template**
```text
<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
<|im_start|>user
{YOUR_QUESTION}<|im_end|>
<|im_start|>assistant
```
### **Example Input**
```text
<|im_start|>system
You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
<|im_start|>user
Find the average price of all 'completed' orders.<|im_end|>
<|im_start|>assistant
```
### **Example Output**
```sql
SELECT AVG(price) FROM orders WHERE status = 'completed';
```
---
## **Training Dataset**
The model was trained on the **Gretel Synthetic SQL** dataset. This dataset is designed to cover:
* Complex joins and subqueries.
* Diverse industry domains (Finance, Retail, Tech).
* Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses.
---
## **Technical Limitations**
* **Schema Size:** Best suited for schemas with < 20 tables.
* **Dialect:** Defaulted to standard SQL.
* **Reasoning:** The model does not "explain" its code; it is a direct translation engine.
---
## **How to Use with Transformers**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "saadxsalman/SS-350M-SQL-Strict"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
# Ready for inference!
``` |