Update README.md
Browse files
README.md
CHANGED
|
@@ -1,21 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model: LiquidAI/LFM2.5-350M
|
| 3 |
tags:
|
| 4 |
-
- text-
|
| 5 |
-
-
|
|
|
|
| 6 |
- unsloth
|
| 7 |
-
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
| 1 |
+
Below is the complete, detailed, and logical model card formatted in Markdown with the required YAML metadata at the top. You can copy this entire block directly into the `README.md` file of your Hugging Face repository at `saadxsalman/SS-350M-SQL-Strict`.
|
| 2 |
+
|
| 3 |
+
```yaml
|
| 4 |
---
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
license: apache-2.0
|
| 8 |
base_model: LiquidAI/LFM2.5-350M
|
| 9 |
tags:
|
| 10 |
+
- text-to-sql
|
| 11 |
+
- liquid-ai
|
| 12 |
+
- lfm
|
| 13 |
- unsloth
|
| 14 |
+
- qlora
|
| 15 |
+
- synthetic-data
|
| 16 |
+
- database
|
| 17 |
+
datasets:
|
| 18 |
+
- gretelai/synthetic_text_to_sql
|
| 19 |
+
metrics:
|
| 20 |
+
- loss
|
| 21 |
+
model-index:
|
| 22 |
+
- name: SS-350M-SQL-Strict
|
| 23 |
+
results: []
|
| 24 |
+
---
|
| 25 |
+
```
|
| 26 |
+
|
| 27 |
+
# **Model Card: SS-350M-SQL-Strict**
|
| 28 |
+
|
| 29 |
+
## **Model Summary**
|
| 30 |
+
**SS-350M-SQL-Strict** is a specialized, lightweight LLM fine-tuned for the singular task of **Text-to-SQL translation**. Built upon the **LiquidAI LFM2.5-350M** architecture, this model has been engineered to follow a "Strict" output protocol: it generates **only** raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.
|
| 31 |
+
|
| 32 |
+
By leveraging **4-bit QLoRA** and **Unsloth** optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.
|
| 33 |
+
|
| 34 |
---
|
| 35 |
|
| 36 |
+
## **Model Details**
|
| 37 |
+
- **Developed by:** Saad Salman
|
| 38 |
+
- **Architecture:** Liquid Foundation Model (LFM) 2.5
|
| 39 |
+
- **Parameters:** 350 Million
|
| 40 |
+
- **Quantization:** 4-bit (bitsandbytes)
|
| 41 |
+
- **Fine-tuning Method:** QLoRA
|
| 42 |
+
- **Primary Task:** Natural Language to SQL (Strict)
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## **Training Logic & Parameters**
|
| 47 |
+
The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of **Completion-Only Loss masking**, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.
|
| 48 |
+
|
| 49 |
+
### **Hyperparameters**
|
| 50 |
+
| Parameter | Value | Description |
|
| 51 |
+
| :--- | :--- | :--- |
|
| 52 |
+
| **Max Steps** | 800 | Optimal convergence point for 350M params |
|
| 53 |
+
| **Learning Rate** | 2e-4 | High enough for rapid logic acquisition |
|
| 54 |
+
| **Batch Size** | 16 | (4 per device with 4 grad accumulation) |
|
| 55 |
+
| **Rank (r)** | 32 | High rank to capture complex SQL logic |
|
| 56 |
+
| **Alpha** | 32 | Scaling factor for LoRA weights |
|
| 57 |
+
| **Optimizer** | AdamW 8-bit | Memory-efficient optimization |
|
| 58 |
+
|
| 59 |
+
### **Training Curve Analysis**
|
| 60 |
+
The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between **8.0 and 11.0**. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## **Prompting Specification (ChatML)**
|
| 65 |
+
To ensure the "Strict" behavior, you **must** use the following ChatML format. Failure to use this format may result in hallucinated text.
|
| 66 |
+
|
| 67 |
+
### **Template**
|
| 68 |
+
```text
|
| 69 |
+
<|im_start|>system
|
| 70 |
+
You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
|
| 71 |
+
<|im_start|>user
|
| 72 |
+
{YOUR_QUESTION}<|im_end|>
|
| 73 |
+
<|im_start|>assistant
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
### **Example Input**
|
| 77 |
+
```text
|
| 78 |
+
<|im_start|>system
|
| 79 |
+
You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
|
| 80 |
+
<|im_start|>user
|
| 81 |
+
Find the average price of all 'completed' orders.<|im_end|>
|
| 82 |
+
<|im_start|>assistant
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
### **Example Output**
|
| 86 |
+
```sql
|
| 87 |
+
SELECT AVG(price) FROM orders WHERE status = 'completed';
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## **Training Dataset**
|
| 93 |
+
The model was trained on the **Gretel Synthetic SQL** dataset. This dataset is designed to cover:
|
| 94 |
+
* Complex joins and subqueries.
|
| 95 |
+
* Diverse industry domains (Finance, Retail, Tech).
|
| 96 |
+
* Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses.
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
## **Technical Limitations**
|
| 101 |
+
* **Schema Size:** Best suited for schemas with < 20 tables.
|
| 102 |
+
* **Dialect:** Defaulted to standard SQL.
|
| 103 |
+
* **Reasoning:** The model does not "explain" its code; it is a direct translation engine.
|
| 104 |
+
|
| 105 |
+
---
|
| 106 |
|
| 107 |
+
## **How to Use with Transformers**
|
| 108 |
+
```python
|
| 109 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 110 |
|
| 111 |
+
model_path = "saadxsalman/SS-350M-SQL-Strict"
|
| 112 |
+
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 113 |
+
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
|
| 114 |
|
| 115 |
+
# Ready for inference!
|
| 116 |
+
```
|