saadxsalman commited on
Commit
c7e4025
·
verified ·
1 Parent(s): 9b37578

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -12
README.md CHANGED
@@ -1,21 +1,116 @@
 
 
 
1
  ---
 
 
 
2
  base_model: LiquidAI/LFM2.5-350M
3
  tags:
4
- - text-generation-inference
5
- - transformers
 
6
  - unsloth
7
- - lfm2
8
- license: apache-2.0
9
- language:
10
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** saadxsalman
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** LiquidAI/LFM2.5-350M
18
 
19
- This lfm2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
1
+ Below is the complete, detailed, and logical model card formatted in Markdown with the required YAML metadata at the top. You can copy this entire block directly into the `README.md` file of your Hugging Face repository at `saadxsalman/SS-350M-SQL-Strict`.
2
+
3
+ ```yaml
4
  ---
5
+ language:
6
+ - en
7
+ license: apache-2.0
8
  base_model: LiquidAI/LFM2.5-350M
9
  tags:
10
+ - text-to-sql
11
+ - liquid-ai
12
+ - lfm
13
  - unsloth
14
+ - qlora
15
+ - synthetic-data
16
+ - database
17
+ datasets:
18
+ - gretelai/synthetic_text_to_sql
19
+ metrics:
20
+ - loss
21
+ model-index:
22
+ - name: SS-350M-SQL-Strict
23
+ results: []
24
+ ---
25
+ ```
26
+
27
+ # **Model Card: SS-350M-SQL-Strict**
28
+
29
+ ## **Model Summary**
30
+ **SS-350M-SQL-Strict** is a specialized, lightweight LLM fine-tuned for the singular task of **Text-to-SQL translation**. Built upon the **LiquidAI LFM2.5-350M** architecture, this model has been engineered to follow a "Strict" output protocol: it generates **only** raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.
31
+
32
+ By leveraging **4-bit QLoRA** and **Unsloth** optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.
33
+
34
  ---
35
 
36
+ ## **Model Details**
37
+ - **Developed by:** Saad Salman
38
+ - **Architecture:** Liquid Foundation Model (LFM) 2.5
39
+ - **Parameters:** 350 Million
40
+ - **Quantization:** 4-bit (bitsandbytes)
41
+ - **Fine-tuning Method:** QLoRA
42
+ - **Primary Task:** Natural Language to SQL (Strict)
43
+
44
+ ---
45
+
46
+ ## **Training Logic & Parameters**
47
+ The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of **Completion-Only Loss masking**, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.
48
+
49
+ ### **Hyperparameters**
50
+ | Parameter | Value | Description |
51
+ | :--- | :--- | :--- |
52
+ | **Max Steps** | 800 | Optimal convergence point for 350M params |
53
+ | **Learning Rate** | 2e-4 | High enough for rapid logic acquisition |
54
+ | **Batch Size** | 16 | (4 per device with 4 grad accumulation) |
55
+ | **Rank (r)** | 32 | High rank to capture complex SQL logic |
56
+ | **Alpha** | 32 | Scaling factor for LoRA weights |
57
+ | **Optimizer** | AdamW 8-bit | Memory-efficient optimization |
58
+
59
+ ### **Training Curve Analysis**
60
+ The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between **8.0 and 11.0**. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.
61
+
62
+ ---
63
+
64
+ ## **Prompting Specification (ChatML)**
65
+ To ensure the "Strict" behavior, you **must** use the following ChatML format. Failure to use this format may result in hallucinated text.
66
+
67
+ ### **Template**
68
+ ```text
69
+ <|im_start|>system
70
+ You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|>
71
+ <|im_start|>user
72
+ {YOUR_QUESTION}<|im_end|>
73
+ <|im_start|>assistant
74
+ ```
75
+
76
+ ### **Example Input**
77
+ ```text
78
+ <|im_start|>system
79
+ You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|>
80
+ <|im_start|>user
81
+ Find the average price of all 'completed' orders.<|im_end|>
82
+ <|im_start|>assistant
83
+ ```
84
+
85
+ ### **Example Output**
86
+ ```sql
87
+ SELECT AVG(price) FROM orders WHERE status = 'completed';
88
+ ```
89
+
90
+ ---
91
+
92
+ ## **Training Dataset**
93
+ The model was trained on the **Gretel Synthetic SQL** dataset. This dataset is designed to cover:
94
+ * Complex joins and subqueries.
95
+ * Diverse industry domains (Finance, Retail, Tech).
96
+ * Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses.
97
+
98
+ ---
99
+
100
+ ## **Technical Limitations**
101
+ * **Schema Size:** Best suited for schemas with < 20 tables.
102
+ * **Dialect:** Defaulted to standard SQL.
103
+ * **Reasoning:** The model does not "explain" its code; it is a direct translation engine.
104
+
105
+ ---
106
 
107
+ ## **How to Use with Transformers**
108
+ ```python
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
 
111
+ model_path = "saadxsalman/SS-350M-SQL-Strict"
112
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
113
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
114
 
115
+ # Ready for inference!
116
+ ```