| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: LiquidAI/LFM2.5-350M |
| pipeline_tag: text-generation |
| tags: |
| - text-to-sql |
| - liquid-ai |
| - lfm |
| - unsloth |
| - qlora |
| - synthetic-data |
| - database |
| datasets: |
| - gretelai/synthetic_text_to_sql |
| metrics: |
| - loss |
| model-index: |
| - name: SS-350M-SQL-Strict |
| results: [] |
| --- |
| |
| # **Model Card: SS-350M-SQL-Strict** |
|
|
| ## **Model Summary** |
| **SS-350M-SQL-Strict** is a specialized, lightweight LLM fine-tuned for the singular task of **Text-to-SQL translation**. Built upon the **LiquidAI LFM2.5-350M** architecture, this model has been engineered to follow a "Strict" output protocol: it generates **only** raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models. |
|
|
| By leveraging **4-bit QLoRA** and **Unsloth** optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments. |
|
|
| --- |
|
|
| ## **Model Details** |
| - **Developed by:** Saad Salman |
| - **Architecture:** Liquid Foundation Model (LFM) 2.5 |
| - **Parameters:** 350 Million |
| - **Quantization:** 4-bit (bitsandbytes) |
| - **Fine-tuning Method:** QLoRA |
| - **Primary Task:** Natural Language to SQL (Strict) |
|
|
| --- |
|
|
| ## **Training Logic & Parameters** |
| The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of **Completion-Only Loss masking**, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax. |
|
|
| ### **Hyperparameters** |
| | Parameter | Value | Description | |
| | :--- | :--- | :--- | |
| | **Max Steps** | 800 | Optimal convergence point for 350M params | |
| | **Learning Rate** | 2e-4 | High enough for rapid logic acquisition | |
| | **Batch Size** | 16 | (4 per device with 4 grad accumulation) | |
| | **Rank (r)** | 32 | High rank to capture complex SQL logic | |
| | **Alpha** | 32 | Scaling factor for LoRA weights | |
| | **Optimizer** | AdamW 8-bit | Memory-efficient optimization | |
|
|
| ### **Training Curve Analysis** |
| The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between **8.0 and 11.0**. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic. |
|
|
| --- |
|
|
| ## **Prompting Specification (ChatML)** |
| To ensure the "Strict" behavior, you **must** use the following ChatML format. Failure to use this format may result in hallucinated text. |
|
|
| ### **Template** |
| ```text |
| <|im_start|>system |
| You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<|im_end|> |
| <|im_start|>user |
| {YOUR_QUESTION}<|im_end|> |
| <|im_start|>assistant |
| ``` |
|
|
| ### **Example Input** |
| ```text |
| <|im_start|>system |
| You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<|im_end|> |
| <|im_start|>user |
| Find the average price of all 'completed' orders.<|im_end|> |
| <|im_start|>assistant |
| ``` |
|
|
| ### **Example Output** |
| ```sql |
| SELECT AVG(price) FROM orders WHERE status = 'completed'; |
| ``` |
|
|
| --- |
|
|
| ## **Training Dataset** |
| The model was trained on the **Gretel Synthetic SQL** dataset. This dataset is designed to cover: |
| * Complex joins and subqueries. |
| * Diverse industry domains (Finance, Retail, Tech). |
| * Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses. |
|
|
| --- |
|
|
| ## **Technical Limitations** |
| * **Schema Size:** Best suited for schemas with < 20 tables. |
| * **Dialect:** Defaulted to standard SQL. |
| * **Reasoning:** The model does not "explain" its code; it is a direct translation engine. |
|
|
| --- |
|
|
| ## **How to Use with Transformers** |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_path = "saadxsalman/SS-350M-SQL-Strict" |
| tokenizer = AutoTokenizer.from_pretrained(model_path) |
| model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto") |
| |
| # Ready for inference! |
| ``` |