Update README.md

904ccb7 verified 9 days ago

3.89 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: LiquidAI/LFM2.5-350M
	pipeline_tag: text-generation
	tags:
	- text-to-sql
	- liquid-ai
	- lfm
	- unsloth
	- qlora
	- synthetic-data
	- database
	datasets:
	- gretelai/synthetic_text_to_sql
	metrics:
	- loss
	model-index:
	- name: SS-350M-SQL-Strict
	results: []
	---

	# Model Card: SS-350M-SQL-Strict

	## Model Summary
	SS-350M-SQL-Strict is a specialized, lightweight LLM fine-tuned for the singular task of Text-to-SQL translation. Built upon the LiquidAI LFM2.5-350M architecture, this model has been engineered to follow a "Strict" output protocol: it generates only raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.

	By leveraging 4-bit QLoRA and Unsloth optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.

	---

	## Model Details
	- Developed by: Saad Salman
	- Architecture: Liquid Foundation Model (LFM) 2.5
	- Parameters: 350 Million
	- Quantization: 4-bit (bitsandbytes)
	- Fine-tuning Method: QLoRA
	- Primary Task: Natural Language to SQL (Strict)

	---

	## Training Logic & Parameters
	The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of Completion-Only Loss masking, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.

	### Hyperparameters
	\| Parameter \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Max Steps \| 800 \| Optimal convergence point for 350M params \|
	\| Learning Rate \| 2e-4 \| High enough for rapid logic acquisition \|
	\| Batch Size \| 16 \| (4 per device with 4 grad accumulation) \|
	\| Rank (r) \| 32 \| High rank to capture complex SQL logic \|
	\| Alpha \| 32 \| Scaling factor for LoRA weights \|
	\| Optimizer \| AdamW 8-bit \| Memory-efficient optimization \|

	### Training Curve Analysis
	The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between 8.0 and 11.0. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.

	---

	## Prompting Specification (ChatML)
	To ensure the "Strict" behavior, you must use the following ChatML format. Failure to use this format may result in hallucinated text.

	### Template
	```text
	<\|im_start\|>system
	You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<\|im_end\|>
	<\|im_start\|>user
	{YOUR_QUESTION}<\|im_end\|>
	<\|im_start\|>assistant
	```

	### Example Input
	```text
	<\|im_start\|>system
	You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<\|im_end\|>
	<\|im_start\|>user
	Find the average price of all 'completed' orders.<\|im_end\|>
	<\|im_start\|>assistant
	```

	### Example Output
	```sql
	SELECT AVG(price) FROM orders WHERE status = 'completed';
	```

	---

	## Training Dataset
	The model was trained on the Gretel Synthetic SQL dataset. This dataset is designed to cover:
	* Complex joins and subqueries.
	* Diverse industry domains (Finance, Retail, Tech).
	* Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses.

	---

	## Technical Limitations
	* Schema Size: Best suited for schemas with < 20 tables.
	* Dialect: Defaulted to standard SQL.
	* Reasoning: The model does not "explain" its code; it is a direct translation engine.

	---

	## How to Use with Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_path = "saadxsalman/SS-350M-SQL-Strict"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

	# Ready for inference!
	```

	---
	language:
	- en
	license: apache-2.0
	base_model: LiquidAI/LFM2.5-350M
	pipeline_tag: text-generation
	tags:
	- text-to-sql
	- liquid-ai
	- lfm
	- unsloth
	- qlora
	- synthetic-data
	- database
	datasets:
	- gretelai/synthetic_text_to_sql
	metrics:
	- loss
	model-index:
	- name: SS-350M-SQL-Strict
	results: []
	---

	# Model Card: SS-350M-SQL-Strict

	## Model Summary
	SS-350M-SQL-Strict is a specialized, lightweight LLM fine-tuned for the singular task of Text-to-SQL translation. Built upon the LiquidAI LFM2.5-350M architecture, this model has been engineered to follow a "Strict" output protocol: it generates only raw SQL code, eliminating the conversational filler, Markdown blocks, and explanations typically found in general-purpose models.

	By leveraging 4-bit QLoRA and Unsloth optimizations, this model provides high-speed, low-latency SQL generation suitable for edge deployment and resource-constrained environments.

	---

	## Model Details
	- Developed by: Saad Salman
	- Architecture: Liquid Foundation Model (LFM) 2.5
	- Parameters: 350 Million
	- Quantization: 4-bit (bitsandbytes)
	- Fine-tuning Method: QLoRA
	- Primary Task: Natural Language to SQL (Strict)

	---

	## Training Logic & Parameters
	The model was trained using a custom pipeline to enforce strict code generation. The key differentiator is the use of Completion-Only Loss masking, which prevents the model from wasting weights on learning the prompt structure, focusing 100% of its learning capacity on the SQL syntax.

	### Hyperparameters
	\| Parameter \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Max Steps \| 800 \| Optimal convergence point for 350M params \|
	\| Learning Rate \| 2e-4 \| High enough for rapid logic acquisition \|
	\| Batch Size \| 16 \| (4 per device with 4 grad accumulation) \|
	\| Rank (r) \| 32 \| High rank to capture complex SQL logic \|
	\| Alpha \| 32 \| Scaling factor for LoRA weights \|
	\| Optimizer \| AdamW 8-bit \| Memory-efficient optimization \|

	### Training Curve Analysis
	The model demonstrated a classic "L-shaped" convergence curve. Initial loss started at ~38.1 and successfully plateaued between 8.0 and 11.0. This plateau indicates the model has fully internalized the ChatML structure and the SQL schema-mapping logic.

	---

	## Prompting Specification (ChatML)
	To ensure the "Strict" behavior, you must use the following ChatML format. Failure to use this format may result in hallucinated text.

	### Template
	```text
	<\|im_start\|>system
	You are a SQL translation engine. Return ONLY raw SQL. Schema: {YOUR_SCHEMA}<\|im_end\|>
	<\|im_start\|>user
	{YOUR_QUESTION}<\|im_end\|>
	<\|im_start\|>assistant
	```

	### Example Input
	```text
	<\|im_start\|>system
	You are a SQL translation engine. Return ONLY raw SQL. Schema: Table 'orders' (id, price, status, created_at)<\|im_end\|>
	<\|im_start\|>user
	Find the average price of all 'completed' orders.<\|im_end\|>
	<\|im_start\|>assistant
	```

	### Example Output
	```sql
	SELECT AVG(price) FROM orders WHERE status = 'completed';
	```

	---

	## Training Dataset
	The model was trained on the Gretel Synthetic SQL dataset. This dataset is designed to cover:
	* Complex joins and subqueries.
	* Diverse industry domains (Finance, Retail, Tech).
	* Correct handling of `GROUP BY`, `ORDER BY`, and `HAVING` clauses.

	---

	## Technical Limitations
	* Schema Size: Best suited for schemas with < 20 tables.
	* Dialect: Defaulted to standard SQL.
	* Reasoning: The model does not "explain" its code; it is a direct translation engine.

	---

	## How to Use with Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_path = "saadxsalman/SS-350M-SQL-Strict"
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")

	# Ready for inference!
	```