Update README.md
Browse files
README.md
CHANGED
|
@@ -1,66 +1,68 @@
|
|
| 1 |
---
|
| 2 |
-
datasets:
|
| 3 |
-
- gretelai/synthetic_text_to_sql
|
| 4 |
-
- xlangai/spider
|
| 5 |
-
- NBAmine/xlangai-spider-with-context
|
| 6 |
language:
|
| 7 |
- en
|
| 8 |
-
base_model:
|
| 9 |
-
- mistralai/Mistral-Nemo-Instruct-2407
|
| 10 |
-
pipeline_tag: text-generation
|
| 11 |
-
library_name: transformers
|
| 12 |
tags:
|
| 13 |
- text-to-sql
|
| 14 |
-
- mistral
|
| 15 |
- mistral-nemo
|
|
|
|
| 16 |
- peft
|
| 17 |
- qlora
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# Mistral-Nemo-Text-to-SQL
|
| 21 |
|
| 22 |
[](https://github.com/NBAmine/Nemo-text-to-sql)
|
| 23 |
|
| 24 |
|
| 25 |
## Model Overview
|
| 26 |
-
This is
|
| 27 |
|
| 28 |
-
|
| 29 |
-
1. **Phase 1 (Syntactic Alignment):** Learning basic SQL structure and schema-linking.
|
| 30 |
-
2. **Phase 2 (Logical Reasoning With Context):** Advanced reasoning for complex `JOIN`, `UNION`, and nested sub-queries.
|
| 31 |
|
| 32 |
-
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
|
| 38 |
-
###
|
| 39 |
-
|
| 40 |
-
- **
|
| 41 |
-
- **
|
| 42 |
-
- **Zero Point:** True
|
| 43 |
-
- **Format:** Safetensors (Zero-copy loading)
|
| 44 |
|
| 45 |
-
###
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
|
| 51 |
## Evaluation Results
|
| 52 |
Evaluated on the **Spider** validation set:
|
| 53 |
-
- **Execution Accuracy (EX):**
|
| 54 |
- **Exact Match (EM):** 61.2%
|
| 55 |
-
- **Max Context:** 2048 tokens
|
| 56 |
|
| 57 |
-
##
|
| 58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
|
| 61 |
-
##
|
| 62 |
-
The model follows a structured prompt format to ensure logical alignment:
|
| 63 |
|
| 64 |
-
Context: {
|
| 65 |
-
Question: {USER_QUESTION}<br>
|
| 66 |
-
Answer:
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
base_model: mistralai/Mistral-Nemo-Base-2407
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- text-to-sql
|
|
|
|
| 7 |
- mistral-nemo
|
| 8 |
+
- spider
|
| 9 |
- peft
|
| 10 |
- qlora
|
| 11 |
+
metrics:
|
| 12 |
+
- execution_accuracy
|
| 13 |
+
- exact_match
|
| 14 |
+
model_creator: NBAmine
|
| 15 |
+
pipeline_tag: text-generation
|
| 16 |
+
datasets:
|
| 17 |
+
- gretelai/synthetic_text_to_sql
|
| 18 |
+
- xlangai/spider
|
| 19 |
+
- NBAmine/xlangai-spider-with-context
|
| 20 |
+
library_name: transformers
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# Mistral-Nemo-12B-Text-to-SQL
|
| 24 |
|
| 25 |
[](https://github.com/NBAmine/Nemo-text-to-sql)
|
| 26 |
|
| 27 |
|
| 28 |
## Model Overview
|
| 29 |
+
This is the full-precision (BF16), merged version of a **Mistral-Nemo-12B** model Parameter-Efficient Fine-Tuned for high-performance **Text-to-SQL** generation. This model is the result of merging LoRA adapters—trained via a two-phase curriculum learning strategy—back into the base weights.
|
| 30 |
|
| 31 |
+
It is designed to serve as the "Source of Truth" for further optimizations (like AWQ or GGUF) and represents the peak predictive performance of the training pipeline before any quantization-related drift.
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
- **Base Model:** `mistralai/Mistral-Nemo-Base-2407`
|
| 34 |
+
- **Primary Task:** Natural Language to SQL generation with DDL context.
|
| 35 |
+
- **Output Format:** Standalone SQL queries compatible with standard SQL engines.
|
| 36 |
|
| 37 |
+
## Training Methodology
|
| 38 |
+
The model was developed using an MLOps pipeline on dual T4 GPUs in Kaggle.
|
| 39 |
|
| 40 |
+
### 1. Curriculum Learning Strategy
|
| 41 |
+
The model underwent a two-stage training process:
|
| 42 |
+
- **Phase 1 (Syntactic Alignment):** Focused on SQL syntax, basic keywords, and simple schema mapping.
|
| 43 |
+
- **Phase 2 (Logical Alignment):** Introduced complex reasoning tasks including multiple `JOIN` operations, nested subqueries, and set operations (`UNION`, `INTERSECT`).
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
### 2. Fine-Tuning Details
|
| 46 |
+
- **Technique:** QLoRA (Rank 16, Alpha 32)
|
| 47 |
+
- **Quantization (during training):** 4-bit NF4
|
| 48 |
+
- **Optimizer:** Paged AdamW 8-bit
|
| 49 |
+
- **Hardware:** 2x NVIDIA T4 (Kaggle).
|
| 50 |
|
| 51 |
## Evaluation Results
|
| 52 |
Evaluated on the **Spider** validation set:
|
| 53 |
+
- **Execution Accuracy (EX):** **69.5%**
|
| 54 |
- **Exact Match (EM):** 61.2%
|
| 55 |
+
- **Max Context Length:** 2048 tokens
|
| 56 |
|
| 57 |
+
## Architecture Specs
|
| 58 |
+
The merged weights utilize the standard Mistral-Nemo 12B architecture:
|
| 59 |
+
- **Parameters:** 12.2B
|
| 60 |
+
- **Layers:** 40
|
| 61 |
+
- **Attention:** Grouped Query Attention (GQA) with 8 KV heads.
|
| 62 |
+
- **Vocabulary Size:** 128k (Tekken Tokenizer)
|
| 63 |
+
- **VRAM Requirements:** ~24GB for inference in BF16/FP16.
|
| 64 |
|
| 65 |
|
| 66 |
+
## Template used during training
|
|
|
|
| 67 |
|
| 68 |
+
prompt = "Context: {DDL}<br>Question: {NL_QUERY}<br>Answer:"
|
|
|
|
|
|