NBAmine
/

mistral-nemo-text-to-sql

@@ -1,66 +1,68 @@
 ---
-datasets:
-- gretelai/synthetic_text_to_sql
-- xlangai/spider
-- NBAmine/xlangai-spider-with-context
 language:
 - en
-base_model:
-- mistralai/Mistral-Nemo-Instruct-2407
-pipeline_tag: text-generation
-library_name: transformers
 tags:
 - text-to-sql
-- mistral
 - mistral-nemo
 - peft
 - qlora
 ---
-# Mistral-Nemo-Text-to-SQL-AWQ
 [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white)](https://github.com/NBAmine/Nemo-text-to-sql)
 ## Model Overview
-This is a production-optimized, **4-bit AWQ quantized** version of a Parameter-Efficient Fine-Tuned **Mistral-Nemo-12B**. The model is specialized for **Text-to-SQL** tasks, specifically designed to handle complex relational database queries with DDL context.
-The original model was trained using a **two-phase curriculum learning strategy**:
-1. **Phase 1 (Syntactic Alignment):** Learning basic SQL structure and schema-linking.
-2. **Phase 2 (Logical Reasoning With Context):** Advanced reasoning for complex `JOIN`, `UNION`, and nested sub-queries.
-## Engineering & Optimization
-This model demonstrates advanced optimization techniques to balance performance and hardware accessibility.
-### 1. Stratified AWQ Calibration
-Unlike standard quantization, this model utilized **Stratified Sampling** based on SQL query length during the calibration phase. This ensured that the **"Salient Weights"** responsible for long-context reasoning and complex SQL clauses were protected via activation-aware scaling factors.
-### 2. Quantization Specs
-- **Method:** AWQ (Activation-aware Weight Quantization)
-- **Bits:** 4-bit
-- **Group Size:** 128
-- **Zero Point:** True
-- **Format:** Safetensors (Zero-copy loading)
-### 3. Hardware Requirements
-| Version | VRAM Usage | Precision | Suggested GPU |
-| :--- | :--- | :--- | :--- |
-| **BF16 (Original)** | ~24 GB | 16-bit | A100 / RTX 6000 |
-| **AWQ (This Model)** | **~7.5 GB** | **4-bit** | **RTX 3060 / T4** |
 ## Evaluation Results
 Evaluated on the **Spider** validation set:
-- **Execution Accuracy (EX):** ~69.0%
 - **Exact Match (EM):** 61.2%
-- **Max Context:** 2048 tokens
-## Deployment with vLLM
-This model is optimized for **vLLM**
-## Prompt Template
-The model follows a structured prompt format to ensure logical alignment:
-Context: {DDL_STATEMENTS}<br>
-Question: {USER_QUESTION}<br>
-Answer:

 ---
 language:
 - en
+base_model: mistralai/Mistral-Nemo-Base-2407
 tags:
 - text-to-sql
 - mistral-nemo
+- spider
 - peft
 - qlora
+metrics:
+- execution_accuracy
+- exact_match
+model_creator: NBAmine
+pipeline_tag: text-generation
+datasets:
+- gretelai/synthetic_text_to_sql
+- xlangai/spider
+- NBAmine/xlangai-spider-with-context
+library_name: transformers
 ---
+# Mistral-Nemo-12B-Text-to-SQL
 [![GitHub](https://img.shields.io/badge/GitHub-Repo-181717?logo=github&logoColor=white)](https://github.com/NBAmine/Nemo-text-to-sql)
 ## Model Overview
+This is the full-precision (BF16), merged version of a **Mistral-Nemo-12B** model Parameter-Efficient Fine-Tuned for high-performance **Text-to-SQL** generation. This model is the result of merging LoRA adapters—trained via a two-phase curriculum learning strategy—back into the base weights.
+It is designed to serve as the "Source of Truth" for further optimizations (like AWQ or GGUF) and represents the peak predictive performance of the training pipeline before any quantization-related drift.
+- **Base Model:** `mistralai/Mistral-Nemo-Base-2407`
+- **Primary Task:** Natural Language to SQL generation with DDL context.
+- **Output Format:** Standalone SQL queries compatible with standard SQL engines.
+## Training Methodology
+The model was developed using an MLOps pipeline on dual T4 GPUs in Kaggle.
+### 1. Curriculum Learning Strategy
+The model underwent a two-stage training process:
+- **Phase 1 (Syntactic Alignment):** Focused on SQL syntax, basic keywords, and simple schema mapping.
+- **Phase 2 (Logical Alignment):** Introduced complex reasoning tasks including multiple `JOIN` operations, nested subqueries, and set operations (`UNION`, `INTERSECT`).
+### 2. Fine-Tuning Details
+- **Technique:** QLoRA (Rank 16, Alpha 32)
+- **Quantization (during training):** 4-bit NF4
+- **Optimizer:** Paged AdamW 8-bit
+- **Hardware:** 2x NVIDIA T4 (Kaggle).
 ## Evaluation Results
 Evaluated on the **Spider** validation set:
+- **Execution Accuracy (EX):** **69.5%**
 - **Exact Match (EM):** 61.2%
+- **Max Context Length:** 2048 tokens
+## Architecture Specs
+The merged weights utilize the standard Mistral-Nemo 12B architecture:
+- **Parameters:** 12.2B
+- **Layers:** 40
+- **Attention:** Grouped Query Attention (GQA) with 8 KV heads.
+- **Vocabulary Size:** 128k (Tekken Tokenizer)
+- **VRAM Requirements:** ~24GB for inference in BF16/FP16.
+## Template used during training
+prompt = "Context: {DDL}<br>Question: {NL_QUERY}<br>Answer:"