manuelaschrittwieser
/

phi-3-mini-sql-assistant

@@ -1,199 +1,145 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: mit
+base_model: microsoft/Phi-3-mini-4k-instruct
+datasets:
+- b-mc2/sql-create-context
+tags:
+- peft
+- qlora
+- text-to-sql
+- phi-3
 ---
+# Enhanced QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant (2 Epochs)
+This repository contains an improved, high-performance QLoRA adapter for the `microsoft/Phi-3-mini-4k-instruct` model.
+This version has been fine-tuned for **two full epochs** on a Text-to-SQL task, resulting in enhanced performance and reliability compared to single-epoch versions.
+The model is designed to function as a technical assistant, capable of generating accurate SQL queries from natural language questions based on a provided database schema.
+This project was developed for an engineering and deployment course, with a focus on creating a robust, reproducible, and practical AI artifact.
+## Key Improvements in This Version
+-   **Enhanced Reliability:** Training for two epochs has significantly improved the model's ability to consistently adhere to the required chat template format, reducing parsing errors in production.
+-   **Maintained Accuracy:** The model maintains its high accuracy in generating syntactically correct and logically sound SQL queries.
+-   **Robust Loading:** The usage instructions below follow best practices to ensure reliable loading across different environments.
+## How to Use
+First, ensure you have a compatible environment by installing these specific library versions:
+```bash
+pip install transformers==4.38.2 peft==0.10.0 accelerate==0.28.0 bitsandbytes==0.43.0 torch
+```
+The following code provides the most robust method for loading and running inference with this adapter.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+from peft import PeftModel
+import torch
+# --- 1. Configuration ---
+base_model_id = "microsoft/Phi-3-mini-4k-instruct"
+# IMPORTANT: Replace with your new model's ID on the Hugging Face Hub
+adapter_id = "YourUsername/YourNewModelName"
+# --- 2. Load the Quantized Base Model ---
+# This is required to fit the model in memory-constrained environments like Colab
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+# --- 3. Load and Apply the LoRA Adapter ---
+model = PeftModel.from_pretrained(base_model, adapter_id)
+print("Successfully loaded quantized base model and applied adapter.")
+# --- 4. Prepare for Inference ---
+context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
+question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"
+prompt = f"""<|user|>
+Given the database schema:
+{context}
+Generate the SQL query for the following request:
+{question}<|end|>
+<|assistant|>
+"""
+# --- 5. Generate the Response ---
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
+outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
+generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
+generated_sql = generated_text.split("<|assistant|>")[-1].strip()
+print(f"\nGenerated SQL: {generated_sql}")
+# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000
+```
+## Training Procedure
+### Dataset
+The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset, split 90/10 for training and validation.
+### Fine-tuning Configuration (QLoRA)
+* **Quantization:** 4-bit NormalFloat (NF4) with `bfloat16` compute dtype.
+* **LoRA Rank (`r`):** 8
+* **LoRA Alpha (`lora_alpha`):** 16
+* **Target Modules:** All linear layers in the Phi-3 architecture (`q_proj`, `k_proj`, `v_proj`, `o_proj`, etc.).
+### Training Hyperparameters
+* **Learning Rate:** 2e-4
+* **Epochs: 2**
+* **Effective Batch Size:** 8
+* **Optimizer:** Paged AdamW (32-bit)
+* **LR Scheduler:** Cosine
+## Evaluation and Results
+Qualitative evaluation on a held-out test set confirms that the model consistently generates correct SQL queries.
+The extended training to two epochs has successfully addressed the primary limitation of the single-epoch version: inconsistent formatting.
+This model now reliably generates the `<|assistant|>` token, making it more suitable for automated parsing and deployment.
+## Deployment & Optimization Considerations
+For deployment in a production environment, consider the following optimizations:
+1. **Merge Adapter Weights:** Before deploying, merge the adapter weights into the base model to create a single, solid model.
+This eliminates the overhead of dynamically applying the adapter during inference and can improve performance.
+```python
+# After loading the model and adapter:
+merged_model = model.merge_and_unload()
+# Use 'merged_model' for all subsequent 'generate' calls.
+```
+2. **Further Quantization:** For CPU-based deployment or even more efficient GPU usage, the merged model can be further quantized into formats like **GGUF** (for use with `llama.cpp`) or **AWQ/GPTQ**.
+3. **API Serving:** Wrap the model in a high-performance web server like FastAPI or use a dedicated LLM serving framework like vLLM for optimal throughput and batching.
+## Limitations and Responsible AI
+* **Generalization:** The model is specialized for the Text-to-SQL task and the schema styles seen in its training data. It may not perform well on highly complex or esoteric SQL dialects.
+* **Security:** This model is a proof-of-concept and has not been hardened against SQL injection attacks. All generated SQL should be treated as untrusted input and must be sanitized or executed in a sandboxed, read-only environment.
+* **Bias:** The training data is the source of the model's knowledge. Any biases present in the sql-create-context dataset may be reflected in the model's outputs.