--- language: - en license: apache-2.0 tags: - text-to-sql - llama-3 - lora - spider - sql-generation - fine-tuned base_model: meta-llama/Meta-Llama-3-8B-Instruct datasets: - spider metrics: - exact_match - execution_accuracy --- # 🤖 Nano-Analyst: Fine-Tuned SQL Agent **A private, on-device SQL generation model fine-tuned on the Spider dataset.** ## Model Description This is a **LoRA fine-tuned** version of **Llama-3-8B-Instruct** specialized for Text-to-SQL generation. The model was trained on 6,300 examples from the Spider dataset and achieves 100% valid SQL generation. ### Key Features - ✅ **Privacy-Preserving**: Runs locally, no API calls required - ✅ **Efficient**: QLoRA fine-tuning (only 1.03% parameters trained) - ✅ **Production-Ready**: Includes self-correction and RAG retrieval - ✅ **Well-Documented**: Complete training and evaluation pipeline ## Model Details - **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct - **Fine-Tuning Method**: QLoRA (4-bit quantization) - **Framework**: Unsloth (2x faster training) - **Training Data**: Spider dataset (6,300 training examples) - **LoRA Rank**: 32 - **LoRA Alpha**: 64 ## Training Results - **Final Training Loss**: 0.19 - **Validation Loss**: 0.31 - **Training Steps**: 1,182 - **Training Time**: ~3 hours on T4 GPU ## Evaluation Results Evaluated on 100 examples from Spider validation set: | Metric | Value | |--------|-------| | **Valid SQL Generation** | 100.0% | | **Exact String Match** | 2.0% | | **Successful Queries** | 100/100 | ## Usage ### Loading the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.float16, device_map="auto" ) # Load LoRA adapters model = PeftModel.from_pretrained( base_model, "tanvicas/nano-analyst-sql" ) tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") ``` ### Generating SQL ```python question = "How many users are from California?" schema = "CREATE TABLE users (id INT, name TEXT, state TEXT);" # Generate SQL using the model sql = generate_sql(question, schema) print(sql) # SELECT COUNT(*) FROM users WHERE state = 'California'; ``` ## License Apache 2.0 License. Base model subject to Meta's Llama 3 Community License. ## Contact - **GitHub**: github.com/tanvicas/nano-analyst --- **Built for learning and research** 🚀