--- license: mit --- # SQL Socratic Models This repository contains fine-tuned large language models for **Socratic SQL instruction** in higher education, focusing on guiding learners through SQL concepts using structured reasoning rather than providing direct solutions. ## Models - phi3_rq4 - qwen25 - gemma2 ## Method Our approach is designed to support **conceptual learning in STEM education** through Socratic interaction: - **Phase 1 (Data Construction):** SQL instruction data is augmented with Socratic prompts emphasizing: - Question decomposition - Conceptual hints - Guided reasoning steps - **Phase 2 (Fine-Tuning):** We apply full fine-tuning (FFT) on small, open-source LLMs with **pedagogical constraints** that explicitly discourage direct answer generation and instead promote: - Conceptual scaffolding - Incremental reasoning - Learner-centered guidance - **Phase 3 (Evaluation):** Models are evaluated using: - **BERTScore** for semantic alignment with expected reasoning - **ROUGE-L** to measure and control **answer leakage** (i.e., avoidance of direct SQL solutions) ## Contributions - Fine-tuning across multiple architectures (Phi-3, Qwen2.5, Gemma2) for **instructional SQL reasoning** - Development of **Socratic SQL prompting framework** for higher education contexts - Evaluation of models on their ability to generate **guidance without revealing final answers** - Ablation study identifying factors that enable LLMs to mimic effective instructors through: - Misconception-aware feedback - Iterative questioning - Structured reasoning support ## Task Given a natural language SQL question, the model generates: 1. Socratic reasoning steps 2. Conceptual hints and guiding questions 3. Intermediate decomposition of the problem **The model does NOT produce the final SQL query**, ensuring alignment with instructional use in higher education settings. This design supports: - Active learning - Conceptual understanding of SQL - Integration of database concepts into broader STEM curricula ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("sriram882004/SQL-Socratic-Models/phi3_rq4") tokenizer = AutoTokenizer.from_pretrained("sriram882004/SQL-Socratic-Models/phi3_rq4")