---
license: mit
---

# SQL Socratic Models

This repository contains fine-tuned large language models for **Socratic SQL instruction** in higher education, focusing on guiding learners through SQL concepts using structured reasoning rather than providing direct solutions.

## Models
- phi3_rq4
- qwen25
- gemma2

## Method

Our approach is designed to support **conceptual learning in STEM education** through Socratic interaction:

- **Phase 1 (Data Construction):**  
  SQL instruction data is augmented with Socratic prompts emphasizing:
  - Question decomposition  
  - Conceptual hints  
  - Guided reasoning steps  

- **Phase 2 (Fine-Tuning):**  
  We apply full fine-tuning (FFT) on small, open-source LLMs with **pedagogical constraints** that explicitly discourage direct answer generation and instead promote:
  - Conceptual scaffolding  
  - Incremental reasoning  
  - Learner-centered guidance  

- **Phase 3 (Evaluation):**  
  Models are evaluated using:
  - **BERTScore** for semantic alignment with expected reasoning  
  - **ROUGE-L** to measure and control **answer leakage** (i.e., avoidance of direct SQL solutions)

## Contributions
- Fine-tuning across multiple architectures (Phi-3, Qwen2.5, Gemma2) for **instructional SQL reasoning**
- Development of **Socratic SQL prompting framework** for higher education contexts
- Evaluation of models on their ability to generate **guidance without revealing final answers**
- Ablation study identifying factors that enable LLMs to mimic effective instructors through:
  - Misconception-aware feedback  
  - Iterative questioning  
  - Structured reasoning support  

## Task

Given a natural language SQL question, the model generates:

1. Socratic reasoning steps  
2. Conceptual hints and guiding questions  
3. Intermediate decomposition of the problem  

**The model does NOT produce the final SQL query**, ensuring alignment with instructional use in higher education settings.

This design supports:
- Active learning  
- Conceptual understanding of SQL  
- Integration of database concepts into broader STEM curricula  

## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("sriram882004/SQL-Socratic-Models/phi3_rq4")
tokenizer = AutoTokenizer.from_pretrained("sriram882004/SQL-Socratic-Models/phi3_rq4")