--- language: en license: mit tags: - sql - education - text-classification - sentence-transformers - multi-tower pipeline_tag: text-classification --- # SQL Error Classifier (Multi-Tower) Lightweight classifier that identifies **which SQL mistake area** a student is struggling with, given: - **Question** — natural-language task - **Schema** — available tables and columns - **Correct query** — reference solution - **Student query** — what the student submitted - **Error message** *(optional)* — database error text ## Architecture Multi-tower semantic comparison using `sentence-transformers/all-MiniLM-L6-v2`: 1. **Intent tower** — question + schema 2. **Reference tower** — correct query 3. **Student tower** — student query (+ error) 4. **Comparison layer** — embedding diff, interaction, cosine similarities, SQL rule features 5. **Linear head** — 15 error categories ## Error Categories (15) | ID | Category | |----|----------| | 0 | SYNTAX_ERROR | | 1 | JOIN_ERROR | | 2 | AGGREGATION_ERROR | | 3 | HAVING_WHERE_ERROR | | 4 | SUBQUERY_ERROR | | 5 | WINDOW_FUNCTION_ERROR | | 6 | NULL_HANDLING_ERROR | | 7 | DATE_FUNCTION_ERROR | | 8 | COLUMN_REFERENCE_ERROR | | 9 | TABLE_REFERENCE_ERROR | | 10 | DATA_TYPE_ERROR | | 11 | DUPLICATE_RECORD_ERROR | | 12 | LOGICAL_QUERY_ERROR | | 13 | PERFORMANCE_ERROR | | 14 | FILTERING_ERROR | ## Usage ```python from src.huggingface import SQLLErrorClassifierHF clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier") result = clf.predict( question="What is the average score of students in each department?", schema="students(id, name, score, department_id) | departments(id, name)", correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id", student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id", ) print(result["label_name"]) # LOGICAL_QUERY_ERROR print(result["confidence"]) # 0.94 print(result["similarities"]) # semantic alignment scores ``` ## Gradio Demo Deploy as a [Hugging Face Space](https://huggingface.co/docs/hub/spaces) with `app.py` from this repository. ## Model Details - **Encoder**: `sentence-transformers/all-MiniLM-L6-v2` (loaded from Hub, not bundled) - **Head**: scikit-learn SGDClassifier + StandardScaler - **Size**: ~5 MB classifier head (encoder ~80 MB, cached separately) - **Inference**: ~100–200 ms on CPU ## Training Data Synthetically generated from exercise templates with per-category error injectors. 1M balanced samples across 15 classes. ## Citation ```bibtex @misc{sql-error-classifier, title = {SQL Error Classifier - Multi-Tower}, author = {SQLErrorClassification}, year = {2025}, publisher = {Hugging Face}, } ```