Spaces:
Sleeping
Sleeping
metadata
language: en
license: mit
tags:
- sql
- education
- text-classification
- sentence-transformers
- multi-tower
pipeline_tag: text-classification
SQL Error Classifier (Multi-Tower)
Lightweight classifier that identifies which SQL mistake area a student is struggling with, given:
- Question β natural-language task
- Schema β available tables and columns
- Correct query β reference solution
- Student query β what the student submitted
- Error message (optional) β database error text
Architecture
Multi-tower semantic comparison using sentence-transformers/all-MiniLM-L6-v2:
- Intent tower β question + schema
- Reference tower β correct query
- Student tower β student query (+ error)
- Comparison layer β embedding diff, interaction, cosine similarities, SQL rule features
- Linear head β 15 error categories
Error Categories (15)
| ID | Category |
|---|---|
| 0 | SYNTAX_ERROR |
| 1 | JOIN_ERROR |
| 2 | AGGREGATION_ERROR |
| 3 | HAVING_WHERE_ERROR |
| 4 | SUBQUERY_ERROR |
| 5 | WINDOW_FUNCTION_ERROR |
| 6 | NULL_HANDLING_ERROR |
| 7 | DATE_FUNCTION_ERROR |
| 8 | COLUMN_REFERENCE_ERROR |
| 9 | TABLE_REFERENCE_ERROR |
| 10 | DATA_TYPE_ERROR |
| 11 | DUPLICATE_RECORD_ERROR |
| 12 | LOGICAL_QUERY_ERROR |
| 13 | PERFORMANCE_ERROR |
| 14 | FILTERING_ERROR |
Usage
from src.huggingface import SQLLErrorClassifierHF
clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")
result = clf.predict(
question="What is the average score of students in each department?",
schema="students(id, name, score, department_id) | departments(id, name)",
correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
)
print(result["label_name"]) # LOGICAL_QUERY_ERROR
print(result["confidence"]) # 0.94
print(result["similarities"]) # semantic alignment scores
Gradio Demo
Deploy as a Hugging Face Space with app.py from this repository.
Model Details
- Encoder:
sentence-transformers/all-MiniLM-L6-v2(loaded from Hub, not bundled) - Head: scikit-learn SGDClassifier + StandardScaler
- Size: ~5 MB classifier head (encoder ~80 MB, cached separately)
- Inference: ~100β200 ms on CPU
Training Data
Synthetically generated from exercise templates with per-category error injectors. 1M balanced samples across 15 classes.
Citation
@misc{sql-error-classifier,
title = {SQL Error Classifier - Multi-Tower},
author = {SQLErrorClassification},
year = {2025},
publisher = {Hugging Face},
}