Spaces:
Sleeping
Sleeping
| language: en | |
| license: mit | |
| tags: | |
| - sql | |
| - education | |
| - text-classification | |
| - sentence-transformers | |
| - multi-tower | |
| pipeline_tag: text-classification | |
| # SQL Error Classifier (Multi-Tower) | |
| Lightweight classifier that identifies **which SQL mistake area** a student is struggling with, given: | |
| - **Question** β natural-language task | |
| - **Schema** β available tables and columns | |
| - **Correct query** β reference solution | |
| - **Student query** β what the student submitted | |
| - **Error message** *(optional)* β database error text | |
| ## Architecture | |
| Multi-tower semantic comparison using `sentence-transformers/all-MiniLM-L6-v2`: | |
| 1. **Intent tower** β question + schema | |
| 2. **Reference tower** β correct query | |
| 3. **Student tower** β student query (+ error) | |
| 4. **Comparison layer** β embedding diff, interaction, cosine similarities, SQL rule features | |
| 5. **Linear head** β 15 error categories | |
| ## Error Categories (15) | |
| | ID | Category | | |
| |----|----------| | |
| | 0 | SYNTAX_ERROR | | |
| | 1 | JOIN_ERROR | | |
| | 2 | AGGREGATION_ERROR | | |
| | 3 | HAVING_WHERE_ERROR | | |
| | 4 | SUBQUERY_ERROR | | |
| | 5 | WINDOW_FUNCTION_ERROR | | |
| | 6 | NULL_HANDLING_ERROR | | |
| | 7 | DATE_FUNCTION_ERROR | | |
| | 8 | COLUMN_REFERENCE_ERROR | | |
| | 9 | TABLE_REFERENCE_ERROR | | |
| | 10 | DATA_TYPE_ERROR | | |
| | 11 | DUPLICATE_RECORD_ERROR | | |
| | 12 | LOGICAL_QUERY_ERROR | | |
| | 13 | PERFORMANCE_ERROR | | |
| | 14 | FILTERING_ERROR | | |
| ## Usage | |
| ```python | |
| from src.huggingface import SQLLErrorClassifierHF | |
| clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier") | |
| result = clf.predict( | |
| question="What is the average score of students in each department?", | |
| schema="students(id, name, score, department_id) | departments(id, name)", | |
| correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id", | |
| student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id", | |
| ) | |
| print(result["label_name"]) # LOGICAL_QUERY_ERROR | |
| print(result["confidence"]) # 0.94 | |
| print(result["similarities"]) # semantic alignment scores | |
| ``` | |
| ## Gradio Demo | |
| Deploy as a [Hugging Face Space](https://huggingface.co/docs/hub/spaces) with `app.py` from this repository. | |
| ## Model Details | |
| - **Encoder**: `sentence-transformers/all-MiniLM-L6-v2` (loaded from Hub, not bundled) | |
| - **Head**: scikit-learn SGDClassifier + StandardScaler | |
| - **Size**: ~5 MB classifier head (encoder ~80 MB, cached separately) | |
| - **Inference**: ~100β200 ms on CPU | |
| ## Training Data | |
| Synthetically generated from exercise templates with per-category error injectors. | |
| 1M balanced samples across 15 classes. | |
| ## Citation | |
| ```bibtex | |
| @misc{sql-error-classifier, | |
| title = {SQL Error Classifier - Multi-Tower}, | |
| author = {SQLErrorClassification}, | |
| year = {2025}, | |
| publisher = {Hugging Face}, | |
| } | |
| ``` | |