---
language: en
license: mit
tags:
  - sql
  - education
  - text-classification
  - sentence-transformers
  - multi-tower
pipeline_tag: text-classification
---

# SQL Error Classifier (Multi-Tower)

Lightweight classifier that identifies **which SQL mistake area** a student is struggling with, given:

- **Question** — natural-language task
- **Schema** — available tables and columns
- **Correct query** — reference solution
- **Student query** — what the student submitted
- **Error message** *(optional)* — database error text

## Architecture

Multi-tower semantic comparison using `sentence-transformers/all-MiniLM-L6-v2`:

1. **Intent tower** — question + schema
2. **Reference tower** — correct query
3. **Student tower** — student query (+ error)
4. **Comparison layer** — embedding diff, interaction, cosine similarities, SQL rule features
5. **Linear head** — 15 error categories

## Error Categories (15)

| ID | Category |
|----|----------|
| 0 | SYNTAX_ERROR |
| 1 | JOIN_ERROR |
| 2 | AGGREGATION_ERROR |
| 3 | HAVING_WHERE_ERROR |
| 4 | SUBQUERY_ERROR |
| 5 | WINDOW_FUNCTION_ERROR |
| 6 | NULL_HANDLING_ERROR |
| 7 | DATE_FUNCTION_ERROR |
| 8 | COLUMN_REFERENCE_ERROR |
| 9 | TABLE_REFERENCE_ERROR |
| 10 | DATA_TYPE_ERROR |
| 11 | DUPLICATE_RECORD_ERROR |
| 12 | LOGICAL_QUERY_ERROR |
| 13 | PERFORMANCE_ERROR |
| 14 | FILTERING_ERROR |

## Usage

```python
from src.huggingface import SQLLErrorClassifierHF

clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")

result = clf.predict(
    question="What is the average score of students in each department?",
    schema="students(id, name, score, department_id) | departments(id, name)",
    correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
    student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
)

print(result["label_name"])    # LOGICAL_QUERY_ERROR
print(result["confidence"])    # 0.94
print(result["similarities"])  # semantic alignment scores
```

## Gradio Demo

Deploy as a [Hugging Face Space](https://huggingface.co/docs/hub/spaces) with `app.py` from this repository.

## Model Details

- **Encoder**: `sentence-transformers/all-MiniLM-L6-v2` (loaded from Hub, not bundled)
- **Head**: scikit-learn SGDClassifier + StandardScaler
- **Size**: ~5 MB classifier head (encoder ~80 MB, cached separately)
- **Inference**: ~100–200 ms on CPU

## Training Data

Synthetically generated from exercise templates with per-category error injectors.
1M balanced samples across 15 classes.

## Citation

```bibtex
@misc{sql-error-classifier,
  title  = {SQL Error Classifier - Multi-Tower},
  author = {SQLErrorClassification},
  year   = {2025},
  publisher = {Hugging Face},
}
```