nishu08's picture
Deploy CodeBERT training Space
9b2cded verified
|
Raw
History Blame Contribute Delete
2.78 kB
---
language: en
license: mit
tags:
- sql
- education
- text-classification
- sentence-transformers
- multi-tower
pipeline_tag: text-classification
---
# SQL Error Classifier (Multi-Tower)
Lightweight classifier that identifies **which SQL mistake area** a student is struggling with, given:
- **Question** β€” natural-language task
- **Schema** β€” available tables and columns
- **Correct query** β€” reference solution
- **Student query** β€” what the student submitted
- **Error message** *(optional)* β€” database error text
## Architecture
Multi-tower semantic comparison using `sentence-transformers/all-MiniLM-L6-v2`:
1. **Intent tower** β€” question + schema
2. **Reference tower** β€” correct query
3. **Student tower** β€” student query (+ error)
4. **Comparison layer** β€” embedding diff, interaction, cosine similarities, SQL rule features
5. **Linear head** β€” 15 error categories
## Error Categories (15)
| ID | Category |
|----|----------|
| 0 | SYNTAX_ERROR |
| 1 | JOIN_ERROR |
| 2 | AGGREGATION_ERROR |
| 3 | HAVING_WHERE_ERROR |
| 4 | SUBQUERY_ERROR |
| 5 | WINDOW_FUNCTION_ERROR |
| 6 | NULL_HANDLING_ERROR |
| 7 | DATE_FUNCTION_ERROR |
| 8 | COLUMN_REFERENCE_ERROR |
| 9 | TABLE_REFERENCE_ERROR |
| 10 | DATA_TYPE_ERROR |
| 11 | DUPLICATE_RECORD_ERROR |
| 12 | LOGICAL_QUERY_ERROR |
| 13 | PERFORMANCE_ERROR |
| 14 | FILTERING_ERROR |
## Usage
```python
from src.huggingface import SQLLErrorClassifierHF
clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")
result = clf.predict(
question="What is the average score of students in each department?",
schema="students(id, name, score, department_id) | departments(id, name)",
correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
)
print(result["label_name"]) # LOGICAL_QUERY_ERROR
print(result["confidence"]) # 0.94
print(result["similarities"]) # semantic alignment scores
```
## Gradio Demo
Deploy as a [Hugging Face Space](https://huggingface.co/docs/hub/spaces) with `app.py` from this repository.
## Model Details
- **Encoder**: `sentence-transformers/all-MiniLM-L6-v2` (loaded from Hub, not bundled)
- **Head**: scikit-learn SGDClassifier + StandardScaler
- **Size**: ~5 MB classifier head (encoder ~80 MB, cached separately)
- **Inference**: ~100–200 ms on CPU
## Training Data
Synthetically generated from exercise templates with per-category error injectors.
1M balanced samples across 15 classes.
## Citation
```bibtex
@misc{sql-error-classifier,
title = {SQL Error Classifier - Multi-Tower},
author = {SQLErrorClassification},
year = {2025},
publisher = {Hugging Face},
}
```