nishu08's picture
Deploy CodeBERT training Space
9b2cded verified
|
Raw
History Blame Contribute Delete
2.78 kB
metadata
language: en
license: mit
tags:
  - sql
  - education
  - text-classification
  - sentence-transformers
  - multi-tower
pipeline_tag: text-classification

SQL Error Classifier (Multi-Tower)

Lightweight classifier that identifies which SQL mistake area a student is struggling with, given:

  • Question β€” natural-language task
  • Schema β€” available tables and columns
  • Correct query β€” reference solution
  • Student query β€” what the student submitted
  • Error message (optional) β€” database error text

Architecture

Multi-tower semantic comparison using sentence-transformers/all-MiniLM-L6-v2:

  1. Intent tower β€” question + schema
  2. Reference tower β€” correct query
  3. Student tower β€” student query (+ error)
  4. Comparison layer β€” embedding diff, interaction, cosine similarities, SQL rule features
  5. Linear head β€” 15 error categories

Error Categories (15)

ID Category
0 SYNTAX_ERROR
1 JOIN_ERROR
2 AGGREGATION_ERROR
3 HAVING_WHERE_ERROR
4 SUBQUERY_ERROR
5 WINDOW_FUNCTION_ERROR
6 NULL_HANDLING_ERROR
7 DATE_FUNCTION_ERROR
8 COLUMN_REFERENCE_ERROR
9 TABLE_REFERENCE_ERROR
10 DATA_TYPE_ERROR
11 DUPLICATE_RECORD_ERROR
12 LOGICAL_QUERY_ERROR
13 PERFORMANCE_ERROR
14 FILTERING_ERROR

Usage

from src.huggingface import SQLLErrorClassifierHF

clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")

result = clf.predict(
    question="What is the average score of students in each department?",
    schema="students(id, name, score, department_id) | departments(id, name)",
    correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
    student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
)

print(result["label_name"])    # LOGICAL_QUERY_ERROR
print(result["confidence"])    # 0.94
print(result["similarities"])  # semantic alignment scores

Gradio Demo

Deploy as a Hugging Face Space with app.py from this repository.

Model Details

  • Encoder: sentence-transformers/all-MiniLM-L6-v2 (loaded from Hub, not bundled)
  • Head: scikit-learn SGDClassifier + StandardScaler
  • Size: ~5 MB classifier head (encoder ~80 MB, cached separately)
  • Inference: ~100–200 ms on CPU

Training Data

Synthetically generated from exercise templates with per-category error injectors. 1M balanced samples across 15 classes.

Citation

@misc{sql-error-classifier,
  title  = {SQL Error Classifier - Multi-Tower},
  author = {SQLErrorClassification},
  year   = {2025},
  publisher = {Hugging Face},
}