Spaces:

nishu08
/

sql-error-classifier-train

Sleeping

App Files Files Community

sql-error-classifier-train / hub /MODEL_CARD.md

nishu08

Deploy CodeBERT training Space

9b2cded verified 26 days ago

preview code

Raw

History Blame Contribute Delete

2.78 kB

metadata

language: en
license: mit
tags:
  - sql
  - education
  - text-classification
  - sentence-transformers
  - multi-tower
pipeline_tag: text-classification

SQL Error Classifier (Multi-Tower)

Lightweight classifier that identifies which SQL mistake area a student is struggling with, given:

Question — natural-language task
Schema — available tables and columns
Correct query — reference solution
Student query — what the student submitted
Error message (optional) — database error text

Architecture

Multi-tower semantic comparison using sentence-transformers/all-MiniLM-L6-v2:

Intent tower — question + schema
Reference tower — correct query
Student tower — student query (+ error)
Comparison layer — embedding diff, interaction, cosine similarities, SQL rule features
Linear head — 15 error categories

Error Categories (15)

ID	Category
0	SYNTAX_ERROR
1	JOIN_ERROR
2	AGGREGATION_ERROR
3	HAVING_WHERE_ERROR
4	SUBQUERY_ERROR
5	WINDOW_FUNCTION_ERROR
6	NULL_HANDLING_ERROR
7	DATE_FUNCTION_ERROR
8	COLUMN_REFERENCE_ERROR
9	TABLE_REFERENCE_ERROR
10	DATA_TYPE_ERROR
11	DUPLICATE_RECORD_ERROR
12	LOGICAL_QUERY_ERROR
13	PERFORMANCE_ERROR
14	FILTERING_ERROR

Usage

from src.huggingface import SQLLErrorClassifierHF

clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")

result = clf.predict(
    question="What is the average score of students in each department?",
    schema="students(id, name, score, department_id) | departments(id, name)",
    correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
    student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
)

print(result["label_name"])    # LOGICAL_QUERY_ERROR
print(result["confidence"])    # 0.94
print(result["similarities"])  # semantic alignment scores

Gradio Demo

Deploy as a Hugging Face Space with app.py from this repository.

Model Details

Encoder: sentence-transformers/all-MiniLM-L6-v2 (loaded from Hub, not bundled)
Head: scikit-learn SGDClassifier + StandardScaler
Size: ~5 MB classifier head (encoder ~80 MB, cached separately)
Inference: ~100–200 ms on CPU

Training Data

Synthetically generated from exercise templates with per-category error injectors. 1M balanced samples across 15 classes.

Citation

@misc{sql-error-classifier,
  title  = {SQL Error Classifier - Multi-Tower},
  author = {SQLErrorClassification},
  year   = {2025},
  publisher = {Hugging Face},
}