Spaces:

nishu08
/

sql-error-classifier-train

Sleeping

App Files Files Community

sql-error-classifier-train / hub /MODEL_CARD.md

nishu08

Deploy CodeBERT training Space

9b2cded verified 26 days ago

preview code

Raw

History Blame Contribute Delete

2.78 kB

	---
	language: en
	license: mit
	tags:
	- sql
	- education
	- text-classification
	- sentence-transformers
	- multi-tower
	pipeline_tag: text-classification
	---

	# SQL Error Classifier (Multi-Tower)

	Lightweight classifier that identifies which SQL mistake area a student is struggling with, given:

	- Question — natural-language task
	- Schema — available tables and columns
	- Correct query — reference solution
	- Student query — what the student submitted
	- Error message (optional) — database error text

	## Architecture

	Multi-tower semantic comparison using `sentence-transformers/all-MiniLM-L6-v2`:

	1. Intent tower — question + schema
	2. Reference tower — correct query
	3. Student tower — student query (+ error)
	4. Comparison layer — embedding diff, interaction, cosine similarities, SQL rule features
	5. Linear head — 15 error categories

	## Error Categories (15)

	\| ID \| Category \|
	\|----\|----------\|
	\| 0 \| SYNTAX_ERROR \|
	\| 1 \| JOIN_ERROR \|
	\| 2 \| AGGREGATION_ERROR \|
	\| 3 \| HAVING_WHERE_ERROR \|
	\| 4 \| SUBQUERY_ERROR \|
	\| 5 \| WINDOW_FUNCTION_ERROR \|
	\| 6 \| NULL_HANDLING_ERROR \|
	\| 7 \| DATE_FUNCTION_ERROR \|
	\| 8 \| COLUMN_REFERENCE_ERROR \|
	\| 9 \| TABLE_REFERENCE_ERROR \|
	\| 10 \| DATA_TYPE_ERROR \|
	\| 11 \| DUPLICATE_RECORD_ERROR \|
	\| 12 \| LOGICAL_QUERY_ERROR \|
	\| 13 \| PERFORMANCE_ERROR \|
	\| 14 \| FILTERING_ERROR \|

	## Usage

	```python
	from src.huggingface import SQLLErrorClassifierHF

	clf = SQLLErrorClassifierHF.from_pretrained("YOUR_USERNAME/sql-error-classifier")

	result = clf.predict(
	question="What is the average score of students in each department?",
	schema="students(id, name, score, department_id) \| departments(id, name)",
	correct_query="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
	student_query="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
	)

	print(result["label_name"]) # LOGICAL_QUERY_ERROR
	print(result["confidence"]) # 0.94
	print(result["similarities"]) # semantic alignment scores
	```

	## Gradio Demo

	Deploy as a [Hugging Face Space](https://huggingface.co/docs/hub/spaces) with `app.py` from this repository.

	## Model Details

	- Encoder: `sentence-transformers/all-MiniLM-L6-v2` (loaded from Hub, not bundled)
	- Head: scikit-learn SGDClassifier + StandardScaler
	- Size: ~5 MB classifier head (encoder ~80 MB, cached separately)
	- Inference: ~100–200 ms on CPU

	## Training Data

	Synthetically generated from exercise templates with per-category error injectors.
	1M balanced samples across 15 classes.

	## Citation

	```bibtex
	@misc{sql-error-classifier,
	title = {SQL Error Classifier - Multi-Tower},
	author = {SQLErrorClassification},
	year = {2025},
	publisher = {Hugging Face},
	}
	```