File size: 1,268 Bytes
5c948aa
9b2cded
 
 
 
8464aea
 
5c948aa
9b2cded
 
5c948aa
 
9b2cded
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
title: SQL Error Classifier Training
emoji: 🧠
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
hardware: t4-small
---

# SQL Error Classifier — CodeBERT Training Space

Train `microsoft/codebert-base` as a **cross-encoder** for multi-label SQL error classification.

## Setup

1. **Hardware:** Settings → Hardware → **GPU t4-small** (recommended)
2. **Secrets:** Settings → Secrets → add `HF_TOKEN` (Hugging Face write token) to push models to your account
3. **Data:** Include `data/sql_errors_dev.parquet` in this Space repo, or upload parquet at runtime

## Usage

1. Choose bundled dataset or upload your own parquet
2. Set epochs, batch size, max samples
3. Click **Start Training**
4. Optionally enable **Push to Hub** with model id `your-username/sql-codebert-classifier`

## Dataset columns

Required (aliases supported):

| Column | Aliases |
|--------|---------|
| `question` | — |
| `schema` | — |
| `student_sql` | `query` |
| `correct_sql` | `correct_query` |
| `error_labels` | `label_name` |

## Labels (9-class multi-label)

`JOIN_ERROR`, `AGGREGATION_ERROR`, `FILTER_ERROR`, `WINDOW_FUNCTION_ERROR`,
`SUBQUERY_ERROR`, `NULL_HANDLING_ERROR`, `PERFORMANCE_ERROR`, `LOGICAL_ERROR`, `SYNTAX_ERROR`