File size: 1,493 Bytes

7b88c3e
60a1507
 
 
 
 
 
 
 
 
 
 
7b88c3e
 
60a1507
7b88c3e
60a1507
7b88c3e
60a1507
7b88c3e
60a1507
7b88c3e
60a1507
 
 
 
 
7b88c3e
60a1507
7b88c3e
60a1507
 
 
 
 
 
 
 
7b88c3e
60a1507
7b88c3e
60a1507
 
 
7b88c3e
60a1507
7b88c3e
60a1507
 
7b88c3e
60a1507

---
language: en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - text-classification
  - intent-classification
  - onnx
  - triton-inference-server
datasets:
  - custom
pipeline_tag: text-classification
---

# distilbert-intent-sql-creative-general

Fine-tuned [distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) for 3-class intent routing in an LLM inference pipeline.

## Purpose

Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server:

| Label | ID | Routes to |
|---|---|---|
| `GENERAL` | 0 | Qwen2.5-7B-Instruct (no LoRA) |
| `SQL` | 1 | `sql-expert` LoRA adapter |
| `CREATIVE` | 2 | `creative` LoRA adapter |

## Training

- **Base model**: `distilbert/distilbert-base-uncased`
- **Dataset**: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31)
- **Epochs**: 5
- **Learning rate**: 2e-5
- **Batch size**: 16
- **Max sequence length**: 128
- **Optimizer**: AdamW (weight_decay=0.01)
- **Val split**: 20% stratified

## Deployment

Exported to ONNX (opset 17) via [optimum](https://github.com/huggingface/optimum) and served
as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot
(NVIDIA L4 GPU).

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general")
classifier("Write a SQL query to find all orders above 100")
# [{'label': 'SQL', 'score': 0.98}]
```