xczou's picture
Upload README.md with huggingface_hub
60a1507 verified
---
language: en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- text-classification
- intent-classification
- onnx
- triton-inference-server
datasets:
- custom
pipeline_tag: text-classification
---
# distilbert-intent-sql-creative-general
Fine-tuned [distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) for 3-class intent routing in an LLM inference pipeline.
## Purpose
Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server:
| Label | ID | Routes to |
|---|---|---|
| `GENERAL` | 0 | Qwen2.5-7B-Instruct (no LoRA) |
| `SQL` | 1 | `sql-expert` LoRA adapter |
| `CREATIVE` | 2 | `creative` LoRA adapter |
## Training
- **Base model**: `distilbert/distilbert-base-uncased`
- **Dataset**: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31)
- **Epochs**: 5
- **Learning rate**: 2e-5
- **Batch size**: 16
- **Max sequence length**: 128
- **Optimizer**: AdamW (weight_decay=0.01)
- **Val split**: 20% stratified
## Deployment
Exported to ONNX (opset 17) via [optimum](https://github.com/huggingface/optimum) and served
as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot
(NVIDIA L4 GPU).
## Usage
```python
from transformers import pipeline
classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general")
classifier("Write a SQL query to find all orders above 100")
# [{'label': 'SQL', 'score': 0.98}]
```