| --- |
| language: en |
| license: apache-2.0 |
| base_model: distilbert/distilbert-base-uncased |
| tags: |
| - text-classification |
| - intent-classification |
| - onnx |
| - triton-inference-server |
| datasets: |
| - custom |
| pipeline_tag: text-classification |
| --- |
| |
| # distilbert-intent-sql-creative-general |
|
|
| Fine-tuned [distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) for 3-class intent routing in an LLM inference pipeline. |
|
|
| ## Purpose |
|
|
| Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server: |
|
|
| | Label | ID | Routes to | |
| |---|---|---| |
| | `GENERAL` | 0 | Qwen2.5-7B-Instruct (no LoRA) | |
| | `SQL` | 1 | `sql-expert` LoRA adapter | |
| | `CREATIVE` | 2 | `creative` LoRA adapter | |
|
|
| ## Training |
|
|
| - **Base model**: `distilbert/distilbert-base-uncased` |
| - **Dataset**: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31) |
| - **Epochs**: 5 |
| - **Learning rate**: 2e-5 |
| - **Batch size**: 16 |
| - **Max sequence length**: 128 |
| - **Optimizer**: AdamW (weight_decay=0.01) |
| - **Val split**: 20% stratified |
| |
| ## Deployment |
| |
| Exported to ONNX (opset 17) via [optimum](https://github.com/huggingface/optimum) and served |
| as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot |
| (NVIDIA L4 GPU). |
| |
| ## Usage |
| |
| ```python |
| from transformers import pipeline |
| |
| classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general") |
| classifier("Write a SQL query to find all orders above 100") |
| # [{'label': 'SQL', 'score': 0.98}] |
| ``` |
| |