--- language: en license: apache-2.0 base_model: distilbert/distilbert-base-uncased tags: - text-classification - intent-classification - onnx - triton-inference-server datasets: - custom pipeline_tag: text-classification --- # distilbert-intent-sql-creative-general Fine-tuned [distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) for 3-class intent routing in an LLM inference pipeline. ## Purpose Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server: | Label | ID | Routes to | |---|---|---| | `GENERAL` | 0 | Qwen2.5-7B-Instruct (no LoRA) | | `SQL` | 1 | `sql-expert` LoRA adapter | | `CREATIVE` | 2 | `creative` LoRA adapter | ## Training - **Base model**: `distilbert/distilbert-base-uncased` - **Dataset**: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31) - **Epochs**: 5 - **Learning rate**: 2e-5 - **Batch size**: 16 - **Max sequence length**: 128 - **Optimizer**: AdamW (weight_decay=0.01) - **Val split**: 20% stratified ## Deployment Exported to ONNX (opset 17) via [optimum](https://github.com/huggingface/optimum) and served as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot (NVIDIA L4 GPU). ## Usage ```python from transformers import pipeline classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general") classifier("Write a SQL query to find all orders above 100") # [{'label': 'SQL', 'score': 0.98}] ```