xczou's picture
Upload README.md with huggingface_hub
60a1507 verified
metadata
language: en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
  - text-classification
  - intent-classification
  - onnx
  - triton-inference-server
datasets:
  - custom
pipeline_tag: text-classification

distilbert-intent-sql-creative-general

Fine-tuned distilbert-base-uncased for 3-class intent routing in an LLM inference pipeline.

Purpose

Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server:

Label ID Routes to
GENERAL 0 Qwen2.5-7B-Instruct (no LoRA)
SQL 1 sql-expert LoRA adapter
CREATIVE 2 creative LoRA adapter

Training

  • Base model: distilbert/distilbert-base-uncased
  • Dataset: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31)
  • Epochs: 5
  • Learning rate: 2e-5
  • Batch size: 16
  • Max sequence length: 128
  • Optimizer: AdamW (weight_decay=0.01)
  • Val split: 20% stratified

Deployment

Exported to ONNX (opset 17) via optimum and served as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot (NVIDIA L4 GPU).

Usage

from transformers import pipeline

classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general")
classifier("Write a SQL query to find all orders above 100")
# [{'label': 'SQL', 'score': 0.98}]