xczou
/

distilbert-intent-sql-creative-general

Text Classification

intent-classification

triton-inference-server

Model card Files Files and versions

distilbert-intent-sql-creative-general / README.md

xczou's picture

Upload README.md with huggingface_hub

60a1507 verified 15 days ago

|

history blame contribute delete

1.49 kB

	---
	language: en
	license: apache-2.0
	base_model: distilbert/distilbert-base-uncased
	tags:
	- text-classification
	- intent-classification
	- onnx
	- triton-inference-server
	datasets:
	- custom
	pipeline_tag: text-classification
	---

	# distilbert-intent-sql-creative-general

	Fine-tuned [distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) for 3-class intent routing in an LLM inference pipeline.

	## Purpose

	Routes user prompts to the appropriate vLLM LoRA adapter on a Triton Inference Server:

	\| Label \| ID \| Routes to \|
	\|---\|---\|---\|
	\| `GENERAL` \| 0 \| Qwen2.5-7B-Instruct (no LoRA) \|
	\| `SQL` \| 1 \| `sql-expert` LoRA adapter \|
	\| `CREATIVE` \| 2 \| `creative` LoRA adapter \|

	## Training

	- Base model: `distilbert/distilbert-base-uncased`
	- Dataset: 84 hand-curated examples (SQL=30, CREATIVE=23, GENERAL=31)
	- Epochs: 5
	- Learning rate: 2e-5
	- Batch size: 16
	- Max sequence length: 128
	- Optimizer: AdamW (weight_decay=0.01)
	- Val split: 20% stratified

	## Deployment

	Exported to ONNX (opset 17) via [optimum](https://github.com/huggingface/optimum) and served
	as an ONNX Runtime backend model inside NVIDIA Triton Inference Server on GKE Autopilot
	(NVIDIA L4 GPU).

	## Usage

	```python
	from transformers import pipeline

	classifier = pipeline("text-classification", model="xczou/distilbert-intent-sql-creative-general")
	classifier("Write a SQL query to find all orders above 100")
	# [{'label': 'SQL', 'score': 0.98}]
	```