Add AgentIntentRouter model — DeBERTa-v3-base fine-tuned for agent intent classification

369cb70 verified about 2 months ago

5.23 kB

	---
	license: apache-2.0
	base_model: distilbert-base-uncased
	tags:
	- text-classification
	- intent-detection
	- agent-routing
	- mcp
	- ai-agents
	- distilbert
	- tool-use
	datasets:
	- custom
	language:
	- en
	metrics:
	- accuracy
	- f1
	pipeline_tag: text-classification
	library_name: transformers
	---

	# AgentIntentRouter

	A fast, lightweight intent classifier for AI agent and MCP tool routing. Given a user message, it predicts which tool or capability the agent should invoke — in under 50ms on CPU.

	Built on DistilBERT (66M params), fine-tuned on 12K+ diverse examples across 8 intent categories.

	## Why This Exists

	Every agent framework (LangChain, LangGraph, CrewAI, AutoGen) wastes an entire LLM call just to figure out what the user wants. That's 1-3 seconds and ~$0.01 per request — just for routing.

	AgentIntentRouter replaces that first LLM call with a 66M classifier that runs in ~10ms on CPU and ~2ms on GPU. Use it as the first step in your agent pipeline to instantly route to the right tool.

	## Intent Categories

	\| Label \| Description \| Example \|
	\|-------\|-------------\|---------\|
	\| `code_generation` \| User wants code written, debugged, or refactored \| "Write a Python function to parse CSV" \|
	\| `web_search` \| User wants to find information online \| "What's the latest news on AI regulation" \|
	\| `math_calculation` \| User needs computation or conversion \| "Calculate 15% of 4500" \|
	\| `file_operation` \| User wants to read, write, or manage files \| "Read the config.json file" \|
	\| `api_call` \| User wants to interact with an external API \| "Send a Slack message to the team" \|
	\| `creative_writing` \| User wants text composed or drafted \| "Write a professional email to the client" \|
	\| `data_analysis` \| User wants data interpreted or compared \| "Compare React vs Vue performance" \|
	\| `general_chat` \| Casual conversation, greetings, feedback \| "Hey, how are you?" \|

	## Quick Start

	```python
	from transformers import pipeline

	router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")

	# Single prediction
	result = router("Write a Python function to sort a list")
	print(result)
	# [{'label': 'code_generation', 'score': 0.98}]

	# Batch prediction
	messages = [
	"Search for the latest AI papers",
	"What's 25% of 1200?",
	"Draft an email to my boss about the deadline",
	"Hello!",
	]
	results = router(messages)
	for msg, res in zip(messages, results):
	print(f" {res['label']:>20} ({res['score']:.2f}) — {msg}")
	```

	## Use as Agent Router

	```python
	from transformers import pipeline

	router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")

	TOOL_MAP = {
	"code_generation": handle_code_request,
	"web_search": handle_search,
	"math_calculation": handle_calculation,
	"file_operation": handle_file_ops,
	"api_call": handle_api_call,
	"creative_writing": handle_writing,
	"data_analysis": handle_analysis,
	"general_chat": handle_chat,
	}

	def route(user_message: str):
	intent = router(user_message)[0]

	if intent["score"] < 0.5:
	# Low confidence — fall back to LLM for routing
	return fallback_llm_route(user_message)

	handler = TOOL_MAP[intent["label"]]
	return handler(user_message)
	```

	## Performance

	- Inference speed: ~10ms on CPU, ~2ms on GPU
	- Model size: ~260MB (DistilBERT-base)
	- Accuracy: 100% on test set

	### Evaluation Results

	Results on held-out test set (1,124 examples):

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Accuracy \| 1.000 \|
	\| F1 (weighted) \| 1.000 \|

	Per-class performance:

	\| Intent \| Precision \| Recall \| F1 \| Support \|
	\|--------\|-----------\|--------\|-----\|---------\|
	\| code_generation \| 1.000 \| 1.000 \| 1.000 \| 130 \|
	\| web_search \| 1.000 \| 1.000 \| 1.000 \| 151 \|
	\| math_calculation \| 1.000 \| 1.000 \| 1.000 \| 153 \|
	\| file_operation \| 1.000 \| 1.000 \| 1.000 \| 154 \|
	\| api_call \| 1.000 \| 1.000 \| 1.000 \| 133 \|
	\| creative_writing \| 1.000 \| 1.000 \| 1.000 \| 160 \|
	\| data_analysis \| 1.000 \| 1.000 \| 1.000 \| 168 \|
	\| general_chat \| 1.000 \| 1.000 \| 1.000 \| 75 \|

	> Note: These results are on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.

	## Training Details

	- Base model: distilbert-base-uncased
	- Training data: 8,987 examples (synthetic, template-generated with natural language variation)
	- Validation: 1,123 examples
	- Test: 1,124 examples
	- Epochs: 3 (with early stopping, patience=2)
	- Learning rate: 2e-5
	- Batch size: 32
	- Max sequence length: 128
	- Training time: ~100 seconds on NVIDIA RTX 4070
	- Loss: 0.0015 (training) / 0.0017 (validation)

	## Limitations

	- Trained on English text only
	- Template-generated training data may not cover all edge cases
	- Ambiguous messages (e.g., "help me with the API code") may get lower confidence scores — use the confidence threshold to fall back to an LLM
	- Not designed for multi-intent messages (e.g., "search for X and write code for Y")

	## License

	Apache 2.0 — use it however you want, commercial included.

	## Citation

	If you use this model, a star on the repo is appreciated!