AgentIntentRouter / README.md
tripathyShaswata's picture
Add AgentIntentRouter model — DeBERTa-v3-base fine-tuned for agent intent classification
369cb70 verified
---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- text-classification
- intent-detection
- agent-routing
- mcp
- ai-agents
- distilbert
- tool-use
datasets:
- custom
language:
- en
metrics:
- accuracy
- f1
pipeline_tag: text-classification
library_name: transformers
---
# AgentIntentRouter
A fast, lightweight intent classifier for AI agent and MCP tool routing. Given a user message, it predicts which tool or capability the agent should invoke — in under 50ms on CPU.
Built on DistilBERT (66M params), fine-tuned on 12K+ diverse examples across 8 intent categories.
## Why This Exists
Every agent framework (LangChain, LangGraph, CrewAI, AutoGen) wastes an entire LLM call just to figure out *what the user wants*. That's 1-3 seconds and ~$0.01 per request — just for routing.
AgentIntentRouter replaces that first LLM call with a 66M classifier that runs in **~10ms on CPU** and **~2ms on GPU**. Use it as the first step in your agent pipeline to instantly route to the right tool.
## Intent Categories
| Label | Description | Example |
|-------|-------------|---------|
| `code_generation` | User wants code written, debugged, or refactored | "Write a Python function to parse CSV" |
| `web_search` | User wants to find information online | "What's the latest news on AI regulation" |
| `math_calculation` | User needs computation or conversion | "Calculate 15% of 4500" |
| `file_operation` | User wants to read, write, or manage files | "Read the config.json file" |
| `api_call` | User wants to interact with an external API | "Send a Slack message to the team" |
| `creative_writing` | User wants text composed or drafted | "Write a professional email to the client" |
| `data_analysis` | User wants data interpreted or compared | "Compare React vs Vue performance" |
| `general_chat` | Casual conversation, greetings, feedback | "Hey, how are you?" |
## Quick Start
```python
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
# Single prediction
result = router("Write a Python function to sort a list")
print(result)
# [{'label': 'code_generation', 'score': 0.98}]
# Batch prediction
messages = [
"Search for the latest AI papers",
"What's 25% of 1200?",
"Draft an email to my boss about the deadline",
"Hello!",
]
results = router(messages)
for msg, res in zip(messages, results):
print(f" {res['label']:>20} ({res['score']:.2f}) — {msg}")
```
## Use as Agent Router
```python
from transformers import pipeline
router = pipeline("text-classification", model="tripathyShaswata/AgentIntentRouter")
TOOL_MAP = {
"code_generation": handle_code_request,
"web_search": handle_search,
"math_calculation": handle_calculation,
"file_operation": handle_file_ops,
"api_call": handle_api_call,
"creative_writing": handle_writing,
"data_analysis": handle_analysis,
"general_chat": handle_chat,
}
def route(user_message: str):
intent = router(user_message)[0]
if intent["score"] < 0.5:
# Low confidence — fall back to LLM for routing
return fallback_llm_route(user_message)
handler = TOOL_MAP[intent["label"]]
return handler(user_message)
```
## Performance
- **Inference speed:** ~10ms on CPU, ~2ms on GPU
- **Model size:** ~260MB (DistilBERT-base)
- **Accuracy:** 100% on test set
### Evaluation Results
*Results on held-out test set (1,124 examples):*
| Metric | Score |
|--------|-------|
| Accuracy | 1.000 |
| F1 (weighted) | 1.000 |
*Per-class performance:*
| Intent | Precision | Recall | F1 | Support |
|--------|-----------|--------|-----|---------|
| code_generation | 1.000 | 1.000 | 1.000 | 130 |
| web_search | 1.000 | 1.000 | 1.000 | 151 |
| math_calculation | 1.000 | 1.000 | 1.000 | 153 |
| file_operation | 1.000 | 1.000 | 1.000 | 154 |
| api_call | 1.000 | 1.000 | 1.000 | 133 |
| creative_writing | 1.000 | 1.000 | 1.000 | 160 |
| data_analysis | 1.000 | 1.000 | 1.000 | 168 |
| general_chat | 1.000 | 1.000 | 1.000 | 75 |
> **Note:** These results are on synthetic test data from the same distribution as training. Real-world performance will vary. Use the confidence score threshold to handle ambiguous inputs gracefully.
## Training Details
- **Base model:** distilbert-base-uncased
- **Training data:** 8,987 examples (synthetic, template-generated with natural language variation)
- **Validation:** 1,123 examples
- **Test:** 1,124 examples
- **Epochs:** 3 (with early stopping, patience=2)
- **Learning rate:** 2e-5
- **Batch size:** 32
- **Max sequence length:** 128
- **Training time:** ~100 seconds on NVIDIA RTX 4070
- **Loss:** 0.0015 (training) / 0.0017 (validation)
## Limitations
- Trained on English text only
- Template-generated training data may not cover all edge cases
- Ambiguous messages (e.g., "help me with the API code") may get lower confidence scores — use the confidence threshold to fall back to an LLM
- Not designed for multi-intent messages (e.g., "search for X and write code for Y")
## License
Apache 2.0 — use it however you want, commercial included.
## Citation
If you use this model, a star on the repo is appreciated!