| --- |
| license: mit |
| base_model: |
| - microsoft/deberta-v3-small |
| datasets: |
| - tgupj/tiny-router-data |
| --- |
| |
| # tiny-router |
|
|
| `tiny-router` is a compact experimental multi-head routing classifier for short, domain-neutral messages with optional interaction context. It predicts four separate signals that downstream systems or agents can use for update handling, action routing, memory policy, and prioritization. |
|
|
| ## What it predicts |
|
|
| ``` |
| relation_to_previous: new | follow_up | correction | confirmation | cancellation | closure |
| actionability: none | review | act |
| retention: ephemeral | useful | remember |
| urgency: low | medium | high |
| ``` |
|
|
| The model emits these heads independently at inference time, plus calibrated confidences and an `overall_confidence`. |
|
|
| ## Intended use |
|
|
| - Route short user messages into lightweight automation tiers. |
| - Detect whether a message updates prior context or starts something new. |
| - Decide whether action is required, review is safer, or no action is needed. |
| - Separate disposable details from short-term useful context and longer-term memory candidates. |
| - Prioritize items by urgency. |
|
|
| Good use cases: |
|
|
| - routing message-like requests in assistants or productivity tools |
| - triaging follow-ups, corrections, confirmations, and closures |
| - conservative automation with review fallback |
|
|
| Not good use cases: |
|
|
| - fully autonomous high-stakes action without guardrails |
| - domains that need expert reasoning or regulated decisions |
|
|
| ## Training data |
|
|
| This checkpoint was trained on the synthetic dataset split in: |
|
|
| - `data/synthetic/train.jsonl` |
| - `data/synthetic/validation.jsonl` |
| - `data/synthetic/test.jsonl` |
|
|
| The data follows a structured JSONL schema with: |
|
|
| - `current_text` |
| - optional `interaction.previous_text` |
| - optional `interaction.previous_action` |
| - optional `interaction.previous_outcome` |
| - optional `interaction.recency_seconds` |
| - four label heads under `labels` |
|
|
| ## Model details |
|
|
| - Base encoder: `microsoft/deberta-v3-small` |
| - Architecture: encoder-only multitask classifier |
| - Pooling: learned attention pooling |
| - Structured features: |
| - canonicalized `previous_action` embedding |
| - `previous_outcome` embedding |
| - learned projection of `log1p(recency_seconds)` |
| - Head structure: |
| - dependency-aware multitask heads |
| - later heads condition on learned summaries of earlier head predictions |
| - Calibration: |
| - post-hoc per-head temperature scaling fit on validation logits |
|
|
| This checkpoint was trained with: |
|
|
| - `batch_size = 32` |
| - `epochs = 20` |
| - `max_length = 128` |
| - `encoder_lr = 2e-5` |
| - `head_lr = 1e-4` |
| - `dropout = 0.1` |
| - `pooling_type = attention` |
| - `use_head_dependencies = true` |
|
|
| ## Current results |
|
|
| Held-out test results from `artifacts/tiny-router/eval.json`: |
|
|
| - `macro_average_f1 = 0.7848` |
| - `exact_match = 0.4570` |
| - `automation_safe_accuracy = 0.6230` |
| - `automation_safe_coverage = 0.5430` |
| - `ECE = 0.3440` |
|
|
| Per-head macro F1: |
|
|
| - `relation_to_previous = 0.8415` |
| - `actionability = 0.7982` |
| - `retention = 0.7809` |
| - `urgency = 0.7187` |
|
|
| Ablations: |
|
|
| - `current_text_only = 0.7058` |
| - `current_plus_previous_text = 0.7478` |
| - `full_interaction = 0.7848` |
|
|
| Interpretation: |
|
|
| - interaction context helps |
| - actionability and urgency are usable but still imperfect |
| - high-confidence automation is possible only with conservative thresholds |
|
|
| ## Limitations |
|
|
| - The benchmark is task-specific and internal to this repo. |
| - The dataset is synthetic, so distribution shift to real product traffic is likely. |
| - Label quality on subtle boundaries still matters a lot. |
| - Confidence calibration is improved but not strong enough to justify broad unattended automation. |
|
|
| ## Example inference |
|
|
| ```json |
| { |
| "relation_to_previous": { "label": "correction", "confidence": 0.94 }, |
| "actionability": { "label": "act", "confidence": 0.97 }, |
| "retention": { "label": "useful", "confidence": 0.76 }, |
| "urgency": { "label": "medium", "confidence": 0.81 }, |
| "overall_confidence": 0.87 |
| } |
| ``` |
|
|
| ## How to load |
|
|
| This repo uses a custom checkpoint format. Load it with this project: |
|
|
| ```python |
| from tiny_router.io import load_checkpoint |
| from tiny_router.runtime import get_device |
| |
| device = get_device(requested_device="cpu") |
| model, tokenizer, config = load_checkpoint("artifacts/tiny-router", device=device) |
| ``` |
|
|
| Or run inference with: |
|
|
| ```bash |
| uv run python predict.py \ |
| --model-dir artifacts/tiny-router \ |
| --input-json '{"current_text":"Actually next Monday","interaction":{"previous_text":"Set a reminder for Friday","previous_action":"created_reminder","previous_outcome":"success","recency_seconds":45}}' \ |
| --pretty |
| ``` |