ONNX
tgupj commited on
Commit
d42dcd8
·
verified ·
1 Parent(s): 075f421

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -3
README.md CHANGED
@@ -1,3 +1,153 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model:
4
+ - microsoft/deberta-v3-small
5
+ ---
6
+
7
+ # tiny-router
8
+
9
+ `tiny-router` is a compact experimental multi-head routing classifier for short, domain-neutral messages with optional interaction context. It predicts four separate signals that downstream systems or agents can use for update handling, action routing, memory policy, and prioritization.
10
+
11
+ ## What it predicts
12
+
13
+ - `relation_to_previous`
14
+ - `new | follow_up | correction | confirmation | cancellation | closure`
15
+ - `actionability`
16
+ - `none | review | act`
17
+ - `retention`
18
+ - `ephemeral | useful | remember`
19
+ - `urgency`
20
+ - `low | medium | high`
21
+
22
+ The model emits these heads independently at inference time, plus calibrated confidences and an `overall_confidence`.
23
+
24
+ ## Intended use
25
+
26
+ - Route short user messages into lightweight automation tiers.
27
+ - Detect whether a message updates prior context or starts something new.
28
+ - Decide whether action is required, review is safer, or no action is needed.
29
+ - Separate disposable details from short-term useful context and longer-term memory candidates.
30
+ - Prioritize items by urgency.
31
+
32
+ Good use cases:
33
+
34
+ - routing message-like requests in assistants or productivity tools
35
+ - triaging follow-ups, corrections, confirmations, and closures
36
+ - conservative automation with review fallback
37
+
38
+ Not good use cases:
39
+
40
+ - fully autonomous high-stakes action without guardrails
41
+ - domains that need expert reasoning or regulated decisions
42
+
43
+ ## Training data
44
+
45
+ This checkpoint was trained on the synthetic dataset split in:
46
+
47
+ - `data/synthetic/train.jsonl`
48
+ - `data/synthetic/validation.jsonl`
49
+ - `data/synthetic/test.jsonl`
50
+
51
+ The data follows a structured JSONL schema with:
52
+
53
+ - `current_text`
54
+ - optional `interaction.previous_text`
55
+ - optional `interaction.previous_action`
56
+ - optional `interaction.previous_outcome`
57
+ - optional `interaction.recency_seconds`
58
+ - four label heads under `labels`
59
+
60
+ ## Model details
61
+
62
+ - Base encoder: `microsoft/deberta-v3-small`
63
+ - Architecture: encoder-only multitask classifier
64
+ - Pooling: learned attention pooling
65
+ - Structured features:
66
+ - canonicalized `previous_action` embedding
67
+ - `previous_outcome` embedding
68
+ - learned projection of `log1p(recency_seconds)`
69
+ - Head structure:
70
+ - dependency-aware multitask heads
71
+ - later heads condition on learned summaries of earlier head predictions
72
+ - Calibration:
73
+ - post-hoc per-head temperature scaling fit on validation logits
74
+
75
+ This checkpoint was trained with:
76
+
77
+ - `batch_size = 32`
78
+ - `epochs = 20`
79
+ - `max_length = 128`
80
+ - `encoder_lr = 2e-5`
81
+ - `head_lr = 1e-4`
82
+ - `dropout = 0.1`
83
+ - `pooling_type = attention`
84
+ - `use_head_dependencies = true`
85
+
86
+ ## Current results
87
+
88
+ Held-out test results from `artifacts/tiny-router/eval.json`:
89
+
90
+ - `macro_average_f1 = 0.7848`
91
+ - `exact_match = 0.4570`
92
+ - `automation_safe_accuracy = 0.6230`
93
+ - `automation_safe_coverage = 0.5430`
94
+ - `ECE = 0.3440`
95
+
96
+ Per-head macro F1:
97
+
98
+ - `relation_to_previous = 0.8415`
99
+ - `actionability = 0.7982`
100
+ - `retention = 0.7809`
101
+ - `urgency = 0.7187`
102
+
103
+ Ablations:
104
+
105
+ - `current_text_only = 0.7058`
106
+ - `current_plus_previous_text = 0.7478`
107
+ - `full_interaction = 0.7848`
108
+
109
+ Interpretation:
110
+
111
+ - interaction context helps
112
+ - actionability and urgency are usable but still imperfect
113
+ - high-confidence automation is possible only with conservative thresholds
114
+
115
+ ## Limitations
116
+
117
+ - The benchmark is task-specific and internal to this repo.
118
+ - The dataset is synthetic, so distribution shift to real product traffic is likely.
119
+ - Label quality on subtle boundaries still matters a lot.
120
+ - Confidence calibration is improved but not strong enough to justify broad unattended automation.
121
+
122
+ ## Example inference
123
+
124
+ ```json
125
+ {
126
+ "relation_to_previous": { "label": "correction", "confidence": 0.94 },
127
+ "actionability": { "label": "act", "confidence": 0.97 },
128
+ "retention": { "label": "useful", "confidence": 0.76 },
129
+ "urgency": { "label": "medium", "confidence": 0.81 },
130
+ "overall_confidence": 0.87
131
+ }
132
+ ```
133
+
134
+ ## How to load
135
+
136
+ This repo uses a custom checkpoint format. Load it with this project:
137
+
138
+ ```python
139
+ from tiny_router.io import load_checkpoint
140
+ from tiny_router.runtime import get_device
141
+
142
+ device = get_device(requested_device="cpu")
143
+ model, tokenizer, config = load_checkpoint("artifacts/tiny-router", device=device)
144
+ ```
145
+
146
+ Or run inference with:
147
+
148
+ ```bash
149
+ uv run python predict.py \
150
+ --model-dir artifacts/tiny-router \
151
+ --input-json '{"current_text":"Actually next Monday","interaction":{"previous_text":"Set a reminder for Friday","previous_action":"created_reminder","previous_outcome":"success","recency_seconds":45}}' \
152
+ --pretty
153
+ ```