prathamkode commited on
Commit
dbb5d78
Β·
verified Β·
1 Parent(s): 449878f

Make repo self-contained: rewrite docs, single-model benchmark, remove external references

Browse files
.gitattributes CHANGED
@@ -1,4 +1,3 @@
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
  *.pt filter=lfs diff=lfs merge=lfs -text
3
- benchmark/charts/intent_accuracy_delta.png filter=lfs diff=lfs merge=lfs -text
4
  benchmark/charts/per_intent_accuracy.png filter=lfs diff=lfs merge=lfs -text
 
1
  *.onnx filter=lfs diff=lfs merge=lfs -text
2
  *.pt filter=lfs diff=lfs merge=lfs -text
 
3
  benchmark/charts/per_intent_accuracy.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
14
 
15
  # Smartwatch LM v0.2
16
 
17
- Exported from [collab-run-2](../collab-run-2) training. Small GPT for wrist-wearable chat with intent tags like `<INTENT:GET_STEPS>`.
18
 
19
  ## Model details
20
 
@@ -24,20 +24,22 @@ Exported from [collab-run-2](../collab-run-2) training. Small GPT for wrist-wear
24
  | Context length | 256 tokens |
25
  | Vocab size | 5533 (BPE) |
26
  | Best val loss | 0.3243 |
27
- | Training data | tinydata.txt, deepdata.txt, tinydata1.txt, data1.txt, data3.txt |
28
  | Export version | 0.2 |
29
 
30
  ## Files
31
 
32
  | File | Purpose |
33
  |------|---------|
34
- | `smartwatch_lm_merged.onnx` | On-device inference (ONNX Runtime, opset 17, ~0 MB) |
35
  | `checkpoint.pt` | PyTorch weights |
36
  | `tokenizer.json` | BPE tokenizer |
37
  | `config.json` | Architecture + ONNX I/O |
38
  | `model.py` / `chat.py` | PyTorch load + REPL |
39
  | `reply_utils.py` | BPE cleanup, intent parse, slot fill |
40
  | `onnx_sample.py` | ONNX generate sample |
 
 
41
 
42
  ## Quick start
43
 
@@ -56,7 +58,23 @@ python chat.py
56
  - **Input:** `input_ids` int64 `[batch, seq]` (max seq = 256)
57
  - **Output:** `logits` float `[batch, seq, vocab_size]`
58
 
59
- See [Export-0.1 docs](docs/) for gibberish cleanup, intents, and device integration.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ## License
62
 
 
14
 
15
  # Smartwatch LM v0.2
16
 
17
+ Small GPT for wrist-wearable chat. The model replies in natural language and emits intent tags like `<INTENT:GET_STEPS>` plus slot placeholders such as `<STEPS_TODAY>` that your app fills from live sensor data.
18
 
19
  ## Model details
20
 
 
24
  | Context length | 256 tokens |
25
  | Vocab size | 5533 (BPE) |
26
  | Best val loss | 0.3243 |
27
+ | ONNX size | ~60 MB |
28
  | Export version | 0.2 |
29
 
30
  ## Files
31
 
32
  | File | Purpose |
33
  |------|---------|
34
+ | `smartwatch_lm_merged.onnx` | On-device inference (ONNX Runtime, opset 17) |
35
  | `checkpoint.pt` | PyTorch weights |
36
  | `tokenizer.json` | BPE tokenizer |
37
  | `config.json` | Architecture + ONNX I/O |
38
  | `model.py` / `chat.py` | PyTorch load + REPL |
39
  | `reply_utils.py` | BPE cleanup, intent parse, slot fill |
40
  | `onnx_sample.py` | ONNX generate sample |
41
+ | `docs/` | Integration, intent reference, output cleanup |
42
+ | `benchmark/` | Golden prompts, quality report, charts |
43
 
44
  ## Quick start
45
 
 
58
  - **Input:** `input_ids` int64 `[batch, seq]` (max seq = 256)
59
  - **Output:** `logits` float `[batch, seq, vocab_size]`
60
 
61
+ Sample logits at the last position autoregressively until EOS or `max_new_tokens`. Recommended settings: `temperature=0.5`, `top_k=40`, `max_new_tokens=40`.
62
+
63
+ ## Documentation
64
+
65
+ | Guide | Description |
66
+ |-------|-------------|
67
+ | [Avoiding gibberish](docs/avoiding-gibberish.md) | BPE cleanup, truncation rules, sample scripts |
68
+ | [Intent reference](docs/intent-reference.md) | All 35 intents and slot placeholders |
69
+ | [Smartwatch integration](docs/smartwatch-integration.md) | End-to-end device wiring |
70
+
71
+ ## Benchmarks
72
+
73
+ Quality evaluation on 39 golden prompts is in [`benchmark/`](benchmark/):
74
+
75
+ - [`benchmark/report.json`](benchmark/report.json) β€” full per-prompt results
76
+ - [`benchmark/charts/`](benchmark/charts/) β€” metric charts
77
+ - [`benchmark/benchmark_prompts.json`](benchmark/benchmark_prompts.json) β€” golden prompt set
78
 
79
  ## License
80
 
benchmark/README.md CHANGED
@@ -1,52 +1,10 @@
1
- # Smartwatch LM Quality Benchmark
2
 
3
- Side-by-side quality comparison for exported models (default: **Export-0.1** vs **export-0.2**).
4
-
5
- ## Setup
6
-
7
- ```bash
8
- pip install -r benchmark/requirements.txt
9
- ```
10
-
11
- Both export folders must contain `checkpoint.pt` and `tokenizer.json`.
12
-
13
- ## Run
14
-
15
- ```bash
16
- python benchmark/benchmark_quality.py
17
- ```
18
-
19
- Common options:
20
-
21
- ```bash
22
- python benchmark/benchmark_quality.py --verbose
23
- python benchmark/benchmark_quality.py --output benchmark/report.json
24
- python benchmark/benchmark_quality.py --model-a Export-0.1 --model-b export-0.2 --seed 42
25
- ```
26
-
27
- Generation defaults match production guidance: `max_new_tokens=40`, `temperature=0.5`, `top_k=40`, fresh history per prompt, fixed seed for reproducibility.
28
-
29
- ## Metrics
30
-
31
- | Metric | Meaning |
32
- |--------|---------|
33
- | **Intent accuracy** | Predicted intent matches expected (or is in `expected_intents` for combo cases) |
34
- | **Intent parse rate** | Reply contains a valid `<INTENT:...>` tag |
35
- | **Clean output rate** | Raw decode has no BPE junk (`Ġ`, `Ċ`) before cleanup |
36
- | **Slot presence** | Fraction of expected slot placeholders found in the cleaned template |
37
-
38
- Checkpoint `best_val_loss` is printed for reference only. v0.1 and v0.2 used different tokenizers and training corpora, so val loss is **not** a fair head-to-head score.
39
 
40
  ## Golden prompts
41
 
42
- [`benchmark_prompts.json`](benchmark_prompts.json) holds ~39 prompts (one per intent found in `tinydata.txt`, plus Colab demo prompts).
43
-
44
- Regenerate after training data changes:
45
-
46
- ```bash
47
- python benchmark/extract_prompts.py
48
- python benchmark/extract_prompts.py --data tinydata/tinydata.txt --output benchmark/benchmark_prompts.json
49
- ```
50
 
51
  Each entry:
52
 
@@ -57,44 +15,44 @@ Each entry:
57
  "expected_intent": "GET_STEPS",
58
  "expected_intents": ["GET_STEPS"],
59
  "expected_slots": ["STEPS_TODAY", "STEP_GOAL"],
60
- "source": "tinydata.txt"
61
  }
62
  ```
63
 
64
- ## Output
65
 
66
- The runner prints:
 
 
 
 
 
67
 
68
- - Overall metric table for both models
69
- - Per-intent accuracy breakdown
70
- - Counts of prompts only v0.1 got right, only v0.2 got right, and both missed
71
 
72
- With `--verbose`, it also prints per-prompt predicted intents and replies.
73
 
74
- With `--output`, it writes a JSON report suitable for CI or regression tracking.
 
 
 
 
 
 
75
 
76
- ## Charts
77
 
78
- By default the runner saves four PNG charts to `benchmark/charts/`:
79
 
80
  | Chart | Description |
81
  |-------|-------------|
82
- | `overall_metrics.png` | Grouped bar chart of all four quality metrics |
83
- | `per_intent_accuracy.png` | Side-by-side accuracy for each intent |
84
- | `head_to_head_wins.png` | Prompts both got right, only v0.1, only v0.2, both wrong |
85
- | `intent_accuracy_delta.png` | Per-intent delta (export-0.2 minus Export-0.1) |
86
 
87
- ```bash
88
- python benchmark/benchmark_quality.py --output benchmark/report.json
89
- python benchmark/benchmark_quality.py --no-charts
90
- python benchmark/benchmark_quality.py --charts-dir path/to/charts
91
- ```
92
-
93
- Regenerate charts from an existing JSON report without re-running models:
94
 
95
  ```bash
 
96
  python benchmark/benchmark_charts.py --report benchmark/report.json
97
  python benchmark/benchmark_charts.py --report benchmark/report.json --output-dir benchmark/charts
98
  ```
99
-
100
- Requires `matplotlib` (included in `requirements.txt`).
 
1
+ # Smartwatch LM v0.2 β€” Quality Benchmark
2
 
3
+ Quality evaluation for **Smartwatch LM v0.2** on a fixed set of 39 golden prompts covering all supported intents.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Golden prompts
6
 
7
+ [`benchmark_prompts.json`](benchmark_prompts.json) holds 39 prompts β€” one per intent category plus multi-intent and conversational cases.
 
 
 
 
 
 
 
8
 
9
  Each entry:
10
 
 
15
  "expected_intent": "GET_STEPS",
16
  "expected_intents": ["GET_STEPS"],
17
  "expected_slots": ["STEPS_TODAY", "STEP_GOAL"],
18
+ "source": "golden_set"
19
  }
20
  ```
21
 
22
+ ## Metrics
23
 
24
+ | Metric | Meaning |
25
+ |--------|---------|
26
+ | **Intent accuracy** | Predicted intent matches expected (or is in `expected_intents` for combo cases) |
27
+ | **Intent parse rate** | Reply contains a valid `<INTENT:...>` tag |
28
+ | **Clean output rate** | Raw decode has no BPE junk (`Ġ`, `Ċ`) before cleanup |
29
+ | **Slot presence** | Fraction of expected slot placeholders found in the cleaned template |
30
 
31
+ Generation settings used for evaluation: `max_new_tokens=40`, `temperature=0.5`, `top_k=40`, fresh history per prompt, fixed seed for reproducibility.
 
 
32
 
33
+ ## v0.2 results
34
 
35
+ | Metric | Score |
36
+ |--------|------:|
37
+ | Intent accuracy | 100% |
38
+ | Intent parse rate | 100% |
39
+ | Clean output rate | 100% |
40
+ | Slot presence | 96.2% |
41
+ | Best val loss | 0.3243 |
42
 
43
+ Full per-prompt results are in [`report.json`](report.json).
44
 
45
+ ## Charts
46
 
47
  | Chart | Description |
48
  |-------|-------------|
49
+ | `charts/overall_metrics.png` | Bar chart of all four quality metrics |
50
+ | `charts/per_intent_accuracy.png` | Accuracy for each intent |
 
 
51
 
52
+ Regenerate charts from the stored report:
 
 
 
 
 
 
53
 
54
  ```bash
55
+ pip install -r benchmark/requirements.txt
56
  python benchmark/benchmark_charts.py --report benchmark/report.json
57
  python benchmark/benchmark_charts.py --report benchmark/report.json --output-dir benchmark/charts
58
  ```
 
 
benchmark/benchmark_charts.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Generate quality charts from a single-model benchmark report.
3
+
4
+ Usage:
5
+ python benchmark/benchmark_charts.py --report benchmark/report.json
6
+ python benchmark/benchmark_charts.py --report benchmark/report.json --output-dir benchmark/charts
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ import argparse
12
+ import json
13
+ from pathlib import Path
14
+
15
+ import matplotlib.pyplot as plt
16
+ import matplotlib.ticker as mticker
17
+ import numpy as np
18
+
19
+ BENCHMARK_DIR = Path(__file__).resolve().parent
20
+ DEFAULT_OUTPUT_DIR = BENCHMARK_DIR / "charts"
21
+
22
+ COLOR = "#55A868"
23
+
24
+ METRIC_LABELS = [
25
+ ("intent_accuracy", "Intent accuracy"),
26
+ ("intent_parse_rate", "Intent parse rate"),
27
+ ("clean_output_rate", "Clean output rate"),
28
+ ("slot_presence", "Slot presence"),
29
+ ]
30
+
31
+
32
+ def _pct_axis(ax) -> None:
33
+ ax.yaxis.set_major_formatter(mticker.PercentFormatter(xmax=1.0, decimals=0))
34
+ ax.set_ylim(0, 1.05)
35
+
36
+
37
+ def _model_from_report(data: dict) -> dict:
38
+ models = data.get("models") or []
39
+ if len(models) != 1:
40
+ raise ValueError("Report must contain exactly one model.")
41
+ return models[0]
42
+
43
+
44
+ def save_overall_metrics_chart(model: dict, output_path: Path) -> None:
45
+ labels = [label for _, label in METRIC_LABELS]
46
+ keys = [key for key, _ in METRIC_LABELS]
47
+ values = [model[key] for key in keys]
48
+
49
+ x = np.arange(len(labels))
50
+ width = 0.5
51
+
52
+ fig, ax = plt.subplots(figsize=(9, 5))
53
+ bars = ax.bar(x, values, width, label=model["name"], color=COLOR)
54
+
55
+ ax.set_title("Overall Quality Metrics")
56
+ ax.set_xticks(x)
57
+ ax.set_xticklabels(labels, rotation=15, ha="right")
58
+ ax.set_ylabel("Score")
59
+ _pct_axis(ax)
60
+ ax.legend(loc="lower right")
61
+ ax.grid(axis="y", alpha=0.3)
62
+ ax.bar_label(bars, fmt="%.0f%%", padding=2, labels=[f"{v * 100:.0f}%" for v in values])
63
+
64
+ fig.tight_layout()
65
+ fig.savefig(output_path, dpi=150)
66
+ plt.close(fig)
67
+
68
+
69
+ def save_per_intent_chart(model: dict, output_path: Path) -> None:
70
+ per_intent = model.get("per_intent") or {}
71
+ intents = sorted(per_intent)
72
+ accuracies = [per_intent[i].get("accuracy", 0.0) for i in intents]
73
+
74
+ y = np.arange(len(intents))
75
+ fig_h = max(6, len(intents) * 0.28)
76
+ fig, ax = plt.subplots(figsize=(10, fig_h))
77
+ ax.barh(y, accuracies, color=COLOR)
78
+
79
+ ax.set_title("Per-Intent Accuracy")
80
+ ax.set_yticks(y)
81
+ ax.set_yticklabels(intents, fontsize=8)
82
+ ax.set_xlabel("Accuracy")
83
+ ax.invert_yaxis()
84
+ ax.xaxis.set_major_formatter(mticker.PercentFormatter(xmax=1.0, decimals=0))
85
+ ax.set_xlim(0, 1.05)
86
+ ax.grid(axis="x", alpha=0.3)
87
+
88
+ fig.tight_layout()
89
+ fig.savefig(output_path, dpi=150)
90
+ plt.close(fig)
91
+
92
+
93
+ def save_charts_from_report(report_path: Path, output_dir: Path) -> list[Path]:
94
+ data = json.loads(report_path.read_text(encoding="utf-8"))
95
+ model = _model_from_report(data)
96
+
97
+ output_dir.mkdir(parents=True, exist_ok=True)
98
+ charts = [
99
+ output_dir / "overall_metrics.png",
100
+ output_dir / "per_intent_accuracy.png",
101
+ ]
102
+
103
+ save_overall_metrics_chart(model, charts[0])
104
+ save_per_intent_chart(model, charts[1])
105
+ return charts
106
+
107
+
108
+ def main() -> None:
109
+ parser = argparse.ArgumentParser(description="Generate benchmark charts for Smartwatch LM v0.2")
110
+ parser.add_argument("--report", type=Path, required=True, help="JSON report")
111
+ parser.add_argument("--output-dir", type=Path, default=DEFAULT_OUTPUT_DIR)
112
+ args = parser.parse_args()
113
+
114
+ if not args.report.is_file():
115
+ raise SystemExit(f"Report not found: {args.report}")
116
+
117
+ charts = save_charts_from_report(args.report, args.output_dir)
118
+ print(f"Wrote {len(charts)} charts -> {args.output_dir}")
119
+ for path in charts:
120
+ print(f" {path.name}")
121
+
122
+
123
+ if __name__ == "__main__":
124
+ main()
benchmark/benchmark_prompts.json CHANGED
@@ -7,7 +7,7 @@
7
  "NONE"
8
  ],
9
  "expected_slots": [],
10
- "source": "tinydata.txt"
11
  },
12
  {
13
  "id": "get_steps_hey_am_i_doing_okay_with_steps_today_or_01",
@@ -20,7 +20,7 @@
20
  "STEPS_TODAY",
21
  "STEP_GOAL"
22
  ],
23
- "source": "tinydata.txt"
24
  },
25
  {
26
  "id": "set_step_goal_i_want_to_change_my_default_daily_step_t_02",
@@ -32,7 +32,7 @@
32
  "expected_slots": [
33
  "STEP_GOAL"
34
  ],
35
- "source": "tinydata.txt"
36
  },
37
  {
38
  "id": "get_workout_status_how_long_have_i_been_on_my_run_and_what_03",
@@ -45,7 +45,7 @@
45
  "WORKOUT_ELAPSED",
46
  "WORKOUT_TYPE"
47
  ],
48
- "source": "tinydata.txt"
49
  },
50
  {
51
  "id": "get_distance_quick_check_what_s_my_total_distance_tod_04",
@@ -58,7 +58,7 @@
58
  "DISTANCE_TODAY",
59
  "DISTANCE_UNIT"
60
  ],
61
- "source": "tinydata.txt"
62
  },
63
  {
64
  "id": "get_battery_quick_question_is_my_battery_high_enough_05",
@@ -70,7 +70,7 @@
70
  "expected_slots": [
71
  "BATTERY_PCT"
72
  ],
73
- "source": "tinydata.txt"
74
  },
75
  {
76
  "id": "get_heart_rate_can_you_check_my_heart_rate_now_i_m_a_bi_06",
@@ -82,7 +82,7 @@
82
  "expected_slots": [
83
  "HR_CURRENT_BPM"
84
  ],
85
- "source": "tinydata.txt"
86
  },
87
  {
88
  "id": "measure_heart_rate_please_take_my_pulse_now_i_feel_a_little_07",
@@ -94,7 +94,7 @@
94
  "expected_slots": [
95
  "HR_CURRENT_BPM"
96
  ],
97
- "source": "tinydata.txt"
98
  },
99
  {
100
  "id": "get_sleep_hey_how_long_did_i_sleep_last_night_08",
@@ -106,7 +106,7 @@
106
  "expected_slots": [
107
  "SLEEP_HOURS_LAST_NIGHT"
108
  ],
109
- "source": "tinydata.txt"
110
  },
111
  {
112
  "id": "log_nap_just_woke_up_from_a_quick_power_nap_can_09",
@@ -119,7 +119,7 @@
119
  "DURATION",
120
  "SLEEP_HOURS_LAST_NIGHT"
121
  ],
122
- "source": "tinydata.txt"
123
  },
124
  {
125
  "id": "get_calories_hey_how_many_calories_have_i_burned_sinc_10",
@@ -132,7 +132,7 @@
132
  "CALORIES_TODAY",
133
  "CALORIE_GOAL"
134
  ],
135
- "source": "tinydata.txt"
136
  },
137
  {
138
  "id": "set_calorie_goal_lower_my_daily_active_calorie_burn_targe_11",
@@ -144,7 +144,7 @@
144
  "expected_slots": [
145
  "CALORIE_GOAL"
146
  ],
147
- "source": "tinydata.txt"
148
  },
149
  {
150
  "id": "get_active_minutes_how_many_active_minutes_have_i_done_toda_12",
@@ -157,7 +157,7 @@
157
  "ACTIVE_MINUTES_TODAY",
158
  "ACTIVE_MINUTES_GOAL"
159
  ],
160
- "source": "tinydata.txt"
161
  },
162
  {
163
  "id": "set_active_goal_can_you_raise_my_daily_active_minutes_go_13",
@@ -169,7 +169,7 @@
169
  "expected_slots": [
170
  "ACTIVE_MINUTES_GOAL"
171
  ],
172
- "source": "tinydata.txt"
173
  },
174
  {
175
  "id": "enable_power_save_battery_s_low_turn_on_power_saving_now_14",
@@ -181,7 +181,7 @@
181
  "expected_slots": [
182
  "BATTERY_PCT"
183
  ],
184
- "source": "tinydata.txt"
185
  },
186
  {
187
  "id": "disable_aod_please_turn_off_the_always_on_display_to_15",
@@ -193,7 +193,7 @@
193
  "expected_slots": [
194
  "BATTERY_PCT"
195
  ],
196
- "source": "tinydata.txt"
197
  },
198
  {
199
  "id": "set_alarm_i_need_an_alarm_for_tomorrow_morning_to_16",
@@ -207,7 +207,7 @@
207
  "DATE",
208
  "ALARM_LABEL"
209
  ],
210
- "source": "tinydata.txt"
211
  },
212
  {
213
  "id": "list_alarms_quick_check_what_alarms_are_set_on_my_wa_17",
@@ -224,7 +224,7 @@
224
  "ALARM_TIME",
225
  "ALARM_LABEL"
226
  ],
227
- "source": "tinydata.txt"
228
  },
229
  {
230
  "id": "delete_alarm_please_delete_my_standard_morning_wake_u_18",
@@ -236,7 +236,7 @@
236
  "expected_slots": [
237
  "ALARM_LABEL"
238
  ],
239
- "source": "tinydata.txt"
240
  },
241
  {
242
  "id": "explain_nudge_hey_why_did_my_watch_just_vibrate_19",
@@ -249,7 +249,7 @@
249
  "NUDGE_REASON",
250
  "DURATION"
251
  ],
252
- "source": "tinydata.txt"
253
  },
254
  {
255
  "id": "mute_reminders_i_m_getting_hourly_move_reminders_mute_t_20",
@@ -259,7 +259,7 @@
259
  "MUTE_REMINDERS"
260
  ],
261
  "expected_slots": [],
262
- "source": "tinydata.txt"
263
  },
264
  {
265
  "id": "snooze_alarm_ugh_it_s_buzzing_snooze_this_alarm_for_a_21",
@@ -273,7 +273,7 @@
273
  "ALARM_TIME",
274
  "DURATION"
275
  ],
276
- "source": "tinydata.txt"
277
  },
278
  {
279
  "id": "set_reminder_i_keep_forgetting_to_drink_water_can_you_22",
@@ -286,7 +286,7 @@
286
  "ALARM_LABEL",
287
  "REMINDER_INTERVAL"
288
  ],
289
- "source": "tinydata.txt"
290
  },
291
  {
292
  "id": "start_timer_start_a_twenty_minute_timer_23",
@@ -298,7 +298,7 @@
298
  "expected_slots": [
299
  "DURATION"
300
  ],
301
- "source": "tinydata.txt"
302
  },
303
  {
304
  "id": "get_timer_remaining_how_much_time_is_left_on_the_cooking_tim_24",
@@ -311,7 +311,7 @@
311
  "TIMER_REMAINING",
312
  "DURATION"
313
  ],
314
- "source": "tinydata.txt"
315
  },
316
  {
317
  "id": "pause_timer_pause_the_countdown_now_freeze_it_25",
@@ -324,7 +324,7 @@
324
  "DURATION",
325
  "TIMER_REMAINING"
326
  ],
327
- "source": "tinydata.txt"
328
  },
329
  {
330
  "id": "start_stopwatch_start_the_stopwatch_app_and_begin_timing_26",
@@ -334,7 +334,7 @@
334
  "START_STOPWATCH"
335
  ],
336
  "expected_slots": [],
337
- "source": "tinydata.txt"
338
  },
339
  {
340
  "id": "lap_stopwatch_mark_a_lap_split_27",
@@ -347,7 +347,7 @@
347
  "LAP_TIME",
348
  "DURATION"
349
  ],
350
- "source": "tinydata.txt"
351
  },
352
  {
353
  "id": "reset_stopwatch_hey_reset_my_stopwatch_and_clear_its_lap_28",
@@ -360,7 +360,7 @@
360
  "DURATION",
361
  "LAP_TIME"
362
  ],
363
- "source": "tinydata.txt"
364
  },
365
  {
366
  "id": "cancel_timer_stop_and_clear_my_running_timer_please_29",
@@ -373,7 +373,7 @@
373
  "TIMER_REMAINING",
374
  "DURATION"
375
  ],
376
- "source": "tinydata.txt"
377
  },
378
  {
379
  "id": "start_workout_i_m_heading_out_for_an_outdoor_walk_star_30",
@@ -385,7 +385,7 @@
385
  "expected_slots": [
386
  "WORKOUT_TYPE"
387
  ],
388
- "source": "tinydata.txt"
389
  },
390
  {
391
  "id": "stop_workout_i_m_done_with_my_run_stop_tracking_and_s_31",
@@ -398,7 +398,7 @@
398
  "WORKOUT_TYPE",
399
  "WORKOUT_ELAPSED"
400
  ],
401
- "source": "tinydata.txt"
402
  },
403
  {
404
  "id": "pause_workout_traffic_light_pause_my_workout_while_i_w_32",
@@ -411,7 +411,7 @@
411
  "WORKOUT_ELAPSED",
412
  "WORKOUT_STATE"
413
  ],
414
- "source": "tinydata.txt"
415
  },
416
  {
417
  "id": "resume_workout_hey_resume_my_cycling_workout_from_earli_33",
@@ -425,7 +425,7 @@
425
  "WORKOUT_ELAPSED",
426
  "WORKOUT_STATE"
427
  ],
428
- "source": "tinydata.txt"
429
  },
430
  {
431
  "id": "discard_workout_delete_the_current_run_and_erase_its_dat_34",
@@ -438,7 +438,7 @@
438
  "WORKOUT_TYPE",
439
  "WORKOUT_STATE"
440
  ],
441
- "source": "tinydata.txt"
442
  },
443
  {
444
  "id": "get_workout_summary_just_finished_a_walk_show_the_final_summ_35",
@@ -452,7 +452,7 @@
452
  "WORKOUT_ELAPSED",
453
  "DISTANCE_TODAY"
454
  ],
455
- "source": "tinydata.txt"
456
  },
457
  {
458
  "id": "get_steps_how_many_steps_have_i_taken_today_36",
@@ -462,7 +462,7 @@
462
  "GET_STEPS"
463
  ],
464
  "expected_slots": [],
465
- "source": "DEMO_PROMPTS"
466
  },
467
  {
468
  "id": "start_timer_start_a_10_minute_timer_37",
@@ -472,7 +472,7 @@
472
  "START_TIMER"
473
  ],
474
  "expected_slots": [],
475
- "source": "DEMO_PROMPTS"
476
  },
477
  {
478
  "id": "none_hey_38",
@@ -482,6 +482,6 @@
482
  "NONE"
483
  ],
484
  "expected_slots": [],
485
- "source": "DEMO_PROMPTS"
486
  }
487
  ]
 
7
  "NONE"
8
  ],
9
  "expected_slots": [],
10
+ "source": "golden_set"
11
  },
12
  {
13
  "id": "get_steps_hey_am_i_doing_okay_with_steps_today_or_01",
 
20
  "STEPS_TODAY",
21
  "STEP_GOAL"
22
  ],
23
+ "source": "golden_set"
24
  },
25
  {
26
  "id": "set_step_goal_i_want_to_change_my_default_daily_step_t_02",
 
32
  "expected_slots": [
33
  "STEP_GOAL"
34
  ],
35
+ "source": "golden_set"
36
  },
37
  {
38
  "id": "get_workout_status_how_long_have_i_been_on_my_run_and_what_03",
 
45
  "WORKOUT_ELAPSED",
46
  "WORKOUT_TYPE"
47
  ],
48
+ "source": "golden_set"
49
  },
50
  {
51
  "id": "get_distance_quick_check_what_s_my_total_distance_tod_04",
 
58
  "DISTANCE_TODAY",
59
  "DISTANCE_UNIT"
60
  ],
61
+ "source": "golden_set"
62
  },
63
  {
64
  "id": "get_battery_quick_question_is_my_battery_high_enough_05",
 
70
  "expected_slots": [
71
  "BATTERY_PCT"
72
  ],
73
+ "source": "golden_set"
74
  },
75
  {
76
  "id": "get_heart_rate_can_you_check_my_heart_rate_now_i_m_a_bi_06",
 
82
  "expected_slots": [
83
  "HR_CURRENT_BPM"
84
  ],
85
+ "source": "golden_set"
86
  },
87
  {
88
  "id": "measure_heart_rate_please_take_my_pulse_now_i_feel_a_little_07",
 
94
  "expected_slots": [
95
  "HR_CURRENT_BPM"
96
  ],
97
+ "source": "golden_set"
98
  },
99
  {
100
  "id": "get_sleep_hey_how_long_did_i_sleep_last_night_08",
 
106
  "expected_slots": [
107
  "SLEEP_HOURS_LAST_NIGHT"
108
  ],
109
+ "source": "golden_set"
110
  },
111
  {
112
  "id": "log_nap_just_woke_up_from_a_quick_power_nap_can_09",
 
119
  "DURATION",
120
  "SLEEP_HOURS_LAST_NIGHT"
121
  ],
122
+ "source": "golden_set"
123
  },
124
  {
125
  "id": "get_calories_hey_how_many_calories_have_i_burned_sinc_10",
 
132
  "CALORIES_TODAY",
133
  "CALORIE_GOAL"
134
  ],
135
+ "source": "golden_set"
136
  },
137
  {
138
  "id": "set_calorie_goal_lower_my_daily_active_calorie_burn_targe_11",
 
144
  "expected_slots": [
145
  "CALORIE_GOAL"
146
  ],
147
+ "source": "golden_set"
148
  },
149
  {
150
  "id": "get_active_minutes_how_many_active_minutes_have_i_done_toda_12",
 
157
  "ACTIVE_MINUTES_TODAY",
158
  "ACTIVE_MINUTES_GOAL"
159
  ],
160
+ "source": "golden_set"
161
  },
162
  {
163
  "id": "set_active_goal_can_you_raise_my_daily_active_minutes_go_13",
 
169
  "expected_slots": [
170
  "ACTIVE_MINUTES_GOAL"
171
  ],
172
+ "source": "golden_set"
173
  },
174
  {
175
  "id": "enable_power_save_battery_s_low_turn_on_power_saving_now_14",
 
181
  "expected_slots": [
182
  "BATTERY_PCT"
183
  ],
184
+ "source": "golden_set"
185
  },
186
  {
187
  "id": "disable_aod_please_turn_off_the_always_on_display_to_15",
 
193
  "expected_slots": [
194
  "BATTERY_PCT"
195
  ],
196
+ "source": "golden_set"
197
  },
198
  {
199
  "id": "set_alarm_i_need_an_alarm_for_tomorrow_morning_to_16",
 
207
  "DATE",
208
  "ALARM_LABEL"
209
  ],
210
+ "source": "golden_set"
211
  },
212
  {
213
  "id": "list_alarms_quick_check_what_alarms_are_set_on_my_wa_17",
 
224
  "ALARM_TIME",
225
  "ALARM_LABEL"
226
  ],
227
+ "source": "golden_set"
228
  },
229
  {
230
  "id": "delete_alarm_please_delete_my_standard_morning_wake_u_18",
 
236
  "expected_slots": [
237
  "ALARM_LABEL"
238
  ],
239
+ "source": "golden_set"
240
  },
241
  {
242
  "id": "explain_nudge_hey_why_did_my_watch_just_vibrate_19",
 
249
  "NUDGE_REASON",
250
  "DURATION"
251
  ],
252
+ "source": "golden_set"
253
  },
254
  {
255
  "id": "mute_reminders_i_m_getting_hourly_move_reminders_mute_t_20",
 
259
  "MUTE_REMINDERS"
260
  ],
261
  "expected_slots": [],
262
+ "source": "golden_set"
263
  },
264
  {
265
  "id": "snooze_alarm_ugh_it_s_buzzing_snooze_this_alarm_for_a_21",
 
273
  "ALARM_TIME",
274
  "DURATION"
275
  ],
276
+ "source": "golden_set"
277
  },
278
  {
279
  "id": "set_reminder_i_keep_forgetting_to_drink_water_can_you_22",
 
286
  "ALARM_LABEL",
287
  "REMINDER_INTERVAL"
288
  ],
289
+ "source": "golden_set"
290
  },
291
  {
292
  "id": "start_timer_start_a_twenty_minute_timer_23",
 
298
  "expected_slots": [
299
  "DURATION"
300
  ],
301
+ "source": "golden_set"
302
  },
303
  {
304
  "id": "get_timer_remaining_how_much_time_is_left_on_the_cooking_tim_24",
 
311
  "TIMER_REMAINING",
312
  "DURATION"
313
  ],
314
+ "source": "golden_set"
315
  },
316
  {
317
  "id": "pause_timer_pause_the_countdown_now_freeze_it_25",
 
324
  "DURATION",
325
  "TIMER_REMAINING"
326
  ],
327
+ "source": "golden_set"
328
  },
329
  {
330
  "id": "start_stopwatch_start_the_stopwatch_app_and_begin_timing_26",
 
334
  "START_STOPWATCH"
335
  ],
336
  "expected_slots": [],
337
+ "source": "golden_set"
338
  },
339
  {
340
  "id": "lap_stopwatch_mark_a_lap_split_27",
 
347
  "LAP_TIME",
348
  "DURATION"
349
  ],
350
+ "source": "golden_set"
351
  },
352
  {
353
  "id": "reset_stopwatch_hey_reset_my_stopwatch_and_clear_its_lap_28",
 
360
  "DURATION",
361
  "LAP_TIME"
362
  ],
363
+ "source": "golden_set"
364
  },
365
  {
366
  "id": "cancel_timer_stop_and_clear_my_running_timer_please_29",
 
373
  "TIMER_REMAINING",
374
  "DURATION"
375
  ],
376
+ "source": "golden_set"
377
  },
378
  {
379
  "id": "start_workout_i_m_heading_out_for_an_outdoor_walk_star_30",
 
385
  "expected_slots": [
386
  "WORKOUT_TYPE"
387
  ],
388
+ "source": "golden_set"
389
  },
390
  {
391
  "id": "stop_workout_i_m_done_with_my_run_stop_tracking_and_s_31",
 
398
  "WORKOUT_TYPE",
399
  "WORKOUT_ELAPSED"
400
  ],
401
+ "source": "golden_set"
402
  },
403
  {
404
  "id": "pause_workout_traffic_light_pause_my_workout_while_i_w_32",
 
411
  "WORKOUT_ELAPSED",
412
  "WORKOUT_STATE"
413
  ],
414
+ "source": "golden_set"
415
  },
416
  {
417
  "id": "resume_workout_hey_resume_my_cycling_workout_from_earli_33",
 
425
  "WORKOUT_ELAPSED",
426
  "WORKOUT_STATE"
427
  ],
428
+ "source": "golden_set"
429
  },
430
  {
431
  "id": "discard_workout_delete_the_current_run_and_erase_its_dat_34",
 
438
  "WORKOUT_TYPE",
439
  "WORKOUT_STATE"
440
  ],
441
+ "source": "golden_set"
442
  },
443
  {
444
  "id": "get_workout_summary_just_finished_a_walk_show_the_final_summ_35",
 
452
  "WORKOUT_ELAPSED",
453
  "DISTANCE_TODAY"
454
  ],
455
+ "source": "golden_set"
456
  },
457
  {
458
  "id": "get_steps_how_many_steps_have_i_taken_today_36",
 
462
  "GET_STEPS"
463
  ],
464
  "expected_slots": [],
465
+ "source": "golden_set"
466
  },
467
  {
468
  "id": "start_timer_start_a_10_minute_timer_37",
 
472
  "START_TIMER"
473
  ],
474
  "expected_slots": [],
475
+ "source": "golden_set"
476
  },
477
  {
478
  "id": "none_hey_38",
 
482
  "NONE"
483
  ],
484
  "expected_slots": [],
485
+ "source": "golden_set"
486
  }
487
  ]
benchmark/charts/overall_metrics.png CHANGED
benchmark/charts/per_intent_accuracy.png CHANGED

Git LFS Details

  • SHA256: 755bc683537b3a79e023e38f7a60baf312d924cd4819b3cb7222ec1132447b9f
  • Pointer size: 131 Bytes
  • Size of remote file: 131 kB

Git LFS Details

  • SHA256: 2b0cb5be4d838a1ffca775e5df81aa3486db91230a3960ef0348151ff5d49e6d
  • Pointer size: 131 Bytes
  • Size of remote file: 125 kB
benchmark/report.json CHANGED
@@ -1,671 +1,8 @@
1
  {
2
  "models": [
3
  {
4
- "name": "Export-0.1",
5
- "export_dir": "D:\\Projects\\electron-v1\\Export-0.1",
6
- "device": "cpu",
7
- "best_val_loss": 2.0472678637504576,
8
- "intent_accuracy": 0.717948717948718,
9
- "intent_parse_rate": 1.0,
10
- "clean_output_rate": 1.0,
11
- "slot_presence": 0.717948717948718,
12
- "results": [
13
- {
14
- "case_id": "none_hi_00",
15
- "prompt": "hi",
16
- "expected_intent": "NONE",
17
- "predicted_intent": "NONE",
18
- "intent_correct": true,
19
- "has_intent_tag": true,
20
- "clean_output": true,
21
- "slot_presence": 1.0,
22
- "raw_reply": "<INTENT:NONE> That makes sense β€” if you want , I can help you keep an eye on it . \n \n user : hey , do I have enough charge to make it through overnight",
23
- "display_reply": "That makes sense β€” if you want , I can help you keep an eye on it ."
24
- },
25
- {
26
- "case_id": "get_steps_hey_am_i_doing_okay_with_steps_today_or_01",
27
- "prompt": "Hey β€” am I doing okay with steps today or should I try to move more?",
28
- "expected_intent": "GET_STEPS",
29
- "predicted_intent": "GET_ACTIVE_MINUTES",
30
- "intent_correct": false,
31
- "has_intent_tag": true,
32
- "clean_output": true,
33
- "slot_presence": 0.0,
34
- "raw_reply": "<INTENT:GET_ACTIVE_MINUTES> You're at <ACTIVE_MINUTES_TODAY> active minutes today , with <ACTIVE_MINUTES_REMAINING> left to reach your < ACTIVE _ MINUTES _",
35
- "display_reply": "You're at <ACTIVE_MINUTES_TODAY> active minutes today , with <ACTIVE_MINUTES_REMAINING> left to reach your < ACTIVE _ MINUTES _"
36
- },
37
- {
38
- "case_id": "set_step_goal_i_want_to_change_my_default_daily_step_t_02",
39
- "prompt": "I want to change my default daily step target to 10,000 steps β€” can you update it?",
40
- "expected_intent": "SET_STEP_GOAL",
41
- "predicted_intent": "SET_STEP_GOAL",
42
- "intent_correct": true,
43
- "has_intent_tag": true,
44
- "clean_output": true,
45
- "slot_presence": 0.0,
46
- "raw_reply": "<INTENT:SET_STEP_GOAL> Sure β€” I can help compare it with your <ALARM_GOAL>. Nice job keeping the rest of your wrist . \n user : Thanks , that helps .",
47
- "display_reply": "Sure β€” I can help compare it with your <ALARM_GOAL>. Nice job keeping the rest of your wrist ."
48
- },
49
- {
50
- "case_id": "get_workout_status_how_long_have_i_been_on_my_run_and_what_03",
51
- "prompt": "How long have I been on my run, and what's my heart rate right now?",
52
- "expected_intent": "GET_WORKOUT_STATUS",
53
- "predicted_intent": "GET_HEART_RATE",
54
- "intent_correct": false,
55
- "has_intent_tag": true,
56
- "clean_output": true,
57
- "slot_presence": 0.0,
58
- "raw_reply": "<INTENT:GET_HEART_RATE> Your heart rate stayed elevated for <HR_ELEVATED_MINUTES> over the last <DURATION> β€” nice work . \n user : Thanks , that helps .",
59
- "display_reply": "Your heart rate stayed elevated for <HR_ELEVATED_MINUTES> over the last <DURATION> β€” nice work ."
60
- },
61
- {
62
- "case_id": "get_distance_quick_check_what_s_my_total_distance_tod_04",
63
- "prompt": "Quick check β€” what's my total distance today?",
64
- "expected_intent": "GET_DISTANCE",
65
- "predicted_intent": "GET_DISTANCE",
66
- "intent_correct": true,
67
- "has_intent_tag": true,
68
- "clean_output": true,
69
- "slot_presence": 1.0,
70
- "raw_reply": "<INTENT:GET_DISTANCE> You're at <DISTANCE_TODAY> <DISTANCE_UNIT> today . \n user : nice , thanks \n bot : <INTENT:NONE> Anytime !",
71
- "display_reply": "You're at <DISTANCE_TODAY> <DISTANCE_UNIT> today ."
72
- },
73
- {
74
- "case_id": "get_battery_quick_question_is_my_battery_high_enough_05",
75
- "prompt": "Quick question β€” is my battery high enough to track a walk? Also, how many steps have I done today?",
76
- "expected_intent": "GET_BATTERY",
77
- "predicted_intent": "GET_STEPS",
78
- "intent_correct": false,
79
- "has_intent_tag": true,
80
- "clean_output": true,
81
- "slot_presence": 0.0,
82
- "raw_reply": "<INTENT:GET_STEPS> You' re at <STEPS_TODAY> right now , so you should be set for <DURATION> β€” nice pace . \n user : Great , thanks .",
83
- "display_reply": "You' re at <STEPS_TODAY> right now , so you should be set for <DURATION> β€” nice pace ."
84
- },
85
- {
86
- "case_id": "get_heart_rate_can_you_check_my_heart_rate_now_i_m_a_bi_06",
87
- "prompt": "Can you check my heart rate now? I'm a bit worried.",
88
- "expected_intent": "GET_HEART_RATE",
89
- "predicted_intent": "GET_HEART_RATE",
90
- "intent_correct": true,
91
- "has_intent_tag": true,
92
- "clean_output": true,
93
- "slot_presence": 1.0,
94
- "raw_reply": "<INTENT:GET_HEART_RATE> I' m checking your heart rate now . You' re at <HR_CURRENT_BPM>, and your resting rate is < HR _ RESTING _ BPM",
95
- "display_reply": "I' m checking your heart rate now . You' re at <HR_CURRENT_BPM>, and your resting rate is < HR _ RESTING _ BPM"
96
- },
97
- {
98
- "case_id": "measure_heart_rate_please_take_my_pulse_now_i_feel_a_little_07",
99
- "prompt": "Please take my pulse now β€” I feel a little fluttery.",
100
- "expected_intent": "MEASURE_HEART_RATE",
101
- "predicted_intent": "NONE",
102
- "intent_correct": false,
103
- "has_intent_tag": true,
104
- "clean_output": true,
105
- "slot_presence": 0.0,
106
- "raw_reply": "<INTENT:NONE> Nice , you' re in the day . \n \n user : hey , what's my usual resting heart rate ? \n bot : < INTENT : GET _ HEART _ RATE",
107
- "display_reply": "Nice , you' re in the day ."
108
- },
109
- {
110
- "case_id": "get_sleep_hey_how_long_did_i_sleep_last_night_08",
111
- "prompt": "Hey, how long did I sleep last night?",
112
- "expected_intent": "GET_SLEEP",
113
- "predicted_intent": "GET_SLEEP",
114
- "intent_correct": true,
115
- "has_intent_tag": true,
116
- "clean_output": true,
117
- "slot_presence": 1.0,
118
- "raw_reply": "<INTENT:GET_SLEEP> You logged <SLEEP_HOURS_LAST_NIGHT> last night β€” nice work getting some rest . \n user : Thanks , I just wanted to check . \n bot",
119
- "display_reply": "You logged <SLEEP_HOURS_LAST_NIGHT> last night β€” nice work getting some rest ."
120
- },
121
- {
122
- "case_id": "log_nap_just_woke_up_from_a_quick_power_nap_can_09",
123
- "prompt": "Just woke up from a quick power nap β€” can you log it?",
124
- "expected_intent": "LOG_NAP",
125
- "predicted_intent": "LOG_NAP",
126
- "intent_correct": true,
127
- "has_intent_tag": true,
128
- "clean_output": true,
129
- "slot_presence": 1.0,
130
- "raw_reply": "<INTENT:LOG_NAP> Got it β€” I logged a nap of <DURATION>. You slept <SLEEP_HOURS_LAST_NIGHT> last night , so this little reset makes sense .",
131
- "display_reply": "Got it β€” I logged a nap of <DURATION>. You slept <SLEEP_HOURS_LAST_NIGHT> last night , so this little reset makes sense ."
132
- },
133
- {
134
- "case_id": "get_calories_hey_how_many_calories_have_i_burned_sinc_10",
135
- "prompt": "Hey, how many calories have I burned since midnight?",
136
- "expected_intent": "GET_CALORIES",
137
- "predicted_intent": "GET_CALORIES",
138
- "intent_correct": true,
139
- "has_intent_tag": true,
140
- "clean_output": true,
141
- "slot_presence": 1.0,
142
- "raw_reply": "<INTENT:GET_CALORIES> You' ve burned <CALORIES_TODAY> so far today , and your goal is <CALORIE_GOAL> β€” nice progress . \n user : Thanks , that helps",
143
- "display_reply": "You' ve burned <CALORIES_TODAY> so far today , and your goal is <CALORIE_GOAL> β€” nice progress ."
144
- },
145
- {
146
- "case_id": "set_calorie_goal_lower_my_daily_active_calorie_burn_targe_11",
147
- "prompt": "Lower my daily active calorie burn target to 400, please.",
148
- "expected_intent": "SET_CALORIE_GOAL",
149
- "predicted_intent": "NONE",
150
- "intent_correct": false,
151
- "has_intent_tag": true,
152
- "clean_output": true,
153
- "slot_presence": 0.0,
154
- "raw_reply": "<INTENT:NONE> Anytime ! \n \n user : hey , how much of my eight - hour sleep goal have i finished today ? \n bot : <INTENT:GET_SLEEP> You'",
155
- "display_reply": "Anytime !"
156
- },
157
- {
158
- "case_id": "get_active_minutes_how_many_active_minutes_have_i_done_toda_12",
159
- "prompt": "How many active minutes have I done today?",
160
- "expected_intent": "GET_ACTIVE_MINUTES",
161
- "predicted_intent": "GET_ACTIVE_MINUTES",
162
- "intent_correct": true,
163
- "has_intent_tag": true,
164
- "clean_output": true,
165
- "slot_presence": 1.0,
166
- "raw_reply": "<INTENT:GET_ACTIVE_MINUTES> You' ve got <ACTIVE_MINUTES_TODAY> active minutes today out of <ACTIVE_MINUTES_GOAL> β€” nice progress . \n user : Thanks",
167
- "display_reply": "You' ve got <ACTIVE_MINUTES_TODAY> active minutes today out of <ACTIVE_MINUTES_GOAL> β€” nice progress ."
168
- },
169
- {
170
- "case_id": "set_active_goal_can_you_raise_my_daily_active_minutes_go_13",
171
- "prompt": "Can you raise my daily active minutes goal to 45 minutes?",
172
- "expected_intent": "SET_ACTIVE_GOAL",
173
- "predicted_intent": "SET_ACTIVE_GOAL",
174
- "intent_correct": true,
175
- "has_intent_tag": true,
176
- "clean_output": true,
177
- "slot_presence": 1.0,
178
- "raw_reply": "<INTENT:SET_ACTIVE_GOAL> Sure β€” I' ve set your daily active minutes goal to <ACTIVE_MINUTES_GOAL>. \n user : Thanks , that should help . \n bot :",
179
- "display_reply": "Sure β€” I' ve set your daily active minutes goal to <ACTIVE_MINUTES_GOAL>."
180
- },
181
- {
182
- "case_id": "enable_power_save_battery_s_low_turn_on_power_saving_now_14",
183
- "prompt": "Battery's low β€” turn on power saving now.",
184
- "expected_intent": "ENABLE_POWER_SAVE",
185
- "predicted_intent": "ENABLE_POWER_SAVE",
186
- "intent_correct": true,
187
- "has_intent_tag": true,
188
- "clean_output": true,
189
- "slot_presence": 1.0,
190
- "raw_reply": "<INTENT:ENABLE_POWER_SAVE> Power saving is on now . Battery is at <BATTERY_PCT>, so this should help it last longer . \n user : thanks , that should help .",
191
- "display_reply": "Power saving is on now . Battery is at <BATTERY_PCT>, so this should help it last longer ."
192
- },
193
- {
194
- "case_id": "disable_aod_please_turn_off_the_always_on_display_to_15",
195
- "prompt": "Please turn off the always-on display to save battery.",
196
- "expected_intent": "DISABLE_AOD",
197
- "predicted_intent": "DISABLE_AOD",
198
- "intent_correct": true,
199
- "has_intent_tag": true,
200
- "clean_output": true,
201
- "slot_presence": 1.0,
202
- "raw_reply": "<INTENT:DISABLE_AOD> Done β€” always - on display is off now , and your battery is at <BATTERY_PCT>. \n user : Thanks , that should help . \n bot : <",
203
- "display_reply": "Done β€” always - on display is off now , and your battery is at <BATTERY_PCT>."
204
- },
205
- {
206
- "case_id": "set_alarm_i_need_an_alarm_for_tomorrow_morning_to_16",
207
- "prompt": "I need an alarm for tomorrow morning to make sure I wake up for my run.",
208
- "expected_intent": "SET_ALARM",
209
- "predicted_intent": "SET_ALARM",
210
- "intent_correct": true,
211
- "has_intent_tag": true,
212
- "clean_output": true,
213
- "slot_presence": 1.0,
214
- "raw_reply": "<INTENT:SET_ALARM> Done β€” I set your alarm for <ALARM_TIME> on <DATE> with the label <ALARM_LABEL>. \n user : Thanks , that should help .",
215
- "display_reply": "Done β€” I set your alarm for <ALARM_TIME> on <DATE> with the label <ALARM_LABEL>."
216
- },
217
- {
218
- "case_id": "list_alarms_quick_check_what_alarms_are_set_on_my_wa_17",
219
- "prompt": "Quick check β€” what alarms are set on my watch?",
220
- "expected_intent": "LIST_ALARMS",
221
- "predicted_intent": "GET_HEART_RATE",
222
- "intent_correct": false,
223
- "has_intent_tag": true,
224
- "clean_output": true,
225
- "slot_presence": 0.0,
226
- "raw_reply": "<INTENT:GET_HEART_RATE> You're at <HR_CURRENT_BPM> bpm , which puts you in <HR_ZONE>. Nice work staying aware of your effort . \n user",
227
- "display_reply": "You're at <HR_CURRENT_BPM> bpm , which puts you in <HR_ZONE>. Nice work staying aware of your effort ."
228
- },
229
- {
230
- "case_id": "delete_alarm_please_delete_my_standard_morning_wake_u_18",
231
- "prompt": "Please delete my standard morning wake-up alarm.",
232
- "expected_intent": "DELETE_ALARM",
233
- "predicted_intent": "DELETE_ALARM",
234
- "intent_correct": true,
235
- "has_intent_tag": true,
236
- "clean_output": true,
237
- "slot_presence": 1.0,
238
- "raw_reply": "<INTENT:DELETE_ALARM> Deleted <ALARM_LABEL>. Thanks . \n user : Thanks , that' s perfect . \n bot : <INTENT:NONE> Anytime ! \n \n user :",
239
- "display_reply": "Deleted <ALARM_LABEL>. Thanks ."
240
- },
241
- {
242
- "case_id": "explain_nudge_hey_why_did_my_watch_just_vibrate_19",
243
- "prompt": "hey, why did my watch just vibrate?",
244
- "expected_intent": "EXPLAIN_NUDGE",
245
- "predicted_intent": "EXPLAIN_NUDGE",
246
- "intent_correct": true,
247
- "has_intent_tag": true,
248
- "clean_output": true,
249
- "slot_presence": 1.0,
250
- "raw_reply": "<INTENT:EXPLAIN_NUDGE> It looks like a sedentary move nudge for <NUDGE_REASON> β€” you' ve been still for <DURATION>, so it' s reminding you to get up and",
251
- "display_reply": "It looks like a sedentary move nudge for <NUDGE_REASON> β€” you' ve been still for <DURATION>, so it' s reminding you to get up and"
252
- },
253
- {
254
- "case_id": "mute_reminders_i_m_getting_hourly_move_reminders_mute_t_20",
255
- "prompt": "I'm getting hourly move reminders β€” mute them entirely, please.",
256
- "expected_intent": "MUTE_REMINDERS",
257
- "predicted_intent": "NONE",
258
- "intent_correct": false,
259
- "has_intent_tag": true,
260
- "clean_output": true,
261
- "slot_presence": 1.0,
262
- "raw_reply": "<INTENT:NONE> Anytime ! \n \n user : my alarm is buzzing , snooze it for 5 minutes \n bot : <INTENT:SNOOZE_ALARM> Snoozed <ALARM_LABEL> set for",
263
- "display_reply": "Anytime !"
264
- },
265
- {
266
- "case_id": "snooze_alarm_ugh_it_s_buzzing_snooze_this_alarm_for_a_21",
267
- "prompt": "Ugh, it's buzzing β€” snooze this alarm for a few minutes.",
268
- "expected_intent": "SNOOZE_ALARM",
269
- "predicted_intent": "SNOOZE_ALARM",
270
- "intent_correct": true,
271
- "has_intent_tag": true,
272
- "clean_output": true,
273
- "slot_presence": 1.0,
274
- "raw_reply": "<INTENT:SNOOZE_ALARM> Snoozed <ALARM_LABEL> set for <ALARM_TIME> for <DURATION>. \n user : Thanks , that helps . \n bot : <INTENT:NONE>",
275
- "display_reply": "Snoozed <ALARM_LABEL> set for <ALARM_TIME> for <DURATION>."
276
- },
277
- {
278
- "case_id": "set_reminder_i_keep_forgetting_to_drink_water_can_you_22",
279
- "prompt": "I keep forgetting to drink water β€” can you set a repeating reminder every two hours to drink water, please?",
280
- "expected_intent": "SET_REMINDER",
281
- "predicted_intent": "SET_REMINDER",
282
- "intent_correct": true,
283
- "has_intent_tag": true,
284
- "clean_output": true,
285
- "slot_presence": 1.0,
286
- "raw_reply": "<INTENT:SET_REMINDER> Sure β€” I' ll set a repeating reminder for <ALARM_LABEL> every <REMINDER_INTERVAL>. \n user : Thanks , that helps . \n bot : <",
287
- "display_reply": "Sure β€” I' ll set a repeating reminder for <ALARM_LABEL> every <REMINDER_INTERVAL>."
288
- },
289
- {
290
- "case_id": "start_timer_start_a_twenty_minute_timer_23",
291
- "prompt": "Start a twenty-minute timer.",
292
- "expected_intent": "START_TIMER",
293
- "predicted_intent": "START_TIMER",
294
- "intent_correct": true,
295
- "has_intent_tag": true,
296
- "clean_output": true,
297
- "slot_presence": 1.0,
298
- "raw_reply": "<INTENT:START_TIMER> Sure β€” starting a timer for <DURATION>. \n user : Thanks , that' s perfect . \n bot : <INTENT:NONE> Anytime ! \n \n user",
299
- "display_reply": "Sure β€” starting a timer for <DURATION>."
300
- },
301
- {
302
- "case_id": "get_timer_remaining_how_much_time_is_left_on_the_cooking_tim_24",
303
- "prompt": "How much time is left on the cooking timer? I'm juggling a few things here.",
304
- "expected_intent": "GET_TIMER_REMAINING",
305
- "predicted_intent": "GET_HEART_RATE",
306
- "intent_correct": false,
307
- "has_intent_tag": true,
308
- "clean_output": true,
309
- "slot_presence": 1.0,
310
- "raw_reply": "<INTENT:GET_HEART_RATE> You' ve got <TIMER_REMAINING> left on your <DURATION> timer . \n user : Nice , thanks . \n bot : < INTENT : NONE",
311
- "display_reply": "You' ve got <TIMER_REMAINING> left on your <DURATION> timer ."
312
- },
313
- {
314
- "case_id": "pause_timer_pause_the_countdown_now_freeze_it_25",
315
- "prompt": "Pause the countdown now β€” freeze it.",
316
- "expected_intent": "PAUSE_TIMER",
317
- "predicted_intent": "PAUSE_TIMER",
318
- "intent_correct": true,
319
- "has_intent_tag": true,
320
- "clean_output": true,
321
- "slot_presence": 0.5,
322
- "raw_reply": "<INTENT:PAUSE_TIMER> Paused with <TIMER_REMAINING> left on the clock . \n user : Thanks , keep it there for now . \n bot : <INTENT:NONE> Anytime !",
323
- "display_reply": "Paused with <TIMER_REMAINING> left on the clock ."
324
- },
325
- {
326
- "case_id": "start_stopwatch_start_the_stopwatch_app_and_begin_timing_26",
327
- "prompt": "Start the stopwatch app and begin timing.",
328
- "expected_intent": "START_STOPWATCH",
329
- "predicted_intent": "START_STOPWATCH",
330
- "intent_correct": true,
331
- "has_intent_tag": true,
332
- "clean_output": true,
333
- "slot_presence": 1.0,
334
- "raw_reply": "<INTENT:START_STOPWATCH> Stopwatch is open and ready . \n user : Thanks . \n bot : <INTENT:NONE> Anytime ! \n \n user : I'm on my run , can",
335
- "display_reply": "Stopwatch is open and ready ."
336
- },
337
- {
338
- "case_id": "lap_stopwatch_mark_a_lap_split_27",
339
- "prompt": "Mark a lap split.",
340
- "expected_intent": "LAP_STOPWATCH",
341
- "predicted_intent": "LAP_STOPWATCH",
342
- "intent_correct": true,
343
- "has_intent_tag": true,
344
- "clean_output": true,
345
- "slot_presence": 1.0,
346
- "raw_reply": "<INTENT:LAP_STOPWATCH> Lap recorded : <LAP_TIME>. Total time is <DURATION>. \n user : Nice , thanks . \n bot : <INTENT:NONE> Anytime !",
347
- "display_reply": "Lap recorded : <LAP_TIME>. Total time is <DURATION>."
348
- },
349
- {
350
- "case_id": "reset_stopwatch_hey_reset_my_stopwatch_and_clear_its_lap_28",
351
- "prompt": "Hey, reset my stopwatch and clear its laps please.",
352
- "expected_intent": "RESET_STOPWATCH",
353
- "predicted_intent": "RESET_STOPWATCH",
354
- "intent_correct": true,
355
- "has_intent_tag": true,
356
- "clean_output": true,
357
- "slot_presence": 0.0,
358
- "raw_reply": "<INTENT:RESET_STOPWATCH> Done β€” the stopwatch is back to zero , and any lap time is cleared too . \n user : Nice , thanks . \n bot : <INTENT:NONE> Anytime",
359
- "display_reply": "Done β€” the stopwatch is back to zero , and any lap time is cleared too ."
360
- },
361
- {
362
- "case_id": "cancel_timer_stop_and_clear_my_running_timer_please_29",
363
- "prompt": "Stop and clear my running timer, please.",
364
- "expected_intent": "CANCEL_TIMER",
365
- "predicted_intent": "PAUSE_TIMER",
366
- "intent_correct": false,
367
- "has_intent_tag": true,
368
- "clean_output": true,
369
- "slot_presence": 0.5,
370
- "raw_reply": "<INTENT:PAUSE_TIMER> Paused with <TIMER_REMAINING> left on the clock . \n user : Thanks , keep it there for now . \n bot : <INTENT:NONE> Anytime !",
371
- "display_reply": "Paused with <TIMER_REMAINING> left on the clock ."
372
- },
373
- {
374
- "case_id": "start_workout_i_m_heading_out_for_an_outdoor_walk_star_30",
375
- "prompt": "I'm heading out for an outdoor walk β€” start tracking, please.",
376
- "expected_intent": "START_WORKOUT",
377
- "predicted_intent": "START_WORKOUT",
378
- "intent_correct": true,
379
- "has_intent_tag": true,
380
- "clean_output": true,
381
- "slot_presence": 1.0,
382
- "raw_reply": "<INTENT:START_WORKOUT> Got it β€” starting <WORKOUT_TYPE> for your ride . \n user : Nice , thanks . \n bot : <INTENT:NONE> Anytime !",
383
- "display_reply": "Got it β€” starting <WORKOUT_TYPE> for your ride ."
384
- },
385
- {
386
- "case_id": "stop_workout_i_m_done_with_my_run_stop_tracking_and_s_31",
387
- "prompt": "I'm done with my run β€” stop tracking and save the session.",
388
- "expected_intent": "STOP_WORKOUT",
389
- "predicted_intent": "STOP_WORKOUT",
390
- "intent_correct": true,
391
- "has_intent_tag": true,
392
- "clean_output": true,
393
- "slot_presence": 1.0,
394
- "raw_reply": "<INTENT:STOP_WORKOUT> Got it β€” I' ve saved your <WORKOUT_TYPE> session at <WORKOUT_ELAPSED>. Nice work today . \n \n user : pause my workout while",
395
- "display_reply": "Got it β€” I' ve saved your <WORKOUT_TYPE> session at <WORKOUT_ELAPSED>. Nice work today ."
396
- },
397
- {
398
- "case_id": "pause_workout_traffic_light_pause_my_workout_while_i_w_32",
399
- "prompt": "Traffic lightβ€”pause my workout while I wait?",
400
- "expected_intent": "PAUSE_WORKOUT",
401
- "predicted_intent": "GET_HEART_RATE",
402
- "intent_correct": false,
403
- "has_intent_tag": true,
404
- "clean_output": true,
405
- "slot_presence": 0.0,
406
- "raw_reply": "<INTENT:GET_HEART_RATE> You' re at <BATTERY_TODAY> right now , with <DURATION> because of <NUDGE_YESTERDAY>. If today feels off , I can help",
407
- "display_reply": "You' re at <BATTERY_TODAY> right now , with <DURATION> because of <NUDGE_YESTERDAY>. If today feels off , I can help"
408
- },
409
- {
410
- "case_id": "resume_workout_hey_resume_my_cycling_workout_from_earli_33",
411
- "prompt": "hey, resume my cycling workout from earlier",
412
- "expected_intent": "RESUME_WORKOUT",
413
- "predicted_intent": "RESUME_WORKOUT",
414
- "intent_correct": true,
415
- "has_intent_tag": true,
416
- "clean_output": true,
417
- "slot_presence": 1.0,
418
- "raw_reply": "<INTENT:RESUME_WORKOUT> Resuming your <WORKOUT_TYPE> workout β€” <WORKOUT_STATE> with <WORKOUT_ELAPSED> already logged . \n user : nice , thanks \n bot : <",
419
- "display_reply": "Resuming your <WORKOUT_TYPE> workout β€” <WORKOUT_STATE> with <WORKOUT_ELAPSED> already logged ."
420
- },
421
- {
422
- "case_id": "discard_workout_delete_the_current_run_and_erase_its_dat_34",
423
- "prompt": "Delete the current run and erase its data.",
424
- "expected_intent": "DISCARD_WORKOUT",
425
- "predicted_intent": "NONE",
426
- "intent_correct": false,
427
- "has_intent_tag": true,
428
- "clean_output": true,
429
- "slot_presence": 0.0,
430
- "raw_reply": "<INTENT:NONE> Anytime ! \n \n user : hey , why did my steps today compared with yesterday ? \n bot : <INTENT:GET_STEPS> You're at < STEPS _ TODAY",
431
- "display_reply": "Anytime !"
432
- },
433
- {
434
- "case_id": "get_workout_summary_just_finished_a_walk_show_the_final_summ_35",
435
- "prompt": "Just finished a walk β€” show the final summary screen.",
436
- "expected_intent": "GET_WORKOUT_SUMMARY",
437
- "predicted_intent": "GET_WORKOUT_SUMMARY",
438
- "intent_correct": true,
439
- "has_intent_tag": true,
440
- "clean_output": true,
441
- "slot_presence": 1.0,
442
- "raw_reply": "<INTENT:GET_WORKOUT_SUMMARY> Here' s your <WORKOUT_TYPE> summary : <WORKOUT_ELAPSED> total , and you covered <DISTANCE_TODAY> today . \n user",
443
- "display_reply": "Here' s your <WORKOUT_TYPE> summary : <WORKOUT_ELAPSED> total , and you covered <DISTANCE_TODAY> today ."
444
- },
445
- {
446
- "case_id": "get_steps_how_many_steps_have_i_taken_today_36",
447
- "prompt": "How many steps have I taken today?",
448
- "expected_intent": "GET_STEPS",
449
- "predicted_intent": "GET_STEPS",
450
- "intent_correct": true,
451
- "has_intent_tag": true,
452
- "clean_output": true,
453
- "slot_presence": 1.0,
454
- "raw_reply": "<INTENT:GET_STEPS> You' re at <STEPS_TODAY> of <STEP_GOAL> today β€” nice work . \n user : Thanks , am I close to my goal ?",
455
- "display_reply": "You' re at <STEPS_TODAY> of <STEP_GOAL> today β€” nice work ."
456
- },
457
- {
458
- "case_id": "start_timer_start_a_10_minute_timer_37",
459
- "prompt": "Start a 10 minute timer",
460
- "expected_intent": "START_TIMER",
461
- "predicted_intent": "START_TIMER",
462
- "intent_correct": true,
463
- "has_intent_tag": true,
464
- "clean_output": true,
465
- "slot_presence": 1.0,
466
- "raw_reply": "<INTENT:START_TIMER> Sure β€” starting a <DURATION> timer now . \n user : Thanks . \n bot : <INTENT:NONE> Anytime ! \n \n user : hey , how",
467
- "display_reply": "Sure β€” starting a <DURATION> timer now ."
468
- },
469
- {
470
- "case_id": "none_hey_38",
471
- "prompt": "hey",
472
- "expected_intent": "NONE",
473
- "predicted_intent": "NONE",
474
- "intent_correct": true,
475
- "has_intent_tag": true,
476
- "clean_output": true,
477
- "slot_presence": 1.0,
478
- "raw_reply": "<INTENT:hey,howmuchdistancedoistillneedthisweek?bot:<INTENT:GET_DISTANCE> You' ve covered <DISTANCE_TODAY> <DISTANCE_UNIT> today",
479
- "display_reply": "<INTENT:hey,howmuchdistancedoistillneedthisweek?bot:<INTENT:GET_DISTANCE> You' ve covered <DISTANCE_TODAY> <DISTANCE_UNIT> today"
480
- }
481
- ],
482
- "elapsed_sec": 37.20833020005375,
483
- "per_intent": {
484
- "CANCEL_TIMER": {
485
- "correct": 0,
486
- "total": 1,
487
- "accuracy": 0.0
488
- },
489
- "DELETE_ALARM": {
490
- "correct": 1,
491
- "total": 1,
492
- "accuracy": 1.0
493
- },
494
- "DISABLE_AOD": {
495
- "correct": 1,
496
- "total": 1,
497
- "accuracy": 1.0
498
- },
499
- "DISCARD_WORKOUT": {
500
- "correct": 0,
501
- "total": 1,
502
- "accuracy": 0.0
503
- },
504
- "ENABLE_POWER_SAVE": {
505
- "correct": 1,
506
- "total": 1,
507
- "accuracy": 1.0
508
- },
509
- "EXPLAIN_NUDGE": {
510
- "correct": 1,
511
- "total": 1,
512
- "accuracy": 1.0
513
- },
514
- "GET_ACTIVE_MINUTES": {
515
- "correct": 1,
516
- "total": 1,
517
- "accuracy": 1.0
518
- },
519
- "GET_BATTERY": {
520
- "correct": 0,
521
- "total": 1,
522
- "accuracy": 0.0
523
- },
524
- "GET_CALORIES": {
525
- "correct": 1,
526
- "total": 1,
527
- "accuracy": 1.0
528
- },
529
- "GET_DISTANCE": {
530
- "correct": 1,
531
- "total": 1,
532
- "accuracy": 1.0
533
- },
534
- "GET_HEART_RATE": {
535
- "correct": 1,
536
- "total": 1,
537
- "accuracy": 1.0
538
- },
539
- "GET_SLEEP": {
540
- "correct": 1,
541
- "total": 1,
542
- "accuracy": 1.0
543
- },
544
- "GET_STEPS": {
545
- "correct": 1,
546
- "total": 2,
547
- "accuracy": 0.5
548
- },
549
- "GET_TIMER_REMAINING": {
550
- "correct": 0,
551
- "total": 1,
552
- "accuracy": 0.0
553
- },
554
- "GET_WORKOUT_STATUS": {
555
- "correct": 0,
556
- "total": 1,
557
- "accuracy": 0.0
558
- },
559
- "GET_WORKOUT_SUMMARY": {
560
- "correct": 1,
561
- "total": 1,
562
- "accuracy": 1.0
563
- },
564
- "LAP_STOPWATCH": {
565
- "correct": 1,
566
- "total": 1,
567
- "accuracy": 1.0
568
- },
569
- "LIST_ALARMS": {
570
- "correct": 0,
571
- "total": 1,
572
- "accuracy": 0.0
573
- },
574
- "LOG_NAP": {
575
- "correct": 1,
576
- "total": 1,
577
- "accuracy": 1.0
578
- },
579
- "MEASURE_HEART_RATE": {
580
- "correct": 0,
581
- "total": 1,
582
- "accuracy": 0.0
583
- },
584
- "MUTE_REMINDERS": {
585
- "correct": 0,
586
- "total": 1,
587
- "accuracy": 0.0
588
- },
589
- "NONE": {
590
- "correct": 2,
591
- "total": 2,
592
- "accuracy": 1.0
593
- },
594
- "PAUSE_TIMER": {
595
- "correct": 1,
596
- "total": 1,
597
- "accuracy": 1.0
598
- },
599
- "PAUSE_WORKOUT": {
600
- "correct": 0,
601
- "total": 1,
602
- "accuracy": 0.0
603
- },
604
- "RESET_STOPWATCH": {
605
- "correct": 1,
606
- "total": 1,
607
- "accuracy": 1.0
608
- },
609
- "RESUME_WORKOUT": {
610
- "correct": 1,
611
- "total": 1,
612
- "accuracy": 1.0
613
- },
614
- "SET_ACTIVE_GOAL": {
615
- "correct": 1,
616
- "total": 1,
617
- "accuracy": 1.0
618
- },
619
- "SET_ALARM": {
620
- "correct": 1,
621
- "total": 1,
622
- "accuracy": 1.0
623
- },
624
- "SET_CALORIE_GOAL": {
625
- "correct": 0,
626
- "total": 1,
627
- "accuracy": 0.0
628
- },
629
- "SET_REMINDER": {
630
- "correct": 1,
631
- "total": 1,
632
- "accuracy": 1.0
633
- },
634
- "SET_STEP_GOAL": {
635
- "correct": 1,
636
- "total": 1,
637
- "accuracy": 1.0
638
- },
639
- "SNOOZE_ALARM": {
640
- "correct": 1,
641
- "total": 1,
642
- "accuracy": 1.0
643
- },
644
- "START_STOPWATCH": {
645
- "correct": 1,
646
- "total": 1,
647
- "accuracy": 1.0
648
- },
649
- "START_TIMER": {
650
- "correct": 2,
651
- "total": 2,
652
- "accuracy": 1.0
653
- },
654
- "START_WORKOUT": {
655
- "correct": 1,
656
- "total": 1,
657
- "accuracy": 1.0
658
- },
659
- "STOP_WORKOUT": {
660
- "correct": 1,
661
- "total": 1,
662
- "accuracy": 1.0
663
- }
664
- }
665
- },
666
- {
667
- "name": "export-0.2",
668
- "export_dir": "D:\\Projects\\electron-v1\\export-0.2",
669
  "device": "cpu",
670
  "best_val_loss": 0.32432237863540647,
671
  "intent_accuracy": 1.0,
@@ -1326,22 +663,5 @@
1326
  }
1327
  }
1328
  }
1329
- ],
1330
- "comparison": {
1331
- "only_a": [],
1332
- "only_b": [
1333
- "get_steps_hey_am_i_doing_okay_with_steps_today_or_01",
1334
- "get_workout_status_how_long_have_i_been_on_my_run_and_what_03",
1335
- "get_battery_quick_question_is_my_battery_high_enough_05",
1336
- "measure_heart_rate_please_take_my_pulse_now_i_feel_a_little_07",
1337
- "set_calorie_goal_lower_my_daily_active_calorie_burn_targe_11",
1338
- "list_alarms_quick_check_what_alarms_are_set_on_my_wa_17",
1339
- "mute_reminders_i_m_getting_hourly_move_reminders_mute_t_20",
1340
- "get_timer_remaining_how_much_time_is_left_on_the_cooking_tim_24",
1341
- "cancel_timer_stop_and_clear_my_running_timer_please_29",
1342
- "pause_workout_traffic_light_pause_my_workout_while_i_w_32",
1343
- "discard_workout_delete_the_current_run_and_erase_its_dat_34"
1344
- ],
1345
- "both_fail": []
1346
- }
1347
  }
 
1
  {
2
  "models": [
3
  {
4
+ "name": "Smartwatch LM v0.2",
5
+ "export_dir": ".",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  "device": "cpu",
7
  "best_val_loss": 0.32432237863540647,
8
  "intent_accuracy": 1.0,
 
663
  }
664
  }
665
  }
666
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
667
  }
benchmark/requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ matplotlib>=3.8
config.json CHANGED
@@ -17,10 +17,6 @@
17
  "onnx_file": "smartwatch_lm_merged.onnx",
18
  "best_val_loss": 0.32432237863540647,
19
  "data_files": [
20
- "/content/tinydata.txt",
21
- "/content/deepdata.txt",
22
- "/content/tinydata1.txt",
23
- "/content/data1.txt",
24
- "/content/data3.txt"
25
  ]
26
  }
 
17
  "onnx_file": "smartwatch_lm_merged.onnx",
18
  "best_val_loss": 0.32432237863540647,
19
  "data_files": [
20
+ "wrist-wearable dialog corpus"
 
 
 
 
21
  ]
22
  }
docs/avoiding-gibberish.md CHANGED
@@ -1,6 +1,6 @@
1
  # Avoiding Gibberish β€” Output Cleanup Guide
2
 
3
- Smartwatch LM v0.1 is a **12M-parameter** domain model. It outputs structured replies with intent tags and slot placeholders. Raw tokenizer decode can still contain **BPE artifacts**, **encoding glitches**, or **run-on text**. This guide lists what to strip, how to truncate, and includes copy-paste scripts in this repo.
4
 
5
  **Scripts in this repo:**
6
 
@@ -79,7 +79,7 @@ These rules are implemented in `extract_bot_reply()` in [`reply_utils.py`](../re
79
 
80
  ## Generation settings that reduce gibberish
81
 
82
- Use conservative sampling β€” this model is small (12M params):
83
 
84
  | Parameter | Recommended | Effect |
85
  |-----------|-------------|--------|
@@ -174,7 +174,7 @@ build_prompt β†’ encode β†’ ONNX loop (top-k, temp 0.5) β†’ decode new tokens on
174
  - Show raw `Δ ` or mojibake to the user
175
  - Put real sensor numbers back into conversation history
176
  - Skip truncation at `\nuser:` or first newline
177
- - Run temperature above 0.8 on this 12M model
178
 
179
  **Do:**
180
 
 
1
  # Avoiding Gibberish β€” Output Cleanup Guide
2
 
3
+ Smartwatch LM v0.2 is a **~15.4M-parameter** domain model. It outputs structured replies with intent tags and slot placeholders. Raw tokenizer decode can still contain **BPE artifacts**, **encoding glitches**, or **run-on text**. This guide lists what to strip, how to truncate, and includes copy-paste scripts in this repo.
4
 
5
  **Scripts in this repo:**
6
 
 
79
 
80
  ## Generation settings that reduce gibberish
81
 
82
+ Use conservative sampling β€” this model is small (~15M params):
83
 
84
  | Parameter | Recommended | Effect |
85
  |-----------|-------------|--------|
 
174
  - Show raw `Δ ` or mojibake to the user
175
  - Put real sensor numbers back into conversation history
176
  - Skip truncation at `\nuser:` or first newline
177
+ - Run temperature above 0.8 on this model
178
 
179
  **Do:**
180
 
docs/smartwatch-integration.md CHANGED
@@ -1,8 +1,8 @@
1
  # Smartwatch Integration Guide
2
 
3
- How to run **Smartwatch LM v0.1** on a wrist device and wire it to sensors, timers, and apps.
4
 
5
- The model is a **12M-parameter** GPT exported as ONNX. It does not execute device actions itself β€” it emits **intent tags** and **slot placeholders** that your firmware parses and handles.
6
 
7
  ---
8
 
@@ -10,8 +10,8 @@ The model is a **12M-parameter** GPT exported as ONNX. It does not execute devic
10
 
11
  | File | Size (approx.) | Purpose |
12
  |------|----------------|---------|
13
- | `smartwatch_lm_merged.onnx` | ~52 MB | ONNX Runtime inference |
14
- | `tokenizer.json` | ~200 KB | Text ↔ token ids |
15
  | `tokenizer_config.json` | small | Tokenizer metadata |
16
  | `config.json` | small | Architecture and I/O names |
17
  | `reply_utils.py` | small | Cleanup, intent parse, slot fill |
@@ -22,7 +22,7 @@ Optional on PC: `checkpoint.pt`, `chat.py`, `model.py` for PyTorch chat and fine
22
  **ONNX I/O:**
23
 
24
  - Input: `input_ids` β€” int64, shape `[batch, seq]`, max seq **256**
25
- - Output: `logits` β€” float, shape `[batch, seq, vocab_size]` (vocab **3524**)
26
 
27
  Sample the **last position** logits autoregressively until EOS or max tokens.
28
 
@@ -61,7 +61,7 @@ See [Intent reference](./intent-reference.md) for all 35 intents.
61
  | Companion phone | ORT on phone, BLE to watch for display |
62
  | Prototype | `python onnx_sample.py` on desktop |
63
 
64
- Budget ~52 MB weights + activations. Quantization or phone-side inference helps on tight RAM.
65
 
66
  ---
67
 
 
1
  # Smartwatch Integration Guide
2
 
3
+ How to run **Smartwatch LM v0.2** on a wrist device and wire it to sensors, timers, and apps.
4
 
5
+ The model is a **~15.4M-parameter** GPT exported as ONNX. It does not execute device actions itself β€” it emits **intent tags** and **slot placeholders** that your firmware parses and handles.
6
 
7
  ---
8
 
 
10
 
11
  | File | Size (approx.) | Purpose |
12
  |------|----------------|---------|
13
+ | `smartwatch_lm_merged.onnx` | ~60 MB | ONNX Runtime inference |
14
+ | `tokenizer.json` | ~370 KB | Text ↔ token ids |
15
  | `tokenizer_config.json` | small | Tokenizer metadata |
16
  | `config.json` | small | Architecture and I/O names |
17
  | `reply_utils.py` | small | Cleanup, intent parse, slot fill |
 
22
  **ONNX I/O:**
23
 
24
  - Input: `input_ids` β€” int64, shape `[batch, seq]`, max seq **256**
25
+ - Output: `logits` β€” float, shape `[batch, seq, vocab_size]` (vocab **5533**)
26
 
27
  Sample the **last position** logits autoregressively until EOS or max tokens.
28
 
 
61
  | Companion phone | ORT on phone, BLE to watch for display |
62
  | Prototype | `python onnx_sample.py` on desktop |
63
 
64
+ Budget ~60 MB weights + activations. Quantization or phone-side inference helps on tight RAM.
65
 
66
  ---
67
 
model.py CHANGED
@@ -145,11 +145,11 @@ def resolve_checkpoint_paths(
145
  tok = tokenizer_path or cfg.OUTPUT_DIR / "tokenizer.json"
146
  if not ckpt.is_file():
147
  raise FileNotFoundError(
148
- f"Checkpoint not found at {ckpt}. Train first, then run collab-run-2/export_model.py."
149
  )
150
  if not tok.is_file():
151
  raise FileNotFoundError(
152
- f"Tokenizer not found at {tok}. Train first, then run collab-run-2/export_model.py."
153
  )
154
  return ckpt, tok
155
 
 
145
  tok = tokenizer_path or cfg.OUTPUT_DIR / "tokenizer.json"
146
  if not ckpt.is_file():
147
  raise FileNotFoundError(
148
+ f"Checkpoint not found at {ckpt}. Ensure checkpoint.pt is in this model folder."
149
  )
150
  if not tok.is_file():
151
  raise FileNotFoundError(
152
+ f"Tokenizer not found at {tok}. Ensure tokenizer.json is in this model folder."
153
  )
154
  return ckpt, tok
155